Tuesday, March 18, 2008

New evidence about random binding of TFs

From: Nature Reviews Genetics 8, 413-423 (June 2007)
As well as providing a genome-wide glimpse of how gene expression is regulated in cis, this study contains a caution against using ChIP data without further evidence of functionality. Importantly, it is now also clear that CRM occupancy must be thought of in quantitative terms, rather than as an on–off state, to fully explore the function of these sites.

ORIGINAL RESEARCH PAPER of this comment: 18271625

Monday, June 18, 2007

calculate PWM similarity

Overview:
Most methods calculate the similarity of each column, and then sum all columns up.

For each column, people have used the following measures:
1. Pearson correlation coeffecient [8871566, 10698627, 15735639]
2. K-L distance [15980506 (with web tool), 12015892, 14534164]
3. Euclidean distance [14985506]
4. Kai-square test [15319260 (dedicated paper)]
5. Fisher's Exact test [15319260]
6. Average log likelihood ratio [14668220]
7. SW [15066426]

Shobhit Gupta et al. come up with a idea to use p-value measure column similarity[17324271]. The calculation of p-value can be based on any of the above seven measures.

Sunday, March 11, 2007

The debates about PWM

A few papers: : 17218526, 11861919, 11410653, 12384591.

Thursday, February 08, 2007

How does SNP work?

1. In coding region of gene, if the SNP result in a non-synonymous mutation it would change the translated protein structure and function.
2. In coding region of gene, if the SNP is a synonymous mutation, the translated protein sequence might not be changed. But it is very interesting that the structure of protein would still have probabilty to be changed [17185560].

3. In promoter region, it would change the expression pattern of downstream genes[17053109 , 17053108].
4. In intron, ???
5. In other region, ???

Thursday, February 01, 2007

Standard procedure in R for pre-processing of Affy .CEL file

Modified from Soumyaroop Bhattacharya (sbhattacharya@rics.bwh.harvard.edu) 's post at http://lungtranscriptome.bwh.harvard.edu/Microarray%20Data%20Analysis%20using%20Bioconductor.pdf .

Using affy package
Save all the .CEL files for the analysis and the corresponding .CDF file for the particular array in one directory. If you are connected to internet then R will get .CDF file. Run R from that irectory.
In R write
  • library (affy)
  • Data <- ReadAffy() # Reads in Data
Then creating Expression Intensity Values by one of the following ways:

1. RMA
  • eset <- rma(Data)
OR
  • eset <- expresso (Data, normalize.method=”quantile.robust”, bgcorrect.method=”rma”, pmcorrect.method=”pmonly”,summary.method=”medianpolish”)
  • write.exprs(eset,file=”myDataRMA.txt”)
2. DChip
  • eset <- expresso (Data, normalize.method=”invariantset”, bgcorrect=FALSE, pmcorrect.method=”pmonly”, summary.method=”liwong”)
  • write.exprs(eset,file=”myDataDChip.txt”)

3. MAS 5.0 Expression Intensities (http://www.affymetrix.com/analysis)
  • eset <- mas5(Data)
OR
  • eset <- expresso (Data, normalize =FALSE, bgcorrect.method=”mas”, pmcorrect.method=”mas”, summary.method=”mas”)
  • write.exprs(eset,file=”myDataMAS5.txt”)
  • Calls <- mas5calls(Data)

Friday, January 12, 2007

A case that concentration of TF play a role

The Dorsal nuclear gradient establishes the territories of the prospective mesoderm, neuroectoderm, and dorsal ectoderm by activating or repressing zygotic gene expression in a concentration-dependent manner. [16908844]

Tuesday, December 26, 2006

papers regarding siRNA repress translation instead of cleave mRNA

12600936
12902540