Epigenetics 12/05/07 Statisticians like data. Don’t emphasize method too much, it is not to your advantage. Don’t exaggerate Speak more clearly. In the next slide, explain epigenetics
Epigenetic regulation is critical for cell differentiation Epithelial cell (right); liver cell (left)
Gene imprinting
More examples of epigenetic regulation
Epigenetic mechanisms DNA methylation Histone modification Nucleosome positions
DNA methylation Alberts et al. Molecular Biology of the Cell
Methylated genes are silenced
Probable mechanisms for DNA methylation induced siliencing The DNA methylation marker directly interferes with TF binding. The DNA methylation marker is recognized by proteins that cause chromatin structure changes.
DNA in the nucleus is complexed with histones to form nucleosomes 10,000 nm DNA in the nucleus is complexed with histones to form nucleosomes 11 nm 30nm Mention linker DNA. Say that its length is variable. Keep it real short! Don’t say everything in the figure. Nucleosome is the fundamental repeating unit in chromatin. 1bp (0.3nm)
Histone modification Acetyl Ubiquityl Methyl Phosphoryl Luger et al. Nature, (1997) Histone tails can be covalently modified in multiple ways at multiple sites Felsenfeld and Groudine, Nature, (2003)
How histone modfication is inherited Histone methylation marks may be inherited by local concentration. The exact mechanism for inheritance is unknown. Even if histone modification is inherited is not proved.
Transcriptional regulation by chromatin Nucleosome positioning Histone modification TF TF TF TF TF target site
DNA methylation histone modification chromatin H3K9me3 HP1 H3K9me3 H4K16ac
Epigenetic reprogramming during development Methylation marks are erased during cleavage. Methylation of the maternal genome is actively stripped within hours of fertilization. Maternal genome is passively erased at a slower rate. de novo methylation after implantation. Another round of demethylation during differentiation. DNA methylation is essential for development.
Epigenetic reprogramming can reverse tumorgenesis Figure 1. Two-step cloning procedure to produce mice from cancer cells. Different tumor cells were used as donors for nuclear transfer into enucleated oocytes. Resultant blastocysts were explanted in culture to produce ES cell lines. The tumorigenic and differentiation potential of these ES cells was assayed in vitro by inducing teratomas in SCID mice (1), and in vivo by injecting cells into diploid (2) or tetraploid (3) blastocysts to generate chimeras and entirely ES-cell-derived mice, respectively. Hochelinger et al. Genes & Dev, (2004)
Cancer and histone modification Chin, Nature (1998)
Cancer and chromatin BRG1, the motor component of the SWI/SNF chromatin complex, is mutated in multiple cell lines (Wong et al. 2000) prostate DU145; lung A-427; prostate TSU-Pr-1; lung NCI-H1299; breast ALAB; pancreas Hs 700T … suggesting BRG1 may be a tumor repressor protein
Genomic-view of epigenetic regulation How to detect genome-wide patterns of epigenetic markers? How do epigenetic factors regulate genome-wide gene expression? How is the distribution of genome-wide epigenetic markers regulated?
Log (mononuc/genomic) 1.Tile microarray 20 bp offset, 50-mers Chr III + 233 promoters 2.Hybridize mononucleosomal DNA vs naked genomic DNA Green stuff doesn’t have linker DNA Resolution is 20 bp. Nucleosome signals span multiple probes. midlog phase yeast; mononucleosomal DNA is purified by MNase. Don’t say I didn’t do experiments We first filter out promoters containing highly repetitive sequences. Then ~100 promoters are randomly chosen. ~100 promoters correspond to cell-cycle genes Q: How to filter repetitive sequences? A: Highly repetitive sequences are not tiled. 5 or more contiguous probes with perfect matches. 30 contigs. Q: what kind of arrays? A: Pat Brown arrays. Glass. 25,000 probes. 3.Compute Log (mononuc/genomic) Yuan et al., Science, (2005)
Nucleosome positioning in yeast MFA2 HIS3 MATa MATa MATa nucs predicted positioned nucs CHA1 centromere literature positioned nucs Fuzzy nucleosomes are real. Here is how it looks like in our data.. MFA2 (Watson) is the mating pheromone a-factor, made by a cells. HIS3 (Watson) catalyzes the sixth step in histidine biosynthesis; transcription is regulated by Gcn4p. CHA1 (Crick) catalyzes the degradation of both L-serine and L-threonine; required to use serine or threonine as the sole nitrogen source. fuzzy nucs Yuan et al., Science, (2005)
Stereotyped pattern Aligned by ATG Average signal (aligned by ATG codon) shows regular pattern. 95% CI Log2 Ratio Aligned by ATG You might expect that nucleosome positions at different promoters all look differently. But look. Nucleosome positioning has a common pattern, suggesting there may be a basic principle underlying the nucleosome positioning; Show align wrt NFRs Inter-nucleosome distance 160~170 bp. Predict the length of 5’ UTR. Distance to ATG Yuan et al., Science, (2005)
Transcription factor binding sites (TFBSs) are likely to be nucleosome-depleted TFBSs tend to be nucleosome-depleted. Motif sites that are unbound in our condition but bound in other conditions also tend to be nucleosome depleted. Motif sites that are always unbound do not have nucleosome-depletion property. Show one color at a time Highly transcribed genes tend to be more delocalized in ORF. Q: Why does bound (other) also have a strong signal? A: Maybe nucleosome makes accessible the TFBS that are used in other conditions as well. Thus it gives the potential of activity not the activity itself. Yuan et al., Science, (2005)
Histone modification in yeast Liu et al., PLoS Biology, (2005)
Co-regulated histone modifications Liu et al., PLoS Biology, (2005)
Nucleosome positioning in human Ozsolak et al., Nat Biotech, (2007)
Histone modification in human Guenther et al., Cell, (2007)
Distinct histone modification pattern in Embryonic Stem (ES) cells Gene ES ES cells contain both repressive and active markers Differentiated cell type 1 Differentiated cell type 2 Differentiated cells contain either repressive or active markers but not both Differentiated cell type n H3K27M: repressive H3K4M: active Bernstein et al. Cell (2006)
Euchromatin and heterochromatin http://respiratory-research.com
Large–scale chromatin domain Rinn et al. Cell (2007)
Large-scale chromatin domain ENCODE, Nature, 2007
Large-scale chromatin domain Open Closed ENCODE, Nature, 2007
Large-scale chromatin domain Open Closed ENCODE, Nature, 2007
DNA methylation in human Eckhardt et al. Nat Gen. (2007)
DNA-methylation pattern in human Figure 1 Type and distribution of amplicons. In total, we analyzed 2,524 amplicons from six distinct categories: 43.7% 5¢-UTRs, 22.5% evolutionary conserved regions (ECR), 14.3% intronic regions, 13.3% exonic regions, 3.6% Sp1 transcription factor binding sites and 2.6% ‘other’ Eckhardt et al. Nat Gen. (2007)
Histone modification Acetyl Ubiquityl Methyl Phosphoryl Luger et al. Nature, (1997) Histone tails can be covalently modified in multiple ways at multiple sites Felsenfeld and Groudine, Nature, (2003)
Histone code hypothesis “… multiple histone modifications, acting in a combinatorial or sequential fashion on one or multiple histone tails, specify unique downstream functions …” ― Strahl and Allis, Nature, (2000) Don’t get into long discussion of the code. Simply, different combinations can have different effects. Don’t get into details of Dion’s experiment. Simply, mutagenesis suggests that the code is probably much simpler. H4-lysine acetylation seems to be cumulative. A remarkable hypothesis proposed by Strahl and Allis is that … But this hypothesis also leads to a dilemma, which is, since the number of possible combinations of histone modifications are overwhelming, how can we possibly decode the histone modification? On the other hand, there is plenty of evidence that the “histone code” is not as complicated as conjectured. For example, our group mutated H4 tail lysine to arginine, which mimics unacetylable lysine, in all possible combinations. The overall effect seems to be cumulative rather than combinatorial.
Statistical assessment of the global impact of histone acetylation on gene expression Integrative analysis using multiple genomic data resources (sequence, gene expression, histone modification) Linear regression model yi expression; Aij acetylation; Si promoter sequence Key is to estimate sequence dependent regulatory effects. If the model fits well, then it suggests it is not so complicated. Data come from …, expand on sequence part. Yuan et al. Gen Bio (2006)
Estimating sequence dependent regulation effects Linear regression model with transcription factor binding motifs Sij motif score Scan motif (MDscan, AlignAce) Filter out insignificant motifs (RSIR) linear f(Si) R^2 is about 0.27, reasonably well for this kind of data. Including interaction coefficients, the R^2 is increased by less than 0.01. Repressors have negative coefficients. E.g., RFX1 has negative coefficients. The effect of the motifs are fitted by data. Repressor corresponds to negative weights? Say linear model of sequences. Change S_ij to motif scores. Explain. S_ij looks similar to S_i which is not. Say a few words about Beer-tavazoie’s motifs. Are they better? One RSIR direction is selected. Q: what if there are more than one RSIR direction? Would it still help to include the variables corresponding to both directions? A: Yes. RSIR is only an exploratory tool. Andrew Gelman did an experiment: X^2+y^2=1 to geneerate data. And linear model can fit very well to the data. The fact that there are more than one RSIR direction can be caused by 1) nonlinear effect; or 2) linear effect but inaccurate SIR direction estimate. In the first case, the variables in 2nd SIR are important factors and should be included in the model. On the other hand, it will be difficult to estimate the full nonlinear effect, so we use the simplified linear model as a proxy. In the second case, the variables selected based on the 1st SIR is unreliable. Therefore, using these variables alone may actually ignore some important factors. R-square is about 0.3. Yuan et al. Gen Bio (2006)
Performance of the linear regression model
Performance of the linear regression model
Performance of the linear regression model
Cumulative effect of histone acetylation Test whether including quadratic interaction between different acetylation sites would improve model performance quadratic interaction p-value for quadratic interaction coefficients (gjk) Write out the formula on top Question is does including quadratic interaction terms would improve model performance? Coding region acetylation may not be regulatory but serve as mark. (don’t discuss unless pressed) Data available at three sites statistically insignificant
Reading List Strahl and Allis 2000; Bernstein et al. 2007 Proposed histone code hypothesis Bernstein et al. 2007 An up to date review of epigenomics Yuan et al. 2005; Nucleosome positions in yeast Yuan et al. 2006; Statistical analysis of histone related gene expression.