Computational Approaches in Epigenomics Guo-Cheng Yuan Department of Biostatistics and Computational Biology Dana-Farber Cancer Institute Harvard School of Public Health BIO506, Jan 11 th, 2010
Definition Epigenetics refers to changes in phenotype (appearance) or gene expression caused by mechanisms other than changes in the underlying DNA sequence. wikipedia
Epigenetic mechanisms Nucleosome positions Histone modification DNA methylation
Chromatin DNA is packaged into chromatin. Nucleosome is the fundamental unit of chromatin. It wraps 146 bp DNA. The chromatin structure is hierarchical. Felsenfeld and Groudine 2003
Nucleosome and histone modification First layer chromatin structure looks like “beads-on-a-string”. A nucleosome is made of core histone proteins. The amino acids on the N- terminus of histones can be covalently modified. Felsenfeld and Groudine 2003
DNA methylation Alberts et al. Molecular Biology of the Cell DNA methylation normally occurs at CpG dinucleotide only and can be inherited during cell- division.
Why do we care? Epigenetics is an extra layer of transcriptional control. Epigenetics plays an important role in development. Epigenetic mechanisms can cause cancer and other diseases. Epigenetic patterns are reversible and can be influenced by environments.
Our goals epigenonic data microarray DNA sequence … Computational model Characterize cell-type specific epigenetic states Elucidate epigenetic targeting mechanism Understand epigenetic regulation in cell differentiation Epigenetic signature of diseases TF binding
Chromatin domains Intrachromosomal interactions large-scale histone modification patterns chromatin loops
A hidden Markov model for prediction of multi-gene chromatin domains Jessica Larson
Prediction results
Targeting mechanism for epigenetic factors Nucleosome positions Histone modification pattern
Wavelet Energy Dinucleotide Frequency Signal Wavelet Basis Signal Decomposition E1E1 E2E2 E3E3 An N-score model to prediction nucleosome positions Yuan and Liu
N-score prediction in two yeast species Lanterman et al.
Polycomb targets developmental genes in ES Boyer et al Polycomb Oct4 Nanog Sox2 expressed repressed Kim et al. 2008
Motif A Motif B Motif C NOYES NOYES NOYES A computational model: BART BART is a Bayesian average of regression trees Chipman et al. 2007
Overall prediction accuracy AUC = 0.82 all factors 5 factors CpG random testing data ROC Number of cell-types in which the gene is targeted Propensity score Spring Liu; Zhen Shao
TF network + Polycomb Hox Dnmt1 Hox + cell-type A cell-type B An integrated network Jess Mar
Future directions How do genetic and epigenetic factors work together to regulate cell-type specific gene expression? How does the integrated regulatory network change across cell-types? Are there epigenetic signatures associated with common diseases and if so what role do they have?
Acknowledgment Jessica Larson Yingchun (Spring) Liu Zhen Shao John Quackenbush Lab –Jess Mar Stuart Orkin Lab –Xiaohua Shen –Jongwan Kim Steve Altschuler Ollie Rando Jun Liu Claudia Adams Barr Program