Introduction to epigenetics: chromatin modifications, DNA methylation and the CpG Island landscape (part 2) Héctor Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)
How do we measure DNA methylation?
Microarray Data
One question… Where do we measure? At least 7 arrays are needed to measure entire genome CpG are depleated Remaining CpGs cluster
CpG Islands
But variation seen outside
McRBC No Methylation Cuts at A m CG or G m CG Input
McRBC Methylation
McRBC after GEL Methylation
McRBC after GEL Methylation
Now unmethylated No Methylation
McRBC after Gel No Methylation
Gene Expression Normalization does not work well here
We use control probes
There are also waves
Smoothing
McRBC on tiling two channel array We smooth
Proportion of neighboring CpG also methylated/not methylated
True signal (simulated)
Observed data
Observed data and true signal
What is methylated (above 50%)?
Naïve approach
Many false positives (FP)
Smooth
No FP, but one false negative
Smooth less? No FN, lots of FP
We prefer this!
CHARM DMR for three tissues (five replicates) Irizarry et al, Nature Genetics 2009
Some findings [Irizarry et al., 2009, Nat. Genetics]
Tissue easily distinguished
Cancer DMR
Many Regions like this Note: hypo and hyper methylation
Both hyper and hypo methylated
Cancer and Tissue DMRs coincide
DMR enriched in Shores
Still affects expression T-DMRs
Still affects expression C-DMRs
USING SEQUENCING (BS-SEQ)
TTCGATTACGATTCGATTACGA AAGCTAATGCTAAGCTAATGCT CH 3 TTCGATTACGATTCGATTACGA AAGCTAATGCTAAGCTAATGCT LiverBrain
TTCGATTACGATTCGATTACGA AAGCTAATGCTAAGCTAATGCT CH 3 TTCGATTACGATTCGATTACGA AAGCTAATGCTAAGCTAATGCT TTCGATTACGATTCGATTACGA AAGCTAATGCTAAGCTAATGCT TTCGATTACGATTCGATTACGA AAGCTAATGCTAAGCTAATGCT TTCGATTACGATTCGATTACGA AAGCTAATGCTAAGCTAATGCT TTCGATTACGATTCGATTACGA AAGCTAATGCTAAGCTAATGCT TTCGATTACGATTCGATTACGA AAGCTAATGCTAAGCTAATGCT 85% Methylation chr3:44,031,616-44,031,626
Bisulfite Treatment
GGGGAGCAGCATGGAGGAGCCTTCGGCTGACT GGGGAGCAGTATGGAGGAGTTTTCGGTTGATT
BS-seq GTCGTAGTATTTGTCT GTCGTAGTATTTGTNN TGTCGTAGTATCTGTC TATGTCGTAGTATTTG TATATCGTAGTATTTT TATATCGTAGTATTTG NATATCGTAGTATNTG TTTTATATCGCAGTAT ATATTTTATGTCGTA ATATTTTATCTCGTA ATATTTTATGTCGTA GA-TATTTTATGTCGT GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACG TTCAATATT Coverage: 13 Methylation Evidence: 13 Methylation Percentage: 100%
BS-seq GTCGTAGTATTTGTCT GTCGTAGTATTTGTNN TGTCGTAGTATCTGTC TATGTCGTAGTATTTG TATATTGTAGTATTTT TATATCGTAGTATTTG NATATTGTAGTATNTG TTTTATATTGCAGTAT ATATTTTATGTCGTA ATATTTTATCTTGTA ATATTTTATGTCGTA GA-TATTTTATGTCGT GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACG TTCAATATT Coverage: 13 Methylation Evidence: 9 Methylation Percentage: 69%
BS-seq GTCGTAGTATTTGTCT GTCGTAGTATTTGTNN TGTTGTAGTATCTGTC TATGTTGTAGTATTTG TATATTGTAGTATTTT TATATTGTAGTATTTG NATATTGTAGTATNTG TTTTATATTGCAGTAT ATATTTTATGTCGTA ATATTTTATCTTGTA ATATTTTATGTTGTA GA-TATTTTATGTCGT GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACG TTCAATATT Coverage: 13 Methylation Evidence: 4 Methylation Percentage: 31%
BS-seq Alignment is much trickier: – Naïve strategy: do nothing, hope not many CpG in a single read – Smarter strategy: “bisulfite convert” reference: turn all Cs to Ts Also needs to be done on reverse complement reference and reads – Smartest strategy: be unbiased and try all combinations of methylated/un-methylated CpGs in each read Computationally expensive (see Hansen et al, 2011, for a strategy)
BS-seq There are similarities to SNP calling (we’ll see this in a couple of weeks) EXCEPT: we want to measure percentages – Use a binomial model to estimate p, percentage of methylation – Allow for sequencing errors, coverage differences, etc.
Measuring DNA Methylation Estimating percentages Use “local-likelihood” method – Based on loess (Plot courtesy of Kasper Hansen)
BS-seq Lister et al. 2009, Nature
Gene Expression Regulation: DNA methylation in promoter regions Lister et al. 2009, Nature
DNA methylation patterns within genomic regions Lister et al. 2009
Putting it together
What were we after? The epigenetic progenitor origin of human cancer [Feinberg, et al., Nature Reviews Genetics, 2006] Stochastic epigenetic variation as driving force of disease [Feinberg & Irizarry, PNAS, 2009] Phenotypic variation, perhaps epigenetically mediated, increases disease susceptibility Increased epigenetic and gene expression variability of specific genes/regions is a defining characteristic of cancer
What did we do? Custom Illumina methylation microarray Confirmed increased epigenetic variability in specific regions across five cancer types
What did we do? Custom Illumina methylation microarray Confirmed increased epigenetic variability in specific regions across five cancer types
What did we do? Custom Illumina methylation microarray Confirmed increased epigenetic variability in specific regions across five cancer types Confirmed same sites are involved in tissue differentiation
What did we do? Custom Illumina methylation microarray Whole genome sequencing of bisulfite treated DNA – Found large blocks of hypo-methylation (sometimes Mbps long) in colon cancer
What did we do? Custom Illumina methylation microarray Whole genome sequencing of bisulfite treated DNA – Found large blocks of hypo-methylation (sometimes Mbps long) in colon cancer – These regions coincide with hyper-variable regions across cancer types
What did we do? Custom Illumina methylation microarray Whole genome sequencing of bisulfite treated DNA Gene Expression Analysis
Gene Expression Data
When using multiple microarray experiments, proper normalization is key [McCall, et al., Biostatistics 2010]
Normalization is key fRMA: a single-chip normalization procedure GNUSE: a single-chip quality metric Barcode: a single-chip common-scale measurement
What did we do? Custom Illumina methylation microarray Whole genome sequencing of bisulfite treated DNA Gene Expression Analysis – Genes with hyper-variable gene expression in colon cancer are enriched in hypo-methylation blocks [Corrada Bravo, et al., under review]
What are we doing next? Custom Illumina methylation microarray Whole genome sequencing of bisulfite treated DNA Gene Expression Analysis – Genes with hyper-variable gene expression in colon cancer are enriched in hypo-methylation blocks
Bigger gene expression study 7,741 HGU133plus2 samples 598 normal tissue samples, 4,886 tumor samples 176 different tissue types 175 different GEO studies
Bigger gene expression study [Corrada Bravo, et al., under review]
What are we doing next? Custom Illumina methylation microarray Whole genome sequencing of bisulfite treated DNA Gene Expression Analysis – Genes with hyper-variable gene expression in colon cancer are enriched in hypo-methylation blocks – Tissue-specific genes have hyper-variable gene expression across cancer types [Corrada Bravo, et al., under review]