Methods to read out regulatory functions Regulomics I: Methods to read out regulatory functions
Identifying regulatory functions in genomes Merge into general discussion of regulatory space - regulomics Noonan and McCallion, Ann Rev Genomics Hum Genet 11:1 (2010)
Genes are not just protein coding sequences Expression of gene A gene A limb Limb TFs gene A forebrain gene A Brain TFs Tissue specific TF neural tube gene A Neural TFs
Regulatory mutations can cause profound phenotypes Lettice et al. Hum Mol Genet 12:1725 (2003) Sagai et al. Development 132:797 (2005)
Three essential questions Q1: Where are regulatory elements located in the genome? Q2: What regulatory functions do they encode? Q3: What genes do they control? We will use promoters and enhancers as our examples, but there are other regulatory functions
Q1: Mapping regulatory elements in genomes Chr5: 133,876,119 – 134,876,119 Genes Transcription Regulatory elements are not easily detected by sequence analysis Examine biochemical correlates of RE activity in cells/tissues: Chromatin Immunoprecipitation (ChIP-seq) DNase-seq and FAIRE Methylated DNA immunoprecipitation (MeDIP)
Biochemical indicators of regulatory function 1. TF binding 2. Histone modification H3K27ac H3K4me3 3. Chromatin modifiers & coactivators p300 MLL 4. DNA looping factors cohesin
Methods ChIP-seq Chromatin accessibility TFs Histone mods DNase FAIRE From Furey (2012) Nat Rev Genet 13:840
Method I: ChIP-seq ChIP Input Peak call Signal Align reads to reference Use peaks of mapped reads to identify binding events PCR
Calling peaks in ChIP-seq data Input Peak call Enrichment relative to control Highlight the challenges for both ChIP and RNA-seq in both protocols ChIP-seq is an enrichment method Requires a statistical framework for determining the significance of enrichment ChIP-seq ‘peaks’ are regions of enriched read density relative to an input control Input = sonicated chromatin collected prior to immunoprecipitation
There are many ChIP-seq peak callers available Wilbanks and Facciotti PLoS ONE 5:e11471 (2010)
Generating ChIP-seq peak profiles Artifacts: Repeats PCR duplicates From Park (2009) Nat Rev Genet 10:669
Assessing statistical significance Assume read distribution follows a Poisson distribution Many sites in input data will have some reads by chance Some sites will have many reads Poisson assumption + seq depth # of reads at a site (S) Empirical FDR: Call peaks in input (using ChIP as control) FDR = ratio of # of peaks of given enrichment value called in input vs ChIP From Pepke et al (2009) Nat Meth 6:S22
Assessing statistical significance Sequencing depth matters: Poisson assumption + seq depth # of reads at a site (S) From Park (2009) Nat Rev Genet 10:669
ChIP-seq signal profiles vary depending on factor Transcription factors Pol II Histone mods From Park (2009) Nat Rev Genet 10:669
Mapping chromatin accessibility DNase I FAIRE From Furey (2012) Nat Rev Genet 13:840
DNase I hypersensitivity identifies regulatory elements… DNase I hypersensitive sites Case studies: TFs Which? Oct4? CTCF? Song et al., Genome Res 21:1757 (2011)
…but needs to be combined with other data to determine what is actually bound – such as TF ChIP… DHS signal in GM12878 RNA PolII ChIP in GM12878
… or motif analysis DHS sites in human ES cells: From Neph (2012) Nature 489:83
Q2: Making sense of regulatory functions Compare multiple biological states Integrate multiple data sources TF function Histone modification Potential target genes Existing genome annotations
Regulatory function is dependent on biological context forebrain gene A Brain TFs neural tube Neural TFs limb Limb TFs
Identifying tissue-specific regulatory function Limb Brain Limb Sites strongly marked in Limb Sites strongly marked in both ChIP-seq signal Signal at 20,000 bound sites Clustering signal Sites strongly marked in Brain
Identifying tissue-specific regulatory function Limb Brain Function? Assign enhancers to genes based on proximity (not ideal) GREAT: bejerano.stanford.edu/great/ Gene ontology annotation assigned to regulatory sequences
Q2: Making sense of regulatory functions Compare multiple biological states Integrate multiple data sources TF function Histone modification Potential target genes Existing genome annotations
Example from PS1: CTCF and RAD21 (cohesin) Annotate (GREAT)
CTCF and cohesin co-occupy many sites Promoters Insulators Enhancers From Kagey et al (2010) Nature 467:430
CTCF: marks insulators and promoters Enhancers? Annotate (GREAT) CTCF: marks insulators and promoters RAD21 (cohesin): marks insulators, promoters and enhancers? Include histone modification data (Wednesday’s lecture)
Identifying bound motifs from ChIP-seq data CTCF ~20,000 binding sites identified by ChIP: GREAT MEME suite: http://meme.nbcr.net/meme/ From Furey (2012) Nat Rev Genet 13:840
Single TF binding events often do not indicate regulatory function Caveat: Single TF binding events often do not indicate regulatory function Enhancer-associated histone modification Many TFs are present at high concentrations in the nucleus TF motifs are abundant in the genome Single TF binding events may be incidental Combinations of marks/TF binding events
Q3: Identifying the target genes for regulatory elements forebrain gene A Brain TFs neural tube Neural TFs limb Limb TFs
Chromosome Conformation Capture ChIP for specific factors: ChIA-PET Sequence: 5C Sequence: Hi-C Sequence: 4C
3C evaluates specific interaction possibilities by qPCR Dekker et al Nat Rev Genet 14:390 (2013)
4C identifies genome-wide interactions for a single “bait” sequence
ChIA-PET identifies interactions involving a particular factor From Kieffer-Kwon et al. (2013) Cell 155:1507
In principle, Hi-C captures all interactions, but is limited by sequencing depth Dekker et al Nat Rev Genet 14:390 (2013)
Hierarchical organization of the genome Cohesin-mediated interactions Dekker et al Nat Rev Genet 14:390 (2013) Gorkin et al Cell Stem Cell 14:762 (2014)
Summary Relevant overview papers on methodologies posted on class wiki Wednesday: Epigenetics and the histone code