Download presentation
Presentation is loading. Please wait.
1
DNA:chromatin interactions
Exploring transcription factor binding and the epigenomic landscape Chris Seward
2
Introductions Cell and Developmental Biology PhD Candidate in Dr. Lisa Stubbs’ Laboratory Currently looking for Post-Doc! Molecular Roots of the Social Brain Project Dr. Lisa Stubbs - Mouse Dr. Gene Robinson – Honeybee Dr. Alison Bell – Stickleback Dr. Saurabh Sinha – Computer Science Dr. Dave Zhao – Statistics Project seeks to understand the genomic and molecular response to social stimulus across social species
3
Genomics in Social Research
Social Stimulus Animal Behavior Behavioral Testing and Scoring Molecular Response Transcription Factor Activation Epigenetics and motif analysis Genetic response Differential Gene Expression RNASeq and qPCR Develop-ment Developmental regulators Epigenetics and motif Analysis Genomics in Social Research Research Methods
4
Why Epigenomics? RNAseq can show you changes in gene expression across the genome, but how do you know what caused those changes? If RNAseq shows a Transcription Factor changes expression, what is it doing? Motif analysis of promoter regions works for identifying probable known regulators, but what if the regulator is unknown? What if the regulators bind far away from the promoter? What about more advanced regulatory mechanisms? Solution: Epigenomics
5
Outline Epigenomics background ChIP experimental design
ChIP bioinformatics Other chromatin assays Higher order chromatin assays
6
Eukaryotic genomes are complex structures comprised of modified and unmodified DNA, RNA and many types of interacting proteins Most DNA is wrapped around a “histone core”, to form nucleosomes The classical histone protein complexes bind very tightly to DNA and prevent association with other proteins Modifications of the classical histones, or their replacement with unusual histone types under certain conditions, can “loosen” the interaction with DNA, allowing access to transcription factors, RNA polymerase, and other proteins
7
Histone Modifications
All four histones in the tetramer have “tails” that can be modified in various ways, but the most consequential modifications, with respect to transcriptional activity, appear to involve methylation or acetylation of Lysines (K) in histone H3
8
Histone H3 modifications, especially methylation and acetylation, mark “open” or “closed” DNA
CLOSED: Histones bound more tightly to DNA H3K27Me3, H3K9Me3 OPEN: Histones can be displaced by TFs, RNA Polymerase, and other proteins H3K27Ac, H3K4me1, H3K4me3 Histone marks, together with other assays of open chromatin, are presently the only reliable indicators of the locations and activities of regulatory elements
9
Many types of regulatory elements
Gene transcribed Promoter Binding Factors: TFs, TATA binding factors, and other site-specific binders Recruit additional proteins: co-factors, RNA polymerase and others Enhancers: Tissue-specific activators of transcription Binding sites for proteins that interact with the promoter to enhance transcription Silencers: Also prevalent, but more difficult to detect and assay Many transcription factors repress, rather than enhance, gene expression “Enhancers” and “Silencers” are not mutually exclusive! Most regulatory elements can serve either function, depending on the proteins bound at a particular time Insulators: “boundary elements” that shield genes from enhancers or heterochromatin proteins in neighboring gene “territories” Involved in establishing loop structures that isolate genes
10
How to find them? Chromatin ImmunoPrecipitation (ChIP)
Antibody to a DNA binding protein is used to “fish out” DNA bound to the protein in a cell DNA and protein are crosslinked in the cell using brief treatment with low concentration of high quality formaldehyde Crosslinked chromatin is sheared, usually by sonication, to yield short fragments of DNA+protein complexes Antibody to a TF or other binding protein used to fish out fragments containing that DNA binding protein DNA is then “released” and can be analyzed by various methods: PCR, microarray, sequencing Creates a pool of sequences highly enriched in binding sites for a particular protein
11
ChIP can be used to map DNA:protein interactions of virtually any type
Histone Modifications RNA polymerase and elongation factors, to find promoters and active sites of transcription Proteins involved in DNA recombination, repair, and replication DNA Methylation Proteins
12
ChIP can be used to map DNA:protein interactions of virtually any type
Secondary interactions (no direct linkage to DNA) Histone modifying proteins, such as SWI/SNF, histone deacetylases, histone methylases Cofactors that bind to TFs at particular sites, and that stabilize chromatin loops Proteins that link chromatin to nuclear matrix or envelope All of these methods require highly specific and efficient antibodies (which are rare!) Nuclear Matrix Envelope Loop Structures
13
Outline Epigenomics background ChIP experimental design
ChIP bioinformatics Other chromatin assays Higher order chromatin assays
14
ChIP Antibodies ENCODE maintains lists of ChIP validated antibodies – good starting point Otherwise, validate yourself! IHC showing nuclear protein Western blot showing a single clear band ChIP PCR for a known binding site sequence ChIP-seq followed by motif analysis Try to find higher concentration αBs
15
Best Practices for ChIP Experimental Design
Requires 1-2 million cells per IP for common Histone Modifications Requires 5-10 million cells for most transcription factors Other methods like ATAC and ChIPmentation can reduce cell # requirements (more later) Biological replicates are great! (But technical replicate IPs alone are generally accepted) Frozen tissue may have significantly less yields than tissue freshly collected, fixed, and reduced to nuclei. Fixed, washed nuclei can be stored for years at -80°C
16
ChIP Sequencing Criteria
Many ChIP-seq samples can be sequenced per Illumina lane Typically 50bp+ unpaired sequencing is enough Depth needed dependent on genome size, quality, sequencer Always sequence technical replicate IP samples Always sequence an “input” background sample without antibody for each tissue This means 3 sequenced items per “sample” Species Genome Size # reads / IP HiSeq 2500 Hiseq 4000 Human 3.2 Gb 20m+ ~10 ~20 Mouse 2.8 Gb 15m+ ~13 ~24 Stickleback Fish 500 Mb 10m+ ~18 ~30 Honeybee 250 Mb 7m+ ~40
17
Outline Epigenomics background ChIP experimental design
ChIP bioinformatics Other chromatin assays Higher order chromatin assays
18
ChIP Bioinformatics Pipeline
Align reads to genome Look at your data! Call peaks Look at your data again! Annotate peaks / Gene Ontology Identify differential peaks? Identify Co-binding? Motif analysis?
19
ChIP Bioinformatics Sonicated fragment Binding site Ends sequenced First step is to map reads: BOWTIE, Novalign, BWA or other ChIP seq reads surround but may not contain the DNA binding site Sequence is generated from the ends of randomly sheared fragments, which overlap at the protein binding site Gives rise to two adjacent sets of read peaks separated by ~ 2X fragment length (~500bp) Defines a “shift” distance between read peaks at which you will find the true ChIP peak summit (~200bp) Programs like MACS and HOMER automatically subtract your control (genomic input) from sample reads to define a final set of peaks
20
ChIP Analytical challenges
Genomic Background Shear efficiency is not really “random” Some genomic regions are fragile and sensitive Some regions are protected from shear or degradation Other artifacts Centromeres: repeat sequences that are not all represented in the genome sequence build Polymorphic regions, such as regions modified in cell line DNA Repeats: most programs cannot manage sequence reads that are not mapped uniquely Peak width Transcription factors are typically sharp ~200bp peaks; chromatin marks are more diffuse If planning to call differential peaks, peak width should be locked between samples
21
Traditional methods fail with broad, flat peaks
Most tools designed for TF proteins: isolated, sharp peaks Certain chromatin proteins, and modified histones in certain regions, bind continuously to large regions of chromatin and do not yield “peaks” MACS in default mode will carve the “mesa” into many peaks, or not detect it at all New settings in MACS 2 can be set to overcome this problem HOMER has a wide variety of settings ideal for data of different types
22
Look At Your Data! Once you have aligned reads and putative peaks called, it is important to actually look at your data and see if it looks believable. Peak calling software will often still call peaks from failed experiments! IGV, UCSC Genome Browser, Galaxy Track Browser are great tools for experimental validation
23
Looking at your Data Are your peaks enriched over the background?
RNA H3k27ac H3k27me3 H3k4me3 H3k4me1 Annotation Peaks False Peak! ?? ✔ Are your peaks enriched over the background? Do your technical replicates look similar? Are your peaks associated with genes? Are your peaks similar to existing data sets? UCSC Genome Browser Amazing resource of epigenomics data and genome annotations Visualize your ChIP data on UCSC to compare to existing tracks Can set up a public/private “track hub” for publication of your data
24
Differential Peaks Identifying when peak sets have changed between conditions can be tricky For comparisons with lots of (+/-) peak changes, just subtract the peak-sets to find new or missing peaks in one sample To spot changes in peak magnitude (+ / ++), more advanced methods are required Most involve re-calling peaks using experimental sample as the IP and using another sample IP as the input. HOMER has more advanced differential peak finding mechanisms and can utilize biological replicates Linking of differential regulatory peaks to differential genes interrogates each step of a biological process
25
Co-binding factor finding
Some experiments may want to identify positions where two factors both bind at one location Example: TCF4 is a repressor when bound alone, or an activator when bound together with Beta-catenin Example: H3k4me3+h3k27ac dual peak indicates active promoters Galaxy intersect tool, bedtools intersect, or HOMER mergePeaks can identify these co-bound sites
26
Peak Annotation ? Now that we have peaks, what do they mean?
Many peaks intersect promoters directly, but some may by 100kB+ from the nearest gene Different interpretations may lead to different conclusions Typical promoter region (mouse) -5kb/+2kb Typical regulatory domain +/- 100kb GREAT genome tool is a good place to start for Human, Mouse, Zebrafish Identifies nearest genes and performs Gene Ontology Analysis HOMER has advanced peak annotation scripts: annotatePeaks.pl ?
27
Motif-finding Differential peak sets can be submitted for motif analysis to find enriched motifs TF ChIP Motif scanning can reveal novel binding sites or validate ChIP results with known binding sites Histone ChIP motif scanning in differential peaks can identify the active regulatory proteins responsible HOMER and MEME-ChIP are great ChIP motif finders Covered in detail this afternoon
28
Outline Epigenomics background ChIP experimental design
ChIP bioinformatics Other chromatin assays Higher order chromatin assays
29
DNAse sensitivity assays are antibody free
The first approach: from Crawford, 2006 (Francis Collins’ laboratory) Digest with DNAse I to “erase” all the hypersensitive regions Polish and ligate the remaining double-strand ends Ligate 5’-biotinylated linkers to the DS ends Shear (sonicate) or restriction-digest DNA into smaller fragments Purify end sequences on a streptavidin column Release sequences, add new linkers, and sequence Does not allow footprinting, because TF binding sites inside the HS regions have been digested away
30
Latest (and better) approach: sequences DNAse sensitive regions per se and permits transcription factor “Footprinting” The easiest method uses low concentrations of Dnase I to generate short fragments at sensitive (“open) sites Released fragments can be blunt-ended, ligated to linkers and sequenced directly Permits DNase Footprinting: Very deep sequencing can “see” short protected regions that are absent from the released DNA, and appear as protected “valleys” inside the DNAse sensitive peaks protected from DNAse I because they are occupied by TF proteins
31
Related methods and twists on the theme (see Furey et al
Related methods and twists on the theme (see Furey et al., 2012 for review) Exo-ChIP Follows sonication with an exonuclease step, to “pare back” all but the protein-protected region in ChIP “Nano-ChIP” Methods ChIP normally required ~107 cells as input; hard to achieve for many cell types Nano ChIP can be carried out in several ways: With carrier DNA: not the best for sequence analysis but can be done Amplification after ChIP: very tricky because it can cause serious biases and artifacts, but can be done with care; linear amplification is the best strategy ATAC and Tagmentation: a new method that creates libraries directly by transposon insertion The problem is library preparation, which needs a minimal amount of input for success
32
ATAC-seq and Tagmentation
Uses transposase that has been modified to insert Illumina sequencing primers On untreated DNA, prefers to insert in open chromatin and is known as ATAC-seq (Greenleaf, 2013) Needs to be done on freshly collected tissue
33
ChIP Tagmentation Tagmentation can also be used to insert sequencing tags into immunoprecipitated DNA after or during ChIP (Schmidl, 2015) This allows you to make ChIP libraries from very small numbers of cells 50,000 or fewer!
34
Tagmentation Bioinformatics
Bioinformatics pipeline from tagmented samples is similar to ChIP, but may require read trimming to remove transposon/index contamination Due to increased PCR amplification / bias in tagmentation vs traditional ChIP, it may be difficult to compare peak magnitudes Biological replicates strongly recommended
35
DNA Methylation Assays
Bisulphite Sequencing (Review Li, 2011) Treatment of DNA with bisulphite followed by PCR amplification allows detection of methylated regions Requires very deep sequencing ($$$) Methyl-DIP (Weber, 2005) Enriches for methylcytosine using antibodies like ChIP Analysis is identical to standard ChIP Methyl-Binding-Domain Capture (Review Nair, 2011) Uses beads coated with MBD protein that bind methylated DNA directly Analysis is also identical to standard ChIP
36
Outline Epigenomics background ChIP experimental design
ChIP bioinformatics Other chromatin assays Higher order chromatin assays
37
Back to the nucleus: Distant regulatory elements interact with promoters (and each other) through long-range chromatin loops Regulatory elements are essentially “docking sites” for specific types of DNA-binding proteins Transcription factors, TATA-binding factors, and others These proteins serve to attract co-factors, which then mediate protein: protein interactions across chromatin loops Very long range interactions are common in vertebrates, less so in invertebrate species with lower coding:nocoding ratios ChIP with an antibody that binds to “E” DNA will bring down “P” DNA as well Proteins are crosslinked very efficiently to each other, as well as to DNA, by formaldehyde treatment When crosslinking is reversed the complex falls apart, and both DNA fragments are released independently Only one sequence binds to the TF! Common issue in analysis of ChIP TF Shear chromatin (Sonication or restriction enzyme) TF
38
Chromatin conformation capture methods can identify these loop-linked sequences
Ends of the co-captured DNA fragments are ligated while still captured on the antibody- bead with protein complex DNA is released and can be Queried by PCR for enrichment of suspected candidate interactors Circularized and PCR amplified using a primer from a “bait” region (4C) Directly sequenced for all X all interactions (5C, Hi-C, Chia-PET) Issue include random co-ligation between fragments that are not truly connected in the cell Over-crosslinking, which may join sequences that are nearby, incorrectly Provides a view of 3-D chromatin architecture, especially important for mammalian cells From Wikipedia
39
Genomics Methods Summary
RNA RNA-Seq Gro-Seq DNA Variant Calling Chromatin State DNase + ATAC Histone Modifications Histone ChIP Transcription Factors TF ChIP Chromatin Structure 3C + 4C + HiC DNA Methylation Bisulphite MBP/MethylDIP
40
Applied Genomics Social Stimulus Animal Behavior
Behavioral Testing and Scoring Molecular Response Transcription Factor Activation Epigenetics and motif analysis Genetic response Differential Gene Expression RNASeq and qPCR Develop-ment Developmental regulators Epigenetics and motif Analysis Applied Genomics Research Methods Research Findings Nuclear Receptor TFs enter nucleus, bind DNA Neuro-transmitters, hormonal signaling Dormant Developmental Pathways activated, social learning Animals are stressed! Saul*, Seward* et al, Genome Research, 2017
41
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.