DNA:chromatin interactions

Slides:



Advertisements
Similar presentations
Methods to read out regulatory functions
Advertisements

Section D: Chromosome StructureYang Xu, College of Life Sciences Section D Prokaryotic and Eukaryotic Chromosome Structure D1 Prokaryotic Chromosome Structure.
DNA:chromatin interactions
Detecting DNA-protein Interactions Xinghua Lu Dept Biomedical Informatics BIOST 2055.
Analysis of ChIP-Seq Data
Gene Regulation in Eukaryotes Same basic idea, but more intricate than in prokaryotes Why? 1.Genes have to respond to both environmental and physiological.
[BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos.
Epigenomics Exploring transcription factor binding and the epigenomic landscape Saurabh Sinha (several slides here are courtesy of Lisa Stubbs)
Organization of DNA Within a Cell from Lodish et al., Molecular Cell Biology, 6 th ed. Fig meters of DNA is packed into a 10  m diameter cell.
Mapping protein-DNA interactions by ChIP-seq Zsolt Szilagyi Institute of Biomedicine.
An Introduction to ENCODE Mark Reimers, VIPBG (borrowing heavily from John Stamatoyannopoulos and the ENCODE papers)
Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)
Vidyadhar Karmarkar Genomics and Bioinformatics 414 Life Sciences Building, Huck Institute of Life Sciences.
ChIP-chip Data. DNA-binding proteins Constitutive proteins (mostly histones) –Organize DNA –Regulate access to DNA –Have many modifications Acetylation,
I519 Introduction to Bioinformatics, Fall, 2012
Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in.
 CHANGE!! MGL Users Group meetings will now be on the 1 st Monday of each month 3:00-4:00 Room Note the change of time and room.
Analysis of ChIP-Seq Data Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.
CS173 Lecture 9: Transcriptional regulation III
Biol 456/656 Molecular Epigenetics Lecture #5 Wed. Sept 2, 2015.
Enhancers and 3D genomics Noam Bar RESEARCH METHODS IN COMPUTATIONAL BIOLOGY.
DNA:chromatin interactions Exploring transcription factor binding and the epigenomic landscape Lisa Stubbs.
Canadian Bioinformatics Workshops
YOUR FUTURE STARTS WITH HOPE YOUR FUTURE STARTS WITH HOPE Genome Biology & Applied Bioinformatics Human Genome Mehmet Tevfik DORAK, MD PhD.
Gene Regulation, Part 2 Lecture 15 (cont.) Fall 2008.
Gene Expression: Prokaryotes and Eukaryotes AP Biology Ch 18.
Additional high-throughput sequencing techniques (finding all functional elements of genome) June 15, 2017.
DNA:chromatin interactions
Il principio della ChIP: arricchimento selettivo della frazione di cromatina contenente una specifica proteina La ChIP può anche esser considerata.
Biotechnology.
High-throughput data used in bioinformatics
The Transcriptional Landscape of the Mammalian Genome
Epigenetics Continued
Organization of DNA Within a Cell
Gene Regulation and Expression
Controlling the genes Lecture 15 pp
Eukaryotic Genome Control Mechanisms for Gene Expression
Control of Gene Expression in Eukaryotes
Figure 1. Annotation and characterization of genomic target of p63 in mouse keratinocytes (MK) based on ChIP-Seq. (A) Scatterplot representing high degree.
Chapter 15 Controls over Genes.
Regulation of Gene Expression by Eukaryotes
Molecular Mechanisms of Gene Regulation
Eukaryote Gene Expression/Regulation
Control of eukaryotic gene expression
Eukaryotic Gene Expression
Simon v ChIP-Seq Analysis Simon v
Concept 18.2: Eukaryotic gene expression can be regulated at any stage
Gene Regulation.
Chapter 18: Regulation of Gene Expression
Organization of DNA Within a Cell
Agenda 3/16 Eukaryotic Control Introduction and Reading
BS222 – Genome Science Lecture 8
High-Resolution Profiling of Histone Methylations in the Human Genome
Review Warm-Up What is the Central Dogma?
Epigenetics System Biology Workshop: Introduction
Chromosome Architecture
High-Resolution Profiling of Histone Methylations in the Human Genome
Fine-Resolution Mapping of TF Binding and Chromatin Interactions
Volume 7, Issue 9, Pages (September 2014)
Xinyang Zhao, P.Shannon Pendergrast, Nouria Hernandez  Molecular Cell 
Control of the Embryonic Stem Cell State
Fine-Resolution Mapping of TF Binding and Chromatin Interactions
Evolution of Alu Elements toward Enhancers
Volume 132, Issue 2, Pages (January 2008)
Adam C. Wilkinson, Hiromitsu Nakauchi, Berthold Göttgens  Cell Systems 
By Wenfei Jin Presenter: Peter Kyesmu
Eukaryotic Gene Regulation
Eukaryotic genomes are complex 3D structures comprised of modified and unmodified DNA, RNA and many types of interacting proteins Most DNA is wrapped around.
Transcriptional Regulation by p53 through Intrinsic DNA/Chromatin Binding and Site- Directed Cofactor Recruitment  Joaquin M Espinosa, Beverly M Emerson 
Chromatin basics & ChIP-seq analysis
Presentation transcript:

DNA:chromatin interactions Exploring transcription factor binding and the epigenomic landscape Chris Seward

Introductions Cell and Developmental Biology PhD Candidate in Dr. Lisa Stubbs’ Laboratory Currently looking for Post-Doc! seward2@Illinois.edu Molecular Roots of the Social Brain Project Dr. Lisa Stubbs - Mouse Dr. Gene Robinson – Honeybee Dr. Alison Bell – Stickleback Dr. Saurabh Sinha – Computer Science Dr. Dave Zhao – Statistics Project seeks to understand the genomic and molecular response to social stimulus across social species

Genomics in Social Research Social Stimulus Animal Behavior Behavioral Testing and Scoring Molecular Response Transcription Factor Activation Epigenetics and motif analysis Genetic response Differential Gene Expression RNASeq and qPCR Develop-ment Developmental regulators Epigenetics and motif Analysis Genomics in Social Research Research Methods

Why Epigenomics? RNAseq can show you changes in gene expression across the genome, but how do you know what caused those changes? If RNAseq shows a Transcription Factor changes expression, what is it doing? Motif analysis of promoter regions works for identifying probable known regulators, but what if the regulator is unknown? What if the regulators bind far away from the promoter? What about more advanced regulatory mechanisms? Solution: Epigenomics

Outline Epigenomics background ChIP experimental design ChIP bioinformatics Other chromatin assays Higher order chromatin assays

Eukaryotic genomes are complex structures comprised of modified and unmodified DNA, RNA and many types of interacting proteins Most DNA is wrapped around a “histone core”, to form nucleosomes The classical histone protein complexes bind very tightly to DNA and prevent association with other proteins Modifications of the classical histones, or their replacement with unusual histone types under certain conditions, can “loosen” the interaction with DNA, allowing access to transcription factors, RNA polymerase, and other proteins

Histone Modifications All four histones in the tetramer have “tails” that can be modified in various ways, but the most consequential modifications, with respect to transcriptional activity, appear to involve methylation or acetylation of Lysines (K) in histone H3

Histone H3 modifications, especially methylation and acetylation, mark “open” or “closed” DNA CLOSED: Histones bound more tightly to DNA H3K27Me3, H3K9Me3 OPEN: Histones can be displaced by TFs, RNA Polymerase, and other proteins H3K27Ac, H3K4me1, H3K4me3 Histone marks, together with other assays of open chromatin, are presently the only reliable indicators of the locations and activities of regulatory elements

Many types of regulatory elements Gene transcribed Promoter Binding Factors: TFs, TATA binding factors, and other site-specific binders Recruit additional proteins: co-factors, RNA polymerase and others Enhancers: Tissue-specific activators of transcription Binding sites for proteins that interact with the promoter to enhance transcription Silencers: Also prevalent, but more difficult to detect and assay Many transcription factors repress, rather than enhance, gene expression “Enhancers” and “Silencers” are not mutually exclusive! Most regulatory elements can serve either function, depending on the proteins bound at a particular time Insulators: “boundary elements” that shield genes from enhancers or heterochromatin proteins in neighboring gene “territories” Involved in establishing loop structures that isolate genes

How to find them? Chromatin ImmunoPrecipitation (ChIP) Antibody to a DNA binding protein is used to “fish out” DNA bound to the protein in a cell DNA and protein are crosslinked in the cell using brief treatment with low concentration of high quality formaldehyde Crosslinked chromatin is sheared, usually by sonication, to yield short fragments of DNA+protein complexes Antibody to a TF or other binding protein used to fish out fragments containing that DNA binding protein DNA is then “released” and can be analyzed by various methods: PCR, microarray, sequencing Creates a pool of sequences highly enriched in binding sites for a particular protein

ChIP can be used to map DNA:protein interactions of virtually any type Histone Modifications RNA polymerase and elongation factors, to find promoters and active sites of transcription Proteins involved in DNA recombination, repair, and replication DNA Methylation Proteins

ChIP can be used to map DNA:protein interactions of virtually any type Secondary interactions (no direct linkage to DNA) Histone modifying proteins, such as SWI/SNF, histone deacetylases, histone methylases Cofactors that bind to TFs at particular sites, and that stabilize chromatin loops Proteins that link chromatin to nuclear matrix or envelope All of these methods require highly specific and efficient antibodies (which are rare!) Nuclear Matrix Envelope Loop Structures

Outline Epigenomics background ChIP experimental design ChIP bioinformatics Other chromatin assays Higher order chromatin assays

ChIP Antibodies ENCODE maintains lists of ChIP validated antibodies – good starting point Otherwise, validate yourself! IHC showing nuclear protein Western blot showing a single clear band ChIP PCR for a known binding site sequence ChIP-seq followed by motif analysis Try to find higher concentration αBs

Best Practices for ChIP Experimental Design Requires 1-2 million cells per IP for common Histone Modifications Requires 5-10 million cells for most transcription factors Other methods like ATAC and ChIPmentation can reduce cell # requirements (more later) Biological replicates are great! (But technical replicate IPs alone are generally accepted) Frozen tissue may have significantly less yields than tissue freshly collected, fixed, and reduced to nuclei. Fixed, washed nuclei can be stored for years at -80°C

ChIP Sequencing Criteria Many ChIP-seq samples can be sequenced per Illumina lane Typically 50bp+ unpaired sequencing is enough Depth needed dependent on genome size, quality, sequencer Always sequence technical replicate IP samples Always sequence an “input” background sample without antibody for each tissue This means 3 sequenced items per “sample” Species Genome Size # reads / IP HiSeq 2500 Hiseq 4000 Human 3.2 Gb 20m+ ~10 ~20 Mouse 2.8 Gb 15m+ ~13 ~24 Stickleback Fish 500 Mb 10m+ ~18 ~30 Honeybee 250 Mb 7m+ ~40

Outline Epigenomics background ChIP experimental design ChIP bioinformatics Other chromatin assays Higher order chromatin assays

ChIP Bioinformatics Pipeline Align reads to genome Look at your data! Call peaks Look at your data again! Annotate peaks / Gene Ontology Identify differential peaks? Identify Co-binding? Motif analysis?

ChIP Bioinformatics Sonicated fragment Binding site Ends sequenced First step is to map reads: BOWTIE, Novalign, BWA or other ChIP seq reads surround but may not contain the DNA binding site Sequence is generated from the ends of randomly sheared fragments, which overlap at the protein binding site Gives rise to two adjacent sets of read peaks separated by ~ 2X fragment length (~500bp) Defines a “shift” distance between read peaks at which you will find the true ChIP peak summit (~200bp) Programs like MACS and HOMER automatically subtract your control (genomic input) from sample reads to define a final set of peaks

ChIP Analytical challenges Genomic Background Shear efficiency is not really “random” Some genomic regions are fragile and sensitive Some regions are protected from shear or degradation Other artifacts Centromeres: repeat sequences that are not all represented in the genome sequence build Polymorphic regions, such as regions modified in cell line DNA Repeats: most programs cannot manage sequence reads that are not mapped uniquely Peak width Transcription factors are typically sharp ~200bp peaks; chromatin marks are more diffuse If planning to call differential peaks, peak width should be locked between samples

Traditional methods fail with broad, flat peaks Most tools designed for TF proteins: isolated, sharp peaks Certain chromatin proteins, and modified histones in certain regions, bind continuously to large regions of chromatin and do not yield “peaks” MACS in default mode will carve the “mesa” into many peaks, or not detect it at all New settings in MACS 2 can be set to overcome this problem HOMER has a wide variety of settings ideal for data of different types

Look At Your Data! Once you have aligned reads and putative peaks called, it is important to actually look at your data and see if it looks believable. Peak calling software will often still call peaks from failed experiments! IGV, UCSC Genome Browser, Galaxy Track Browser are great tools for experimental validation

Looking at your Data Are your peaks enriched over the background? RNA H3k27ac H3k27me3 H3k4me3 H3k4me1 Annotation Peaks False Peak! ?? ✔ Are your peaks enriched over the background? Do your technical replicates look similar? Are your peaks associated with genes? Are your peaks similar to existing data sets? UCSC Genome Browser Amazing resource of epigenomics data and genome annotations Visualize your ChIP data on UCSC to compare to existing tracks Can set up a public/private “track hub” for publication of your data

Differential Peaks Identifying when peak sets have changed between conditions can be tricky For comparisons with lots of (+/-) peak changes, just subtract the peak-sets to find new or missing peaks in one sample To spot changes in peak magnitude (+ / ++), more advanced methods are required Most involve re-calling peaks using experimental sample as the IP and using another sample IP as the input. HOMER has more advanced differential peak finding mechanisms and can utilize biological replicates Linking of differential regulatory peaks to differential genes interrogates each step of a biological process

Co-binding factor finding Some experiments may want to identify positions where two factors both bind at one location Example: TCF4 is a repressor when bound alone, or an activator when bound together with Beta-catenin Example: H3k4me3+h3k27ac dual peak indicates active promoters Galaxy intersect tool, bedtools intersect, or HOMER mergePeaks can identify these co-bound sites

Peak Annotation ? Now that we have peaks, what do they mean? Many peaks intersect promoters directly, but some may by 100kB+ from the nearest gene Different interpretations may lead to different conclusions Typical promoter region (mouse) -5kb/+2kb Typical regulatory domain +/- 100kb GREAT genome tool is a good place to start for Human, Mouse, Zebrafish Identifies nearest genes and performs Gene Ontology Analysis HOMER has advanced peak annotation scripts: annotatePeaks.pl ?

Motif-finding Differential peak sets can be submitted for motif analysis to find enriched motifs TF ChIP Motif scanning can reveal novel binding sites or validate ChIP results with known binding sites Histone ChIP motif scanning in differential peaks can identify the active regulatory proteins responsible HOMER and MEME-ChIP are great ChIP motif finders Covered in detail this afternoon

Outline Epigenomics background ChIP experimental design ChIP bioinformatics Other chromatin assays Higher order chromatin assays

DNAse sensitivity assays are antibody free The first approach: from Crawford, 2006 (Francis Collins’ laboratory) Digest with DNAse I to “erase” all the hypersensitive regions Polish and ligate the remaining double-strand ends Ligate 5’-biotinylated linkers to the DS ends Shear (sonicate) or restriction-digest DNA into smaller fragments Purify end sequences on a streptavidin column Release sequences, add new linkers, and sequence Does not allow footprinting, because TF binding sites inside the HS regions have been digested away

Latest (and better) approach: sequences DNAse sensitive regions per se and permits transcription factor “Footprinting” The easiest method uses low concentrations of Dnase I to generate short fragments at sensitive (“open) sites Released fragments can be blunt-ended, ligated to linkers and sequenced directly Permits DNase Footprinting: Very deep sequencing can “see” short protected regions that are absent from the released DNA, and appear as protected “valleys” inside the DNAse sensitive peaks protected from DNAse I because they are occupied by TF proteins

Related methods and twists on the theme (see Furey et al Related methods and twists on the theme (see Furey et al., 2012 for review) Exo-ChIP Follows sonication with an exonuclease step, to “pare back” all but the protein-protected region in ChIP “Nano-ChIP” Methods ChIP normally required ~107 cells as input; hard to achieve for many cell types Nano ChIP can be carried out in several ways: With carrier DNA: not the best for sequence analysis but can be done Amplification after ChIP: very tricky because it can cause serious biases and artifacts, but can be done with care; linear amplification is the best strategy ATAC and Tagmentation: a new method that creates libraries directly by transposon insertion The problem is library preparation, which needs a minimal amount of input for success

ATAC-seq and Tagmentation Uses transposase that has been modified to insert Illumina sequencing primers On untreated DNA, prefers to insert in open chromatin and is known as ATAC-seq (Greenleaf, 2013) Needs to be done on freshly collected tissue

ChIP Tagmentation Tagmentation can also be used to insert sequencing tags into immunoprecipitated DNA after or during ChIP (Schmidl, 2015) This allows you to make ChIP libraries from very small numbers of cells 50,000 or fewer!

Tagmentation Bioinformatics Bioinformatics pipeline from tagmented samples is similar to ChIP, but may require read trimming to remove transposon/index contamination Due to increased PCR amplification / bias in tagmentation vs traditional ChIP, it may be difficult to compare peak magnitudes Biological replicates strongly recommended

DNA Methylation Assays Bisulphite Sequencing (Review Li, 2011) Treatment of DNA with bisulphite followed by PCR amplification allows detection of methylated regions Requires very deep sequencing ($$$) Methyl-DIP (Weber, 2005) Enriches for methylcytosine using antibodies like ChIP Analysis is identical to standard ChIP Methyl-Binding-Domain Capture (Review Nair, 2011) Uses beads coated with MBD protein that bind methylated DNA directly Analysis is also identical to standard ChIP

Outline Epigenomics background ChIP experimental design ChIP bioinformatics Other chromatin assays Higher order chromatin assays

Back to the nucleus: Distant regulatory elements interact with promoters (and each other) through long-range chromatin loops Regulatory elements are essentially “docking sites” for specific types of DNA-binding proteins Transcription factors, TATA-binding factors, and others These proteins serve to attract co-factors, which then mediate protein: protein interactions across chromatin loops Very long range interactions are common in vertebrates, less so in invertebrate species with lower coding:nocoding ratios ChIP with an antibody that binds to “E” DNA will bring down “P” DNA as well Proteins are crosslinked very efficiently to each other, as well as to DNA, by formaldehyde treatment When crosslinking is reversed the complex falls apart, and both DNA fragments are released independently Only one sequence binds to the TF! Common issue in analysis of ChIP TF Shear chromatin (Sonication or restriction enzyme) TF

Chromatin conformation capture methods can identify these loop-linked sequences Ends of the co-captured DNA fragments are ligated while still captured on the antibody- bead with protein complex DNA is released and can be Queried by PCR for enrichment of suspected candidate interactors Circularized and PCR amplified using a primer from a “bait” region (4C) Directly sequenced for all X all interactions (5C, Hi-C, Chia-PET) Issue include random co-ligation between fragments that are not truly connected in the cell Over-crosslinking, which may join sequences that are nearby, incorrectly Provides a view of 3-D chromatin architecture, especially important for mammalian cells From Wikipedia

Genomics Methods Summary RNA RNA-Seq Gro-Seq DNA Variant Calling Chromatin State DNase + ATAC Histone Modifications Histone ChIP Transcription Factors TF ChIP Chromatin Structure 3C + 4C + HiC DNA Methylation Bisulphite MBP/MethylDIP

Applied Genomics Social Stimulus Animal Behavior Behavioral Testing and Scoring Molecular Response Transcription Factor Activation Epigenetics and motif analysis Genetic response Differential Gene Expression RNASeq and qPCR Develop-ment Developmental regulators Epigenetics and motif Analysis Applied Genomics Research Methods Research Findings Nuclear Receptor TFs enter nucleus, bind DNA Neuro-transmitters, hormonal signaling Dormant Developmental Pathways activated, social learning Animals are stressed! Saul*, Seward* et al, Genome Research, 2017

Questions?