Analysis of ChIP-Seq Data Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

Slides:



Advertisements
Similar presentations
Methods to read out regulatory functions
Advertisements

DNA:chromatin interactions
Epigenetics Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
20,000 GENES IN HUMAN GENOME; WHAT WOULD HAPPEN IF ALL THESE GENES WERE EXPRESSED IN EVERY CELL IN YOUR BODY? WHAT WOULD HAPPEN IF THEY WERE EXPRESSED.
Finding Transcription Factor Binding Sites BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG.
Detecting DNA-protein Interactions Xinghua Lu Dept Biomedical Informatics BIOST 2055.
Understanding the Human Genome: Lessons from the ENCODE project
Analysis of ChIP-Seq Data
Data Analysis for High-Throughput Sequencing
Organization of DNA Within a Cell from Lodish et al., Molecular Cell Biology, 6 th ed. Fig meters of DNA is packed into a 10  m diameter cell.
“An integrated encyclopedia of DNA elements in the human genome” ENCODE Project Consortium. Nature 2012 Sep 6; 489: Michael M. Hoffman University.
Molecular genetics of gene expression Mat Halter and Neal Stewart 2014.
Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers.
High-Throughput Sequencing
Organization of DNA Within a Cell from Lodish et al., Molecular Cell Biology, 6 th ed. Fig meters of DNA is packed into a 10  m diameter cell.
-The methods section of the course covers chapters 21 and 22, not chapters 20 and 21 -Paper discussion on Tuesday - assignment due at the start of class.
Mapping protein-DNA interactions by ChIP-seq Zsolt Szilagyi Institute of Biomedicine.
The Genome is Organized in Chromatin. Nucleosome Breathing, Opening, and Gaping.
SIGNAL PROCESSING FOR NEXT-GEN SEQUENCING DATA
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
ChIP-on-Chip and Differential Location Analysis Junguk Hur School of Informatics October 4, 2005.
SIGNAL PROCESSING FOR NEXT-GEN SEQUENCING DATA RNA-seq CHIP-seq DNAse I-seq FAIRE-seq Peaks Transcripts Gene models Binding sites RIP/CLIP-seq.
Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)
Vidyadhar Karmarkar Genomics and Bioinformatics 414 Life Sciences Building, Huck Institute of Life Sciences.
Proteome and interactome Bioinformatics.
ChIP-chip Data. DNA-binding proteins Constitutive proteins (mostly histones) –Organize DNA –Regulate access to DNA –Have many modifications Acetylation,
I519 Introduction to Bioinformatics, Fall, 2012
Ct log DNA ( pmol) P1P2 Supplementary Fig. S1 Standard curve of the PCR amplification efficiency of transcripts.
Sackler Medical School
EDACC Quality Characterization for Various Epigenetic Assays
Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in.
Lecture-5 ChIP-chip and ChIP-seq
DNAse Hyper-Sensitivity BNFO 602 Biological Sequence Analysis, Spring 2014 Mark Reimers, Ph.D.
MCB 317 Genetics and Genomics MCB 317 Topic 10, part 6 A Story of Transcription.
Biol 456/656 Molecular Epigenetics Lecture #5 Wed. Sept 2, 2015.
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology.
Conclusions (last lecture)
Peak Calling for ChIP-Seq data Larry Meyer UCSC Bioinformatics Dept. BME 230 January 11, 2011.
Additional high-throughput sequencing techniques (finding all functional elements of genome) June 15, 2017.
Il principio della ChIP: arricchimento selettivo della frazione di cromatina contenente una specifica proteina La ChIP può anche esser considerata.
Outline of the chromatin immunoprecipitation (ChIP) technique
Gene expression from RNA-Seq
Regulation of Gene Expression by Eukaryotes
Simon v ChIP-Seq Analysis Simon v
High-Resolution Profiling of Histone Methylations in the Human Genome
Taichi Umeyama, Takashi Ito  Cell Reports 
Protein Occupancy Landscape of a Bacterial Genome
Latent Regulatory Potential of Human-Specific Repetitive Elements
Adrien Le Thomas, Georgi K. Marinov, Alexei A. Aravin  Cell Reports 
Expression and occupancy of a set of transcription factors corresponding to the identified motifs at FAIRE peaks Expression and occupancy of a set of transcription.
High-Resolution Profiling of Histone Methylations in the Human Genome
Alex M. Plocik, Brenton R. Graveley  Molecular Cell 
Fine-Resolution Mapping of TF Binding and Chromatin Interactions
Control of the Embryonic Stem Cell State
Volume 17, Issue 6, Pages (November 2016)
Fine-Resolution Mapping of TF Binding and Chromatin Interactions
Volume 10, Issue 7, Pages (February 2015)
Volume 72, Issue 2, Pages e4 (October 2018)
Volume 63, Issue 6, Pages (September 2016)
Volume 132, Issue 2, Pages (January 2008)
Dynamic Regulation of Nucleosome Positioning in the Human Genome
Volume 64, Issue 3, Pages (November 2016)
Volume 47, Issue 4, Pages (August 2012)
Volume 63, Issue 3, Pages (August 2016)
Divergent Transcription from Active Promoters
Genomewide profiling of chromatin accessibility in prostate cancer specimens Genomewide profiling of chromatin accessibility in prostate cancer specimens.
Fig. 5 E2F1 also interacts with alternatively spliced transcripts from the MECOM gene. E2F1 also interacts with alternatively spliced transcripts from.
Taichi Umeyama, Takashi Ito  Cell Reports 
Identification of chromatin modifying complex recruiting H3K9 methyltransferases. a, A MEME-ChIP analysis was performed to identify the transcription factor.
Presentation transcript:

Analysis of ChIP-Seq Data Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers

Analysis of ChIP-Seq Data Genomic Data Analysis Course Moscow July 2013 Mark Reimers, Ph.D

What Are the Questions? Where are histone modifications? Where do TFs bind to DNA? Where do miRNAs or RNABPs bind to 3’ UTRs? How different is binding between samples?

Why ChIP-Seq? ChIP-Seq is ideal (and is now the standard method) for mapping locations where regulatory proteins bind on DNA –Typically ‘only’ 2, ,000 active binding sites with footprint ~ base pairs Similarly ChIP-Seq is fairly efficient for mapping uncommon histone modifications and for RNA Polymerase occupancy, because the genomic regions occupied are very narrow

Chromatin Immuno-Precipitation From Massie, EMBO Reports, 2008 Chromatin Immuno- Precipitation (ChIP) is a method for selecting fragments from DNA near specific proteins or specific histone modifications

Chromatin Immuno-precipitation Proteins are cross-linked to DNA by formaldehyde or by UV light NB proteins are even more linked to each other than to DNA DNA is fragmented Antibodies are introduced NB cross-linking may disrupt epitopes Antibodies are pulled out (often on magnetic beads) DNA is released and sequenced

CLIP-Seq – A Related Assay Cross-linking immuno-precipitation (CLIP)- Seq is used to map locations of RNA- binding proteins on mRNA Even miRNA binding can be mapped indirectly by CLIP-Seq with antibodies raised to Argonaute – an miRNA accessory protein

What ChIP-Seq Data Look Like From Rozowsky et al, Nature Biotech 2009

The Value of Controls: ChIP vs. Control Reads Red dots are windows containing ChIP peaks and black dots are windows containing control peaks used for FDR calculation NB. Non-specific enrichment depends on protocol Need controls for every batch run

Goals of Analysis 1.Identify genomic regions - ‘peaks’ – where TF binds or histones are modified 2.Quantify and compare levels of binding or histone modification between samples 3.Characterize the relationships among chromatin state and gene expression or splicing

General Characteristics of ChIP-Seq Data Fragments are quite large relative to binding sites of TFs ChIP-exo (ChIP followed by exonuclease treatment) can trim reads to within a smaller number of bases Histone modifications cover broader regions of DNA than TFs Histone modification measures often undulate following well-positioned nucleosomes

ChIP Reads Pile Up in ‘Peaks’ at TF Binding Sites on Alternate Strands

ChIP-Seq for Transcription Factors Typically several thousand distinct peaks across the genome Not clear how many of lower peaks represent low-affinity binding sites From Rozowsky et al, Nature Biotech 2009

ChIP-Seq for Polymerase Fine mapping of Pol2 occupancy shows peaks at 5’ and 3’ ends From Rahl et al Cell 2010

ChIP-Seq Histone Modifications Many histone modifications are over longer stretches rather than peaks May have different profiles Not clear how to compare

Issues in Analysis of ChIP-Seq Data Many false positive peaks –How to use controls in data analysis –How to count reads starting at same locus What are appropriate controls? –Naked DNA, untreated chromatin, IgG Some DNA regions are not uniquely identifiable – ‘mappability’ How to compare different samples? –Overlap between peak-finding algorithm results are often poor

Mapability Issues Many TFBS and histone modifications lie in low-complexity or repeat regions of DNA With short reads (under 75 bp), with some errors, it may not be possible to uniquely identify (map) the locus of origin of a read UCSC provides a set of mapability tracks –Select Mapping and Sequencing Tracks –Select Mapability –35, 40, 50 & 70-mer mapability (some with different error allowances)