Chromatin basics & ChIP-seq analysis

Slides:



Advertisements
Similar presentations
Methods to read out regulatory functions
Advertisements

Epigenetics Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
ChIP-seq Data Analysis
Detecting DNA-protein Interactions Xinghua Lu Dept Biomedical Informatics BIOST 2055.
Analysis of ChIP-Seq Data
NGS Analysis Using Galaxy
1 1 - Lectures.GersteinLab.org Overview of ENCODE Elements Mark Gerstein for the "ENCODE TEAM"
Mapping protein-DNA interactions by ChIP-seq Zsolt Szilagyi Institute of Biomedicine.
Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin School of Agriculture and Food Sciences, UQ, April 8, 2015.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)
I519 Introduction to Bioinformatics, Fall, 2012
Chip – Seq Peak Calling in Galaxy Lisa Stubbs Chip-Seq Peak Calling in Galaxy | Lisa Stubbs | PowerPoint by Casey Hanson.
Sackler Medical School
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
Introduction to RNAseq
Overview of ENCODE Elements
Analysis of ChIP-Seq Data Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.
Biol 456/656 Molecular Epigenetics Lecture #5 Wed. Sept 2, 2015.
Chip – Seq Peak Calling in Galaxy Lisa Stubbs Lisa Stubbs | Chip-Seq Peak Calling in Galaxy1.
HOMER – a one stop shop for ChIP-Seq analysis
Additional high-throughput sequencing techniques (finding all functional elements of genome) June 15, 2017.
ChIP-seq Robert J. Trumbly
Introductory RNA-seq Transcriptome Profiling
Epigenetics Continued
Outline of the chromatin immunoprecipitation (ChIP) technique
Cancer Genomics Core Lab
Figure S1 A B C D E F G Long Day Hypocotyl lenght (mm)
Regulatory Genomics Lab
Figure 1. Annotation and characterization of genomic target of p63 in mouse keratinocytes (MK) based on ChIP-Seq. (A) Scatterplot representing high degree.
Chip – Seq Peak Calling in Galaxy
GE3M25: Data Analysis, Class 4
Day 5 Session 29: Questions and follow-up…. James C. Fleet, PhD
Next Generation Sequencing analysis
Many Sample Size and Power Calculators Exist On-Line
Prediction of Regulatory Elements for Non-Model Organisms Rachita Sharma, Patricia.
DNA:chromatin interactions
Dynamic epigenetic enhancer signatures reveal key transcription factors associated with monocytic differentiation states by Thu-Hang Pham, Christopher.
Simon v ChIP-Seq Analysis Simon v
BS222 – Genome Science Lecture 8
High-Resolution Profiling of Histone Methylations in the Human Genome
BS222 – Genome Science Lecture 5
Taichi Umeyama, Takashi Ito  Cell Reports 
Epigenetics System Biology Workshop: Introduction
Transcription regulation
High-Resolution Profiling of Histone Methylations in the Human Genome
Alex M. Plocik, Brenton R. Graveley  Molecular Cell 
ChIP-seq Robert J. Trumbly
Volume 62, Issue 1, Pages (April 2016)
Volume 17, Issue 6, Pages (November 2016)
Genome-wide analysis of p53 occupancy.
Zhenhai Zhang, B. Franklin Pugh  Cell 
Regulatory Genomics Lab
Volume 63, Issue 6, Pages (September 2016)
Evolution of Alu Elements toward Enhancers
Volume 10, Issue 10, Pages (October 2017)
Volume 66, Issue 4, Pages e4 (May 2017)
Volume 39, Issue 3, Pages (September 2013)
Eukaryotic genomes are complex 3D structures comprised of modified and unmodified DNA, RNA and many types of interacting proteins Most DNA is wrapped around.
Regulatory Genomics Lab
Chip – Seq Peak Calling in Galaxy
Genetic mapping and epigenetic landscape of RUNX3 locus overlapping rs
Pantelis Hatzis, Iannis Talianidis  Molecular Cell 
Formation of the Androgen Receptor Transcription Complex
Taichi Umeyama, Takashi Ito  Cell Reports 
Identification of chromatin modifying complex recruiting H3K9 methyltransferases. a, A MEME-ChIP analysis was performed to identify the transcription factor.

REV-ERBα deficiency alters the epigenetic landscape and differentially affects clock gene expression in ILC3 subsets. REV-ERBα deficiency alters the epigenetic.
Presentation transcript:

Chromatin basics & ChIP-seq analysis BS312 – Genome Bioinformatics Lecture 5 Chromatin basics & ChIP-seq analysis Vladimir Teif

Next generation sequencing analysis

Chromatin basics -- reminder https://micro.magnet.fsu.edu/cells/nucleus/images/chromatinstructurefigure1.jpg

Transcription factor-centric view Transcription factor (TF) concentrations Protein assembly at regulatory regions Transcription start site Proteins produced (including TFs) Teif et al. (2013), Methods. 62, 26-38

Transcription factor-centric view Transcription factor (TF) concentrations Enhancer RNA polymerase: enzyme which makes RNA Promoter Proteins produced (including TFs) Teif et al. (2013), Methods. 62, 26-38

Histone modifications-centric view Turner B.M. (2005) Nature Structural & Molecular Biology, 12, 110 - 112

Histone modifications-centric view http://dev.biologists.org/content/139/6/1045

NGS METHODS AND THEIR APPLICATIONS Chromatin domains Hi-C Figure adapted from http://www.scienceinschool.org

ChIP-seq (Chromatin Immunoprecipitation followed by sequencing) 1. Crosslink Protein-DNA complexes in situ 2. Isolate nuclei and fragment DNA (sonication or digestion) 3. Immunoprecipitate with antibody against target nuclear protein and reverse crosslinks 4. Release DNA and submit for sequencing Adapted from www.VisiScience.com

MNase-seq (Micrococcal Nuclease digestion followed by sequencing) MM MNase-seq (Micrococcal Nuclease digestion followed by sequencing) MNase = Micrococcal Nuclease (enzyme that cuts DNA between nucleosomes) Teif et al. (2012), Methods, 62, 26-38

FAIRE-seq (Formaldehyde-Assisted Isolation of Regulatory Elements) sequencing Giresi et al (2007), Genome Res. 17, 877–885

DNAse-seq (DNase I digestion followed by sequencing Wang et al. (2012), PLoS ONE 7, e42414

ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing) How transposase works: https://www.youtube.com/watch?v=XYZHMGUGq6o Buenrostro et al. (2013) Nat Methods. 10, 1213-1218

Methods for 1D genome mapping MM Methods for 1D genome mapping Meyer & Liu, Nature Reviews Genetics 15, 709–721 (2014)

Methods for 1D genome mapping Tsompana and Buck, Epigenetics & Chromatin20147:33

Timeline of NGS methods Bulk methods that require many cells River and Ren (2013), Cell, 155, 39-55 Single-cell methods Hu et al, Front. Cell Dev. Biol., 2018

Where to get NGS data? Do your own experiment Gene Expression Omnibus (GEO) https://www.ncbi.nlm.nih.gov/geo Sequence read archive (SRA) https://www.ncbi.nlm.nih.gov/sra European Nucleotide Archive https://www.ebi.ac.uk/ena The Cancer Genome Atlas (TCGA) https://tcga-data.nci.nih.gov/tcga Exome Aggregation Consortium (ExAC) http://exac.broadinstitute.org/ You also have to upload your data!

How to analyze NGS data? Ask a bioinformatician you need to explain what do you want, and for that you need to understand what/how can be done Do it yourself Command line –> become a bioinformatician Online wrappers –> simpler, but file size limits Example of a convenient online tool: Galaxy http://galaxy.essex.ac.uk/

ChIP-seq (Chromatin ImmunoPrecipitation followed by sequencing) 1. Crosslink Protein-DNA complexes in situ 2. Isolate nuclei and fragment DNA (sonication or digestion) 3. Immunoprecipitate with antibody against target nuclear protein and reverse crosslinks 4. Release DNA and submit for sequencing Adapted from www.VisiScience.com

Experiment Data analysis http://www4.utsouthwestern.edu/mcdermottlab/NGS/index.html

ChIP-seq analysis workflow www.utsouthwestern.edu/labs.bioinformatics-core/analysis/chip-seq.png

NGS data after sequencing but before mapping ( NGS data after sequencing but before mapping (.fastq file aka “raw” data):

Mapping with Bowtie http://bowtie-bio.sourceforge.net/manual.shtml -v <N> Allow no more than N mismatches, where V may be a number from 0 through 3 set using the -v option. -p <N> Use N computer processors/cores in parallel -m <N> disregard reads with >N possible alignments

Guess what this command does bowtie -v 2 -p 2 -m 1 mm9 filename.fastq filename.map -v <N> Allow no more than N mismatches, where V may be a number from 0 through 3 set using the -v option. -p <N> Use N computer processors/cores in parallel -m <N> disregard reads with >N possible alignments

NGS data after mapping: .bed files (BED format) Bowtie, BWA, ELAND, Novoalign, BLAST, ClustalW TopHat (for RNA-seq)

Reads can align to overlapping locations http://biocluster.ucr.edu/~rkaundal/workshops/R_feb2016/ChIPseq/ChIPseq.html We need to count all reads at each base pair

From mapped reads to occupancy landscapes HOMER, BedTools, BamTools, NucTools Teif et al., Methods, 2012

Calculating occupancy with HOMER http://homer.ucsd.edu/homer/ngs/tagDir.html makeTagDirectory <Directory Name> [options] <alignment file>

Quality control (QC) http://homer.ucsd.edu/homer/ngs/tagDir.html

Quality control (QC) Good ChIP-seq Bad ChIP-seq http://homer.ucsd.edu/homer/ngs/tagDir.html Good ChIP-seq Bad ChIP-seq

Data view in genome browsers Jung et al., NAR 2014 UCSC Genome Browser (online) IGV (install on a local computer)

UCSC Genome Browser https://genome.ucsc.edu/

Create UCSC files with HOMER http://homer.ucsd.edu/homer/ngs/ucsc.html makeUCSCfile <tag directory> -o auto

Peak shapes can be different Park P. J., Nature Genetics, 2009

Systematic analysis requires to identify all peaks in all datasets and compare differences Badet et al. (2012) Nature Protocols, 7, 45-61

Peak calling is a method to identify areas in a genome enriched with aligned reads Wilbanks EG (2010) PLoS ONE 5, e11471.

Peak calling: finding the peaks Input: sample that was prepared in the same way as in the ChIP-seq, but no antibody was added, so it has no specific enrichment of our protein of interest Pepke et al. (2009). Nature Methods, 6, S22–S32. 

Peak calling: defining statistical significance

Peak calling: defining statistical significance MACS (good for TFs) CISER (histones, etc) HOMER (universal) PeakSeq edgeR CisGenome Is this peak statistically significant? Is this peak statistically significant? Park P. J., Nature Genetics, 2009

Finding peaks with HOMER http://homer.ucsd.edu/homer/ngs/peaks.html

Guess what this command does findPeaks ChIPDirectory -style factor -i InputDirectory We need to map our ChIP-seq and its Input (control), then create their HOMER tag directories ChIPDirectory and InputDirectory, then find peaks using both these directories. Additional optional parameters: -F <#> Enrichment ratio ChIP vs. Input (by default 4-fold) -P <#> P-value cut off (by default 0.0001

ChIP-seq: reads to peaks/regions MACS, CISER, HOMER PeakSeq, edgeR, DESeq, CisGenome

Peaks/regions in BED format pos2bed.pl peakfile.txt > peakfile.bed bed2pos.pl peakfile.bed > peakfile.txt

Intersecting genomic regions BedTools (command line) Galaxy (online)

Genomic features are also regions Mattout et al., Genome Biology, 2015

Let’s look at many similar regions Each horisontal line is one genomic region deepTools NucTools https://github.com/fidelram/deepTools/wiki/Visualizations

ChIP-seq heat maps for all genes, scaled with respect to their start (TSS) and end (TES) https://github.com/fidelram/deepTools/wiki/Visualizations

Cluster heatmaps deepTools 2.0 https://github.com/fidelram/deepTools/wiki/Visualizations

Comparing cluster heatmaps between two cell conditions NucTools

Histone modifications around TSS deepTools http://www.ie-freiburg.mpg.de/bioinformaticsfac

Motif enrichment analysis HOMER, MEME Pavlaki et al., 2017

Finding motifs with HOMER HOMER takes the coordinates of all ChIP-seq peaks, looks at the corresponding DNA sequences of each peak and finds the common consensus motifs that are encountered in many of these peaks. Then HOMER looks in a database and reports which motifs are similar to already known TF binding motifs, and which motifs are new.

http://meme-suite.org The MEME Suite is even more sophisticated and contains all tools that are needed for motif analysis

Summary of ChIP-seq analysis: Map all reads Occupancy calculation Differential peak calling Intersection of different signals Correlation of different signals Motif enrichment in peaks

HEATMAP; AGGREGATE PROFILE; GENE ONTOLOGY (GO) Take home message Raw reads -> mapping -> peak calling MUST KNOW: Where NGS data is stored (GEO, etc) ~100s types of NGS experiments; we focus on chromatin ChIp-seq data structure RAW DATA; MAPPED READS; REGIONS; SITES GENOME BROWSERS. PEAKS. PEAK CALLING HEATMAP; AGGREGATE PROFILE; GENE ONTOLOGY (GO) Optional video: https://www.youtube.com/watch?v=Ob9xGBPvr_s