Download presentation
Presentation is loading. Please wait.
1
Chromatin basics & ChIP-seq analysis
BS312 – Genome Bioinformatics Lecture 5 Chromatin basics & ChIP-seq analysis Vladimir Teif
2
Next generation sequencing analysis
3
Chromatin basics -- reminder
4
Transcription factor-centric view
Transcription factor (TF) concentrations Protein assembly at regulatory regions Transcription start site Proteins produced (including TFs) Teif et al. (2013), Methods. 62, 26-38
5
Transcription factor-centric view
Transcription factor (TF) concentrations Enhancer RNA polymerase: enzyme which makes RNA Promoter Proteins produced (including TFs) Teif et al. (2013), Methods. 62, 26-38
6
Histone modifications-centric view
Turner B.M. (2005) Nature Structural & Molecular Biology, 12,
7
Histone modifications-centric view
8
NGS METHODS AND THEIR APPLICATIONS
Chromatin domains Hi-C Figure adapted from
10
ChIP-seq (Chromatin Immunoprecipitation followed by sequencing)
1. Crosslink Protein-DNA complexes in situ 2. Isolate nuclei and fragment DNA (sonication or digestion) 3. Immunoprecipitate with antibody against target nuclear protein and reverse crosslinks 4. Release DNA and submit for sequencing Adapted from
11
MNase-seq (Micrococcal Nuclease digestion followed by sequencing)
MM MNase-seq (Micrococcal Nuclease digestion followed by sequencing) MNase = Micrococcal Nuclease (enzyme that cuts DNA between nucleosomes) Teif et al. (2012), Methods, 62, 26-38
12
FAIRE-seq (Formaldehyde-Assisted Isolation of Regulatory Elements)
sequencing Giresi et al (2007), Genome Res. 17, 877–885
13
DNAse-seq (DNase I digestion followed by sequencing
Wang et al. (2012), PLoS ONE 7, e42414
14
ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing)
How transposase works: Buenrostro et al. (2013) Nat Methods. 10,
15
Methods for 1D genome mapping
MM Methods for 1D genome mapping Meyer & Liu, Nature Reviews Genetics 15, 709–721 (2014)
16
Methods for 1D genome mapping
Tsompana and Buck, Epigenetics & Chromatin20147:33
17
Timeline of NGS methods
Bulk methods that require many cells River and Ren (2013), Cell, 155, 39-55 Single-cell methods Hu et al, Front. Cell Dev. Biol., 2018
18
Where to get NGS data? Do your own experiment
Gene Expression Omnibus (GEO) Sequence read archive (SRA) European Nucleotide Archive The Cancer Genome Atlas (TCGA) Exome Aggregation Consortium (ExAC) You also have to upload your data!
19
How to analyze NGS data? Ask a bioinformatician
you need to explain what do you want, and for that you need to understand what/how can be done Do it yourself Command line –> become a bioinformatician Online wrappers –> simpler, but file size limits Example of a convenient online tool: Galaxy
20
ChIP-seq (Chromatin ImmunoPrecipitation followed by sequencing)
1. Crosslink Protein-DNA complexes in situ 2. Isolate nuclei and fragment DNA (sonication or digestion) 3. Immunoprecipitate with antibody against target nuclear protein and reverse crosslinks 4. Release DNA and submit for sequencing Adapted from
21
Experiment Data analysis
22
ChIP-seq analysis workflow
23
NGS data after sequencing but before mapping (
NGS data after sequencing but before mapping (.fastq file aka “raw” data):
24
Mapping with Bowtie http://bowtie-bio.sourceforge.net/manual.shtml
-v <N> Allow no more than N mismatches, where V may be a number from 0 through 3 set using the -v option. -p <N> Use N computer processors/cores in parallel -m <N> disregard reads with >N possible alignments
25
Guess what this command does
bowtie -v 2 -p 2 -m 1 mm9 filename.fastq filename.map -v <N> Allow no more than N mismatches, where V may be a number from 0 through 3 set using the -v option. -p <N> Use N computer processors/cores in parallel -m <N> disregard reads with >N possible alignments
26
NGS data after mapping: .bed files (BED format)
Bowtie, BWA, ELAND, Novoalign, BLAST, ClustalW TopHat (for RNA-seq)
27
Reads can align to overlapping locations
We need to count all reads at each base pair
28
From mapped reads to occupancy landscapes
HOMER, BedTools, BamTools, NucTools Teif et al., Methods, 2012
29
Calculating occupancy with HOMER
makeTagDirectory <Directory Name> [options] <alignment file>
30
Quality control (QC)
31
Quality control (QC) Good ChIP-seq Bad ChIP-seq
Good ChIP-seq Bad ChIP-seq
32
Data view in genome browsers
Jung et al., NAR 2014 UCSC Genome Browser (online) IGV (install on a local computer)
33
UCSC Genome Browser
34
Create UCSC files with HOMER
makeUCSCfile <tag directory> -o auto
36
Peak shapes can be different
Park P. J., Nature Genetics, 2009
37
Systematic analysis requires to identify all peaks in all datasets and compare differences
Badet et al. (2012) Nature Protocols, 7, 45-61
38
Peak calling is a method to identify areas in a genome enriched with aligned reads
Wilbanks EG (2010) PLoS ONE 5, e11471.
39
Peak calling: finding the peaks
Input: sample that was prepared in the same way as in the ChIP-seq, but no antibody was added, so it has no specific enrichment of our protein of interest Pepke et al. (2009). Nature Methods, 6, S22–S32.
40
Peak calling: defining statistical significance
41
Peak calling: defining statistical significance
MACS (good for TFs) CISER (histones, etc) HOMER (universal) PeakSeq edgeR CisGenome Is this peak statistically significant? Is this peak statistically significant? Park P. J., Nature Genetics, 2009
42
Finding peaks with HOMER
43
Guess what this command does
findPeaks ChIPDirectory -style factor -i InputDirectory We need to map our ChIP-seq and its Input (control), then create their HOMER tag directories ChIPDirectory and InputDirectory, then find peaks using both these directories. Additional optional parameters: -F <#> Enrichment ratio ChIP vs. Input (by default 4-fold) -P <#> P-value cut off (by default
44
ChIP-seq: reads to peaks/regions
MACS, CISER, HOMER PeakSeq, edgeR, DESeq, CisGenome
45
Peaks/regions in BED format
pos2bed.pl peakfile.txt > peakfile.bed bed2pos.pl peakfile.bed > peakfile.txt
46
Intersecting genomic regions
BedTools (command line) Galaxy (online)
47
Genomic features are also regions
Mattout et al., Genome Biology, 2015
48
Let’s look at many similar regions
Each horisontal line is one genomic region deepTools NucTools
49
ChIP-seq heat maps for all genes, scaled with respect to their start (TSS) and end (TES)
50
Cluster heatmaps deepTools 2.0
51
Comparing cluster heatmaps between two cell conditions
NucTools
52
Histone modifications around TSS
deepTools
53
Motif enrichment analysis
HOMER, MEME Pavlaki et al., 2017
54
Finding motifs with HOMER
HOMER takes the coordinates of all ChIP-seq peaks, looks at the corresponding DNA sequences of each peak and finds the common consensus motifs that are encountered in many of these peaks. Then HOMER looks in a database and reports which motifs are similar to already known TF binding motifs, and which motifs are new.
55
The MEME Suite is even more sophisticated and contains all tools that are needed for motif analysis
56
Summary of ChIP-seq analysis:
Map all reads Occupancy calculation Differential peak calling Intersection of different signals Correlation of different signals Motif enrichment in peaks
57
HEATMAP; AGGREGATE PROFILE; GENE ONTOLOGY (GO)
Take home message Raw reads -> mapping -> peak calling MUST KNOW: Where NGS data is stored (GEO, etc) ~100s types of NGS experiments; we focus on chromatin ChIp-seq data structure RAW DATA; MAPPED READS; REGIONS; SITES GENOME BROWSERS. PEAKS. PEAK CALLING HEATMAP; AGGREGATE PROFILE; GENE ONTOLOGY (GO) Optional video:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.