Epigenetics System Biology Workshop: Introduction

Epigenetics System Biology Workshop: Introduction
Irina Shchukina 11/27/2018 Title

Outline A very short intro into gene expression regulation
Some NGS technologies available to study regulation Overview of a standard pipeline for ChIP-seq data processing Next: practical session with ChIP-seq data

Chromatin organization
Heterochromatin – packed, unreachable DNA (6-8) Euchromatin – generally more active regions (1–5)

Nucleosomes Nucleosomes are made of 8 histone proteins (2x H2A, H2B, H3, and H4). +H1 – linker histone ~1.65 loops of DNA include 147 nucleotides

Nucleosomes Nucleosomes are made of 8 histone proteins (2x H2A, H2B, H3, and H4). +H1 – linker histone ~1.65 loops of DNA include 147 nucleotides histones tails stick outside and can be recognized chemical modifications of histones influence DNA accessibility histone modifications are dynamic: they can be added, erased, and recognized

Nucleosomes nucleosomes are made of 8 histone proteins (2x H2A, H2B, H3, and H4). +H1 – linker histone ~1.65 loops of DNA include 147 nucleotides histones tails stick outside and can be recognized chemical modifications of histones influence DNA accessibility histone modifications can be read, erased, and recognized Examples: H3K4me1, H3K4me3, H3K27ac

Transcription factors
Proteins that binds to DNA and controls transcription of DNA into RNA

Transcription factors
Proteins that binds to DNA and controls transcription of DNA into RNA A lot of interactions with other genome regions and corresponding histone modifications are happening! Enhancer may be located very far from the gene it regulates.

Role of histone modifications
H3K27ac – distinguishes active enhancers from poised H3K27me3 – repression of transcription; has only one methyltransferase EZH2, which is a part of PRC2 H3K4me1 – enhancer mark H3K4me3 – promoters, active transcription H3K36me3 – gene body, active transcription

Chromatin immunoprecipitation: ChIP-seq

Chromatin immunoprecipitation: ChIP-seq
Need a good antibody for clean data Input sample – no specific IP, control for background noise level, required for normalization and peak calling Can be used for both transcription factors and histone modifications Different proteins → different types of data → different processing: TFs bind narrow region of DNA (5-30bp) Some histone modifications are very broad: H3K36me3 may cover entire body of actively transcribed gene (Will see some examples in the next part)

Ultra-low-input (ULI) ChIP-seq
Standard ChIP-seq protocol requires 1-5 million cells (or even more): serious limitation for multiple studies (human samples, rare populations, etc.) Ultra-low-input ChIP-seq allows you to go as low as 100,000 cells per sample. The price is data quality and consistency: higher noise variable signal to noise ratio between samples within one prep hard to process (as you will see in practice session) Major difference in protocol: no crosslinking step. Often Mnase digestion is used instead of sonication to decrease level of noise and keep DNA-protein complexes intact without crosslinking.

Publicly available data
ENCODE: Raw and processed data available. People actually care about this stuff and generated tons of data Blueprint:

Standard ChIP-seq pipeline
Raw data Alignment Peak calling Interpretation Heavy computational work Machine-time- and memory-consuming part GTAC run their standard pipeline for you Following slides have many keywords, details, names of tools and links, etc – they are for googling

Raw sequencing data: FASTQ files
4 lines per read Line ID Line 2: actual sequence Line 3: + id or any description Line 4: encoded quality of each nucleotide Raw data Alignment Peak calling Interpretation

Alignment Finds location in genome for each read Tools to use: bowtie or BWA (shouldn’t make much difference) Raw data Alignment Peak calling Interpretation

Alignment Finds location in genome for each read
Tools to use: bowtie or BWA (shouldn’t make much difference) NB: use consistent reference genome versions! hg18 → hg19 (=GRCh37) → hg38 (=GRCh38) mm8 → mm9 → mm10 If you want compare positional data: Realign one of your dataset Liftover tool from UCSC ( bin/hgLiftOver) Raw data Alignment Peak calling Interpretation

Alignment output: BAM/SAM files
Detailed format description: Set of utilities: SAMtools Raw data Alignment Peak calling Interpretation

Visualization Genome browsers (JBR, IGV, UCSC) work with bigwig (=bw) format BAM to bigwig conversion: deepTools suite, bamCoverage tool Visualization Raw data Alignment Peak calling Interpretation

Visualization Genome browsers (JBR, IGV, UCSC) work with bigwig (=bw) format BAM to bigwig conversion: deepTools suite, bamCoverage tool You can normalize data before visualization: Sequencing depth (bamCoverage 1x normalization) Input (bamCompare) Visualization Raw data Alignment Peak calling Interpretation

Visualization Different ChIP-seq data actually looks different!
Raw data Alignment Peak calling Interpretation ChIP–seq: advantages and challenges of a maturing technology. Peter Park. Nat. reviews

Peak calling The most sensitive step. Choosing appropriate tool is very important! Golden standard tools: MACS (narrow peaks: TFs and some histone modifications) SICER (broad histone modifications) Data quality significantly affects results. ULI-ChIP-seq data requires special treatment: SPAN. More on your practical session. Raw data Alignment Peak calling Interpretation

Peak calling output: BED
BED format describes genomic intervals (not specific for ChIP-seq). 3 required fields: chromosome, start, end. Other depend on the tool used Detailed description: Toolkit: bedtools Can be visualized in genome browsers together with bigwig files. Raw data Alignment Peak calling Interpretation

Quality control: ENCODE
ENCODE established a set of formal data quality standards and metrics: Raw data QC! Alignment QC! Peak calling QC! Interpretation

Quality control Other things to look at:
Raw data: FASTQC – report per one FASTQ file, MultiQC – summarizes multiple outputs into single report Aligner usually produces a report. Look for % of aligned reads, % of multimappers, etc. Can be processed with MultiQC. Peak calling: no ultimate QC metric, need to look for multiple scores. General advise: Visualize your data! Check genes that you know are bound/active/repressed/… Comparing to existing dataset may be a good sanity check Raw data QC! Alignment QC! Peak calling QC! Interpretation

Interpretation: binding motifs
Particularly useful for transcription factors – can be a part of QC routine for TFs with known binding motif. Major tool: the MEME suite ( MEME-ChIP – specialized version for ChIP-seq. Not too many sequences (e.g. 1,000 – 2,000), not too wide (~ bp) – otherwise may take forever to calculate Input is .fa file with actual nucleotide sequences. May be generated using bedtools: bedtools getfasta -fi GRCm38.genome.fa –fo peaks.seq.fa -bed peaks.bed Submit your jobs and be patient Raw data Alignment Peak calling Interpretation

Interpretation: annotation
Match peaks and genes. Possible approached: Most common way: find gene that is the closest to the peak. Why you may be losing your favorite gene from resulting list? Assign all genes located within selected range from a peak Mostly for TF: consider only peaks located near TSS (e.g. [-10kb, +3kb] around TSS) Raw data Alignment Peak calling Interpretation

Interpretation: pathway enrichment
GREAT: Choose your genome version Upload your BED file May adjust association rules Submit Result includes: Enrichment analysis against multiple pathway databases Matched peaks and genes Some positional analysis (e.g. distribution of distances from TSS)

Interpretation: differential analysis
Not established analysis. No golden tools. Multiple biological replicates are strongly recommended! Good review: A comprehensive comparison of tools for differential ChIP-seq analysis. Steinhauser, Kurzawa, Eils and Herrmann

Next: practical session
Thank you! Next: practical session Raw data Alignment Peak calling Interpretation

Epigenetics System Biology Workshop: Introduction

Similar presentations

Presentation on theme: "Epigenetics System Biology Workshop: Introduction"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Epigenetics System Biology Workshop: Introduction

Similar presentations

Presentation on theme: "Epigenetics System Biology Workshop: Introduction"— Presentation transcript:

Similar presentations

About project

Feedback