Epigenetics System Biology Workshop: Introduction

Slides:



Advertisements
Similar presentations
Methods to read out regulatory functions
Advertisements

BME 130 – Genomes Lecture 19 The histone code. Figure 7.1 Genomes 3 (© Garland Science 2007)
Epigenetics Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
ChIP-seq analysis Ecole de bioinformatique AVIESAN – Roscoff, Jan 2013.
Understanding the Human Genome: Lessons from the ENCODE project
Organization of DNA Within a Cell from Lodish et al., Molecular Cell Biology, 6 th ed. Fig meters of DNA is packed into a 10  m diameter cell.
NGS Analysis Using Galaxy
Organization of DNA Within a Cell from Lodish et al., Molecular Cell Biology, 6 th ed. Fig meters of DNA is packed into a 10  m diameter cell.
Mapping protein-DNA interactions by ChIP-seq Zsolt Szilagyi Institute of Biomedicine.
The Genome is Organized in Chromatin. Nucleosome Breathing, Opening, and Gaping.
Regulation of Gene Expression Eukaryotes
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
NGS data analysis CCM Seminar series Michael Liang:
Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)
Eukaryotic Genome & Gene Regulation The entire genome of the eukaryotic organism is present in every cell of the organism. Although all genes are present,
ChIP-chip Data. DNA-binding proteins Constitutive proteins (mostly histones) –Organize DNA –Regulate access to DNA –Have many modifications Acetylation,
I519 Introduction to Bioinformatics, Fall, 2012
Understanding Gene Regulation Through Integrated Analysis of Genomic Data Guo-Cheng Yuan Department of Biostatistics and Computational Biology Dana-Farber.
Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
Molecular Genetics Introduction to
 CHANGE!! MGL Users Group meetings will now be on the 1 st Monday of each month 3:00-4:00 Room Note the change of time and room.
Overview of ENCODE Elements
Analysis of ChIP-Seq Data Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.
CS173 Lecture 9: Transcriptional regulation III
Biol 456/656 Molecular Epigenetics Lecture #5 Wed. Sept 2, 2015.
Accessing and visualizing genomics data
Outline Molecular Cell Biology Assessment Review from last lecture Role of nucleoporins in transcription Activators and Repressors Epigenetic mechanisms.
Introduction of the ChIP-seq pipeline Shigeki Nakagome November 16 th, 2015 Di Rienzo lab meeting.
HOMER – a one stop shop for ChIP-Seq analysis
Practice:submit the ChIP_Streamline.pbs 1.Replace with your 2.Make sure the.fastq files are in your GMS6014 directory.
Gene Regulation, Part 2 Lecture 15 (cont.) Fall 2008.
Epigenetics Continued
Monica Britton, Ph.D. Sr. Bioinformatics Analyst June 2016 Workshop
Organization of DNA Within a Cell
Regulatory Genomics Lab
Figure 1. Annotation and characterization of genomic target of p63 in mouse keratinocytes (MK) based on ChIP-Seq. (A) Scatterplot representing high degree.
Short Read Sequencing Analysis Workshop
Eukaryotic Genome & Gene Regulation
Gene Expression.
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
GE3M25: Data Analysis, Class 4
Day 5 Session 29: Questions and follow-up…. James C. Fleet, PhD
Regulation of Gene Expression
Chromatin Regulation September 20, 2017.
Introduction to Genetic Analysis
Eukaryote Gene Expression/Regulation
Control of eukaryotic gene expression
Simon v ChIP-Seq Analysis Simon v
Regulation of Gene Expression
RNA and Chromosome Structure
Exploring and Understanding ChIP-Seq data
Organization of DNA Within a Cell
BS222 – Genome Science Lecture 8
ChIP-seq analysis 2/28/2018.
THE ORGANIZATION AND CONTROL OF EUKARYOTIC GENOMES
ChIP-seq Robert J. Trumbly
Regulatory Genomics Lab
Volume 63, Issue 6, Pages (September 2016)
Evolution of Alu Elements toward Enhancers
Volume 132, Issue 2, Pages (January 2008)
Dynamic Regulation of Nucleosome Positioning in the Human Genome
Adam C. Wilkinson, Hiromitsu Nakauchi, Berthold Göttgens  Cell Systems 
Anh Pham Conserved epigenomic signals in mice and humans reveal immune basis of Alzheimer’s disease.
BF528 - Sequence Analysis Fundamentals
Eukaryotic Gene Regulation
Regulatory Genomics Lab
Chromatin modifications
Chromatin basics & ChIP-seq analysis
Quality Control & Nascent Sequencing
Presentation transcript:

Epigenetics System Biology Workshop: Introduction Irina Shchukina 11/27/2018 Title

Outline A very short intro into gene expression regulation Some NGS technologies available to study regulation Overview of a standard pipeline for ChIP-seq data processing Next: practical session with ChIP-seq data

Chromatin organization Heterochromatin – packed, unreachable DNA (6-8) Euchromatin – generally more active regions (1–5) https://www.nature.com/scitable/content/chromatin-has-highly-complex-structure-with-several-113743374

Nucleosomes Nucleosomes are made of 8 histone proteins (2x H2A, H2B, H3, and H4). +H1 – linker histone ~1.65 loops of DNA include 147 nucleotides http://en.wikipedia.org/wiki/Nucleosome

Nucleosomes Nucleosomes are made of 8 histone proteins (2x H2A, H2B, H3, and H4). +H1 – linker histone ~1.65 loops of DNA include 147 nucleotides histones tails stick outside and can be recognized chemical modifications of histones influence DNA accessibility histone modifications are dynamic: they can be added, erased, and recognized http://en.wikipedia.org/wiki/Nucleosome

Nucleosomes nucleosomes are made of 8 histone proteins (2x H2A, H2B, H3, and H4). +H1 – linker histone ~1.65 loops of DNA include 147 nucleotides histones tails stick outside and can be recognized chemical modifications of histones influence DNA accessibility histone modifications can be read, erased, and recognized Examples: H3K4me1, H3K4me3, H3K27ac

Transcription factors Proteins that binds to DNA and controls transcription of DNA into RNA http://www.assignmentpoint.com/science/biology/transcription-factor.html

Transcription factors Proteins that binds to DNA and controls transcription of DNA into RNA A lot of interactions with other genome regions and corresponding histone modifications are happening! Enhancer may be located very far from the gene it regulates. https://www.boundless.com/biology/gene-expression/eukaryotic-transcription-gene-regulation/transcriptional-enhancers-and-repressors/

Role of histone modifications H3K27ac – distinguishes active enhancers from poised H3K27me3 – repression of transcription; has only one methyltransferase EZH2, which is a part of PRC2 H3K4me1 – enhancer mark H3K4me3 – promoters, active transcription H3K36me3 – gene body, active transcription

Chromatin immunoprecipitation: ChIP-seq http://www.bio.brandeis.edu/haberlab/jehsite/chIP.html

Chromatin immunoprecipitation: ChIP-seq Need a good antibody for clean data Input sample – no specific IP, control for background noise level, required for normalization and peak calling Can be used for both transcription factors and histone modifications Different proteins → different types of data → different processing: TFs bind narrow region of DNA (5-30bp) Some histone modifications are very broad: H3K36me3 may cover entire body of actively transcribed gene (Will see some examples in the next part) http://www.bio.brandeis.edu/haberlab/jehsite/chIP.html

Ultra-low-input (ULI) ChIP-seq Standard ChIP-seq protocol requires 1-5 million cells (or even more): serious limitation for multiple studies (human samples, rare populations, etc.) Ultra-low-input ChIP-seq allows you to go as low as 100,000 cells per sample. The price is data quality and consistency: higher noise variable signal to noise ratio between samples within one prep hard to process (as you will see in practice session) Major difference in protocol: no crosslinking step. Often Mnase digestion is used instead of sonication to decrease level of noise and keep DNA-protein complexes intact without crosslinking.

Publicly available data ENCODE: https://www.encodeproject.org Raw and processed data available. People actually care about this stuff and generated tons of data Blueprint: http://www.blueprint-epigenome.eu

Standard ChIP-seq pipeline Raw data Alignment Peak calling Interpretation Heavy computational work Machine-time- and memory-consuming part GTAC run their standard pipeline for you Following slides have many keywords, details, names of tools and links, etc – they are for googling https://www.encodeproject.org/chip-seq/histone/

Raw sequencing data: FASTQ files 4 lines per read Line 1: @read ID Line 2: actual sequence Line 3: + id or any description Line 4: encoded quality of each nucleotide Raw data Alignment Peak calling Interpretation

Alignment Finds location in genome for each read Tools to use: bowtie or BWA (shouldn’t make much difference) Raw data Alignment Peak calling Interpretation https://galaxyproject.github.io/training-material/topics/sequence-analysis/tutorials/mapping/tutorial.html

Alignment Finds location in genome for each read Tools to use: bowtie or BWA (shouldn’t make much difference) NB: use consistent reference genome versions! hg18 → hg19 (=GRCh37) → hg38 (=GRCh38) mm8 → mm9 → mm10 If you want compare positional data: Realign one of your dataset Liftover tool from UCSC (https://genome.ucsc.edu/cgi- bin/hgLiftOver) Raw data Alignment Peak calling Interpretation https://galaxyproject.github.io/training-material/topics/sequence-analysis/tutorials/mapping/tutorial.html

Alignment output: BAM/SAM files Detailed format description: https://samtools.github.io/hts-specs/SAMv1.pdf Set of utilities: SAMtools Raw data Alignment Peak calling Interpretation

Visualization Genome browsers (JBR, IGV, UCSC) work with bigwig (=bw) format BAM to bigwig conversion: deepTools suite, bamCoverage tool Visualization Raw data Alignment Peak calling Interpretation https://deeptools.readthedocs.io/en/develop/content/tools/bamCoverage.html

Visualization Genome browsers (JBR, IGV, UCSC) work with bigwig (=bw) format BAM to bigwig conversion: deepTools suite, bamCoverage tool You can normalize data before visualization: Sequencing depth (bamCoverage 1x normalization) Input (bamCompare) Visualization Raw data Alignment Peak calling Interpretation https://deeptools.readthedocs.io/en/develop/content/tools/bamCoverage.html

Visualization Different ChIP-seq data actually looks different! Raw data Alignment Peak calling Interpretation https://www.encodeproject.org/chip-seq/histone/ ChIP–seq: advantages and challenges of a maturing technology. Peter Park. Nat. reviews

Peak calling The most sensitive step. Choosing appropriate tool is very important! Golden standard tools: MACS (narrow peaks: TFs and some histone modifications) SICER (broad histone modifications) Data quality significantly affects results. ULI-ChIP-seq data requires special treatment: SPAN. More on your practical session. Raw data Alignment Peak calling Interpretation https://www.encodeproject.org/chip-seq/histone/

Peak calling output: BED BED format describes genomic intervals (not specific for ChIP-seq). 3 required fields: chromosome, start, end. Other depend on the tool used Detailed description: https://genome.ucsc.edu/FAQ/FAQformat.html#format1 Toolkit: bedtools Can be visualized in genome browsers together with bigwig files. Raw data Alignment Peak calling Interpretation

Quality control: ENCODE ENCODE established a set of formal data quality standards and metrics: https://www.encodeproject.org/data-standards/ Raw data QC! Alignment QC! Peak calling QC! Interpretation

Quality control Other things to look at: Raw data: FASTQC – report per one FASTQ file, MultiQC – summarizes multiple outputs into single report Aligner usually produces a report. Look for % of aligned reads, % of multimappers, etc. Can be processed with MultiQC. Peak calling: no ultimate QC metric, need to look for multiple scores. General advise: Visualize your data! Check genes that you know are bound/active/repressed/… Comparing to existing dataset may be a good sanity check Raw data QC! Alignment QC! Peak calling QC! Interpretation

Interpretation: binding motifs Particularly useful for transcription factors – can be a part of QC routine for TFs with known binding motif. Major tool: the MEME suite (http://meme-suite.org). MEME-ChIP – specialized version for ChIP-seq. Not too many sequences (e.g. 1,000 – 2,000), not too wide (~100-200bp) – otherwise may take forever to calculate Input is .fa file with actual nucleotide sequences. May be generated using bedtools: bedtools getfasta -fi GRCm38.genome.fa –fo peaks.seq.fa -bed peaks.bed Submit your jobs and be patient Raw data Alignment Peak calling Interpretation

Interpretation: annotation Match peaks and genes. Possible approached: Most common way: find gene that is the closest to the peak. Why you may be losing your favorite gene from resulting list? Assign all genes located within selected range from a peak Mostly for TF: consider only peaks located near TSS (e.g. [-10kb, +3kb] around TSS) Raw data Alignment Peak calling Interpretation

Interpretation: pathway enrichment GREAT: http://great.stanford.edu/public/html/ Choose your genome version Upload your BED file May adjust association rules Submit Result includes: Enrichment analysis against multiple pathway databases Matched peaks and genes Some positional analysis (e.g. distribution of distances from TSS)

Interpretation: differential analysis Not established analysis. No golden tools. Multiple biological replicates are strongly recommended! Good review: A comprehensive comparison of tools for differential ChIP-seq analysis. Steinhauser, Kurzawa, Eils and Herrmann

Next: practical session Thank you! Next: practical session Raw data Alignment Peak calling Interpretation