BS222 – Genome Science Lecture 8

Slides:



Advertisements
Similar presentations
Functional Genomics with Next-Generation Sequencing
Advertisements

Methods to read out regulatory functions
Epigenetics Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Detecting DNA-protein Interactions Xinghua Lu Dept Biomedical Informatics BIOST 2055.
Analysis of ChIP-Seq Data
Before we start: Align sequence reads to the reference genome
Mapping protein-DNA interactions by ChIP-seq Zsolt Szilagyi Institute of Biomedicine.
An Introduction to ENCODE Mark Reimers, VIPBG (borrowing heavily from John Stamatoyannopoulos and the ENCODE papers)
The Genome is Organized in Chromatin. Nucleosome Breathing, Opening, and Gaping.
ChIP-on-Chip and Differential Location Analysis Junguk Hur School of Informatics October 4, 2005.
SIGNAL PROCESSING FOR NEXT-GEN SEQUENCING DATA RNA-seq CHIP-seq DNAse I-seq FAIRE-seq Peaks Transcripts Gene models Binding sites RIP/CLIP-seq.
Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)
Vidyadhar Karmarkar Genomics and Bioinformatics 414 Life Sciences Building, Huck Institute of Life Sciences.
I519 Introduction to Bioinformatics, Fall, 2012
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in.
Analysis of protein-DNA interactions with tiling microarrays
Introduction to RNAseq
Trends Biomedical Science
Overview of ENCODE Elements
Lecture-5 ChIP-chip and ChIP-seq
Analysis of ChIP-Seq Data Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.
No reference available
Biol 456/656 Molecular Epigenetics Lecture #5 Wed. Sept 2, 2015.
Accessing and visualizing genomics data
Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.
Introduction The stem cell derived transcription factors SOX4, POU2F2 and BACH2 are known to be important in B-cell differentiation and B-cell malignancies.
Centralizing Bioinformatics Services: Analysis Pipelines, Opportunities, and Challenges with Large- scale –Omics, and other BigData High-Performance Computing.
Additional high-throughput sequencing techniques (finding all functional elements of genome) June 15, 2017.
ChIP-seq Robert J. Trumbly
Introductory RNA-seq Transcriptome Profiling
Epigenetics Continued
Cancer Genomics Core Lab
Epigenetics 04/04/16.
Figure 1. Annotation and characterization of genomic target of p63 in mouse keratinocytes (MK) based on ChIP-Seq. (A) Scatterplot representing high degree.
Introductory RNA-Seq Transcriptome Profiling
GE3M25: Data Analysis, Class 4
Sequencing Methods VEB.
Day 5 Session 29: Questions and follow-up…. James C. Fleet, PhD
Lecture 4. Topics in Gene Regulation and Epigenomics (Basics)
Next Generation Sequencing analysis
Many Sample Size and Power Calculators Exist On-Line
DNA:chromatin interactions
Sequencing Data Analysis
Simon v ChIP-Seq Analysis Simon v
Chapter 18: Regulation of Gene Expression
Sequencing techniques
Protein coding genes … & what is a gene
BS222 – Genome Science Lecture 5
Taichi Umeyama, Takashi Ito  Cell Reports 
Review Warm-Up What is the Central Dogma?
Epigenetics System Biology Workshop: Introduction
Transcription regulation
Alex M. Plocik, Brenton R. Graveley  Molecular Cell 
ChIP-seq Robert J. Trumbly
Single Cell Regulatory Variation
Seeking a Roadmap toward Neuroepigenetics
Adam C. Wilkinson, Hiromitsu Nakauchi, Berthold Göttgens  Cell Systems 
Eukaryotic genomes are complex 3D structures comprised of modified and unmodified DNA, RNA and many types of interacting proteins Most DNA is wrapped around.
Genetic mapping and epigenetic landscape of RUNX3 locus overlapping rs
Integrative analysis of 111 reference human epigenomes
Schematic representation of a transcriptomic evaluation approach.
Fig. 4 p100/TSN enables E2F1 to interact with alternatively spliced transcripts. p100/TSN enables E2F1 to interact with alternatively spliced transcripts.
Fig. 5 E2F1 also interacts with alternatively spliced transcripts from the MECOM gene. E2F1 also interacts with alternatively spliced transcripts from.
Taichi Umeyama, Takashi Ito  Cell Reports 
Chromatin basics & ChIP-seq analysis
Sequencing Data Analysis
Georgina Berrozpe, Gene O. Bryant, Katherine Warpinski, Mark Ptashne 
BS222 – Genome Science Lecture 7
Presentation transcript:

BS222 – Genome Science Lecture 8 NGS applications. Part 1 Vladimir Teif

Module structure Genomes, sequencing projects and genomic databases (VT) (Oct 9, 2018) Sequencing technologies (VT) (Oct 11, 2018) Genome architecture I: protein coding genes (VT) (Oct 16, 2018) Genome architecture II: transcription regulation (VT) (Oct 18, 2018) Genome architecture III: 3D chromatin organisation (VT) (Oct 23, 2018) Epigenetics overview (PVW) (Oct 25, 2018) DNA methylation and other DNA modifications (VT) (Oct 30, 2018) NGS applications I: Experiments and basic analysis (VT) (Nov 1, 2018) NGS applications II: Data integration (VT) (Nov 8, 2018). Comparative genomics (JP, guest lecture) (Nov 13, 2018) SNPs, CNVs, population genomics (LS, guest lecture) (Nov 15, 2018) Histone modifications (PVW) (Nov 20, 2018) Non-coding RNAs (PVW) (Nov 22, 2018) Genome Stability (PVW) ) (Nov 27, 2018) Transcriptomics (PVW) (Nov 29, 2018) Year's best paper (PVW) (Dec 6, 2018) Revision lecture (all lecturers; spring term)

NGS techniques vs NGS applications NGS techniques: how to sequence DNA (or RNA) (covered in lecture 2; funny recap in this video https://www.youtube.com/watch?v=-7GK1HXwCtE) NGS applications: how to design experiments in order to answer a specific biological question

Examples of NGS applications Chromatin domains Hi-C Figure adapted from http://www.scienceinschool.org

Types of NGS applications RNA-seq, GRO-seq, CAGE, SAGE, CLIP-seq, Drop-seq gene expression; non-coding RNA ChIP-seq, MNase-seq, DNase-seq, ATAC-se, etc protein binding; histone modifications chromatin accessibility; nucleosome positioning Bisulfite sequencing (DNA methylation) Hi-C, 3C, 4C, ChIA-PET, etc (Chromatin loops) Amplicon sequencing targeted regions; philogenomics; metagenomics Whole Genome Sequencing (WGS) de-novo assembly (new species or new analyses) Curated bibliography of *seq methods (~100 methods) can be found at https://liorpachter.wordpress.com/seq/

RNA-seq (RNA sequencing) https://en.wikipedia.org/wiki/RNA-Seq

ChIP-seq (Chromatin Immunoprecipitation followed by sequencing) 1. Crosslink Protein-DNA complexes in situ 2. Isolate nuclei and fragment DNA (sonication or digestion) 3. Immunoprecipitate with antibody against target nuclear protein and reverse crosslinks 4. Release DNA and submit for sequencing Adapted from www.VisiScience.com

MNase-seq (Micrococcal Nuclease digestion followed by sequencing) MM MNase-seq (Micrococcal Nuclease digestion followed by sequencing) MNase = Micrococcal Nuclease (enzyme that cuts DNA between nucleosomes) Teif et al. (2012), Methods, 62, 26-38

FAIRE-seq (Formaldehyde-Assisted Isolation of Regulatory Elements) sequencing Giresi et al (2007), Genome Res. 17, 877–885

DNAse-seq (DNase I digestion followed by sequencing Wang et al. (2012), PLoS ONE 7, e42414

ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing) How transposase works: https://www.youtube.com/watch?v=XYZHMGUGq6o Buenrostro et al. (2013) Nat Methods. 10, 1213-1218

Methods for 1D genome mapping MM Methods for 1D genome mapping Meyer & Liu, Nature Reviews Genetics 15, 709–721 (2014)

Methods for 1D genome mapping Tsompana and Buck, Epigenetics & Chromatin20147:33

NGS methods for DNA methylation Bisulfite sequencing Affinity purification (e.g. MeDIP)

Chromatin Conformation Capture methods to map locations of DNA-DNA loops Rao et al., Nature 159, 1665–1680 (2014)

River and Ren (2013), Cell, 155, 39-55 Since 2017 DNA loops can be measured with 100-bp resolution (Bonev et al., Cell, 2017)

Timeline of NGS methods Bulk methods that require many cells River and Ren (2013), Cell, 155, 39-55 Single-cell methods Hu et al, Front. Cell Dev. Biol., 2018

Where to get NGS data? Do your own experiment Gene Expression Omnibus (GEO) https://www.ncbi.nlm.nih.gov/geo Sequence read archive (SRA) https://www.ncbi.nlm.nih.gov/sra European Nucleotide Archive https://www.ebi.ac.uk/ena The Cancer Genome Atlas (TCGA) https://tcga-data.nci.nih.gov/tcga Exome Aggregation Consortium (ExAC) http://exac.broadinstitute.org/ You also have to upload your data!

Next generation sequencing analysis

How to analyze NGS data? Ask a bioinformatician you need to explain what do you want, and for that you need to understand what/how can be done Do it yourself Command line –> become a bioinformatician Online wrappers –> simpler, but file size limits Example of a convenient online tool: Galaxy http://galaxy.essex.ac.uk/

ChIP-seq (Chromatin ImmunoPrecipitation followed by sequencing) 1. Crosslink Protein-DNA complexes in situ 2. Isolate nuclei and fragment DNA (sonication or digestion) 3. Immunoprecipitate with antibody against target nuclear protein and reverse crosslinks 4. Release DNA and submit for sequencing Adapted from www.VisiScience.com

Experiment Data analysis http://www4.utsouthwestern.edu/mcdermottlab/NGS/index.html

ChIP-seq data analysis www.utsouthwestern.edu/labs.bioinformatics-core/analysis/chip-seq.png

Unmapped sequenced reads (this is “raw”, primary data):

Mapped reads are characterised by their locations in the genome Bowtie, BWA, ELAND, Novoalign, BLAST, ClustalW TopHat (for RNA-seq)

Reads can align to overlapping locations http://biocluster.ucr.edu/~rkaundal/workshops/R_feb2016/ChIPseq/ChIPseq.html We need to count all reads at each base pair

ChIP-seq landscapes depend on the protein Park P. J., Nature Genetics, 2009

We can compare different experimental datasets for the same genomic region 5mC Gifford et.al., Cell 2013

We can compare different experimental conditions in a genome browser Jung et al., NAR 2014 UCSC Genome Browser (online) IGV (install on a local computer)

Systematic analysis requires to identify all peaks in all datasets and compare differences Badet et al. (2012) Nature Protocols, 7, 45-61

Peak calling is a method to identify areas in a genome enriched with aligned reads Wilbanks EG (2010) PLoS ONE 5, e11471.

Peak calling: finding the peaks Input: sample that was prepared in the same way as in the ChIP-seq, but no antibody was added, so it has no specific enrichment of our protein of interest Pepke et al. (2009). Nature Methods, 6, S22–S32. 

Peak calling: defining statistical significance

Peak calling: defining statistical significance MACS (good for TFs) CISER (histones, etc) HOMER (universal) PeakSeq edgeR CisGenome Is this peak statistically significant? Is this peak statistically significant? Park P. J., Nature Genetics, 2009

Important: peaks are just genomic regions

Genes are also some genomic regions DESeq, edgeR, Cuffdiff

DNA methylation: also genomic regions Individual CpGs Differentially methylated regions DMRcaller BISMARK

Any genomic regions can be intersected BedTools (command line) Galaxy (online)

We can calculate distribution of TF binding sites among different genomic features Toropainen et al. (2016) Scientific Reports, 6, 33510

We can also calculate enrichments of binding sites of our TF in different genomic regions Mattout et al., Genome Biology, 2015

…Or study the DNA sequence inside the peaks to find some common motifs HOMER, MEME Massie et al., EMBO J. (2011) 30, 2719–2733

What else can we do with peaks? Compare two experimental conditions to see which peaks appear/disappear (e.g. protein binding gained/lost); Compute associations of our protein with different genes (e.g. define which genes are regulated by this protein) Study the DNA sequence inside the peaks (e.g. to find which other TFs co-bind with our protein of interest) Look how our peaks are arranged with respect to other peaks (e.g. to check for interactions with other proteins) etc

Take home message NGS data structure NGS data are very large text files. NGS analysis needs “large” computers MUST KNOW: NGS data structure ~100s types of NGS experiments; we focus on ChIP-seq here Where NGS data is stored? (GEO, etc) RAW DATA; MAPPED READS; REGIONS; SITES GENOME BROWSERS. PEAKS. PEAK CALLING Optional video: https://www.youtube.com/watch?v=Ob9xGBPvr_s