Analysis of genomes and transcriptomes using ChIP-seq and RNA-seq

Slides:



Advertisements
Similar presentations
Next-Generation Sequencing: Methodology and Application
Advertisements

Understanding the Human Genome: Lessons from the ENCODE project
Canadian Bioinformatics Workshops
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Greg Phillips Veterinary Microbiology
The SOLiD System: Next-Generation Sequencing Overview of the SOLiD System –  Scalable  Accurate Ultra High Throughput  Flexible  Mate Pairs.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
CS 6293 Advanced Topics: Current Bioinformatics
The impact of next-generation sequencing technology of genetics Elaine R. Mardis – 11 February Washington School of Medicine, Genome Sequencing Center.
-The methods section of the course covers chapters 21 and 22, not chapters 20 and 21 -Paper discussion on Tuesday - assignment due at the start of class.
Mapping protein-DNA interactions by ChIP-seq Zsolt Szilagyi Institute of Biomedicine.
The virochip (UCSF) is a spotted microarray. Hybridization of a clinical RNA (cDNA) sample can identify specific viral expression.
Restriction Nucleases Cut at specific recognition sequence Fragments with same cohesive ends can be joined.
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
Next Generation DNA Sequencing
Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)
I519 Introduction to Bioinformatics, Fall, 2012
How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington.
Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in.
Lecture-5 ChIP-chip and ChIP-seq
Analysis of ChIP-Seq Data Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.
Biol 456/656 Molecular Epigenetics Lecture #5 Wed. Sept 2, 2015.
RECETTORI DEGLI ORMONI STEROIDEI CARM1.
Next-generation sequencing technology
Research Techniques Made Simple: Next-Generation Sequencing:
DNA Sequencing Second generation techniques
Next generation sequencing
Next Generation Sequencing
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Next-generation sequencing technology
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
Volume 5, Issue 3, Pages (November 2013)
by Leighton J. Core, Joshua J. Waterfall, and John T. Lis
SOLEXA aka: Sequencing by Synthesis
DNA Sequencing The DNA from the genome is chopped into bits- whole chromosomes are too large to deal with, so the DNA is broken into manageably-sized overlapping.
Dynamic epigenetic enhancer signatures reveal key transcription factors associated with monocytic differentiation states by Thu-Hang Pham, Christopher.
Today… Review a few items from last class
Genomes and Their Evolution
Introduction to Bioinformatics II
Next Generation Sequencing for Clinical Diagnostics-Principles and Application to Targeted Resequencing for Hypertrophic Cardiomyopathy  Karl V. Voelkerding,
ULTRASEQUENCING. Next Generation Sequencing: methods and applications.
A New Map for Navigating the Yeast Epigenome
High-Resolution Profiling of Histone Methylations in the Human Genome
Massively Parallel Sequencing: The Next Big Thing in Genetic Medicine
Roger B. Deal, Steven Henikoff  Developmental Cell 
Volume 17, Issue 8, Pages (November 2016)
Chromosome Architecture
Direct Observation of Single MuB Polymers
High-Resolution Profiling of Histone Methylations in the Human Genome
Alex M. Plocik, Brenton R. Graveley  Molecular Cell 
Volume 23, Issue 1, Pages 9-22 (January 2013)
Volume 11, Issue 3, Pages (April 2015)
ChIP-seq Robert J. Trumbly
Volume 7, Issue 9, Pages (September 2014)
Zhenhai Zhang, B. Franklin Pugh  Cell 
Volume 67, Issue 6, Pages e6 (September 2017)
Gene Expression Analysis
New Technologies Provide Quantum Changes in the Scale, Speed, and Success of SELEX Methods and Aptamer Characterization  Abdullah Ozer, John M Pagano,
Volume 130, Issue 1, Pages (July 2007)
Volume 63, Issue 6, Pages (September 2016)
Volume 39, Issue 6, Pages (September 2010)
Volume 132, Issue 2, Pages (January 2008)
Volume 10, Issue 10, Pages (October 2017)
Dynamic Regulation of Nucleosome Positioning in the Human Genome
Volume 132, Issue 6, Pages (March 2008)
Gene Density, Transcription, and Insulators Contribute to the Partition of the Drosophila Genome into Physical Domains  Chunhui Hou, Li Li, Zhaohui S.
Standard (Sanger) sequencing
High Sensitivity Profiling of Chromatin Structure by MNase-SSP
by Leighton J. Core, Joshua J. Waterfall, and John T. Lis
Volume 62, Issue 6, Pages (June 2016)
Presentation transcript:

Analysis of genomes and transcriptomes using ChIP-seq and RNA-seq Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH ICGEB – Practical Course "Bioinformatics: Computer Methods in Molecular Biology” June 26-30 / 2017

The central dogma

Overview of an RNA–seq experiment Data generation and analysis Reference-based and De novo transcriptome assembly Popular aligner and assembler software

Overview of an ChIP–seq experiment Data generation and analysis ChIP profiles Peak calling in strand specific profiles

Using chromatin immunoprecipitation (ChIP) followed by massively parallel sequencing, the specifi c DNA sites that interact with transcription factors or other chromatin-associated proteins (non-histone ChIP) and sites that correspond to modified nucleosomes (histone ChIP) can be profiled. The ChIP process enriches the crosslinked proteins or modified nucleosomes of interest using an antibody specific to the protein or the histone modification. Purified DNA can be sequenced on any of the next-generation platforms12. The basic concepts are similar for different platforms: common adaptors are ligated to the ChIP DNA and clonally clustered amplicons are generated. The sequencing step involves the enzyme-driven extension of all templates in parallel. After each extension, the fluorescent labels that have been incorporated are detected through high-resolution imaging. On the Illumina Solexa Genome Analyzer (bottom left), clusters of clonal sequences are generated by bridge PCR, and sequencing is performed by sequencing-by-synthesis. On the Roche 454 and Applied Biosystems (ABI) SOLiD platforms (bottom middle), clonal sequencing features are generated by emulsion PCR and amplicons are captured on the surface of micrometre-scale beads. Beads with amplicons are then recovered and immobilized to a planar substrate to be sequenced by pyrosequencing (for the 454 platform) or by DNA ligase-driven synthesis (for the SOLiD platform). On single-molecule sequencing platforms such as the HeliScope by Helicos (bottom right), fluorescent nucleotides incorporated into templates can be imaged at the level of single molecules, which makes clonal amplification unnecessary.

a | Examples of the profiles generated by chromatin immunoprecipitation followed by sequencing (ChIP–seq) or by microarray (ChIP–chip). Shown is a section of the binding profiles of the chromodomain protein Chromator, as measured by ChIP–chip (unlogged intensity ratio; blue) and ChIP–seq (tag density; red) in the Drosophila melanogaster S2 cell line. The tag density profile obtained by ChIP–seq reveals specific positions of Chromator binding with higher spatial resolution and sensitivity. The ChIP–seq input DNA (control experiment) tag density is shown in grey for comparison. b | Examples of different types of ChIP–seq tag density profiles in human T cells. Profiles for different types of proteins and histone marks can have different types of features, such as: sharp binding sites, as shown for the insulator binding protein CTCF (CCCTC-binding factor; red); a mixture of shapes, as shown for RNA polymerase II (orange), which has a sharp peak followed by a broad region of enrichment; medium size broad peaks, as shown for histone H3 trimethylated at lysine 36 (H3K36me3; green), which is associated with transcription elongation over the gene; or large domains, as shown for histone H3 trimethylated at lysine 27 (H3K27me3; blue), which is a repressive mark that is indicative of Polycomb-mediated silencing. BPIL2, bactericidal/permeability-increasing protein-like 2; FBXO7, F box only 7; NPC1, Niemann-Pick disease, type C1; Pros35, proteasome 35 kDa subunit; SYN3, synapsin III. Data for part b are from Ref. 25.

DNA fragments from a chromatin immunoprecipitation experiment are sequenced from the 5' end.Therefore, the alignment of these tags to the genome results in two peaks (one on each strand) that flank the binding location of the protein or nucleosome of interest. This strand-specific pattern can be used for the optimal detection of enriched regions. To create an approximate distribution of all fragments, each tag location can be extended by an estimated fragment size in the appropriate orientation and the number of fragments can be counted at each position.

NCBI Submission Portal https://submit.ncbi.nlm.nih.gov

The Sequence Read Archive (SRA) The SRA is an entirely new resource at NCBI. It is being designed specifically meet the challenges presented by massively parallel sequencing technologies. Provide a central repository for next generation sequencing data.

The Sequence Read Archive (SRA) Concepts Study – A study is a set of experiments and has an overall goal. Experiment – An experiment is a consistent set of laboratory operations on input material with an expected result. Sample – An experiment targets one or more samples. Results are expressed in terms of individual samples or bundles of samples as defined by the experiment. Run – Results are called runs. Runs comprise the data gathered for a sample or sample bundle and refer to a defining experiment. Submission – A submission is a package of metadata and/or data objects and a directive for what to do with those objects.

The Sequence Read Archive (SRA) Submission process Create a NCBI PDA account Register a BioProject and receive the BioProject accession PRJNA# Register a BioSample and receive the BioSample accession SAMN# Complete submission metadata on the SRA website. You will receive the FTP information after creating a Run For FTP, use put to transmit the file(s) to the private FTP box. For Aspera, use the ascp program to transfer data files to the private account.

The Sequence Read Archive (SRA) Submission process http://www.ncbi.nlm.nih.gov/Traces/sra_sub/sub.cgi?&m=submissions&s=default

The Sequence Read Archive (SRA) Web access http://www.ncbi.nlm.nih.gov/sra/?term=SRX045419

The Sequence Read Archive (SRA) Web access http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=viewer&m=data&s=viewer&run=SRR111938

The Sequence Read Archive (SRA) Web access http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=viewer&m=data&s=viewer&run=SRR111938

The Sequence Read Archive (SRA) Toolkit http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=show&f=software&m=software&s=software

dbEST Homo sapiens (human) 8,704,790 Mus musculus + domesticus (mouse) Organism ESTs Homo sapiens (human) 8,704,790 Mus musculus + domesticus (mouse) 4,853,570 Zea mays (maize) 2,019,137 Sus scrofa (pig) 1,669,337 Bos taurus (cattle) 1,559,495 Arabidopsis thaliana (thale cress) 1,529,700 Organism ESTs Homo sapiens (human) 8,704,790 dbEST release 130101 Summary by Organism - 01 January 2013

UniGene

UniGene - Statistics

Gene Expression Omnibus (GEO) http://www.youtube.com/watch?v=J0i_B76zq2w

Gene Expression Omnibus (GEO) http://www.ncbi.nlm.nih.gov/geo/geo2r/ http://www.youtube.com/watch?v=EUPmGWS8ik0

Transcriptome Shotgun Assembly (TSA) TSA is an archive of computationally assembled sequences from primary data such as ESTs, traces and Next Generation Sequencing Technologies. The overlapping sequence reads from a complete transcriptome are assembled into transcripts by computational methods instead of by traditional cloning and sequencing of cloned cDNAs.

A typical transcriptome (454) http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3488962/

RNAseq data analysis with Bowtie tophat and cufflinks