RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.

Slides:



Advertisements
Similar presentations
IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
Advertisements

12/04/2017 RNA seq (I) Edouard Severing.
Cufflinks Matt Paisner, Hua He, Steve Smith and Brian Lovett.
Peter Tsai Bioinformatics Institute, University of Auckland
RNA-seq: the future of transcriptomics ……. ?
RNA sequencing, transcriptome and expression quantification
Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work.
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Transcriptomics Jim Noonan GENE 760.
Ribosomal Profiling Data Handling and Analysis
RNA-Seq based discovery and reconstruction of unannotated transcripts in partially annotated genomes 3 Serghei Mangul*, Adrian Caciula*, Ion.
RNA-seq Analysis in Galaxy
mRNA-Seq: methods and applications
Software for Robust Transcript Discovery and Quantification from RNA-Seq Ion Mandoiu, Alex Zelikovsky, Serghei Mangul.
NGS Analysis Using Galaxy
Introduction to RNA-Seq and Transcriptome Analysis
Li and Dewey BMC Bioinformatics 2011, 12:323
Expression Analysis of RNA-seq Data
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
RNAseq analyses -- methods
Introduction to RNA-Seq & Transcriptome Analysis
TopHat Mi-kyoung Seo. Today’s paper..TopHat Cole Trapnell at the University of Washington's Department of Genome Sciences Steven Salzberg Center.
Transcriptome Analysis
The iPlant Collaborative
Transcriptomics Sequencing. over view The transcriptome is the set of all RNA molecules, including mRNA, rRNA, tRNA, and other non coding RNA produced.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Gene Expression. Remember, every cell in your body contains the exact same DNA… …so why does a muscle cell have different structure and function than.
Prokaryotic cells turn genes on and off by controlling transcription.
Complexities of Gene Expression Cells have regulated, complex systems –Not all genes are expressed in every cell –Many genes are not expressed all of.
Introduction to RNAseq
RNA sequencing, transcriptome and expression quantification
The iPlant Collaborative
TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.
The iPlant Collaborative
No reference available
Manuel Holtgrewe Algorithmic Bioinformatics, Department of Mathematics and Computer Science PMSB Project: RNA-Seq Read Simulation.
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Overview of Genomics Workflows
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on
GCC Workshop 9 RNA-Seq with Galaxy
An Introduction to RNA-Seq Data and Differential Expression Tools in R
WS9: RNA-Seq Analysis with Galaxy (non-model organism )
Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017
Gene expression from RNA-Seq
RNA-Seq Software, Tools, and Workflows
RNA-Seq analysis in R (Bioconductor)
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Kallisto: near-optimal RNA seq quantification tool
Prokaryotic cells turn genes on and off by controlling transcription.
Prokaryotic cells turn genes on and off by controlling transcription.
Reference based assembly
Transcriptome analysis
Prokaryotic cells turn genes on and off by controlling transcription.
Quantitative analyses using RNA-seq data
Sequence Analysis - RNA-Seq 2
Transcriptomics – towards RNASeq – part III
RNA-Seq Data Analysis UND Genomics Core.
Presentation transcript:

RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop

The Basic Tuxedo Suite References Trapnell C, et al TopHat: discovering splice junctions with RNA-Seq. Bioinformatics Trapnell C, et al Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology Kim D, et al TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biology Roberts A, et al Improving RNA-Seq expression estimates by correcting for fragment bias. Genome Biology Roberts A, et al Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics Trapnell C, et al Differential analysis of gene regulation at transcript resolution with RNA-Seq. Nature Biotechnology Cufflinks assembles transcripts Cuffdiff identifies differential expression of genes/ transcripts/promoters

Alignment and Differential Expression TopHat Cuffdiff Read set(s) Existing annotation (GTF) bam file(s) Toptables, etc. We followed these steps with the single-end reads

But, do we have all the genes? For organisms with genomes, gene models are stored in gtf files Assumptions: – The gtf file contains annotation for ALL transcripts and genes – All splice sites, start/stop codons, etc. are correct Are these assumptions correct for every sequenced organism? RNA-Seq reads can be used to independently construct genes and splice variants using limited or no annotation Method used depends on how much sequence information there is for the organism…

Gene Construction (Alignment) vs. Assembly Haas and Zody (2010) Nat. Biotech. 28:421-3 Novel or Non-Model Organisms Genome- Sequenced Organisms Trinity software

Gene / Transcriptome Construction Annotation can be improved – even for well-annotated model organisms – Identify all expressed exons – Combine expressed exons into genes – Find all splice variants for a gene – Discover novel transcripts For newly sequenced organisms – Validate ab initio annotation – Comparison between different annotation sets Can assist in finding some types of contamination – Reconstruction of rRNA genes – Genomic/mitochondrial DNA in RNA library preps.

Reference Annotation Based Transcript (RABT) Assembly TopHat Cufflinks Cuffmerge Cuffcompare Cuffdiff Read set(s) Existing annotation (GTF) [optional] bam file(s) Read-set specific GTF(s) Merged GTF Final assembly (GTF and stats) Toptables, etc.

TopHat Spliced Alignment to a Genome

Reference Annotation Based Transcript (RABT) Assembly

Cufflinks – Identification of Incompatible Fragments Incompatible alignment

Cufflinks – Minimum Paths to Transcripts

Cufflinks – Abundance Estimation

Merging Cufflinks Assemblies

So Now We’ve Explored These Tools…

We’ve Used Other Software in Conjunction HTSeq-count edgeR Raw Counts (But HTSeq-count and edgeR are independent)

And Then Came Some Extensions…

Modules Introduced in 2014 Cuffquant Improves efficiency of running multiple samples Stores data in “.cxb” compressed format, that can later be analyzed with cuffdiff or cuffnorm Cuffnorm Generate tables of expression values that are normalized for library size. Tables are used as input to Monocle Monocle Used to analyze single-cell expression data Trapnell, et al., 2014, Nat. Biotech. 32:381

…But Software Continues to Evolve HISAT (Hierarchical Indexing for Spliced Alignment of Transcripts) Kim et al., 2015, Nat. Methods Planned to be Tophat3 Faster than other aligners More accurate on simulated reads.

…But Software Continues to Evolve StringTie Pertea et al., 2015, Nat. Biotech Probable successor to Cufflinks2 Assembles more transcripts (based on simulated reads) Ballgown Frazee et al., 2015, Nat. Biotech Bioconductor R package Probable successor to Cuffdiff2 Includes useful Tablemaker preprocessor

A New Potential Game-Changer (2015) Kallisto (“Near-Optimal RNA-Seq Quantification”) Bray et al. ( Extremely fast, uses pseudo-alignment based on k-mers and deBruijn graphs SpeedAccuracy

A Few Words About Bacterial RNA-Seq

Eukaryotic and Bacterial Gene Structures are Different Eukaryotes – Gene structure includes introns and exons – Splicing, poly-adenylation – Each mRNA is a discrete molecule when translated Bacteria / Prokaryotes – Individual genes and groups of genes in operons – Generally, no splicing, no polyA – One mRNA can contain coding sequences for multiple proteins

Bacterial RNA-Seq Considerations rRNA depletion strategies may leave considerable amounts of non-coding RNA molecules Splicing-aware aligners (such as Tophat) may not be useful Reads from polycistronic mRNA may overlap two genes – How would HTSeq-Count handle this? Compare alignments to the genome to alignments to transcriptome. – Some aligners, such as bwa-mem, will report secondary alignments – Transcriptome alignments can be used to generate counts table for edgeR Specialized software, such as Rockhopper (stand-alone,