The iPlant Collaborative

Slides:



Advertisements
Similar presentations
RNA-Seq as a Discovery Tool
Advertisements

RNA-seq library prep introduction
Functional Genomics with Next-Generation Sequencing
The Past, Present, and Future of DNA Sequencing
RNAseq.
Transcriptome Sequencing with Reference
Peter Tsai Bioinformatics Institute, University of Auckland
RNA-seq: the future of transcriptomics ……. ?
Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work.
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Transcriptomics Jim Noonan GENE 760.
CSE182-L12 Gene Finding.
mRNA-Seq: methods and applications
Software for Robust Transcript Discovery and Quantification from RNA-Seq Ion Mandoiu, Alex Zelikovsky, Serghei Mangul.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
De-novo Assembly Day 4.
LECTURE 2 Splicing graphs / Annoteted transcript expression estimation.
Li and Dewey BMC Bioinformatics 2011, 12:323
Expression Analysis of RNA-seq Data
Todd J. Treangen, Steven L. Salzberg
Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.
Variables: – T(p) - set of candidate transcripts on which pe read p can be mapped within 1 std. dev. – y(t) -1 if a candidate transcript t is selected,
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
Next Generation DNA Sequencing
Adrian Caciula Department of Computer Science Georgia State University Joint work with Serghei Mangul (UCLA) Ion Mandoiu (UCONN) Alex Zelikovsky (GSU)
RNA-seq workshop ALIGNMENT
Intro to RNA-seq July 13, Goal of the course To be able to effectively design, and interpret genomic studies of gene expression. We will focus on.
Advancing Science with DNA Sequence Metagenome definitions: a refresher course Natalia Ivanova MGM Workshop September 12, 2012.
RNA surveillance and degradation: the Yin Yang of RNA RNA Pol II AAAAAAAAAAA AAA production destruction RNA Ribosome.
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Introduction to RNAseq
De novo assembly validation
The iPlant Collaborative
TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.
The iPlant Collaborative
No reference available
Manuel Holtgrewe Algorithmic Bioinformatics, Department of Mathematics and Computer Science PMSB Project: RNA-Seq Read Simulation.
An Integer Programming Approach to Novel Transcript Reconstruction from Paired-End RNA-Seq Reads Serghei Mangul Department of Computer Science Georgia.
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
RNA Sequencing and transcriptome reconstruction Manfred G. Grabherr.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
Canadian Bioinformatics Workshops
Basics of Genome Annotation Daniel Standage Biology Department Indiana University.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Extract RNA, convert to cDNA RNA-Seq Empowers Transcriptome Studies Next-gen Sequencer (pick your favorite)
RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on
Canadian Bioinformatics Workshops
de Novo Transcriptome Assembly
Amos Tanay Nir Yosef 1st HCA Jamboree, 8/2017
Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Transcriptome Assembly
Genome Annotation w/ MAKER
RNA sequencing (RNA-Seq) and its application in ovarian cancer
Inference of alternative splicing from RNA-Seq data with probabilistic splice graphs BMI/CS Spring 2019 Colin Dewey
Alex M. Plocik, Brenton R. Graveley  Molecular Cell 
Schematic representation of a transcriptomic evaluation approach.
Presentation transcript:

The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Transcriptome assembly with SOAPdenovo Trans

RNA-Seq without a reference genome Overview of work Transcriptome Assembly Characterize Transcript abundance Visualization Generate Sequence QC and Processing We will focus mostly on the transcriptome assembly itself as this is the greatest challenge to working without a reference genome and involves several steps.

Transcriptome assembly Reminder: you’ll need to do some thinking and reading here Assemble Transcriptome Evaluate and refine On to mapping Good read data The most costly (time, compute) process is empirically determining what parameters yield the best assembly given the data and organism. We can suggest some good directions, but we can’t promise “right” answers Get familiar with examples of work and reasoning on approaches relevant to your organism

Transcriptome assembly from RNA-Seq Challenges (just a few of them) DNA assembly assumes even sequencing depth, not true in RNA-Seq (e.g. repetitive regions = more reads for DNA, more expression for RNA), also higher coverage is needed for de novo assembly from RNA-Seq (30X or more) Sequencing error correction, esp. in highly expressed transcripts Multiple transcript variants (splicing) can confound assembly

Transcriptome assembly from RNA-Seq Some reminders Practically the only agreement on which software is the right one to use is that there is no “right” one (we’ll need to experiment*) Don't: use less than 200 to 500 Million RNA reads, mate-paired, of 100 bp or better length, high quality, and expect to get a complete transcriptome.(1) Despite the challenges, making your own transcriptome is still very very useful and perhaps more practical than assembling your own genome. 1. How to get Best mRNA Transcript assemblies. http://eugenes.org/EvidentialGene/ by Don Gilbert, 2013 Jan

Transcriptome assembly from RNA-Seq Overview Pre-analysis: Data generation

Experimental Consideration – Library prep Generate the best data you can! Remove ribosomal RNAs Two options with some pros/cons Poly(A) selection – effective, but will miss ncRNAs and non polyadenlyated transcripts rRNA depletion – hybridize rRNAs and remove, but may introduce different biases (e.g. against highly expressed transcripts)

Experimental Consideration – Library prep Generate the best data you can! PCR Amplification Most protocols have a PCR amplification step – this of course introduces bias (e.g. against high GC content). Some alternative protocols or technologies (PacBio) can avoid amplification but again have their own issues.

Experimental Consideration – Library prep Generate the best data you can! Strand specificity If possible, doing a strand-specific protocol can simplify future analyses

Experimental Consideration – Library prep Generate the best data you can! No orientation Strandedness preserved http://www.giga.ulg.ac.be/jcms/prod_1025901/en/transcriptome-analysis-with-strand-specific-libraries

Trinity Overview Analysis

Trinity Overview Trinity: a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm Chrysalis Butterfly Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data : Manfred G. Grabherr, et al; Nat Biotechnol. 2011 May 15; 29(7): 644–652.

Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data : Manfred G. Grabherr, et al; Nat Biotechnol. 2011 May 15; 29(7): 644–652.

Trinity Overview Trinity aggregates isolated transcript graphs ftp://ftp.broad.mit.edu/pub/users/bhaas/rnaseq_workshop/rnaseq_workshop_2014/rnaseq_workshop_slides.pdf

Why Trinity? Some comparisons Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data : Manfred G. Grabherr, et al; Nat Biotechnol. 2011 May 15; 29(7): 644–652.

Trinity Overview Inchworm first finds the node with the highest number of reads and extends on both sides by picking the highest intensity nodes. Inchworm will find three segments – A, C and the blue part of B Chrysalis takes the three segments determined by inchworm and clusters them into two groups that are related to two genes Butterfly reconstructs the full gene structures including alternate splice forms A and B. http://www.homolog.us/blogs/blog/2011/08/25/de-novo-transcriptome-assemblers-oases-trinity-etc-iv/

Trinity Reads split into k-mers De Brujin graph constructed from kmers Kmers and De Brujin graphs Reads split into k-mers De Brujin graph constructed from kmers Next-generation transcriptome assembly, Jeffrey A. Martin and Zhong Wang – Nat.Reviw.Gen doi:10.1038/nrg3068 Published online 7 September 2011

Trinity Redundancies are collapsed Kmers and De Brujin graphs Redundancies are collapsed Paths through the graph that explained the observed sequence generate the alignments Next-generation transcriptome assembly, Jeffrey A. Martin and Zhong Wang – Nat.Reviw.Gen doi:10.1038/nrg3068 Published online 7 September 2011

Trinity Trinity.fasta : file containing assembled trancritps Key Results Trinity.fasta : file containing assembled trancritps Trinity groups transcripts into clusters based on shared sequence content Such a transcript cluster is very loosely referred to as a gene Information is encoded in the Trinity fasta accession eg:

Trinity Examine the quality of the assembly N50 statistic Where to go from here Examine the quality of the assembly N50 statistic Core gene representation Transcript annotation- Trinonate

Assembly quality N50 http://schatzlab.cshl.edu/

CEGMA How good is assembly coverage? CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes Bioinformatics (2007) 23 (9): 1061-1067. doi: 10.1093/bioinformatics/btm071 First published online: March 1, 2007

Keep asking: ask.iplantcollabortive.org

The iPlant Collaborative is funded by a grant from the National Science Foundation Plant Cyberinfrastructure Program (#DBI-0735191).