Extract RNA, convert to cDNA RNA-Seq Empowers Transcriptome Studies Next-gen Sequencer (pick your favorite)

Slides:



Advertisements
Similar presentations
RNA-seq library prep introduction
Advertisements

RNAseq.
Transcriptome reconstruction and quantification
Transcriptome Sequencing with Reference
Peter Tsai Bioinformatics Institute, University of Auckland
RNA sequencing, transcriptome and expression quantification
Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work.
TOPHAT Next-Generation Sequencing Workshop RNA-Seq Mapping
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Transcriptomics Jim Noonan GENE 760.
Assembly.
mRNA-Seq: methods and applications
Bacterial Genome Assembly | Victor Jongeneel Radhika S. Khetani
Before we start: Align sequence reads to the reference genome
NGS Analysis Using Galaxy
Li and Dewey BMC Bioinformatics 2011, 12:323
Expression Analysis of RNA-seq Data
Todd J. Treangen, Steven L. Salzberg
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
RNAseq analyses -- methods
June 11, 2013 Intro to Bioinformatics – Assembling a Transcriptome Tom Doak Carrie Ganote National Center for Genome Analysis Support.
Variables: – T(p) - set of candidate transcripts on which pe read p can be mapped within 1 std. dev. – y(t) -1 if a candidate transcript t is selected,
Next Generation DNA Sequencing
Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.
TopHat Mi-kyoung Seo. Today’s paper..TopHat Cole Trapnell at the University of Washington's Department of Genome Sciences Steven Salzberg Center.
Adrian Caciula Department of Computer Science Georgia State University Joint work with Serghei Mangul (UCLA) Ion Mandoiu (UCONN) Alex Zelikovsky (GSU)
The iPlant Collaborative
LECTURES 3/4. CONSTRUCTING and SCREENING cDNA LIBRARIES to ISOLATE NEW GENES ORIGINAL ARTICLES: CLONING BY COMPLEMENTATION: Lew, D, Dulic, V, and Reed.
RNA Sequencing I: De novo RNAseq
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Introduction to RNAseq
RNA sequencing, transcriptome and expression quantification
TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.
No reference available
Manuel Holtgrewe Algorithmic Bioinformatics, Department of Mathematics and Computer Science PMSB Project: RNA-Seq Read Simulation.
An Integer Programming Approach to Novel Transcript Reconstruction from Paired-End RNA-Seq Reads Serghei Mangul Department of Computer Science Georgia.
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
RNA Sequencing and transcriptome reconstruction Manfred G. Grabherr.
Case study: Saccharomyces cerevisiae grown under two different conditions RNAseq data plataform: Illumina Goal: Generate a platform where the user will.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
RNA Seq Analysis Aaron Odell June 17 th Mapping Strategy A few questions you’ll want to ask about your data… - What organism is the data from? -
Case study: Saccharomyces cerevisiae grown under two different conditions RNAseq data plataform: Illumina Goal: Generate a platform where the user will.
RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.
Transcriptomics History and practice.
RNA-Seq Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on
Introductory RNA-seq Transcriptome Profiling
de Novo Transcriptome Assembly
Cancer Genomics Core Lab
Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017
RNA-Seq analysis in R (Bioconductor)
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
Kallisto: near-optimal RNA seq quantification tool
Gene expression estimation from RNA-Seq data
Transcriptome Assembly
From: TopHat: discovering splice junctions with RNA-Seq
Genome scaffold ORF A   ORF B RNASeq reads
Transcriptomics History and practice.
RNA sequencing (RNA-Seq) and its application in ovarian cancer
Inference of alternative splicing from RNA-Seq data with probabilistic splice graphs BMI/CS Spring 2019 Colin Dewey
Additional file 2: RNA-Seq data analysis pipeline
Sequence Analysis - RNA-Seq 2
Presentation transcript:

Extract RNA, convert to cDNA RNA-Seq Empowers Transcriptome Studies Next-gen Sequencer (pick your favorite)

Generating RNA-Seq: How to Choose? Illumina454SOLiDHelicos Pacific BiosciencesIon TorrentOxford Nanopore Slide courtesy of Joshua Levin, Broad Institute. Many different instruments hit the scene in the last decade

RNA-Seq: How to Choose? Illumina454SOLiDHelicos Pacific BiosciencesIon TorrentOxford Nanopore Slide courtesy of Joshua Levin, Broad Institute.

Generating RNA-Seq: How to Choose? Illumina454SOLiDHelicos Pacific BiosciencesIon TorrentOxford Nanopore Popular choices for RNA-Seq today

Generating RNA-Seq: How to Choose? Illumina454SOLiDHelicos Pacific BiosciencesIon TorrentOxford Nanopore Popular choices for RNA-Seq today [Current RNA-Seq workhorse] [Full-length single molecule sequencing] [Newly emerging technology for full-length single molecule sequencing]

RNA-Seq: How do we make cDNA? Reverse transcriptase RNase H DNA polymerase DNA Ligase Prime with Random Hexamers (R6) cDNA First strand synthesis cDNA Second strand synthesis mRNA 5’3’ R6R6 R6R6 R6R6 R6R6 R6R6 R6R6 Illumina cDNA Library Slide courtesy of Joshua Levin, Broad Institute.

Overview of RNA-Seq From:

Common Data Formats for RNA-Seq >61DFRAAXX100204:1:100:10494:3070/1 AAACAACAGGGCACATTGTCACTCTTGTATTTGAAAAACACTTTCCGGCCAT FASTA format: FASTQ AAACAACAGGGCACATTGTCACTCTTGTATTTGAAAAACACTTTCCGGCCAT + AsciiEncodedQual (‘C’) = 64 AsciiEncodedQual(x) = -10 * log10(Pwrong(x)) + 33 So, Pwrong(‘C’) = 10^( (64-33/ (-10) ) = 10^-3.4 = Read Quality values

Paired-end AAACAACAGGGCACATTGTCACTCTTGTATTTGAAAAACACTTTCCGGCCAT CTCAAATGGTTAATTCTCAGGCTGCAAATATTCGTTCAGGATGGAAGAACA + C<CCCCCCCACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCBCCCC Two FastQ files, read name indicates left (/1) or right (/2) read of paired-end

Overview of RNA-Seq From:

Transcript Reconstruction from RNA-Seq Reads Nature Biotech, 2010

Transcript Reconstruction from RNA-Seq Reads TopHat

Transcript Reconstruction from RNA-Seq Reads Cufflinks TopHat

Transcript Reconstruction from RNA-Seq Reads Trinity GMAP Cufflinks TopHat The Tuxedo Suite: End-to-end Genome-based RNA-Seq Analysis Software Package

Transcript Reconstruction from RNA-Seq Reads Trinity GMAP Cufflinks TopHat The Tuxedo Suite: End-to-end Genome-based RNA-Seq Analysis Software Package Non-model organisms: “I don’t have a reference genome!”

Transcript Reconstruction from RNA-Seq Reads Trinity Cufflinks TopHat

Transcript Reconstruction from RNA-Seq Reads Trinity GMAP Cufflinks TopHat

Transcript Reconstruction from RNA-Seq Reads Trinity GMAP End-to-end Transcriptome-based RNA-Seq Analysis Software Package

The General Approach to De novo RNA-Seq Assembly Using De Bruijn Graphs

Sequence Assembly via De Bruijn Graphs From Martin & Wang, Nat. Rev. Genet. 2011

Contrasting Genome and Transcriptome Assembly Genome Assembly Transcriptome Assembly Uniform coverage Single contig per locus Double-stranded Exponentially distributed coverage levels Multiple contigs per locus (alt splicing) Strand-specific

Trinity Aggregates Isolated Transcript Graphs Genome Assembly Single Massive Graph Trinity Transcriptome Assembly Many Thousands of Small Graphs Ideally, one graph per expressed gene. Entire chromosomes represented.

RNA-Seq reads Linear contigs de-Bruijn graphs Transcripts + Isoforms Trinity – How it works: Thousands of disjoint graphs

Inchworm Algorithm Decompose all reads into overlapping Kmers (25-mers) Extend kmer at 3’ end, guided by coverage. G A T C Identify seed kmer as most abundant Kmer, ignoring low-complexity kmers. GATTACA 9

Inchworm Algorithm G A T C 4 GATTACA 9

Inchworm Algorithm G A T C 4 1 GATTACA 9

Inchworm Algorithm G A T C GATTACA 9

Inchworm Algorithm G A T C GATTACA 9

G A T C Inchworm Algorithm

GATTACA G A T C G A T C G A T C Inchworm Algorithm

GATTACA G A A T C G T C G A T C Inchworm Algorithm

GATTACA G A Inchworm Algorithm

GATTACA G A G A T C Inchworm Algorithm

GATTACA G A A 6 A 7 Inchworm Algorithm Remove assembled kmers from catalog, then repeat the entire process. Report contig: ….AAGATTACAGA….

Inchworm Contigs from Alt-Spliced Transcripts Isoform A Isoform B Expressed isoforms

Inchworm Contigs from Alt-Spliced Transcripts Isoform A Isoform B Graphical representation Expressed isoforms (low) (high) Expression

Inchworm Contigs from Alt-Spliced Transcripts

+ No k-mers in common

Inchworm Contigs from Alt-Spliced Transcripts +

Chrysalis Re-groups Related Inchworm Contigs + Chrysalis uses (k-1) overlaps and read support to link related Inchworm contigs

Chrysalis Integrate isoforms via k-1 overlaps Build de Bruijn Graphs (ideally, one per gene)

Thousands of Chrysalis Clusters

(isoforms and paralogs)

Butterfly Example 1: Reconstruction of Alternatively Spliced Transcripts Butterfly’s Compacted Sequence Graph Reconstructed Transcripts Aligned to Mouse Genome

Reconstruction of Alternatively Spliced Transcripts Butterfly’s Compacted Sequence Graph Reconstructed Transcripts Aligned to Mouse Genome

Reconstruction of Alternatively Spliced Transcripts Butterfly’s Compacted Sequence Graph Reconstructed Transcripts Aligned to Mouse Genome

Reconstruction of Alternatively Spliced Transcripts Butterfly’s Compacted Sequence Graph Reconstructed Transcripts Aligned to Mouse Genome (Reference structure)

Teasing Apart Transcripts of Paralogous Genes Ap2a1 Ap2a2 Butterfly Example 2:

Teasing Apart Transcripts of Paralogous Genes Ap2a1 Ap2a2

Strand-specific RNA-Seq is Preferred Computationally: fewer confounding graph structures in de novo assembly: ex. Forward != reverse complement (GGAA != TTCC) Biologically: separate sense vs. antisense transcription

dUTP 2 nd Strand Method: Our Favorite Modified from Parkhomchuk et al. (2009) Nucleic Acids Res. 37:e123 RNA PCR and paired-end sequencingAdaptor ligation UUUU U U U U USER™ (Uracil-Specific Excision Reagent) Remove “U”s Second-strand synthesis with dTTP  dUTP UUUU U First-strand synthesis with normal dNTPs cDNA Slide courtesy of Joshua Levin, Broad Institute.

Overlapping UTRs from Opposite Strands Schizosacharomyces pombe (fission yeast)

Antisense-dominated Transcription

Trinity output: A multi-fasta file

Evaluating the quality of your transcriptome assembly Read representation by assembly Full-length transcript reconstruction Contig N50 leveraging expression data (ExN50) Detonate

Evaluating the quality of your transcriptome assembly Read representation by assembly Align reads to the assembled transcripts using Bowtie. A typical ‘good’ assembly has ~80 % reads mapping to the assembly and ~80% are properly paired. Proper pairs Given read pair: Possible mapping contexts in the Trinity assembly are reported: Improper pairs Left only Right only

% samtools tview alignments.bam target.fasta Assembled transcript contig is only as good as its read support.

IGV

Can Examine Transcript Read Support Using IGV

Can align Trinity transcripts to genome scaffolds to examine intron/exon structures (Trinity transcripts aligned to the genome using GMAP)

Evaluating the quality of your transcriptome assembly Full-length Transcript Detection via BLASTX M * Known protein (SWISSPROT) Trinity transcript Haas et al. Nat. Protoc * Mouse transcriptome Have you sequenced deeply enough?

The Contig N50 statistic “At least half of assembled bases are in contigs that are at least N50 bases in length” In genome assemblies – used often to judge ‘which assembly is better’ In transcriptome assemblies – N50 is not very useful. Overzealous isoform annotation for long transcripts drives higher N50 Very sensitive reconstruction for short lowly expressed transcripts drives lower N50

Expression Often, most assembled transcripts are *very* lowly expressed (How many ‘transcripts & genes’ are there really?) 20k transcripts Cumulative # of Transcripts 1.4 million Trinity transcript contigs N50 ~ 500 bases * Salamander transcriptome -1 * minimum TPM

Sort contigs by expression value, descendingly. Compute N50 given minimum % total expression data thresholds => ExN50 N50=3457, and 24K transcripts Compute N50 Based on the Top-most Highly Expressed Transcripts (ExN50) 90% of expression data

ExN50 Profiles for Different Trinity Assemblies Using Different Read Depths Note shift in ExN50 profiles as you assemble more and more reads. * Candida transcriptome Thousands of Reads Millions of Reads

“RSEM-EVAL [sic] uses a novel probabilistic model-based method to compute the joint probability of both an assembly and the RNA-Seq data as an evaluation score.” Li et al. Evaluation of de novo transcriptome assemblies from RNA-Seq data, Genome Biology 2014 Detonate Software Ref Genome –based metric RSEM-EVAL Genome-free metric