RNA-seq library prep introduction

Slides:

Advertisements

Similar presentations

RNA-Seq as a Discovery Tool

Advertisements

Functional Genomics with Next-Generation Sequencing

Capturing the chicken transcriptome with PacBio long read RNA-seq data OR Chicken in awesome sauce: a recipe for new transcript identification Gladstone.

The Past, Present, and Future of DNA Sequencing

An Introduction to Studying Expression Data Through RNA-seq

Walk-thru of CAGE exercise Also at /tag_analysis/ /tag_analysis/

Transcriptome Sequencing with Reference

Peter Tsai Bioinformatics Institute, University of Auckland

RNA-seq: the future of transcriptomics ……. ?

Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work.

1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Data Analysis for High-Throughput Sequencing

Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520

Transcriptomics Jim Noonan GENE 760.

Gene Expression And Regulation Bioinformatics January 11, 2006 D. A. McClellan

High Throughput Sequencing

mRNA-Seq: methods and applications

Li and Dewey BMC Bioinformatics 2011, 12:323

Expression Analysis of RNA-seq Data

Ji-hye Choi August Introduction (2006) ABRF-NGS (the Association fo Biomolecular Resource Facilities next-generation sequencing study)

Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.

Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.

RNA-Seq Analysis Simon V4.1.

ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2.

Verna Vu & Timothy Abreo

The iPlant Collaborative

RNA surveillance and degradation: the Yin Yang of RNA RNA Pol II AAAAAAAAAAA AAA production destruction RNA Ribosome.

The generalized transcription of the genome Víctor Gámez Visairas Genomics Course 2014/15.

Tag profiling is dead... October 2009 Claudia Voelckel Patrick Biggs...long live mRNA-Seq!

1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:

Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.

RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.

Introduction to RNAseq

TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.

Biases in RNA-Seq data. Transcript length bias Two transcripts of length 50 and 100 have the same abundance in a control sample. The expression of both.

No reference available

Transcriptome What is it - genome wide transcript abundance How do you obtain it - Arrays + MPSS What do you do with it when you have it - ?

CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.

Canadian Bioinformatics Workshops

Validation of RNA-Seq data An introduction to qPCR Sarah Diermeier, Ph.D. Cold Spring Harbor Laboratory

Canadian Bioinformatics Workshops

Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.

Canadian Bioinformatics Workshops

Library QA & QC Day 1, Video 3

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops

Transcriptomics History and practice.

RNA-Seq Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520

RNA-Seq Primer Understanding the RNA-Seq evidence tracks on

Simon v RNA-Seq Analysis Simon v

Canadian Bioinformatics Workshops

Next generation sequencing

RNA-Seq for the Next Generation RNA-Seq Intro Slides

Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017

Gene expression from RNA-Seq

RNA-Seq analysis in R (Bioconductor)

S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.

Canadian Bioinformatics Workshops

Sarah K. Whitley, William T. Horne, Jay K. Kolls

Transcriptome analysis

Christopher R. Cabanski, Vincent Magrini, Malachi Griffith, Obi L

Transcriptomics History and practice.

Development and Verification of an RNA Sequencing (RNA-Seq) Assay for the Detection of Gene Fusions in Tumors Jennifer L. Winters, Jaime I. Davila, Amber.

RNA sequencing (RNA-Seq) and its application in ovarian cancer

Volume 16, Issue 8, Pages (August 2016)

Gene Expression Analysis

Sequence Analysis - RNA-Seq 2

Sequence Analysis - RNA-Seq 1

Presentation transcript:

RNA-seq library prep introduction NESCent Academy

Outline Methodologies and history RNA-seq challenges Library preparation methods Common queries Validation Spike-in and future-proofing your work

Gene expression

RNA sequencing Isolate RNAs Generate cDNA, fragment, size select, add linkers Samples of interest Condition 1 (normal colon) Condition 2 (colon tumor) Sequence ends Map to genome, transcriptome, and predicted exon junctions 100s of millions of paired reads 10s of billions bases of sequence Downstream analysis

Metholologies for RNA-Seq studies Mapping transcription start sites Strand-specific RNA-Seq Characterization of alternative splicing patterns Gene fusion detection Targeted approaches using RNA-Seq Small RNA profiling Direct RNA sequencing Profiling low-quantity RNA samples

Pre NGS Transcriptomics Hybridization-based approaches Genomic tiling microarrays Fluorescently labelled cDNA with microarrays Sequence-based approaches Sanger sequencing of cDNA or EST libraries Serial analysis of gene expression (SAGE) Cap analysis of gene expression (CAGE) Massively parallel signature sequencing (MPSS)

RNA-seq

Challenges RNAs consist of small exons that may be separated by large introns Mapping reads to genome is challenging The relative abundance of RNAs vary wildly 105 – 107 orders of magnitude Since RNA sequencing works by random sampling, a small fraction of highly expressed genes may consume the majority of reads Ribosomal and mitochondrial genes RNAs come in a wide range of sizes Small RNAs must be captured separately PolyA selection of large RNAs may result in 3’ end bias RNA is fragile compared to DNA (easily degraded) Bacterial samples may need to be depleted of rRNA

Rubbish in = Rubbish out

RNA-seq library prep methodologies Two main routes for mRNA-seq preparation Illumina TruSeq prep Script-seq Generally Script-seq is our favourite

RNA Illumina Tru-Seq library prep 2 days for 8 samples Size selection step Adaptor ligation and standard library preparation 5ug of total RNA ~$100 per sample Not strand-specific

Script-seq method 2 hours for 12 samples < 1ug of RNA ~$150 per sample Strand-specific

DNA library preparation: RNA fragmentation and DNA fragmentation compared a | Fragmentation of oligo-dT primed cDNA (blue line) is more biased towards the 3' end of the transcript. RNA fragmentation (red line) provides more even coverage along the gene body, but is relatively depleted for both the 5' and 3' ends. Note that the ratio between the maximum and minimum expression level (or the dynamic range) for microarrays is 44, for RNA-Seq it is 9,560. The tag count is the average sequencing coverage for 5,000 yeast ORFs. b | A specific yeast gene, SES1 (seryl-tRNA synthetase), is shown.

Common questions: How much library depth is needed for RNA-seq? My advice. Don’t ask this question if you want a simple answer… Depends on a number of factors: Question being asked of the data. Gene expression? Alternative expression? Mutation calling? Tissue type, RNA preparation, quality of input RNA, library construction method, etc. Sequencing type: read length, paired vs. unpaired, etc. Computational approach and resources Identify publications with similar goals Pilot experiment Good news: 1/8th -1 lane of recent Illumina HiSeq data should be enough for most purposes

Coverage versus depth

Common questions: What mapping strategy should I use for RNA-seq? Depends on read length < 50 bp reads Use aligner like BWA and a genome + junction database Junction database needs to be tailored to read length Or you can use a standard junction database for all read lengths and an aligner that allows substring alignments for the junctions only (e.g. BLAST … slow). Assembly strategy may also work (e.g. Trans-ABySS) > 50 bp reads Spliced aligner such as TopHat or Trinity

Common questions: how reliable are expression predictions from RNA-seq? Are novel exon-exon junctions real? What proportion validate by RT-PCR and Sanger sequencing? Are differential/alternative expression changes observed between tissues accurate? How well do differential expression values correlate with qPCR? 384 validations qPCR, RT-PCR, Sanger sequencing See ALEXA-Seq publication for details: Also includes comparison to microarrays Griffith et al. Alternative expression analysis by RNA sequencing. Nature Methods. 2010 Oct;7(10):843-847.

Common questions: How many replicates? As many as you can afford Tophat/Cufflinks statistics work best with three or more biological replicates

Validation (qualitative) 33 of 192 assays shown. Overall validation rate = 85%

RNA-seq vs Microarray

Spike-in controls How can you identify limits of detection and ensure your data can be compared to future platforms or new library prep methods? (e.g. How does Oxford Nanopore compare to Illumina sequencing?) Spike-in RNA to your total RNA which has a known concentration http://tools.invitrogen.com/content/sfs/manuals/4455352C.pdf Cost - $20 per sample

RNA-seq spike-in protocol

Assessing lower limit of detection

Assessing fold change response

Take home Good quality total RNA of 1-10ug Have 3 or more biological replicates Unless you have good reason, use a Script-seq type protocol Use a standard spike-in as an internal control and to ensure samples can be compared across platforms Don’t forget to validate key findings with qPCR!