Peter Tsai Bioinformatics Institute, University of Auckland

Slides:



Advertisements
Similar presentations
RNA-Seq as a Discovery Tool
Advertisements

RNA-seq library prep introduction
Functional Genomics with Next-Generation Sequencing
RNA-Seq based discovery and reconstruction of unannotated transcripts
IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
12/04/2017 RNA seq (I) Edouard Severing.
Simon v2.3 RNA-Seq Analysis Simon v2.3.
Transcriptome Sequencing with Reference
DEG Mi-kyoung Seo.
RNA-seq: the future of transcriptomics ……. ?
Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work.
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Transcriptomics Jim Noonan GENE 760.
RNA-seq Analysis in Galaxy
NGS Transcriptomic Workflows Hugh Shanahan & Jamie al-Nasir Royal Holloway, University of London.
Biases in RNA-Seq data Aim: to provide you with a brief overview of biases in RNA-seq data such that you become aware of this potential problem (and solutions)
mRNA-Seq: methods and applications
RNA-Seq and RNA Structure Prediction
NGS Analysis Using Galaxy
Expression Analysis of RNA-seq Data
Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
Bioinformatics Institute work with ASAS Genomics Centre By Dan Jones.
RNAseq analyses -- methods
Experimental validation. Integration of transcriptome and genome sequencing uncovers functional variation in human populations Tuuli Lappalainen et al.
Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.
TopHat Mi-kyoung Seo. Today’s paper..TopHat Cole Trapnell at the University of Washington's Department of Genome Sciences Steven Salzberg Center.
RNA-Seq Analysis Simon V4.1.
Adrian Caciula Department of Computer Science Georgia State University Joint work with Serghei Mangul (UCLA) Ion Mandoiu (UCONN) Alex Zelikovsky (GSU)
Transcriptome Analysis
RNA-seq workshop ALIGNMENT
The iPlant Collaborative
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Transcriptomics Sequencing. over view The transcriptome is the set of all RNA molecules, including mRNA, rRNA, tRNA, and other non coding RNA produced.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Introduction to RNAseq
RNA-seq: Quantifying the Transcriptome
TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.
The iPlant Collaborative
No reference available
Manuel Holtgrewe Algorithmic Bioinformatics, Department of Mathematics and Computer Science PMSB Project: RNA-Seq Read Simulation.
An Integer Programming Approach to Novel Transcript Reconstruction from Paired-End RNA-Seq Reads Serghei Mangul Department of Computer Science Georgia.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Aim: to provide you with a brief overview of biases in RNA-seq data such that you become aware of this potential problem (and solutions) Biases in RNA-Seq.
RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.
RNA-Seq Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Simon v RNA-Seq Analysis Simon v
RNA Quantitation from RNAseq Data
Moderní metody analýzy genomu
Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017
Gene expression from RNA-Seq
RNA-Seq analysis in R (Bioconductor)
Lab meeting
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
High-Throughput Analysis of Genomic Data [S7] ENRIQUE BLANCO
Kallisto: near-optimal RNA seq quantification tool
Gene expression estimation from RNA-Seq data
From: TopHat: discovering splice junctions with RNA-Seq
Transcriptome analysis
RNA sequencing (RNA-Seq) and its application in ovarian cancer
Quantitative analyses using RNA-seq data
Sequence Analysis - RNA-Seq 2
Presentation transcript:

Peter Tsai Bioinformatics Institute, University of Auckland RNA Sequencing Peter Tsai Bioinformatics Institute, University of Auckland

What is RNA-seq? Study of transcriptomes Identify known genes, exons, splicing events, ncRNA, miRNA Novel genes or transcripts Abundances of transcripts (quantitive expression) Differential expressed transcripts between different conditions Reconstructing transcriptome.

General workflow Raw data QC De novo transcriptome assembly Map to reference genome De novo transcriptome assembly Require downstream annotation Estimate abundance Normalisation Differential expression analysis

Quality checks and mapping Use FastQC, SolexQA Trim off low quality region, keep only proper-paired reads Most QC software assume normality, but in RNA-seq data you will probably see none-normality You might see some duplicated reads, its probably due to highly expressed gene. Specific reference mapping tool that can map across splice junctions between exons, i.e. Tophat Specific de novo transcriptome assembly software for reconstruction of transcriptomes from RNA-seq data, i.e. Trinity

Expression value in RNA-seq The total number of reads mapped to a gene/transcript (Count data or raw counts or digital gene expression) Complexity of using simple counts Sequencing depth: the higher the sequencing depth, the higher the counts Gene length: Counts are proportional to the length of the gene times mRNA expression level Counts distribution: difference on how counts are distributed among samples.

Normalisation RPKM (Mortazavi et al, 2008) Reads Per Kilobase of exon model per Million mapped reads FPKM (Mortazavi et al, 2010) Fragments Per Kilobase of exon model per Million mapped reads Paired-end RNA-Seq experiments produce two reads per fragment, but that doesn't necessarily mean that both reads will be mappable.

Data exploration Replicate 2 Replicate 1

 Gene.ID/Description logFC logCPM LR PValue FDR 1 2.563086301 5.07961611 28.4599795 9.57E-08 2.72E-05 2 4.003686266 2.330395704 28.3288251 1.02E-07 3 2.71372512 9.704651395 25.01930526 5.68E-07 0.000100653 4 -2.052703196 3.402621025 21.11492168 4.33E-06 0.000575287 5 1.95117636 4.438847349 19.21195535 1.17E-05 0.001244651 6 2.465833373 12.20593577 10.91756889 0.000952565 0.084460792 7 1.817858683 5.308092036 10.3738524 0.001278126 0.097137553 8 1.577603322 6.556675456 9.690419768 0.001852312 0.110687766 9 1.20515812 4.542565518 9.670466698 0.001872537 10 1.233090336 10.08249873 9.289827985 0.002304298 0.122588652 11 1.120581944 12.14988136 7.710102379 0.005491264 0.265577482 12 1.045292369 4.913492018 7.039209923 0.00797442 0.350270537 13 1.089867189 3.885246135 6.912558621 0.008559242 14 1.353955354 2.21406615 5.976193603 0.014500264 0.551010036 15 1.049933686 3.281031472 5.737563572 0.016605812 0.588952795 16 -1.032999983 1.480514873 4.712476717 0.029944481 0.995653998 17 -1.313778857 4.325330722 4.169234925 0.041164384 0.998742102 18 0.864451602 4.338668381 3.479808135 0.062121942 19 -0.766266641 5.2972332 3.443865378 0.063486998

Up-regulated Down-regulated

ERCC spike-in control Set of external RNA transcripts with known concentration. Dynamic range and lower limit of detection Fold-change response Internal control, in order to measure against defined performance criteria

Dynamic range and lower limit of detection The dynamic range can be measured as the difference between the highest and lowest concentration. Measure of sensitivity, and it is defined as the lowest molar amount of ERCC transcript detected in each sample The dynamic range can be measured as the difference between the highest and lowest concentration of ERCC transcript detected in each sample. The LLD is a measure of sensitivity, and it is defined as the lowest molar amount of ERCC transcript detected in each sample, with user-defined threshold values for determining detection. This translates to ~323,000 control molecules detected per 100 ng poly(A) RNA.

Fold-change response

How much library depth is needed for RNA-seq? Depends on a number of factors Biological questions Complexity of the organism Types of analysis Types of RNA, miRNA, lncRNA. Literature search for similar work Pilot experiment

Summary Have 3 or more biological replicates Analysis your data with different normalisation methods Perform data exploration Use a standard spike-in as internal control Validation with qPCR