RNA-seq: the future of transcriptomics ……. ?

Slides:



Advertisements
Similar presentations
RNA-seq library prep introduction
Advertisements

Vanderbilt Center for Quantitative Sciences Summer Institute Sequencing Analysis Yan Guo.
RNAseq.
12/04/2017 RNA seq (I) Edouard Severing.
Simon v2.3 RNA-Seq Analysis Simon v2.3.
Fast and accurate short read alignment with Burrows–Wheeler transform
Transcriptome Sequencing with Reference
Peter Tsai Bioinformatics Institute, University of Auckland
DEG Mi-kyoung Seo.
Data Analysis for High-Throughput Sequencing
Canadian Bioinformatics Workshops
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Transcriptomics Jim Noonan GENE 760.
MCB Lecture #21 Nov 20/14 Prokaryote RNAseq.
Previous Lecture: NGS Alignment
Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.
Gene Expression And Regulation Bioinformatics January 11, 2006 D. A. McClellan
Microarray Type Analyses using Second Generation Sequencing
Biases in RNA-Seq data Aim: to provide you with a brief overview of biases in RNA-seq data such that you become aware of this potential problem (and solutions)
mRNA-Seq: methods and applications
NESCENT : NGS : Measuring expression
1 Introduction to the Analysis of RNAseq Data 4/6/2011 Copyright © 2011 Dan Nettleton These slides are adapted from slides provided by Peng Liu.
Lecture 10. Microarray and RNA-seq
RNA-Seq and RNA Structure Prediction
Brief workflow RNA is isolated from cells, fragmented at random positions, and copied into complementary DNA (cDNA). Fragments meeting a certain size specification.
Li and Dewey BMC Bioinformatics 2011, 12:323
Expression Analysis of RNA-seq Data
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb MPSS Massively Parallel.
Ji-hye Choi August Introduction (2006) ABRF-NGS (the Association fo Biomolecular Resource Facilities next-generation sequencing study)
Todd J. Treangen, Steven L. Salzberg
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
Amandine Bemmo 1,2, David Benovoy 2, Jacek Majewski 2 1 Universite de Montreal, 2 McGill university and Genome Quebec innovation centre Analyses of Affymetrix.
RNAseq analyses -- methods
Lecture 11. Microarray and RNA-seq II
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.
RNA-Seq Analysis Simon V4.1.
Verna Vu & Timothy Abreo
The iPlant Collaborative
1 Identifying differentially expressed genes from RNA-seq data Many recent algorithms for calling differentially expressed genes: edgeR: Empirical analysis.
Tag profiling is dead... October 2009 Claudia Voelckel Patrick Biggs...long live mRNA-Seq!
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Introduction to RNAseq
Analyzing digital gene expression data in Galaxy Supervisors: Peter-Bram A.C. ’t Hoen Kostas Karasavvas Students: Ilya Kurochkin Ivan Rusinov.
RNA-seq: Quantifying the Transcriptome
TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.
No reference available
Lecture 12 RNA – seq analysis.
Transcriptome What is it - genome wide transcript abundance How do you obtain it - Arrays + MPSS What do you do with it when you have it - ?
Gene expression  Introduction to gene expression arrays Microarray Data pre-processing  Introduction to RNA-seq Deep sequencing applications RNA-seq.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Aim: to provide you with a brief overview of biases in RNA-seq data such that you become aware of this potential problem (and solutions) Biases in RNA-Seq.
RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.
RNA-Seq Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Statistics Behind Differential Gene Expression
Simon v RNA-Seq Analysis Simon v
RNA Quantitation from RNAseq Data
An Introduction to RNA-Seq Data and Differential Expression Tools in R
Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017
Gene expression from RNA-Seq
RNA-Seq analysis in R (Bioconductor)
Gene expression estimation from RNA-Seq data
Assessing changes in data – Part 2, Differential Expression with DESeq2
Quantitative analyses using RNA-seq data
Sequence Analysis - RNA-Seq 2
Presentation transcript:

RNA-seq: the future of transcriptomics ……. ? Disclaimer: Tiago Hori is not an expert on RNA-seq

RNA-seq or RNA-sequencing is not a complete novel idea. Wang et al., 2009 RNA-seq or RNA-sequencing is not a complete novel idea. SAGE, long-SAGE, MPSS The recent developments in next-generation sequencing (NGS) have made whole transcriptomic analyses more accessible. Does it work? Comparison with microarrays Advantages and disadvantages How does it work? Challenges Are microarrays going to go extinct? Weapons of choice:

Marioni et al., 2008 There is a good correlation between microarray intensity and count data. There is also good correlation between Affymetrix fold-changes and Illumina-based RNA-seq fold-changes

The Pros and Cons of RNA-seq – do the benefits definitely outweigh the problems? Advantages: Allows for not only the identification of differentially expressed genes, but also identification of differential allelic expression, SNPs, splice variants, new genes or isoforms. It is not limited to a set number of probes. It is NOT impacted by background signal or saturation that causes problems in studying high- and low-expression transcripts. Wang et al, 2009

The Pros and Cons of RNA-seq – do the benefits definitely outweigh the problems? Disadvantages: Cost Dependent on a reference genome or transcriptome. * see Trapnell et al., 2010 – Nature Biotechnology (used 430 million paired-end reads to assemble a transcriptome de-novo Large amounts of data requiring large storage space and computational power Statistical methods are still in their infancy

How does it work? Agilent polyA selection NibleGen selection array Generation of target cDNA (sequence specific, e.g. for allele discrimination) Helicos sequencing Ozsolak and Milos, 2011

How does it work? Oshlack et al., 2010

Mapping Challenges: Computational power required Exon junctions Alleles and SNPs Two main methods: Based on hash tables (local alignment similar to BLAST) Based on prefix/suffix trie

BFAST BWA-SW Homer et al., 2009 Li and Durbin et al., 2010 One of the biggest challenge with mapping is to reduce the “RAM footprint” of the reference genome. This is accomplished by different ways of indexing the reference. The other challenge is to map accurately while allowing for variable reads (e.g. SNPs or error) to be mapped.

Data summarization: There are 3 main ways of summarizing your data: Counts per exon Counts per transcript Counts per gene (Oshlack et al., 2010)

Is RNA-seq data absolute mRNA count? Normalization: Is RNA-seq data absolute mRNA count? Within libraries: Length bias Sequencing efficiency Between libraries: Sequencing depth Over-representation of highly-expressed transcripts

Differential Expression detection: Challenges: Requires biological replication but perhaps not technical replication. Count data is discrete rather than continuous. There is evidence the count data follow a negative binomial distribution similar to the Poisson distribution. Accounting for type I error (False-Discovery) Bioconductor packages: edgeR: Developed for SAGE uses a modified Fisher exact test for dispersed data (means and variance estimated using maximum likelihood) DESeq: Similar to edgeR but uses a different model to estimate means and variance (empirical estimation of mean-variance relationship) BaySeq: Empirical Bayes inference to test of differential expression

What do you do with data and what does it all mean? Systems Biology: DAVID and other microarray techniques used for GO enrichment KEGG pathways What do you do with data and what does it all mean? Resources: