Ji-hye Choi 2014.11.20 24 August 2014. Introduction (2006) ABRF-NGS (the Association fo Biomolecular Resource Facilities next-generation sequencing study)

Slides:



Advertisements
Similar presentations
RNA-Seq as a Discovery Tool
Advertisements

RNA-seq library prep introduction
Capturing the chicken transcriptome with PacBio long read RNA-seq data OR Chicken in awesome sauce: a recipe for new transcript identification Gladstone.
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
An Introduction to Studying Expression Data Through RNA-seq
RNAseq.
We processed six samples in triplicate using 11 different array platforms at one or two laboratories. we obtained measures of array signal variability.
Transcriptomics Breakout. Topics Discussed Transcriptomics Applications and Challenges For Each Systems Biology Project –Host and Pathogen Bacteria Viruses.
Transcriptome Sequencing with Reference
Peter Tsai Bioinformatics Institute, University of Auckland
RNA-seq: the future of transcriptomics ……. ?
Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work.
Data Analysis for High-Throughput Sequencing
Greg Phillips Veterinary Microbiology
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Transcriptomics Jim Noonan GENE 760.
Tyson A. Clark, Ph.D. February 11, 2015
Ribosomal Profiling Data Handling and Analysis
High Throughput Sequencing
mRNA-Seq: methods and applications
Li and Dewey BMC Bioinformatics 2011, 12:323
Expression Analysis of RNA-seq Data
Todd J. Treangen, Steven L. Salzberg
Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.
Bioinformatics Institute work with ASAS Genomics Centre By Dan Jones.
Amandine Bemmo 1,2, David Benovoy 2, Jacek Majewski 2 1 Universite de Montreal, 2 McGill university and Genome Quebec innovation centre Analyses of Affymetrix.
RNAseq analyses -- methods
RNA-Seq Analysis Simon V4.1.
ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2.
Verna Vu & Timothy Abreo
The iPlant Collaborative
Comparison of Microarray Data Generated from Degraded RNA using Five Different Target Synthesis Methods and Commercial Microarrays Scott Tighe and Tim.
Glue Grant Human Transcriptome Array. 2 Affymetrix Confidential PNAS (9) ; published ahead of print February 11, 2011, doi: /pnas
E XOME SEQUENCING AND COMPLEX DISEASE : practical aspects of rare variant association studies Alice Bouchoms Amaury Vanvinckenroye Maxime Legrand 1.
Transcriptomics Sequencing. over view The transcriptome is the set of all RNA molecules, including mRNA, rRNA, tRNA, and other non coding RNA produced.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Log 2 (expression) H3K4me2 score A SLAMF6 log 2 (expression) Supplementary Fig. 1. H3K4me2 profiles vary significantly between loci of genes expressed.
Introduction to RNAseq
Geuvadis Analysis Meeting 16/02/2012 Micha Sammeth CNAG – Barcelona.
( ) 2.2 FC autism control FGR1OP2. ( ) 1.4 FC autism control SMAD2.
Biases in RNA-Seq data. Transcript length bias Two transcripts of length 50 and 100 have the same abundance in a control sample. The expression of both.
No reference available
Affymetrix User’s Group Meeting Boston, MA May 2005 Keynote Topics: 1. Human genome annotations: emergence of non-coding transcripts -tiling arrays: study.
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
RNA Sequencing and transcriptome reconstruction Manfred G. Grabherr.
Canadian Bioinformatics Workshops
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
Canadian Bioinformatics Workshops
Library QA & QC Day 1, Video 3
Conclusions Smart-Seq2 is a single-cell RNA- seq method that generates cDNA libraries with longer average size and higher yield compared to SMARTer™. Smart-seq2.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Simon v RNA-Seq Analysis Simon v
RNA-Seq for the Next Generation RNA-Seq Intro Slides
RNA-Seq analysis in R (Bioconductor)
Lab meeting
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
Comparison of Clinical Targeted Next-Generation Sequence Data from Formalin-Fixed and Fresh-Frozen Tissue Specimens  David H. Spencer, Jennifer K. Sehn,
Expression profiling of snoRNAs in normal hematopoiesis and AML
Christopher R. Cabanski, Vincent Magrini, Malachi Griffith, Obi L
Adrien Le Thomas, Georgi K. Marinov, Alexei A. Aravin  Cell Reports 
Alternative Splicing QTLs in European and African Populations
Volume 5, Issue 4, Pages e4 (October 2017)
Alterations in mRNA 3′ UTR Isoform Abundance Accompany Gene Expression Changes in Human Huntington’s Disease Brains  Lindsay Romo, Ami Ashar-Patel, Edith.
Predicting Gene Expression from Sequence
Universal Alternative Splicing of Noncoding Exons
Sequence Analysis - RNA-Seq 2
Sequence Analysis - RNA-Seq 1
Volume 11, Issue 7, Pages (May 2015)
Derek de Rie and Imad Abuessaisa Presented by: Cassandra Derrick
Presentation transcript:

Ji-hye Choi August 2014

Introduction (2006) ABRF-NGS (the Association fo Biomolecular Resource Facilities next-generation sequencing study) Prior work comparing microarray platforms and methods(MAQC)  high inter-platform concordance

(2013) Introduction (2014)

Introduction Considered variables on this study -library preparation methods (poly-A-enriched and ribo-depleted) -size-specific fractionation (1,2 and 3kb) -damaging effects of tissue fixation with formalin: RNA integrity (using heat, RNase A and sonication to degrade the RNA) -sequencing platforms (Illumina HiSeq, Life Technologies PGM and Proton, Pacific Biosciences RS, Roche 454) The largest comprehensive comparison of results from degraded, full-length and size-selected RNA across sequencing platforms and protocols.

RNA used in the ABRF-NGS study RNA samples: Well-characterized and have been used for many benchmarking studies. - MAQC A: Universal Human Reference RNA (740000, Agilent Technologies) - MAQC B: Ambion First Choice Human Brain Reference RNA (AM6000, Life Technologies) - ERCC spike-in synthetic transcripts ( , Life Technologies) were added to A and B as control

Figure1. Experimental design and sequencing platforms. *site: Lab

Mapping rate and quality To map the sequencing reads to the human genome (hg19) -vendor-recommended alignment algorithms -ELAND (HiSeq) -TMAP (PGM and Proton) -GSRM (454) -GMAP (PacBio) -‘universal’ platform-agnostic aligners -STAR  Each platform-specific algorithm produced better mapping rates with the exception of ELAND Quality mesearment using mismatch rate -single-base substitutions: 0.6~7.1% -Insertion/deletion type mismatch rates: % -TMAP, GSRM vs STAR  tradeoff between mapping rate and accuracy Fig1b 99% 90%

Coverage of genes Figure2. Transcript coverage across all genes detected. The ribo-depleted RNA samples consistently showed more uniform gene coverage than did poly-A-selected libraries. “banding” or altered coverage distributions by the use of a different library kit version (Illumina RNA truSeq v3) at site W. long-read technology (PacBio): the highest and most uniform coverage of full-length transcripts with enrichment for both the 3’ poly-A tail and an antibody for the 5’ methylguanylate cap. Most platform showed less coverage at the 5’ and 3’ end of the transcripts. *common detected gene  100 segment

Reproducibly detect and quantify genes and splice junctions Figure3. Intra- and inter-platform variation of RNA-seq transcript metrics. (a) Inter-site CV (Coefficient variation) of normalized gene expression for transcripts detected across all sites.  illumina (b) Inter-platform and intra-platform normalized gene expression Spearman correlation coefficients for samples A and B.  overall high (d,e) Sequenced bases (depth) were plotted against the number of detected genes or the number of detected splice junctions for known GENCODE junctions. (f) More efficient splice junction detection was observed for long read platforms (PAC, 454) (g) Most known junctions were detected by three or more platforms, indicating concordance among RNA-seq methods. The novel junctions defined by independent observation on three or more platforms were less numerous than known junctions. known: higher inter-platform agreement Novel: lower concordance than known (due to lower expression levels) *CV: 변동계수 *PacBio was not included because of low read count for many genes 250bp*2 FigS26 read depth & #generead depth & #junctionread length & #junction

Splice junction detection and DEGs Figure4. Inter-platform consistency of splicing and differential expression analysis. (a) As a representative plot for RNA splicing, transcripts from the SPR9 gene are shown in a sashimi plot across five platforms and two ilumina library protocols. (b) Starting from the set of genes detected at any expression level on all platforms, the numbers of A versus B differentially expressed genes uniquely or repeatedly detected at statistically significant thresholds (FDR 2) are shown; sets of greater than 1,000 genes are indicated in red, 100–999 in yellow. Rare-isoform quantification benefits the most from greater read depths (Illumina, Proton). Uniformity of coverage across exons is improved with long-read technology (PacBio). The ability of each platform to detect differentially expressed genes (DEGs; A vs B). Although high read-depth platforms showed greater DEG overlap, each platform produced unique subsets (6-11%). 2 isoforms 3 isoforms

Influence of library preparation on transcriptome profiles Figure5. Differentially expressed genes in ribo-depleted and poly-A-enriched libraries. poly-A enrichment vs ribosomal RNA depletion (a) The percentage of reads that map to various gene sequence categories. -read source distribution is very different -intron: 40-47% vs 7-12% -ribo-depleted libraries: noncoding RNAs -poly-A : protein-coding genes, mitochondrial genes (b) DEGs were detected in all pairwise comparisons of the original (A,B) and mixed samples (c) Results were similar for both library types from the common set of detected genes at all fold-change (FC) and false discovery rate (FDR) threshold tested. -The overall numbers of DEGs detected between the biologically distinct samples were also consistent between library preparation methods. (d) Both library types show similar accuracy as evidenced by Mattews Correlation Coefficients (MCC) with RT-qPCR assays. (802 TaqMan assays for these same RNA samples; GSE5350) -similar accuracy

Summary High intraplatform (R>0.86), inter-platform (R>0.83) concordance for expression measures across the deep-count platforms highly variable efficiency and cost for splice junction and variant(?) detection between all platforms. For intact RNA, gene expression profiles from rRNA-depletion and poly-A enrichment are similar. rRNA depletion enables effective analysis of degraded RNA samples. This study provides a broad foundation for cross-platform standardization, evaluation and improvement of RNA-seq