Estimation of alternative splicing isoform frequencies from RNA-Seq data Ion Mandoiu Computer Science and Engineering Department University of Connecticut.

Slides:



Advertisements
Similar presentations
Marius Nicolae Computer Science and Engineering Department
Advertisements

RNA-Seq based discovery and reconstruction of unannotated transcripts
Alex Zelikovsky Department of Computer Science Georgia State University Joint work with Serghei Mangul, Irina Astrovskaya, Bassam Tork, Ion Mandoiu Viral.
 Experimental Setup  Whole brain RNA-Seq Data from Sanger Institute Mouse Genomes Project [Keane et al. 2011]  Synthetic hybrids with different levels.
RNAseq.
Model-based species identification using DNA barcodes Bogdan Paşaniuc CSE Department, University of Connecticut Joint work with Ion Măndoiu and Sotirios.
Cufflinks Matt Paisner, Hua He, Steve Smith and Brian Lovett.
Peter Tsai Bioinformatics Institute, University of Auckland
RNA-seq: the future of transcriptomics ……. ?
Marius Nicolae and Ion Măndoiu (University of Connecticut, USA)
Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work.
Presented by: Pham Kien Cuong NUS Graduate School for Integrative Sciences and Engineering.
Data Analysis for High-Throughput Sequencing
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Transcriptomics Jim Noonan GENE 760.
Computational Challenges in Whole-Genome Association Studies Ion Mandoiu Computer Science and Engineering Department University of Connecticut.
RNA-Seq based discovery and reconstruction of unannotated transcripts in partially annotated genomes 3 Serghei Mangul*, Adrian Caciula*, Ion.
Bioinformatics pipeline for detection of immunogenic cancer mutations by high throughput mRNA sequencing Jorge Duitama 1, Ion Mandoiu 1, and Pramod Srivastava.
Estimation of alternative splicing isoform frequencies from RNA-Seq data Ion Mandoiu Computer Science and Engineering Department University of Connecticut.
Ion Mandoiu Computer Science and Engineering Department
1 Nicholas Mancuso Department of Computer Science Georgia State University Joint work with Bassam Tork, GSU Pavel Skums, CDC Ion M ӑ ndoiu, UConn Alex.
Genotype Error Detection using Hidden Markov Models of Haplotype Diversity Justin Kennedy, Ion Mandoiu, Bogdan Pasaniuc CSE Department, University of Connecticut.
Optimal Tag SNP Selection for Haplotype Reconstruction Jin Jun and Ion Mandoiu Computer Science & Engineering Department University of Connecticut.
Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.
Imputation-based local ancestry inference in admixed populations Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint.
Next-Generation Sequencing: Challenges and Opportunities Ion Mandoiu Computer Science and Engineering Department University of Connecticut.
Estimation of alternative splicing isoform frequencies from RNA-Seq data Ion Mandoiu Computer Science and Engineering Department University of Connecticut.
Reconstruction of infectious bronchitis virus quasispecies from 454 pyrosequencing reads CAME 2011 Ion Mandoiu Computer Science & Engineering Dept. University.
Towards accurate detection and genotyping of expressed variants from whole transcriptome sequencing data Jorge Duitama 1, Pramod Srivastava 2, and Ion.
Software for Robust Transcript Discovery and Quantification from RNA-Seq Ion Mandoiu, Alex Zelikovsky, Serghei Mangul.
Reconstruction of Haplotype Spectra from NGS Data Ion Mandoiu UTC Associate Professor in Engineering Innovation Department of Computer Science & Engineering.
LECTURE 2 Splicing graphs / Annoteted transcript expression estimation.
Li and Dewey BMC Bioinformatics 2011, 12:323
Todd J. Treangen, Steven L. Salzberg
Introduction to Short Read Sequencing Analysis
RNAseq analyses -- methods
Variables: – T(p) - set of candidate transcripts on which pe read p can be mapped within 1 std. dev. – y(t) -1 if a candidate transcript t is selected,
TopHat Mi-kyoung Seo. Today’s paper..TopHat Cole Trapnell at the University of Washington's Department of Genome Sciences Steven Salzberg Center.
Adrian Caciula Department of Computer Science Georgia State University Joint work with Serghei Mangul (UCLA) Ion Mandoiu (UCONN) Alex Zelikovsky (GSU)
Computational Methods for Analysis of Single Cell RNA-Seq Data
Multiple Sequence Alignment Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan WWW:
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
Novel transcript reconstruction from ION Torrent sequencing reads and Viral Meta-genome Reconstruction from AmpliSeq Ion Torrent data University of Connecticut.
Serghei Mangul Department of Computer Science Georgia State University Joint work with Irina Astrovskaya, Marius Nicolae, Bassam Tork, Ion Mandoiu and.
Sahar Al Seesi and Ion Măndoiu Computer Science and Engineering
Geuvadis Analysis Meeting 16/02/2012 Micha Sammeth CNAG – Barcelona.
Imputation-based local ancestry inference in admixed populations
Scalable Algorithms for Next-Generation Sequencing Data Analysis Ion Mandoiu UTC Associate Professor in Engineering Innovation Department of Computer Science.
TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.
Alex Zelikovsky Department of Computer Science Georgia State University Joint work with Adrian Caciula (GSU), Serghei Mangul (UCLA) James Lindsay, Ion.
A Maximum Likelihood Method for Quasispecies Reconstruction Nicholas Mancuso, Georgia State University Bassam Tork, Georgia State University Pavel Skums,
Scalable Algorithms for Next-Generation Sequencing Data Analysis Ion Mandoiu UTC Associate Professor in Engineering Innovation Department of Computer Science.
Manuel Holtgrewe Algorithmic Bioinformatics, Department of Mathematics and Computer Science PMSB Project: RNA-Seq Read Simulation.
An Integer Programming Approach to Novel Transcript Reconstruction from Paired-End RNA-Seq Reads Serghei Mangul Department of Computer Science Georgia.
KGEM: an EM Error Correction Algorithm for NGS Amplicon-based Data Alexander Artyomenko.
ICCABS 2013 kGEM: An EM-based Algorithm for Local Reconstruction of Viral Quasispecies Alexander Artyomenko.
RNA Quantitation from RNAseq Data
VCF format: variants c.f. S. Brown NYU
Gene expression from RNA-Seq
Alexander Zelikovsky Computer Science Department
Imputation-based local ancestry inference in admixed populations
Gene expression estimation from RNA-Seq data
Jin Zhang, Jiayin Wang and Yufeng Wu
Computational Methods for Analysis of Single Cell RNA-Seq Data
Inference of alternative splicing from RNA-Seq data with probabilistic splice graphs BMI/CS Spring 2019 Colin Dewey
Multiple Sequence Alignment
Dec. 22, 2011 live call UCONN: Ion Mandoiu, Sahar Al Seesi
Sequence Analysis - RNA-Seq 2
Relative abundance and expression of the 10 most abundant MAGs in the bioreactor at day 96. Relative abundance and expression of the 10 most abundant MAGs.
Presentation transcript:

Estimation of alternative splicing isoform frequencies from RNA-Seq data Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint work with Marius Nicolae, Serghei Mangul, and Alex Zelikovsky

Outline Introduction EM Algorithm Results Conclusions and future work

Alternative Splicing [Nilsen & Graveley 10]

RNA-Seq ABCDE Make cDNA & shatter into fragments Sequence fragment ends Map reads Gene Expression (GE) ABC AC DE Isoform Discovery (ID) Isoform Expression (IE)

Gene Expression Challenges Read ambiguity (multireads) What is the gene length? ABCDE

Previous approaches to GE Ignore multireads [Mortazavi et al. 08] – Fractionally allocate multireads based on unique read estimates [Pasaniuc et al. 10] – EM algorithm for solving ambiguities Gene length: sum of lengths of exons that appear in at least one isoform  Underestimates expression levels for genes with 2 or more isoforms [Trapnell et al. 10]

Read Ambiguity in IE ABCDE AC

Previous approaches to IE [Jiang&Wong 09] – Poisson model, single reads only [Li et al.10] – EM Algorithm, single reads only [Feng et al. 10] – Convex quadratic program, pairs used only for ID [Trapnell et al. 10] – Extends Jiang’s model to paired reads – Fragment length distribution

Our contribution EM Algorithm for IE – Single and paired reads – Fragment length distribution – Strand information – Base quality scores

Read-Isoform Compatibility

Fragment length distribution Paired reads Single reads ABC AC ABC ACAC ABCABC AC ABC AC ABC AC

IsoEM algorithm E-step M-step

Experimental setup Human genome UCSC known isoforms GNFAtlas2 gene expression levels – Uniform/geometric expression of gene isoforms Normally distributed fragment lengths – Mean 250, std. dev. 25

Accuracy measures Error Fraction (EF) – Percentage of isoforms (or genes) with relative error larger than given threshold t Median Percent Error (MPE) – Threshold t for which EF is 50% r 2

Isoform Error Fraction Curves 30M single reads of length 25

Gene Error Fraction Curves 30M single reads of length 25

Read Length Effect Fixed sequencing throughput (750Mb)

Effect of Pairs & Strand Information 1-60M 75bp reads [Trapnell et al. 10] r 2 =.95 for 13M PE reads

Conclusions & Future Work Presented EM algorithm for estimating isoform/gene expression levels – Integrates fragment length distribution, base qualities, pair and strand info – Ongoing work – Confidence intervals – Allelic specific expression – Integration with isoform discovery – Reconstruction & frequency estimation for virus quasispecies

Acknowledgments NSF awards IIS , IIS , and IIS