Download presentation
Presentation is loading. Please wait.
1
Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky
2
Introduction EM Algorithm Results Conclusions and future work
3
ABCDE Make cDNA & shatter into fragments Sequence fragment ends Map reads Gene Expression (GE) ABC AC DE Isoform Discovery (ID) Isoform Expression (IE)
4
Read ambiguity (multireads) What is the gene length? ABCDE
5
Ignore multireads [Mortazavi et al. 08] ◦ Fractionally allocate multireads based on unique read estimates [Pasaniuc et al. 10] ◦ EM algorithm for solving ambiguities Gene length: sum of lengths of exons that appear in at least one isoform Underestimate expression levels for genes with 2 or more isoforms [Trapnell et al. 10]
6
ABCDE AC
7
[Jiang&Wong 09] ◦ Poisson model, single reads only [Li et al.10] ◦ EM Algorithm, single reads only [Feng et al. 10] ◦ Convex quadratic program, pairs used only for ID [Trapnell et al. 10] ◦ Extends Jiang’s model to paired reads ◦ Fragment length distribution
8
EM Algorithm for IE ◦ Single and paired reads ◦ Fragment length distribution ◦ Strand information ◦ Base quality scores Solving GE by adding isoform levels
9
Introduction EM Algorithm Results Conclusions and future work
11
Paired reads Single reads ABC AC ABC ACAC ABCABC AC ABC AC ABC AC
12
E-step M-step
13
Introduction EM Algorithm Results Conclusions and future work
14
Human genome UCSC known isoforms GNFAtlas2 gene expression levels ◦ Uniform/geometric expression of gene isoforms Normally distributed fragment lengths ◦ Mean 250, std. dev. 25
15
Error Fraction (EF) ◦ Percentage of isoforms (or genes) with relative error larger than given threshold t Median Percent Error (MPE) ◦ Threshold t for which EF is 50% r2r2 ◦ Coefficient of determination
16
30M single reads of length 25 Main difference b/w IsoEM and RSEM is fragment length modeling
17
30M single reads of length 25
18
Fixed sequencing throughput (750Mb) 50bp reads better than 100bp!
19
1-60M 75bp reads Pairs help, strand info doesn’t [Trapnell et al. 10] r 2 =.95 for 13M PE reads
20
Introduction EM Algorithm Results Conclusions and future work
21
Presented EM algorithm for isoform frequency estimation that exploits fragment length distribution for both single and paired reads ◦ Significant accuracy improvement over existing methods ◦ Code and datasets to be released publicly soon Ongoing extensions ◦ Confidence intervals ◦ Allelic specific isoform expression ◦ Testing for novel isoforms ◦ Integration with isoform discovery
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.