Download presentation
Presentation is loading. Please wait.
Published byLaurence Oliver Modified over 9 years ago
1
Marius Nicolae and Ion Măndoiu (University of Connecticut, USA)
Accurate Estimation of Gene Expression Levels from Digital Gene Expression Sequencing Data Marius Nicolae and Ion Măndoiu (University of Connecticut, USA)
2
Outline DGE/SAGE-Seq protocol EM algorithm Experimental results
Conclusions
3
RNA-Seq Protocol Make cDNA & shatter into fragments
Sequence fragment ends A B C D E Map reads A B C D E Isoform Discovery (ID) Isoform Expression (IE) Gene Expression (GE)
4
DGE Protocol Cleave with anchoring enzyme (AE)
AAAAA Cleave with anchoring enzyme (AE) AAAAA CATG AE TCCRAC AAAAA CATG AE TE Attach primer for tagging enzyme (TE) Cleave with tagging enzyme CATG Map tags A B C D E Gene Expression (GE)
5
Our Approach Previous methods New DGE-EM algorithm
Discard ambiguous tags [Asmann et al. 09, Zaretzki et al. 10] Heuristics to rescue some ambiguous tags [Wu et al. 10] New DGE-EM algorithm Uses all tags, including all ambiguous ones Uses quality scores Takes into account partial digest and gene isoforms
6
Tag Formation Probability
7
Tag-Isoform Compatibility
8
DGE-EM Algorithm assign random values to all f(i) while not converged
E-step init all n(i,j) to 0 for each tag t for (i,j,w) in t M-step for each isoform i
9
MAQC Data (UHRR, HBRR) DGE RNA-Seq qPCR
9 Illumina libraries, 238M 20bp tags [Asmann et al. 09] Anchoring enzyme DpnII (GATC) RNA-Seq 6 libraries, 47-92M 35bp reads each [Bullard et al. 10] qPCR Quadruplicate measurements for 832 Ensembl genes [MAQC Consortium 06]
10
Compared Algorithms DGE RNA-Seq
Uniq [Asmann et al. 09, Zaretzki et al. 10] DGE-EM RNA-Seq IsoEM [Nicolae et al. 10] Cufflinks [Trapnell et al. 10]
11
DGE-EM vs. Uniq on HBRR Library 4
12
DGE vs. RNA-Seq
13
DGE vs. RNA-Seq
14
DGE vs. RNA-Seq
15
Synthetic Data 1-30M tags, lengths 14-26bp
UCSC hg19 genome and known isoforms Simulated expression levels Gene expression for 5 tissues from the GNFAtlas2 Geometric expression for the isoforms of each gene Anchoring enzymes from REBASE DpnII (GATC) [Asmann et al. 09] NlaIII (CATG) [Wu et al. 10] CviJI (RGCY, R=G or A, Y=C or T) 15
16
MPE for 30M 21bp tags RNA-Seq: 8.3 MPE
17
Conclusions Introduced new DGE-EM algorithm
Improves accuracy over previous methods by using ambiguous tags and considering isoforms and partial digestion Source code freely availabe at First direct comparison of RNA-Seq and DGE protocols Best inference algorithms yield comparable cost-normalized accuracy on MAQC data Simulations suggest possible DGE protocol improvements Enzymes with degenerate recognition sites (e.g. CviJI) Optimizing cutting probability
18
Work supported in part by NSF awards IIS-0546457 and IIS-0916948
Questions? ACKNOWLEDGEMENTS Work supported in part by NSF awards IIS and IIS
19
Anchoring Enzyme Statistics
20
RNA-Seq
21
DGE enzyme GATC p=1.0
22
DGE enzyme CATG p=1.0
23
DGE enzyme RGCY p=1.0
24
DGE enzyme GATC p=.5
25
DGE enzyme CATG p=.5
26
DGE enzyme RGCY p=.5
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.