Presentation is loading. Please wait.

Presentation is loading. Please wait.

Marius Nicolae and Ion Măndoiu (University of Connecticut, USA)

Similar presentations


Presentation on theme: "Marius Nicolae and Ion Măndoiu (University of Connecticut, USA)"— Presentation transcript:

1 Marius Nicolae and Ion Măndoiu (University of Connecticut, USA)
Accurate Estimation of Gene Expression Levels from Digital Gene Expression Sequencing Data Marius Nicolae and Ion Măndoiu (University of Connecticut, USA)

2 Outline DGE/SAGE-Seq protocol EM algorithm Experimental results
Conclusions

3 RNA-Seq Protocol Make cDNA & shatter into fragments
Sequence fragment ends A B C D E Map reads A B C D E Isoform Discovery (ID) Isoform Expression (IE) Gene Expression (GE)

4 DGE Protocol Cleave with anchoring enzyme (AE)
AAAAA Cleave with anchoring enzyme (AE) AAAAA CATG AE TCCRAC AAAAA CATG AE TE Attach primer for tagging enzyme (TE) Cleave with tagging enzyme CATG Map tags A B C D E Gene Expression (GE)

5 Our Approach Previous methods New DGE-EM algorithm
Discard ambiguous tags [Asmann et al. 09, Zaretzki et al. 10] Heuristics to rescue some ambiguous tags [Wu et al. 10] New DGE-EM algorithm Uses all tags, including all ambiguous ones Uses quality scores Takes into account partial digest and gene isoforms

6 Tag Formation Probability

7 Tag-Isoform Compatibility

8 DGE-EM Algorithm assign random values to all f(i) while not converged
E-step init all n(i,j) to 0 for each tag t for (i,j,w) in t M-step for each isoform i

9 MAQC Data (UHRR, HBRR) DGE RNA-Seq qPCR
9 Illumina libraries, 238M 20bp tags [Asmann et al. 09] Anchoring enzyme DpnII (GATC) RNA-Seq 6 libraries, 47-92M 35bp reads each [Bullard et al. 10] qPCR Quadruplicate measurements for 832 Ensembl genes [MAQC Consortium 06]

10 Compared Algorithms DGE RNA-Seq
Uniq [Asmann et al. 09, Zaretzki et al. 10] DGE-EM RNA-Seq IsoEM [Nicolae et al. 10] Cufflinks [Trapnell et al. 10]

11 DGE-EM vs. Uniq on HBRR Library 4

12 DGE vs. RNA-Seq

13 DGE vs. RNA-Seq

14 DGE vs. RNA-Seq

15 Synthetic Data 1-30M tags, lengths 14-26bp
UCSC hg19 genome and known isoforms Simulated expression levels Gene expression for 5 tissues from the GNFAtlas2 Geometric expression for the isoforms of each gene Anchoring enzymes from REBASE DpnII (GATC) [Asmann et al. 09] NlaIII (CATG) [Wu et al. 10] CviJI (RGCY, R=G or A, Y=C or T) 15

16 MPE for 30M 21bp tags RNA-Seq: 8.3 MPE

17 Conclusions Introduced new DGE-EM algorithm
Improves accuracy over previous methods by using ambiguous tags and considering isoforms and partial digestion Source code freely availabe at First direct comparison of RNA-Seq and DGE protocols Best inference algorithms yield comparable cost-normalized accuracy on MAQC data Simulations suggest possible DGE protocol improvements Enzymes with degenerate recognition sites (e.g. CviJI) Optimizing cutting probability

18 Work supported in part by NSF awards IIS-0546457 and IIS-0916948
Questions? ACKNOWLEDGEMENTS Work supported in part by NSF awards IIS and IIS

19 Anchoring Enzyme Statistics

20 RNA-Seq

21 DGE enzyme GATC p=1.0

22 DGE enzyme CATG p=1.0

23 DGE enzyme RGCY p=1.0

24 DGE enzyme GATC p=.5

25 DGE enzyme CATG p=.5

26 DGE enzyme RGCY p=.5


Download ppt "Marius Nicolae and Ion Măndoiu (University of Connecticut, USA)"

Similar presentations


Ads by Google