Scalable Algorithms for Next-Generation Sequencing Data Analysis Ion Mandoiu UTC Associate Professor in Engineering Innovation Department of Computer Science & Engineering
Next Generation Sequencing Roche/454 Illumina HiSeq SOLiD 5500 Ion Proton PacBio RS Oxford Nanopore
3 Ongoing Projects Transcriptome Analysis -Transcriptome quantification and differential expression analysis -Computational deconvolution of heterogeneous samples -Transcriptome and meta-transcriptome assembly Viral quasispecies -Quasispecies reconstruction from NGS reads -IBV evolution and vaccine optimization -Transmission graphs Immunoinformatics -Genomics-guided immunotherapy -Deep panning for early cancer detection Sequencing error correction, genome assembly and scaffolding, metabolomics, biomarker selection, … -More info & software at
Transcriptome Quantification RNA-PhASE pipeline for allele-specific isoform expression ABC AC IsoEM algorithm for isoform expression estimation - Incorporates fragment length distribution, hexamer bias correction, … Ion Torrent MAQC datasets
Differential Expression Fast estimation enables the use of accurate bootstrapping-based methods MAQC 454 datasets UHRR SRX vs HBRR SRX002935
Computational Deconvolution of Heterogeneous Samples Goal: characterization expression of mesoderm progenitor cells – Whole-transcriptome expression data for NSB cell mixtures + single-cell qPCR data for few genes Three step approach – Cluster of single cell qPCR data and infer “reduced” cell type signatures – Infer mixing proportions based on reduced signatures using quadratic programming – Infer full expression signatures based on mixing proportions, solving one quadratic program per gene
t 1 : t 2 : t 3 :t 4 : Reference-Guided Transcriptome Reconstruction
TRIP: Transciptome Reconstruction using Integer Programming Select the smallest set of putative transcripts that yields a good statistical fit between – empirically determined during library preparation – implied by “mapping” read pairs Mean : 500; Std. dev. 50
De Novo (Meta)Transcriptome Assembly of Bugula Neritina and its Symbiont Uncultured bacterial symbiont produces bryostatins - Symbiont absent in Northern Atlantic populations
De Novo (Meta)Transcriptome Assembly of Bugula Neritina and its Symbiont Developing scalable multi-sample meta transcriptome assembly pipeline based on differential-coverage clustering of reads
Acknowledgements Sahar Al Seesi Abdul Banday Amir Bayegan Gabriel Ilie Caroline Jakuba James Lindsay Rahul Kanadia Craig Nelson Marius Nicolae Adrian Caciula Nicole Lopanik Serghei Mangul Yvette Temate Tiagueu Alex Zelikovsky