Scalable Algorithms for Next-Generation Sequencing Data Analysis Ion Mandoiu UTC Associate Professor in Engineering Innovation Department of Computer Science.

Slides:



Advertisements
Similar presentations
Towards Personalized Genomics-Guided Cancer Immunotherapy Ion Mandoiu Department of Computer Science & Engineering Joint work with Sahar Al Seesi (CSE)
Advertisements

Marius Nicolae Computer Science and Engineering Department
Alex Zelikovsky Department of Computer Science Georgia State University Joint work with Serghei Mangul, Irina Astrovskaya, Bassam Tork, Ion Mandoiu Viral.
 Experimental Setup  Whole brain RNA-Seq Data from Sanger Institute Mouse Genomes Project [Keane et al. 2011]  Synthetic hybrids with different levels.
Next-generation sequencing
(A) Mutations within neoepitopes lead to structural alterations across the peptide backbone, as illustrated with structural snapshots from the simulations.
The 454 and Ion PGM at the Genomics Core Facility Dr. Deborah Grove, Director for Genetic Analysis Genomics Core Facility Huck Institutes of the Life Sciences.
Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work.
Next-generation sequencing – the informatics angle Gabor T. Marth Boston College Biology Department AGBT 2008 Marco Island, FL. February
Next-generation sequencing and PBRC. Next Generation Sequencer Applications DeNovo Sequencing Resequencing, Comparative Genomics Global SNP Analysis Gene.
Greg Phillips Veterinary Microbiology
Bioinformatics pipeline for detection of immunogenic cancer mutations by high throughput mRNA sequencing Jorge Duitama 1, Ion Mandoiu 1, and Pramod Srivastava.
Estimation of alternative splicing isoform frequencies from RNA-Seq data Ion Mandoiu Computer Science and Engineering Department University of Connecticut.
Bioinformatics Methods for Diagnosis and Treatment of Human Diseases Jorge Duitama Dissertation Defense for the Degree of Doctorate in Philosophy Computer.
Ion Mandoiu Computer Science and Engineering Department
Next-generation sequencing: informatics & software aspects Gabor T. Marth Boston College Biology Department.
Bioinformatics Pipeline for Fosmid based Molecular Haplotype Sequencing Jorge Duitama1,2, Thomas Huebsch1, Gayle McEwen1, Sabrina Schulz1, Eun-Kyung Suk1,
Next-generation sequencing: informatics & software aspects Gabor T. Marth Boston College Biology Department.
Optimal Tag SNP Selection for Haplotype Reconstruction Jin Jun and Ion Mandoiu Computer Science & Engineering Department University of Connecticut.
Bioinformatics Tools for Personalized Cancer Immunotherapy
Estimation of alternative splicing isoform frequencies from RNA-Seq data Ion Mandoiu Computer Science and Engineering Department University of Connecticut.
Next-Generation Sequencing: Challenges and Opportunities Ion Mandoiu Computer Science and Engineering Department University of Connecticut.
Estimation of alternative splicing isoform frequencies from RNA-Seq data Ion Mandoiu Computer Science and Engineering Department University of Connecticut.
Bioinformatics Methods for Diagnosis and Treatment of Human Diseases Jorge Duitama Dissertation Proposal for the Degree of Doctorate in Philosophy Computer.
Towards accurate detection and genotyping of expressed variants from whole transcriptome sequencing data Jorge Duitama 1, Pramod Srivastava 2, and Ion.
Using Random Peptide Phage Display Libraries for early Breast cancer detection Ekaterina Nenastyeva.
High Throughput Sequencing
Reconstruction of Haplotype Spectra from NGS Data Ion Mandoiu UTC Associate Professor in Engineering Innovation Department of Computer Science & Engineering.
Next generation sequencing Xusheng Wang 4/29/2010.
Whole Exome Sequencing for Variant Discovery and Prioritisation
The virochip (UCSF) is a spotted microarray. Hybridization of a clinical RNA (cDNA) sample can identify specific viral expression.
Todd J. Treangen, Steven L. Salzberg
Variables: – T(p) - set of candidate transcripts on which pe read p can be mapped within 1 std. dev. – y(t) -1 if a candidate transcript t is selected,
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
High throughput sequencing: informatics & software aspects Gabor T. Marth Boston College Biology Department BI543 Fall 2013 January 29, 2013.
Next Generation DNA Sequencing
Computational methods for genomics-guided immunotherapy
Adrian Caciula Department of Computer Science Georgia State University Joint work with Serghei Mangul (UCLA) Ion Mandoiu (UCONN) Alex Zelikovsky (GSU)
SIGNAL PROCESSING FOR NEXT-GEN SEQUENCING DATA RNA-seq CHIP-seq DNAse I-seq FAIRE-seq Peaks Transcripts Gene models Binding sites RIP/CLIP-seq.
Identification of Cancer-Specific Motifs in
The iPlant Collaborative
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
Serghei Mangul Department of Computer Science Georgia State University Joint work with Irina Astrovskaya, Marius Nicolae, Bassam Tork, Ion Mandoiu and.
Sahar Al Seesi and Ion Măndoiu Computer Science and Engineering
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
UK NGS Sequencing Update July 2009 Dr Gerard Bishop - Division of Biology Dr Sarah Butcher – Centre for Bioinformatics.
Introduction to RNAseq
Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion Măndoiu, UConn Co-PDs: Mazhar.
Computational methods for genomics-guided immunotherapy Sahar Al Seesi Computer Science & Engineering Department, UCONN Immunology Department, UCONN Health.
Scalable Algorithms for Next-Generation Sequencing Data Analysis Ion Mandoiu UTC Associate Professor in Engineering Innovation Department of Computer Science.
TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.
Lecture-5 ChIP-chip and ChIP-seq
Alex Zelikovsky Department of Computer Science Georgia State University Joint work with Adrian Caciula (GSU), Serghei Mangul (UCLA) James Lindsay, Ion.
An Integer Programming Approach to Novel Transcript Reconstruction from Paired-End RNA-Seq Reads Serghei Mangul Department of Computer Science Georgia.
Engineering magnetosomes to express novel proteins Which ones? Tweaking p18 Linker Deleting or replacing GFP Something else? TRZN Oxalate decarboxylases.
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
KGEM: an EM Error Correction Algorithm for NGS Amplicon-based Data Alexander Artyomenko.
Canadian Bioinformatics Workshops
ICCABS 2013 kGEM: An EM-based Algorithm for Local Reconstruction of Viral Quasispecies Alexander Artyomenko.
Cancer Vaccine Design Ion Mandoiu
Computational methods for genomics-guided immunotherapy
Sahar Al Seesi University of Connecticut CANGS 2017
Genome organization and Bioinformatics
Pairing T-cell Receptor Sequences using Pooling and Min-cost Flows
Dec. 22, 2011 live call UCONN: Ion Mandoiu, Sahar Al Seesi
Presentation transcript:

Scalable Algorithms for Next-Generation Sequencing Data Analysis Ion Mandoiu UTC Associate Professor in Engineering Innovation Department of Computer Science & Engineering

Next Generation Sequencing 2 Roche/454 FLX Titanium Illumina HiSeq 2000 SOLiD 4/5500 Ion Proton Sequencer

Next Generation Sequencing

Re-sequencing De novo sequencing RNA-Seq Non-coding RNAs Structural variation ChIP-Seq Methyl-Seq Shape-Seq Chromosome conformation Viral quasispecies … many more biological measurements “reduced” to NGS sequencing A transformative technology

5 Mandoiu Lab Main Research Areas: Bioinformatics Algorithms Development of Computational Methods for Next-Gen Sequencing Data Analysis Ongoing Projects RNA-Seq Analysis (NSF, NIH, Life Technologies) -Novel transcript reconstruction -Allele-specific isoform expression Viral quasispecies reconstruction (USDA) -IBV evolution and vaccine optimization Sequencing error correction, genome assembly and scaffolding, metabolomics, biomarker selection, … -More info & software at -Computational deconvolution of heterogeneous samples

Epi-Seq Bioinformatics Pipeline Source code & binaries available at

Hybrid Read Alignment Approach mRNA reads Transcript Library Mapping Genome Mapping Read Merging Transcript mapped reads Genome mapped reads Mapped reads More efficient compared to spliced alignment onto genome Stringent filtering: reads with multiple alignments are discarded

Clipping Alignments

Removal of PCR Artifacts

Variant Detection and Genotyping AACGCGGCCAGCCGGCTTCTGTCGGCCAGCAGCCAGGAATCTGGAAACAATGGCTACAGCGTGC AACGCGGCCAGCCGGCTTCTGTCGGCCAGCCGGCAG CGCGGCCAGCCGGCTTCTGTCGGCCAGCAGCCCGGA GCGGCCAGCCGGCTTCTGTCGGCCAGCCGGCAGGGA GCCAGCCGGCTTCTGTCGGCCAGCAGCCAGGAATCT GCCGGCTTCTGTCGGCCAGCAGCCAGGAATCTGGAA CTTCTGTCGGCCAGCCGGCAGGAATCTGGAAACAAT CGGCCAGCAGCCAGGAATCTGGAAACAATGGCTACA CCAGCAGCCAGGAATCTGGAAACAATGGCTACAGCG CAAGCAGCCAGGAATCTGGAAACAATGGCTACAGCG GCAGCCAGGAATCTGGAAACAATGGCTACAGCGTGC Reference genome Locus i RiRi

Variant Detection and Genotyping Pick genotype with the largest posterior probability

Accuracy as Function of Coverage

Haplotyping Somatic cells are diploid, containing two nearly identical copies of each autosomal chromosome – Novel mutations are present on only one chromosome copy – For epitope prediction we need to know if nearby mutations appear in phase LocusMutationAlleles 1SNVC,T 2DeletionC,- 3SNVA,G 4Insertion-,GC LocusMutationHaplotype 1 Haplotype 2 1SNVTC 2DeletionC- 3SNVAG 4Insertion-GC

RefHap Algorithm Reduce the problem to Max-Cut Solve Max-Cut Build haplotypes according with the cut Locus12345 f1f1 *0110 f2f2 110*1 f3f3 1**0* f4f4 *00*1 3 f1f1 1 1 f4f4 f2f2 f3f3 h h

Epitope Prediction J.W. Yedell, E Reits and J Neefjes. Making sense of mass destruction: quantitating MHC class I antigen presentation. Nature Reviews Immunology, 3: , 2003 C. Lundegaard et al. MHC Class I Epitope Binding Prediction Trained on Small Data Sets. In Lecture Notes in Computer Science, 3239: , 2004 Profile weight matrix (PWM) model

Results on Tumor Data

Deep Panning for Early Cancer Detection

Deep Panning for Early Cancer Detection F R D K c E P A D Q V N P R Y L A C E F W Phage envelop Phage DNA Peptide coding sequence Peptide

Deep Panning for Early Cancer Detection Phage library Serum antibodies Another round of selection Incubation Making DNA library from phage DNA Amplification in E.coli Elution of antibody bound phage NextGen Sequencing Generating peptide mimotope profile of serum antibodies

Preliminary Results Overlap for 5-mer Overlap for 6- mer Overlap for 7-mer Two different sera The same serum Two differen t sera The same serum Two different sera The same serum 8.3% 27.6%2.9%20.7% 2.6% 18.8%

Preliminary Results binomial p= ControlCancer peptide ABCEHDFGIJ 7-mer NAVQTMT GPLYSSL mer PIYRSE GVEDRL NPLERN mer GELMT PVEWY GPVEW IVHLQ NAIEL

Ongoing Work: Understanding Cancer Evolution

Acknowledgments Ekaterina Nenastyeva Alexander Zelikovsky Pramod Srivastava Duan Fei Sahar Al Seesi Jorge Duitama Yurij Ionov

Acknowledgements

Questions?