Bioinformatics pipeline for detection of immunogenic cancer mutations by high throughput mRNA sequencing Jorge Duitama 1, Ion Mandoiu 1, and Pramod Srivastava.

Slides:



Advertisements
Similar presentations
Towards Personalized Genomics-Guided Cancer Immunotherapy Ion Mandoiu Department of Computer Science & Engineering Joint work with Sahar Al Seesi (CSE)
Advertisements

Marius Nicolae Computer Science and Engineering Department
Vanderbilt Center for Quantitative Sciences Summer Institute Sequencing Analysis Yan Guo.
Hotspot Hunter: a computational system for large-scale screening and selection of candidate immunological hotspots in pathogen proteomes G.L. Zhang, A.M.
 Experimental Setup  Whole brain RNA-Seq Data from Sanger Institute Mouse Genomes Project [Keane et al. 2011]  Synthetic hybrids with different levels.
RNAseq.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU T cell Epitope predictions using bioinformatics (Neural Networks and hidden.
Reference mapping and variant detection Peter Tsai Bioinformatics Institute, University of Auckland.
(A) Mutations within neoepitopes lead to structural alterations across the peptide backbone, as illustrated with structural snapshots from the simulations.
Computational Challenges in Whole-Genome Association Studies Ion Mandoiu Computer Science and Engineering Department University of Connecticut.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
Estimation of alternative splicing isoform frequencies from RNA-Seq data Ion Mandoiu Computer Science and Engineering Department University of Connecticut.
Bioinformatics Methods for Diagnosis and Treatment of Human Diseases Jorge Duitama Dissertation Defense for the Degree of Doctorate in Philosophy Computer.
Ion Mandoiu Computer Science and Engineering Department
Linkage Disequilibrium-Based Single Individual Genotyping from Low-Coverage Short Sequencing Reads Justin Kennedy 1 Joint work with Sanjiv Dinakar 1, Yozen.
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
The Extraction of Single Nucleotide Polymorphisms and the Use of Current Sequencing Tools Stephen Tetreault Department of Mathematics and Computer Science.
Bioinformatics Tools for Personalized Cancer Immunotherapy
Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.
Genotyping of James Watson’s genome from Low-coverage Sequencing Data Sanjiv Dinakar and Yözen Hernández.
Estimation of alternative splicing isoform frequencies from RNA-Seq data Ion Mandoiu Computer Science and Engineering Department University of Connecticut.
Imputation-based local ancestry inference in admixed populations Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint.
Next-Generation Sequencing: Challenges and Opportunities Ion Mandoiu Computer Science and Engineering Department University of Connecticut.
Estimation of alternative splicing isoform frequencies from RNA-Seq data Ion Mandoiu Computer Science and Engineering Department University of Connecticut.
Epitope Selection Rational Vaccine design. Why? Therapeutic vaccines Therapeutic vaccines Treatment of viral infections (e.g., HIV, HCV), and resistant.
Bioinformatics Methods for Diagnosis and Treatment of Human Diseases Jorge Duitama Dissertation Proposal for the Degree of Doctorate in Philosophy Computer.
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
Sequence Variation Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Towards accurate detection and genotyping of expressed variants from whole transcriptome sequencing data Jorge Duitama 1, Pramod Srivastava 2, and Ion.
Informatics for next-generation sequence analysis – SNP calling Gabor T. Marth Boston College Biology Department PSB 2008 January
High Throughput Sequencing
Reconstruction of Haplotype Spectra from NGS Data Ion Mandoiu UTC Associate Professor in Engineering Innovation Department of Computer Science & Engineering.
Selection of T Cell Epitopes Using an Integrative Approach Mette Voldby Larsen cand. scient. in Biology PhD in Immunological Bioinformatics.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Assay Development Breakout (red) Who was in the room? About half of attendees are active NGS users N=1 doing whole genome analyses Everyone else doing.
Variables: – T(p) - set of candidate transcripts on which pe read p can be mapped within 1 std. dev. – y(t) -1 if a candidate transcript t is selected,
Computational methods for genomics-guided immunotherapy
Adrian Caciula Department of Computer Science Georgia State University Joint work with Serghei Mangul (UCLA) Ion Mandoiu (UCONN) Alex Zelikovsky (GSU)
MPL Identification of alternative spliced mRNA variants related to cancers by genome-wide ESTs alignment KIM DAE SOO Oncogene Apr.
The iPlant Collaborative
Telling self from non-self: Learning the language of the Immune System Rose Hoberman and Roni Rosenfeld BioLM Workshop May 2003.
Serghei Mangul Department of Computer Science Georgia State University Joint work with Irina Astrovskaya, Marius Nicolae, Bassam Tork, Ion Mandoiu and.
Sahar Al Seesi and Ion Măndoiu Computer Science and Engineering
BNFO 615 Usman Roshan. Short read alignment Input: – Reads: short DNA sequences (upto a few hundred base pairs (bp)) produced by a sequencing machine.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Lecture 11. Topics in Omic Studies (Cancer Genomics, Transcriptomics and Epignomics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational.
Genetic Cancer Susceptibility (GCS) Genomics in I4C James McKay.
Geuvadis Analysis Meeting 16/02/2012 Micha Sammeth CNAG – Barcelona.
Comparative Genomics Methods for Alternative Splicing of Eukaryotic Genes Liliana Florea Department of Computer Science Department of Biochemistry GWU.
Computational methods for genomics-guided immunotherapy Sahar Al Seesi Computer Science & Engineering Department, UCONN Immunology Department, UCONN Health.
Imputation-based local ancestry inference in admixed populations
Scalable Algorithms for Next-Generation Sequencing Data Analysis Ion Mandoiu UTC Associate Professor in Engineering Innovation Department of Computer Science.
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
Canadian Bioinformatics Workshops
Considerations for multi-omics data integration Michael Tress CNIO,
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
A comparison of somatic mutation callers in breast cancer samples and matched blood samples THOMAS BRETONNET BIOINFORMATICS AND COMPUTATIONAL BIOLOGY UNIT.
SNP Detection Congtam Pham 2/24/04 Dr. Marth’s Class.
Cancer Vaccine Design Ion Mandoiu
Computational methods for genomics-guided immunotherapy
Sahar Al Seesi University of Connecticut CANGS 2017
2nd (Next) Generation Sequencing
Discovery tools for human genetic variations
Telling self from non-self: Learning the language of the Immune System
Pairing T-cell Receptor Sequences using Pooling and Min-cost Flows
Fig. 1 Cancer exome–based identification of neoantigens.
Dec. 22, 2011 live call UCONN: Ion Mandoiu, Sahar Al Seesi
Elham Sherafat and Ion Mandoiu
Fig. 1 Cancer exome–based identification of neoantigens.
Presentation transcript:

Bioinformatics pipeline for detection of immunogenic cancer mutations by high throughput mRNA sequencing Jorge Duitama 1, Ion Mandoiu 1, and Pramod Srivastava 2 1 University of Connecticut. Department of Computer Sciences & Engineering 2 University of Connecticut Health Center

Immunology Background J.W. Yedell, E Reits and J Neefjes. Making sense of mass destruction: quantitating MHC class I antigen presentation. Nature Reviews Immunology, 3: , 2003

Cancer Immunotherapy CTCAATTGATGAAATTGTTCTGAAACT GCAGAGATAGCTAAAGGATACCGGGTT CCGGTATCCTTTAGCTATCTCTGCCTC CTGACACCATCTGTGTGGGCTACCATG … AGGCAAGCTCATGGCCAAATCATGAGA Tumor mRNA Sequencing SYFPEITHI ISETDLSLL CALRRNESL … Tumor Specific Epitopes Discovery Peptides Synthesis Immune System Training Mouse Image Source: Tumor Remission

Tumor mRNA (PE) reads CCDS Mapping Genome Mapping Read merging CCDS mapped reads Genome mapped reads Tumor-specific mutations Variants detection Epitopes Prediction Tumor- specific CTL epitopes Mapped reads Gene fusion & novel transcript detection Unmapped reads Analysis Pipeline

Read Merging GenomeCCDSAgree?Hard MergeSoft Merge Unique YesKeep Unique NoThrow UniqueMultipleNoThrowKeep UniqueNot MappedNoKeep MultipleUniqueNoThrowKeep Multiple NoThrow MultipleNot MappedNoThrow Not mappedUniqueNoKeep Not mappedMultipleNoThrow Not mappedNot MappedYesThrow

Variant Calling Methods Binomial: Test used in e.g. [Levi et al 07, Wheeler et al 08] for calling SNPs from genomic DNA Posterior: Picks the genotype with best posterior probability given the reads, assuming uniform priors

Epitopes Prediction Predictions include MHC binding, TAP transport efficiency, and proteasomal cleavage C. Lundegaard et al. MHC Class I Epitope Binding Prediction Trained on Small Data Sets. In Lecture Notes in Computer Science, 3239: , 2004

Accuracy Assessment of Variants Detection 63 million Illumina mRNA reads generated from blood cell tissue of Hapmap individual NA12878 (NCBI SRA database accession number SRX000566) – We selected Hapmap SNPs in known exons for which there was at least one mapped read by any method (22,362 homozygous reference, 7,893 heterozygous or homozygous variant) – True positives: called variants for which Hapmap genotype is heterozygous or homozygous variant – False positives: called variants for which Hapmap genotype is homozygous reference

Comparison of Variant Calling Strategies Genome Mapping, Alt. coverage  1

Comparison of Variant Calling Strategies Genome Mapping, Alt. coverage  3

Comparison of Mapping Strategies Posterior, Alt. coverage  3

Results on Meth A Reads 6.75 million Illumina reads from mRNA isolated from a mouse cancer tumor cell line We found 6775 variant candidates after hard merge mapping, posterior variant calling and a minimum of three reads per alternative allele 934 variants produced 1439 epitopes with SYFPEITHI score higher than 15 for the mutated peptide

SYFPEITHI Scores Distribution of Mutated Peptides

Distribution of SYFPEITHI Score Differences Between Mutated and Reference Peptides

Validation Results We are using mass spectrometry for confirmation of presentation of epitopes in the surface of the cell Mutations reported by [Noguchi et al 94] were found by this pipeline We are performing Sanger sequencing of PCR amplicons to confirm reported mutations

Conclusions & Ongoing Work We implemented a bioinformatics pipeline for characterizing tumor immunomes Identified tumor epitopes are being evaluated for therapeutic effect Ongoing work: – Detect short immunogenic indels and novel transcripts – Include predictions of TAP transport efficiency, and proteasomal cleavage – Refine the posterior method to increase mutation detection robustness in the presence of differential allelic expression

Acknowledgments Brent Graveley and Duan Fei (UCHC) NSF awards IIS , IIS , and DBI UCONN Research Foundation UCIG grant

Questions? Thanks