Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion Măndoiu, UConn Co-PDs: Mazhar.

Slides:



Advertisements
Similar presentations
In Silico Primer Design and Simulation for Targeted High Throughput Sequencing I519 – FALL 2010 Adam Thomas, Kanishka Jain, Tulip Nandu.
Advertisements

Infectious Bronchitis
Marius Nicolae Computer Science and Engineering Department
RNA-Seq based discovery and reconstruction of unannotated transcripts
Reconstruction of Infectious Bronchitis Virus Quasispecies from NGS Data Bassam Tork Georgia State University Atlanta, GA 30303, USA.
Alex Zelikovsky Department of Computer Science Georgia State University Joint work with Serghei Mangul, Irina Astrovskaya, Bassam Tork, Ion Mandoiu Viral.
Office of Infectious Diseases Computational Challenges for Infectious Diseases Michael Shaw, PhD OID/Office of the Director.
Next-generation sequencing
Amplicon-Based Quasipecies Assembly Using Next Generation Sequencing Nick Mancuso Bassam Tork Computer Science Department Georgia State University.
The 454 and Ion PGM at the Genomics Core Facility Dr. Deborah Grove, Director for Genetic Analysis Genomics Core Facility Huck Institutes of the Life Sciences.
Dale Beach, Longwood University Lisa Scheifele, Loyola University Maryland.
Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work.
Greg Phillips Veterinary Microbiology
RNA-Seq based discovery and reconstruction of unannotated transcripts in partially annotated genomes 3 Serghei Mangul*, Adrian Caciula*, Ion.
Estimation of alternative splicing isoform frequencies from RNA-Seq data Ion Mandoiu Computer Science and Engineering Department University of Connecticut.
1 Nicholas Mancuso Department of Computer Science Georgia State University Joint work with Bassam Tork, GSU Pavel Skums, CDC Ion M ӑ ndoiu, UConn Alex.
Estimation of alternative splicing isoform frequencies from RNA-Seq data Ion Mandoiu Computer Science and Engineering Department University of Connecticut.
Next-Generation Sequencing: Challenges and Opportunities Ion Mandoiu Computer Science and Engineering Department University of Connecticut.
Estimation of alternative splicing isoform frequencies from RNA-Seq data Ion Mandoiu Computer Science and Engineering Department University of Connecticut.
Reconstruction of infectious bronchitis virus quasispecies from 454 pyrosequencing reads CAME 2011 Ion Mandoiu Computer Science & Engineering Dept. University.
High Throughput Sequencing
Software for Robust Transcript Discovery and Quantification from RNA-Seq Ion Mandoiu, Alex Zelikovsky, Serghei Mangul.
Reconstruction of Haplotype Spectra from NGS Data Ion Mandoiu UTC Associate Professor in Engineering Innovation Department of Computer Science & Engineering.
Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO.
De-novo Assembly Day 4.
Todd J. Treangen, Steven L. Salzberg
Introduction to next generation sequencing Rolf Sommer Kaas.
PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne.
PERFORMANCE COMPARISON OF NEXT GENERATION SEQUENCING PLATFORMS Bekir Erguner 1,3, Duran Üstek 2, Mahmut Ş. Sağıroğlu 1 1Advanced Genomics and Bioinformatics.
Variables: – T(p) - set of candidate transcripts on which pe read p can be mapped within 1 std. dev. – y(t) -1 if a candidate transcript t is selected,
GNUMAP-SNP Nathan Clement The University of Texas Austin, TX, USA.
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph.
The use of short-read next generation sequences to recover the evolutionary histories in multi-individual samples Systematic biology presentation Yuantong.
Computational methods for genomics-guided immunotherapy
Adrian Caciula Department of Computer Science Georgia State University Joint work with Serghei Mangul (UCLA) Ion Mandoiu (UCONN) Alex Zelikovsky (GSU)
The iPlant Collaborative
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
De Novo Genome Assembly - Introduction Henrik Lantz - BILS/SciLife/Uppsala University.
How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington.
Novel transcript reconstruction from ION Torrent sequencing reads and Viral Meta-genome Reconstruction from AmpliSeq Ion Torrent data University of Connecticut.
Serghei Mangul Department of Computer Science Georgia State University Joint work with Irina Astrovskaya, Marius Nicolae, Bassam Tork, Ion Mandoiu and.
Sahar Al Seesi and Ion Măndoiu Computer Science and Engineering
BNFO 615 Usman Roshan. Short read alignment Input: – Reads: short DNA sequences (upto a few hundred base pairs (bp)) produced by a sequencing machine.
Quasispecies Assembly Using Network Flows Alex Zelikovsky Georgia State University Joint work with Kelly Westbrooks Georgia State University Irina Astrovskaya.
Inferring Viral Quasispecies Spectra from NGS Reads Ion Măndoiu Computer Science & Engineering Department University of Connecticut.
Scalable Algorithms for Next-Generation Sequencing Data Analysis Ion Mandoiu UTC Associate Professor in Engineering Innovation Department of Computer Science.
De Novo Genome Assembly - Introduction
Alex Zelikovsky Department of Computer Science Georgia State University Joint work with Adrian Caciula (GSU), Serghei Mangul (UCLA) James Lindsay, Ion.
A Maximum Likelihood Method for Quasispecies Reconstruction Nicholas Mancuso, Georgia State University Bassam Tork, Georgia State University Pavel Skums,
Scalable Algorithms for Next-Generation Sequencing Data Analysis Ion Mandoiu UTC Associate Professor in Engineering Innovation Department of Computer Science.
Chapter 5 Sequence Assembly: Assembling the Human Genome.
An Integer Programming Approach to Novel Transcript Reconstruction from Paired-End RNA-Seq Reads Serghei Mangul Department of Computer Science Georgia.
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
When the next-generation sequencing becomes the now- generation Lisa Zhang November 6th, 2012.
Infectious Bronchitis Virus (IBV)
1. Ion Proton I well Ion 300 series well 454 Titanium well.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
KGEM: an EM Error Correction Algorithm for NGS Amplicon-based Data Alexander Artyomenko.
JERI DILTS SUZANNA KIM HEMA NAGRAJAN DEEPAK PURUSHOTHAM AMBILY SIVADAS AMIT RUPANI LEO WU Genome Assembly Final Results
Population sequencing using short reads: HIV as a case study Vladimir Jojic et.al. PSB 13: (2008) Presenter: Yong Li.
A comparison of somatic mutation callers in breast cancer samples and matched blood samples THOMAS BRETONNET BIOINFORMATICS AND COMPUTATIONAL BIOLOGY UNIT.
ICCABS 2013 kGEM: An EM-based Algorithm for Local Reconstruction of Viral Quasispecies Alexander Artyomenko.
Short Read Sequencing Analysis Workshop
DNA Sequencing -sayed Mohammad Amin Nourion -A’Kia Buford
Alexander Zelikovsky Computer Science Department
Sahar Al Seesi University of Connecticut CANGS 2017
2nd (Next) Generation Sequencing
Dec. 22, 2011 live call UCONN: Ion Mandoiu, Sahar Al Seesi
Presentation transcript:

Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion Măndoiu, UConn Co-PDs: Mazhar Khan, UConn Rachel O’Neill, UConn Alex Zelikovsky, GSU

Outline Background & aims of the project Bioinformatics tools for quasispecies spectrum reconstruction from NGS reads Experimental validation on IBV data Summary and ongoing work

Infectious Bronchitis Virus (IBV) Group 3 coronavirus Biggest single cause of economic loss in US poultry farms Young chickens: coughing, tracheal rales, dyspnea Broiler chickens: reduced growth rate Layers: egg production drops 5-50%, thin-shelled, watery albumin Worldwide distribution, with dozens of serotypes in circulation Co-infection with multiple serotypes is not uncommon, creating conditions for recombination IBV-infected egg defects IBV-infected embryo normal embryo

IBV Vaccination Broadly used, most commonly with attenuated live vaccine Short lived protection Layers need to be re-vaccinated multiple times during their lifespan Vaccines might undergo selection in vivo and regain virulence [Hilt, Jackwood, and McKinley 2008]

Lauring & Andino, PLoS Pathogens 2011 RNA Virus Replication High mutation rate (~10-4) Lauring & Andino, PLoS Pathogens 2011

Evolution of IBV Quasispecies identified by cloning and Sanger sequencing in both IBV infected poultry and commercial vaccines [Jackwood, Hilt, and Callison 2003; Hilt, Jackwood, and McKinley 2008]

How Are Quasispecies Contributing to Virus Persistence and Evolution? Variants differ in Virulence Ability to escape immune response Resistance to antiviral therapies Tissue tropism Lauring & Andino, PLoS Pathogens 2011

Project Aims Develop bioinformatics tools for accurate reconstruction of quasispecies sequences and their frequencies from next-generation reads Study quasispecies persistence and evolution of IBV in commercial layer flocks following vaccination Use results of this study to optimize vaccine development and vaccination protocols

Outline Background & aims of the project Bioinformatics tools for quasispecies spectrum reconstruction from NGS reads Experimental validation on IBV data Summary and ongoing work

Next Generation Sequencing http://www.economist.com/node/16349358 Illumina HiSeq 2000 up to 6 billion PE reads/run 35-100bp read length Roche/454 FLX Titanium 400-600 million reads/run Length up to 1,000 bp Ion Torrent PGM 1-10M reads/run length up to 400bp SOLiD 4/5500 1.4-2.4 billion PE reads/run 35-50bp read length 10

Shotgun vs. Amplicon Reads 1111 Shotgun vs. Amplicon Reads Shotgun reads starting positions distributed ~uniformly Amplicon reads reads have predefined start/end positions covering fixed overlapping windows

Reconstruction from Shotgun Reads: ViSpA Error Correction Read Alignment Preprocessing of Aligned Reads Read Graph Construction Contig Assembly Frequency Estimation Shotgun reads Quasispecies sequences w/ frequencies User Specified Parameters: (A) Number of mismatches (B) Mutation rate

Reconstruction from Amplicon Reads: VirA Error-corrected SAM/BAM Read data Estimate Amplicons Amplicon Read Graph Reference in FASTA format Viral population variants with frequencies Frequency Estimation Max-Bandwidth Paths

Amplicon Sequencing Challenges 1414 Amplicon Sequencing Challenges Multiple reads from consecutive amplicons may match over their overlap Distinct quasispecies may be indistinguishable in an amplicon interval

Outline Background & aims of the project Bioinformatics tools for quasispecies spectrum reconstruction from NGS reads Experimental validation on IBV data Summary and ongoing work

IBV Genome RT-PCR of S1 using redesigned primers Rev. Bras. Cienc. Avic. vol.12 no.2 Campinas Apr./June 2010 RT-PCR of S1 using redesigned primers

Assembled quasispecies Experiment 1 10 clone pool C1 20% C2 20% C3 15% C4 15% C5 10% C6 10% C7 4% C8 4% C9 1% C10 1% Assembled quasispecies PV1 PV2 PV3 … PVk 454 reads M42 Sample 53 plasmid clones V1 V2 V3 Vn

Evaluated Reconstruction Flows

Reads Statistics & Coverage Sample Number of Reads Uncorrected SAET Corrected Shorah Corrected KEC Corrected M42 isolate 53062 50858 48945 M42 clone pool 21040 19439 17122

Reads Validation

How well we predicted sanger clones How well our prediction is

Average Prediction Error

Neighbor-Joining Tree for M42 Sanger Clones & Vispa Qsps

Experiment 2

Reads Statistics & Coverage Sample Number of Reads Uncorrected SAET corrected Shorah corrected KEC corrected M41 Vaccine 92113 87883 85311 Field #1 38502 33685 32521 Field #2 132513 123370 111686 Field #3 76906 71408 64507 Field #4 44467 41653 37295

Neighbor-Joining Tree for Sanger clones and ViSpA Reconstructed Sequences

Outline Background & aims of the project Bioinformatics tools for quasispecies spectrum reconstruction from NGS reads Experimental validation on IBV data Summary and ongoing work

Summary Developed software tools for quasispecies reconstruction from both shotgun and amplicon next-generation reads Code and executables freely available at http://alla.cs.gsu.edu/~software/VISPA/vispa.html http://alan.cs.gsu.edu/vira/ ViSpA plugin developed for users of ION Torrent, available on ION community Experimental results on both simulated and real data show improved accuracy tradeoffs compared to previous methods Tools are applicable to quasispecies studies of other viruses

Ongoing Work Deployment of ViSpA and VirA on Galaxy servers maintained at UConn and GSU Tool validation on ION Torrent reads Comparison of shotgun and amplicon based reconstruction methods Combining long and short read technologies Quasispecies persistence studies using longitudinal sampling

Tool Validation for ION Torrent reads Shotgun IBV reads generated using 316 ION chip 2,384,007 reads (1,177,740 after SAET correction) mean length 203.58 bp ViSpA results 23 quasispecies with estimated frequency > .5%, 2,200 total

Longitudinal Sampling Amplicon / shotgun sequencing

University of Connecticut: Contributors Bassam Tork Ekaterina Nenastyeva Alex Artyomenko Serghei Mangul Nicholas Mancuso Alexander Zelikovsky University of Connecticut: Rachel O’Neal, PhD. Mazhar Kahn, Ph.D. Hongjun Wang, Ph.D. Craig Obergfell Andrew Bligh University of Maryland Irina Astrovskaya, Ph.D.