Joachim De Schrijver.  Short introduction on 454 sequencing  Variant Identification pipeline  Possibilities of a DB oriented pipeline  Examples ◦

Slides:



Advertisements
Similar presentations
In Silico Primer Design and Simulation for Targeted High Throughput Sequencing I519 – FALL 2010 Adam Thomas, Kanishka Jain, Tulip Nandu.
Advertisements

The Past, Present, and Future of DNA Sequencing
The Good, Bad, and Ugly of Next-Gen Sequencing
 Sequencing technology › Roche/454 GS-FLX (‘454’) › Illumina  Prokaryotic profiling › De novo genome sequencing › Metagenomics › SNP profiling › Species.
Reference mapping and variant detection Peter Tsai Bioinformatics Institute, University of Auckland.
Next-generation sequencing
Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.
SEQUENCING-related topics 1. chain-termination sequencing 2. the polymerase chain reaction (PCR) 3. cycle sequencing 4. large scale sequencing stefanie.hartmann.
Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work.
Greg Phillips Veterinary Microbiology
Database Software File Management Systems Database Management Systems.
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
CSE182-L12 Gene Finding.
Workshop in Bioinformatics 2010 Class # Class 8 March 2010.
High Throughput Sequencing
CS 6293 Advanced Topics: Current Bioinformatics
Titus Brown Qingpeng Zhang John Blischak Welcome!.
Ch 4. The Evolution of Analytic Scalability
Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO.
De-novo Assembly Day 4.
RExPrimer Pongsakorn Wangkumhang, M.Sc. Biostatistics and Informatics Laboratory, Genome Institute, National Center for Genetic Engineering and Biotechnology.
CS 394C March 19, 2012 Tandy Warnow.
Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame A flexible, scalable genomics.
Accurate estimation of microbial communities using 16S tags Julien Tremblay, PhD
1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.
Genome sequencing Haixu Tang School of Informatics.
Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)
The Changing Face of Sequencing
The iPlant Collaborative
How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington.
Finishing tomato chromosomes #6 and #12 using a Next Generation whole genome shotgun approach Roeland van Ham, CBSG, NL René Klein Lankhorst, EUSOL Giovanni.
Cancer Genome Assemblies and Variations between Normal and Tumour Human Cells Zemin Ning The Wellcome Trust Sanger Institute.
Serghei Mangul Department of Computer Science Georgia State University Joint work with Irina Astrovskaya, Marius Nicolae, Bassam Tork, Ion Mandoiu and.
BRUDNO LAB: A WHIRLWIND TOUR Marc Fiume Department of Computer Science University of Toronto.
Contribution of Epigenetic Variation to Expression Changes Among Tissues and Genotypes Steve Eichten – Springer Lab PAG iPlant Workshop 1/17/12.
Overview of the Drosophila modENCODE hybrid assemblies Wilson Leung01/2014.
Do not reproduce without permission 1 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Gerstein Lab Aims in ModENCODE.
By Alfonso Farrugio, Hieu Nguyen, and Antony Vydrin Sequencing Technologies and Human Genetic Variation.
Accurate estimation of microbial communities using 16S tags
Comparative transcriptomics of fungi Group Nicotiana Daan van Vliet, Dou Hu, Joost de Jong, Krista Kokki.
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
SNP Scores. Overall Score Coverage Score * 4 optional scores ▫Read Balance Score  = 1 if reads are balanced in each direction ▫Allele Balance Score 
Are Roche 454 shotgun reads giving a accurate picture of the genome?
Next-generation sequencing technology
Virginia Commonwealth University
Short Read Sequencing Analysis Workshop
Lesson: Sequence processing
Next generation sequencing
SNP Detection Congtam Pham 2/24/04 Dr. Marth’s Class.
Preprocessing Data Rob Schmieder.
Quality Control & Preprocessing of Metagenomic Data
DNA Sequencing -sayed Mohammad Amin Nourion -A’Kia Buford
Next-generation sequencing technology
Ssaha_pileup - a SNP/indel detection pipeline from new sequencing data
Sequencing technology and assembly
Nanopore Sequencing Technology and Tools:
Introduction to Genome Assembly
CSE182-L12 Gene Finding.
CS 598AGB Genome Assembly Tandy Warnow.
Do You Want to Build a Transcriptome?
2nd (Next) Generation Sequencing
Ch 4. The Evolution of Analytic Scalability
Independent scientist
Next-generation DNA sequencing
Introduction to Sequencing
Multiplex DNA amplification and barcoding in a single reaction for 454 Roche sequencing: A comprehensive study on the control region of the mitochondrial.
Efficiency of Exome Sequencing for the Molecular Diagnosis of Pseudoxanthoma Elasticum  Mohammad J. Hosen, Filip Van Nieuwerburgh, Wouter Steyaert, Dieter.
Additional file 3 >HWI-EAS344:7:70:153:1969#0/1 Length = 75 
Volume 10, Issue 10, Pages (March 2015)
Presentation transcript:

Joachim De Schrijver

 Short introduction on 454 sequencing  Variant Identification pipeline  Possibilities of a DB oriented pipeline  Examples ◦ Coverage ◦ Improving PCR ◦ Fast Q assessment ◦ Homopolymers

 Roche/454 GS-FLX sequencing: ◦ Pyrosequencing ◦ ± 400,000 reads/run ◦ Average length: bp  Applications: ◦ Resequencing: Variant identification ◦ De novo (genome) sequencing: Assembly of new regions, plasmids or entire genomes  Standard Software: ◦ Variants: Amplicon Variant Analyzer (AVA) ◦ Assembly: Standard 454 assembler

 Standard software ◦ + Easy to use ◦ + reproducible results on similar datasets ◦ + GUI (graphical user interface) ◦ - No answer for ‘non-standard’ questions  Methylation experiments  Different types of experiments grouped together  … ◦ - What about ‘hidden’ information?  Homopolymer error rates  Quality score ~ length of sequenced read  ‘Multirun’ information  …

 Modular and database oriented pipeline  Modular: ◦ Efficient planning ◦ Scalable  Database (DB): ◦ No loss of data ◦ Grouping several runs together

 Basic idea: Data is processed and stored in DB. Results (reports) are calculated ‘on the fly’ using the DB data. ◦ Fast & efficient ◦ Calculations only happen once ◦ Everybody can access the database without risk of data modification ◦ Reporting is independent from the dataprocessing  Paper: De Schrijver et al Analysing 454 sequences with a modular and database oriented Variant Identification Pipeline

 VIP originally developed for variant identification  Now being used in: ◦ Amplicon resequencing ◦ De novo shotgun ◦ Methylation ◦ ~ solexa experiments  ‘Hidden’ data can be extracted using intelligent querying strategies  Results per lane/Multiplex MID/run…

 Coverage can be calculated per ◦ Lane ◦ MID ◦ Amplicon ◦ Base position  Assessment of errors (PCR dropouts vs. human errors)

 Amplicon Resequencing experiment  Goal: Variant identification  Length distributions ◦ Mapped ◦ Unmapped ◦ ‘Short’ mapped  Additional length separation + Improved PCR  Result: Improved efficiency

 Can the length of a homopolymer be assessed using the Q score?  Yes, when homopolymer length < 6bp

 Fast assessment of the quality of a run Lab work OKErrors in lab work

 Biobix – Ugent Wim Van Criekinge Tim De Meyer Geert Trooskens Tom Vandekerkhove Leander Van Neste Gerben Mensschaert  CMG – UZ Gent Jo Vandesompele Jan Hellemans Filip Pattyn Steve Lefever Kim Deleeneer Jean-Pierre Renard  NXT-GNT Paul Coucke Sofie Bekaert Filip Van Nieuwerburgh Dieter Deforce Wim Van Criekinge Jo Vandesompele

Questions ?