Challenge analysis Ladang Auxane Mombaerts Laurent Uyttendaele Vincent

Slides:



Advertisements
Similar presentations
1 of 18 Information Dissemination New Digital Opportunities IMARK Investing in Information for Development Information Dissemination New Digital Opportunities.
Advertisements

Mo17 shotgun project Goal: sequence Mo17 gene space with inexpensive new technologies Datasets in progress: Four-phases of 454-FLX sequencing to max of.
0 - 0.
Addition Facts
Lecture 3 Strachan and Read Chapters 16 & 18
RNA-seq library prep introduction
Addison Wesley is an imprint of © 2010 Pearson Addison-Wesley. All rights reserved. Chapter 10 Arrays and Tile Mapping Starting Out with Games & Graphics.
Copyright © SoftTree Technologies, Inc. DB Tuning Expert.
Test on Input, Output, Processing, & Storage Devices
(This presentation may be used for instructional purposes)
Cache and Virtual Memory Replacement Algorithms
Recombinant DNA Technology
Quadratic Inequalities
1© DATOS Computer AG – The Framework for higher Efficiency Tool Catalogs Documents + Planning Assembly + Logistics Storage Systems Job.
Management and Control of Domestic Smart Grid Technology IEEE Transactions on Smart Grid, Sep Albert Molderink, Vincent Bakker Yong Zhou
The Maryland Common Core Frameworks for Braille: Identifying the Next Generation Grade Level Braille Literacy Needs of Students Lisa Wright & Heather Johnson.
Addition 1’s to 20.
CSTA K-12 Computer Science Standards (rev 2011)
Test B, 100 Subtraction Facts
Week 1.
Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular.
Reference mapping and variant detection Peter Tsai Bioinformatics Institute, University of Auckland.
A new method of finding similarity regions in DNA sequences Laurent Noé Gregory Kucherov LORIA/UHP Nancy, France LORIA/INRIA Nancy, France Corresponding.
Next-generation sequencing
Next Generation Sequencing, Assembly, and Alignment Methods
Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.
9 Genomics and Beyond Brief Chapter Outline
Variant discovery Different approaches: With or without a reference? With a reference – Limiting factors are CPU time and memory required – Crossbow –
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
Informatics for next-generation sequence analysis – SNP calling Gabor T. Marth Boston College Biology Department PSB 2008 January
High Throughput Sequencing
Considerations for Analyzing Targeted NGS Data Exome Tim Hague, CTO.
Dr Katie Snape Specialist Registrar in Genetics St Georges Hospital
Whole Exome Sequencing for Variant Discovery and Prioritisation
Biostatistics-Lecture 15 High-throughput sequencing and sequence alignment Ruibin Xi Peking University School of Mathematical Sciences.
Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO.
Mapping protein-DNA interactions by ChIP-seq Zsolt Szilagyi Institute of Biomedicine.
Todd J. Treangen, Steven L. Salzberg
Assay Development Breakout (red) Who was in the room? About half of attendees are active NGS users N=1 doing whole genome analyses Everyone else doing.
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
An Efficient Method of Generating Whole Genome Sequence for Thousands of Bulls Chuanyu Sun 1 and Paul M. VanRaden 2 1 National Association of Animal Breeders,
Gerton Lunter Wellcome Trust Centre for Human Genetics From calling bases to calling variants: Experiences with Illumina data.
Considerations for Analyzing Targeted NGS Data Exome Tim Hague, CTO.
HaloPlexHS Get to Know Your DNA. Every Single Fragment.
E XOME SEQUENCING AND COMPLEX DISEASE : practical aspects of rare variant association studies Alice Bouchoms Amaury Vanvinckenroye Maxime Legrand 1.
The concept of RAID in Databases By Junaid Ali Siddiqui.
billion-piece genome puzzle
Tutorial 6 High Throughput Sequencing. HTS tools and analysis Review of resequencing pipeline Visualization - IGV Analysis platform – Galaxy Tuning up.
HW2: exome sequencing and complex disease Jacquemin Jonathan de Bournonville Sébastien.
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
GSVCaller – R-based computational framework for detection and annotation of short sequence variations in the human genome Vasily V. Grinev Associate Professor.
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
From Reads to Results Exome-seq analysis at CCBR
Virginia Commonwealth University
Interpreting exomes and genomes: a beginner’s guide
Lesson: Sequence processing
Gil McVean Department of Statistics
Preprocessing Data Rob Schmieder.
Quality Control & Preprocessing of Metagenomic Data
EMC Galaxy Course November 24-25, 2014
Department of Computer Science
Content and Labeling of Tests Marketed as Clinical “Whole-Exome Sequencing” Perspectives from a cancer genetics clinician and clinical lab director Allen.
DNA Sequencing The DNA from the genome is chopped into bits- whole chromosomes are too large to deal with, so the DNA is broken into manageably-sized overlapping.
2nd (Next) Generation Sequencing
Discovery tools for human genetic variations
Next-generation DNA sequencing
BF528 - Genomic Variation and SNP Analysis
Next Generation Sequencing Market. Report Description and Highlights According to Renub Research market research report “Next Generation Sequencing (NGS)
Presentation transcript:

The effect of Next-Generation Sequencing technology on complex trait research Challenge analysis Ladang Auxane Mombaerts Laurent Uyttendaele Vincent Presented by December 10, 2013 University of Liège GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov

TABLE OF CONTENTS Introduction Challenge analysis Applications Optimizing parameters for study design Storing and handling data Mapping and aligning Variant calling Analyzing low frequency and rare variants Applications Discussion Conclusion University of Liège GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov

INTRODUCTION What is Next-Generation Sequencing (NGS) ? Applications 3 What is Next-Generation Sequencing (NGS) ? High throughput Low-cost Applications From 1970 until now F. Sanger University of Liège GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov

CHALLENGE ANALYSIS Three mains parameters: 4 1. Optimizing parameters for study design Three mains parameters: High cost-to-data. Only parts of the genome? Power based on depth of coverage. Sample selection. University of Liège GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov

CHALLENGE ANALYSIS Storing and handling data Two years ago The concept of NGS was still theoretical. Today Devices are operational and affordable → raw data available. University of Liège GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov

CHALLENGE ANALYSIS Production of raw data Using a fluorescent-dye DNA sequencer Labeling of DNA strands with 4 fluorescent dyes Separation of fragments by electrophoresis Monitoring by chromatography Storage of raw data One run can provide until 4 Tb of data → Requirement of a huge memory capacity Handling of raw data Algorithms will be applied for mapping → Requirement of powerful computing tools University of Liège GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov

CHALLENGE ANALYSIS De novo assembly Mapping 3. Mapping and aligning algorithms De novo assembly Sequencing a genome without the use of a reference genome. Reads are assembled by an overlapping method. Mapping Building a sequence that is similar to a reference genome. Reads are aligned on the backbone. University of Liège GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov

CHALLENGE ANALYSIS Improving speed and efficiency of algorithms to deal with large throughput Detecting non-unique mapping (reads corresponding to different sequences of the reference genome) Taking into consideration different base qualities (degrees of certainty) Using a more accurate reference genome (including individual sequences) University of Liège GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov

CHALLENGE ANALYSIS 4. Variant calling Distinguish true variant from sequencing or mapping errors → Decrease the number of false positive SNP-calls Detecting misalignment around indels Indel at the middle of a read : Perfect match on either side → the algorithm opens a gap. Indel at one extremity of the read : Hard recognition of the indel → misalignment of the read → false positive SNP-call University of Liège GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov

CHALLENGE ANALYSIS Considering different error rates depending on the base location Nucleobases at the extremities have a higher error rate. If misalignment : false positive confident SNP call. SOLUTION : algorithms that consider a recalibrated “base quality score” and select only the central portion of a read. Decreasing the number of errors introduced by PCR artefacts PCR → not uniform cover of the reference genome → over-represented reads SOLUTION : paired-end sequencing libraries to discard clonal reads University of Liège GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov

CHALLENGE ANALYSIS 5. Analyzing low frequency and rare variants May be a painful step ! Single-Point Low power Would require hundreds of thousands of individuals Across sample sets (composite analysis) A bit less heavy in terms of computing time and data volume University of Liège GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov

APPLICATIONS The number of scientific publications has exploded ! University of Liège GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov

DISCUSSION Development of new study design Development of more effective methods to distinguish errors from low frequency & rare variants Development of the most appropriate strategy to identify one disease. University of Liège GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov University of Liège

DISCUSSION Cost-benefit analysis Whole genome sequencing is unlikely to be cost effective as it still presents huge challenges. → coupling a reduction of the costs with an increase of the efficiency and the accuracy. → make NGS platforms marketable, competitive and usable for clinical applications. University of Liège GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov

DISCUSSION Validation analysis Standards for NGS clinical genomics are required, for instance to validate the test accuracy. → important downstream impact on the patient diagnostic and management. University of Liège GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov

DISCUSSION Current knowledge and research Lack of knowledge in what a SNP implies in how we detect interaction between genes in which influence gene expression has in which interpretation must be given to the genome variance … The more we make tests, the more knowledge we get, the more associations between phenotype and genome we can do. University of Liège GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov

Enable a wide variety of applications Conclusion Multiple issues Study design Error handling Data interpretation Enable a wide variety of applications University of Liège GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov

REFERENCES A G Day-Williams, E Zeggini, The effect of Next-Generation Sequencing technology on complex trait research, Eur J Clin Invest 2011, Vol 41 : 561-567. http://videos.rennes.inria.fr/genopole/GenOuest-2010/, [online on 7th December 2013] http://www.qiagen.com/products/applications/next-generation-sequencing/#Dataanalysis, [online on 9th December 2013] Figures http://www.labtimes.org/labtimes/method/methods/2010_01.lasso http://nextgenseek.com/2012/01/illumina-launches-a-new-faster-sequencer-hiseq-2500/ http://www.nucleics.com/DNA_sequencing_support/DNA-sequencing-dye-blobs.html http://videos.rennes.inria.fr/genopole/GenOuest-2010/peterlongo_plateforme_2010_10_26.pdf http://www.genomicslawreport.com/index.php/tag/illumina/ http://www.cancer.gov/cancertopics/understandingcancer/geneticvariation/page40 University of Liège GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov