Metagenomics Assembly Hubert DENISE

Slides:



Advertisements
Similar presentations
Graph Theory Aiding DNA Fragment Assembly Jonathan Kaptcianos advisor: Professor Jo Ellis-Monaghan Work.
Advertisements

Pamela Ferretti Laboratory of Computational Metagenomics Centre for Integrative Biology University of Trento Italy Microbial Genome Assembly 1.
Next Generation Sequencing, Assembly, and Alignment Methods
DNA Sequencing with Longer Reads Byung G. Kim Computer Science Dept. Univ. of Mass. Lowell
Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities By Kevin Chen, Lior Pachter PLoS Computational Biology, 2005 David Kelley.
Alignment Problem (Optimal) pairwise alignment consists of considering all possible alignments of two sequences and choosing the optimal one. Sub-optimal.
Section 2.1 Euler Cycles Vocabulary CYCLE – a sequence of consecutively linked edges (x 1,x2),(x2,x3),…,(x n-1,x n ) whose starting vertex is the ending.
Class 02: Whole genome sequencing. The seminal papers ``Is Whole Genome Sequencing Feasible?'' ``Whole-Genome DNA.
Assembly.
Novel multi-platform next generation assembly methods for mammalian genomes The Baylor College of Medicine, Australian Government and University of Connecticut.
Henrik Lantz - BILS/SciLife/Uppsala University
Utilizing Fuzzy Logic for Gene Sequence Construction from Sub Sequences and Characteristic Genome Derivation and Assembly.
Novel Peptide Identification using ESTs and Sequence Database Compression Nathan Edwards Center for Bioinformatics and Computational Biology University.
Genome sequencing and assembly Mayo/UIUC Summer Course in Computational Biology Genome sequencing and assembly.
De-novo Assembly Day 4.
Mon C222 lecture by Veli Mäkinen Thu C222 study group by VM  Mon C222 exercises by Anna Kuosmanen Algorithms in Molecular Biology, 5.
CS 394C March 19, 2012 Tandy Warnow.
Todd J. Treangen, Steven L. Salzberg
PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne.
Graphs and DNA sequencing CS 466 Saurabh Sinha. Three problems in graph theory.
8. DNA Sequencing. Fred Sanger, Cambridge, England Partition copied DNA into four groups Each group has one of four bases starved ACGTAAGCTA with T starved.
1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.
Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly.
Universidad de los Andes, Bogotá, Colombia, Septiembre 2015  Sequence and annotation of genomes and metagenomes with Galaxy Dr. rer. nat. Diego Mauricio.
Next generation sequence data and de novo assembly For human genetics By Jaap van der Heijden.
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
Fuzzypath – Algorithms, Applications and Future Developments
CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow.
Advancing Science with DNA Sequence Metagenome definitions: a refresher course Natalia Ivanova MGM Workshop September 12, 2012.
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
Gena Tang Pushkar Pande Tianjun Ye Xing Liu Racchit Thapliyal Robert Arthur Kevin Lee.
Quality Control Hubert DENISE
RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.
University of Connecticut School of Engineering Assembler Reference Abyss Simpson et al., J. T., Wong, K., Jackman, S. D., Schein, J. E., Jones,
De novo assembly validation
Drinking from a fire hose: analysis of metagenomic data Rachel Mackelprang, Ph.D. Assistant Professor of Biology California State University Northridge.
A new Approach to Fragment Assembly in DNA Sequenceing Fei wu April,24,2006.
CS 173, Lecture B Introduction to Genome Assembly (using Eulerian Graphs) Tandy Warnow.
Sequencing technologies and Velvet assembly Lecturer : Du Shengyang September 29 , 2012.
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Chapter 5 Sequence Assembly: Assembling the Human Genome.
A Robust and Accurate Binning Algorithm for Metagenomic Sequences with Arbitrary Species Abundance Ratio Zainab Haydari Dr. Zelikovsky Summer 2011.
ALLPATHS: De Novo Assembly of Whole-Genome Shotgun Microreads
MERmaid: Distributed de novo Assembler Richard Xia, Albert Kim, Jarrod Chapman, Dan Rokhsar.
RNA Sequencing and transcriptome reconstruction Manfred G. Grabherr.
Population sequencing using short reads: HIV as a case study Vladimir Jojic et.al. PSB 13: (2008) Presenter: Yong Li.
Assembly algorithms for next-generation sequencing data
Clustering CSC 600: Data Mining Class 21.
Sequence Assembly.
Sequence assembly Jose Blanca COMAV institute bioinf.comav.upv.es.
CAP5510 – Bioinformatics Sequence Assembly
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Metafast High-throughput tool for metagenome comparison
Denovo genome assembly of Moniliophthora roreri
Assembly.
Metagenomic assembly Cedric Notredame
Research in Computational Molecular Biology , Vol (2008)
Introduction to Genome Assembly
Removing Erroneous Connections
Metagenomics Image: Iverson et al. 2012, Science.
CS 598AGB Genome Assembly Tandy Warnow.
Genome Assembly.
Genome Sequencing and Assembly
Applied Combinatorics, 4th Ed. Alan Tucker
MetaPhase clustering results on the M-Y draft metagenome assembly.
TRC: Trace – Reference Compression
Graph-based variant detection
Roye Rozov Shamir group meeting 3/7/13
Overview of Shotgun Sequence Analysis
Presentation transcript:

Metagenomics Assembly Hubert DENISE

2 main approaches:  building a consensus (“overlap–layout–consensus”)  generating De Bruijn k-mer graphs Metagenomics assembly

I_like_EBI_metagenomics Genomics assembly: building a consensus I_like_EBI_metagenomics read-depth high low I_like_EBI_met ike_EBI_metage _EBI_metagenom I_metagenomics _like_EBI_meta BI_metagenomic Based on ‘word’ overlap reads contig

Metagenomics assembly: building a consensus Issues: read length and repeated sequences ???... ???

Genomics assembly: building a consensus Practical solution : using coverage / read-depth information Coverage: ratio between contigs 3111 Allow the elimination of one of the possible assembly:

Genomics assembly: building a consensus Practical solution : using pair-end reads Pair-ends: Distance information between sequences Allow the identification of the correct assembly:

Genomics assembly: De Bruijn k-mer graphs k-mers generated by breaking reads into multiple overlapping words of fixed length (k) I_like_EBI_metagenomics k=5 e_EBI ke_EB ike_E like_ _like I_lik _EBI_ EBI_m BI_me I_met _meta metag etage tagen ageno genom enomi nomic omics

Branches in the graph represent partially overlapping sequences. T. Brown, 2012 Genomics assembly: using k-mers Each node represents a 14-mer; Links between each node are 13-mer overlaps 14mer k=14

Single nucleotide variations cause k-long branches; They don’t rejoin quickly. Genomics assembly: using k-mers T. Brown, 2012

Genomics assembly: De Bruijn k-mer graphs Building the graph is demanding but navigation through is quick and memory efficient.  branches : ambiguity in assembly  short dead-end branches: low coverage  bubbles: sequencing errors or polymorphism ?  converging and diverging paths: repeats therefore there is a need for biological knowledge and other sequences information to fully reconstruct a genome J.R. Miller et al. / Genomics (2010)

There is a number of (+/- metagenome-adapted) solutions out there:  MetaVelvet, MetaIDBA and khmer “partition” the assembly de Brujn graph into sections from different organisms, and then assemble those individually. This allows them to adjust coverage parameters “locally”.  Genovo uses a 'generative probabilistic model' to identify likely sequence reconstructions  Euler deals with repeats by identifying an Eulerian path (visiting every edge only once) in the De Bruijn graph.  and SOAPdenovo (graph), Newbler (for 454, consensus), MetAMOS… Metagenomics assembly: what to use ?

Butler et al., Genome Res, 2009 Genomics assembly: choosing k-mer Tools such as Velvet Advisor ( ) are available

Judging genomics assembly parameters 1parameters 2 measurements: number of contigs (1) length of contigs (2) nucleotides involved (1) N50weighted median such that 50% of the entire assembly is contained in contigs equal to or larger than this value How to judge the better assembly in absence of external information ?

Judging metagenomics assembly parameters 1parameters 2 total length: 17 contigs: 7 N50 = 3 total length: 15.5 contigs: 5 N50 = 2 Therefore the assembly obtained with parameters 2 will be considered the best Calculating N50:- order the sequences by decreasing length, - add length until 50% of nucleotides reached

Judging metagenomics assembly parameters 1parameters 2 For metagenomics, in addition to N50, we can also use the fact that sequences are originating from different species -% GC will vary between species (20 to 80%) and therefore contigs from different species could be separated from each others. -all predicted CDSs from a single contig should be annotated as being from same species (using Blast for example).

EBI Metagenomics currently do not perform assembly Why ?  absence of reference genome  short reads make chimaera inevitable EBI Metagenomics pipeline validation: What are the consequences of not performing assembly?  cannot link taxonomy information to functional annotations  cannot currently perform viral taxonomy analysis Ex: re-analysis of Hess et al, Science (2011) 331:463

Public Metagenomics portals Do not perform assembly but accept assembled data Perform assembly

Hubert DENISE