Download presentation
Presentation is loading. Please wait.
1
Metagenomics Assembly Hubert DENISE (hudenise@ebi.ac.uk)
2
2 main approaches: building a consensus (“overlap–layout–consensus”) generating De Bruijn k-mer graphs Metagenomics assembly
3
I_like_EBI_metagenomics Genomics assembly: building a consensus I_like_EBI_metagenomics read-depth high low I_like_EBI_met ike_EBI_metage _EBI_metagenom I_metagenomics _like_EBI_meta BI_metagenomic Based on ‘word’ overlap reads contig
4
Metagenomics assembly: building a consensus Issues: read length and repeated sequences ???... ???
5
Genomics assembly: building a consensus Practical solution : using coverage / read-depth information Coverage: ratio between contigs 3111 Allow the elimination of one of the possible assembly:
6
Genomics assembly: building a consensus Practical solution : using pair-end reads Pair-ends: Distance information between sequences Allow the identification of the correct assembly:
7
Genomics assembly: De Bruijn k-mer graphs k-mers generated by breaking reads into multiple overlapping words of fixed length (k) I_like_EBI_metagenomics k=5 e_EBI ke_EB ike_E like_ _like I_lik _EBI_ EBI_m BI_me I_met _meta metag etage tagen ageno genom enomi nomic omics
8
Branches in the graph represent partially overlapping sequences. T. Brown, 2012 Genomics assembly: using k-mers Each node represents a 14-mer; Links between each node are 13-mer overlaps 14mer k=14
9
Single nucleotide variations cause k-long branches; They don’t rejoin quickly. Genomics assembly: using k-mers T. Brown, 2012
10
Genomics assembly: De Bruijn k-mer graphs Building the graph is demanding but navigation through is quick and memory efficient. branches : ambiguity in assembly short dead-end branches: low coverage bubbles: sequencing errors or polymorphism ? converging and diverging paths: repeats therefore there is a need for biological knowledge and other sequences information to fully reconstruct a genome J.R. Miller et al. / Genomics (2010)
11
There is a number of (+/- metagenome-adapted) solutions out there: MetaVelvet, MetaIDBA and khmer “partition” the assembly de Brujn graph into sections from different organisms, and then assemble those individually. This allows them to adjust coverage parameters “locally”. Genovo uses a 'generative probabilistic model' to identify likely sequence reconstructions Euler deals with repeats by identifying an Eulerian path (visiting every edge only once) in the De Bruijn graph. and SOAPdenovo (graph), Newbler (for 454, consensus), MetAMOS… Metagenomics assembly: what to use ?
12
Butler et al., Genome Res, 2009 Genomics assembly: choosing k-mer Tools such as Velvet Advisor ( http://dna.med.monash.edu.au/~torsten/velvet_advisor/ ) are available
13
Judging genomics assembly parameters 1parameters 2 measurements: number of contigs (1) length of contigs (2) nucleotides involved (1) N50weighted median such that 50% of the entire assembly is contained in contigs equal to or larger than this value How to judge the better assembly in absence of external information ?
14
Judging metagenomics assembly parameters 1parameters 2 total length: 17 contigs: 7 N50 = 3 total length: 15.5 contigs: 5 N50 = 2 Therefore the assembly obtained with parameters 2 will be considered the best Calculating N50:- order the sequences by decreasing length, - add length until 50% of nucleotides reached
15
Judging metagenomics assembly parameters 1parameters 2 For metagenomics, in addition to N50, we can also use the fact that sequences are originating from different species -% GC will vary between species (20 to 80%) and therefore contigs from different species could be separated from each others. -all predicted CDSs from a single contig should be annotated as being from same species (using Blast for example).
16
EBI Metagenomics currently do not perform assembly Why ? absence of reference genome short reads make chimaera inevitable EBI Metagenomics pipeline validation: What are the consequences of not performing assembly? cannot link taxonomy information to functional annotations cannot currently perform viral taxonomy analysis Ex: re-analysis of Hess et al, Science (2011) 331:463
17
http://www.ebi.ac.uk/metagenomics/ http://metagenomics.anl.gov/ http://camera.calit2.net/ http://img.jgi.doe.gov/ Public Metagenomics portals Do not perform assembly but accept assembled data Perform assembly
18
Hubert DENISE (hudenise@ebi.ac.uk)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.