Download presentation
Presentation is loading. Please wait.
1
Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division, Argonne National Laboratory NSF/EU Cyberinfrastructure Meeting, Washington, DC. www.nmpdr.orgwww.theseed.org
3
First bacterial genome 100 bacterial genomes 1,000 bacterial genomes Number of known sequences Year How much has been sequenced? Environmental sequencing
4
Everybody in San Diego Everybody in USA All cultured Bacteria 100 people How much will be sequenced? One genome from every species Most major microbial environments
5
What do we want from annotations? Consistent Accurate Available Reliable www.nmpdr.orgwww.theseed.org
6
Consistent www.nmpdr.orgwww.theseed.org
7
The Importance of Consistency Consistency: same genes connected to same functional role Enables communication Required for most comparative genomics assays www.nmpdr.orgwww.theseed.org
8
hisA FIG function: Phosphoribosylformimino-5-aminoimidazole carboxamide ribotide isomerase (EC 5.3.1.16) Other functions in RefSeq: phosphoribosylformimino-5-aminoimidazole carboxamide phosphoribosylformimino-5-aminoimidazole carboxamide ribotide isomerase phosphoribosylformimino-5-aminoimidazole carboxamide ribotide... 1-(5-phosphoribosyl)-5-[(5- phosphoribosylamino)methylideneamino] imidazole-4-carboxamide isomerase N-(5-phospho-L-ribosyl-formimino)-5-amino-1-(5- phosphoribosyl)-4-imidazolecarboxamide isomerase N-(5'-phospho-L-ribosyl-formimino)-5-amino-1-(5'-phosphoribosyl)-4-imidazolecarboxamide isomerase Phosphoribosyl isomerase A [1-[5-phosphoribosyl]-5-[[5-phosphoribosylamino]methylideneamino] imidazole-4-carboxamide isomerase] www.nmpdr.orgwww.theseed.org
9
Measuring Consistency Define a set of protein families such that each family contains genes playing the same function Attach functional roles to protein families Measure the consistency of the annotations made to genes within each family 1."consistency" is the odds that two proteins from the same family have the same function 2.Evaluate both families and functions. www.nmpdr.orgwww.theseed.org
10
Consistency among databases www.nmpdr.orgwww.theseed.org
11
Accurate www.nmpdr.orgwww.theseed.org
12
How to measure accuracy If everything was called “hypothetical protein” the database would be 100% consistent Need to measure accuracy (specificity) as well as consistency Sample 100 proteins at random from “curated” set (i.e. that are believed to be correct) Manually inspect annotations to score correctness www.nmpdr.orgwww.theseed.org
13
Available www.nmpdr.orgwww.theseed.org
14
http://metagenomics.theseed.org Free service User registration/log in Free to upload sequences in several formats Automatically annotates sequences Download in several formats Complete genomes too: http://www.nmpdr.org/anno-server Soon to come: Plasmids, phages, other short genomes
15
Metagenome Metabolic Reconstruction
16
Metabolic potential in environments
17
Phylogenomics
18
Comparing Metagenomes to Genomes (or other metagenomes!)
19
Reliable (Believable)
20
Metabolic potential in environments
21
Sulfur CDA 60.2% CDA 21.7% Respiration Capsule Motility Membrane transport Stress Signaling Phosphorus RNA Mine Saltern Marine Microbialites Coral Fish Animals Freshwater From sequences to environments
22
What do we want from annotations? Consistent Accurate Available Reliable When do we want it? NOW
23
Acknowledgements Environmental Genomics Forest Rohwer Rohwer lab members All the labs that provided sequence Metagenomics Annotation Server Rick Stevens Daniel Paarman Folker Meyer Bob Olsen Statistics Liz Dinsdale Dana Hall Beltran Rodriguez-Brito FIG Ross Overbeek Veronika Vonstein Annotators
25
Subsystems make up metabolism Wikipedia Metabolism http://en.wikipedia.org/wiki/Portal:Metabolism
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.