Presentation is loading. Please wait.

Presentation is loading. Please wait.

Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Similar presentations


Presentation on theme: "Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,"— Presentation transcript:

1 Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division, Argonne National Laboratory NSF/EU Cyberinfrastructure Meeting, Washington, DC. www.nmpdr.orgwww.theseed.org

2

3 First bacterial genome 100 bacterial genomes 1,000 bacterial genomes Number of known sequences Year How much has been sequenced? Environmental sequencing

4 Everybody in San Diego Everybody in USA All cultured Bacteria 100 people How much will be sequenced? One genome from every species Most major microbial environments

5 What do we want from annotations? Consistent Accurate Available Reliable www.nmpdr.orgwww.theseed.org

6 Consistent www.nmpdr.orgwww.theseed.org

7 The Importance of Consistency Consistency: same genes connected to same functional role Enables communication Required for most comparative genomics assays www.nmpdr.orgwww.theseed.org

8 hisA FIG function: Phosphoribosylformimino-5-aminoimidazole carboxamide ribotide isomerase (EC 5.3.1.16) Other functions in RefSeq: phosphoribosylformimino-5-aminoimidazole carboxamide phosphoribosylformimino-5-aminoimidazole carboxamide ribotide isomerase phosphoribosylformimino-5-aminoimidazole carboxamide ribotide... 1-(5-phosphoribosyl)-5-[(5- phosphoribosylamino)methylideneamino] imidazole-4-carboxamide isomerase N-(5-phospho-L-ribosyl-formimino)-5-amino-1-(5- phosphoribosyl)-4-imidazolecarboxamide isomerase N-(5'-phospho-L-ribosyl-formimino)-5-amino-1-(5'-phosphoribosyl)-4-imidazolecarboxamide isomerase Phosphoribosyl isomerase A [1-[5-phosphoribosyl]-5-[[5-phosphoribosylamino]methylideneamino] imidazole-4-carboxamide isomerase] www.nmpdr.orgwww.theseed.org

9 Measuring Consistency Define a set of protein families such that each family contains genes playing the same function Attach functional roles to protein families Measure the consistency of the annotations made to genes within each family 1."consistency" is the odds that two proteins from the same family have the same function 2.Evaluate both families and functions. www.nmpdr.orgwww.theseed.org

10 Consistency among databases www.nmpdr.orgwww.theseed.org

11 Accurate www.nmpdr.orgwww.theseed.org

12 How to measure accuracy If everything was called “hypothetical protein” the database would be 100% consistent Need to measure accuracy (specificity) as well as consistency Sample 100 proteins at random from “curated” set (i.e. that are believed to be correct) Manually inspect annotations to score correctness www.nmpdr.orgwww.theseed.org

13 Available www.nmpdr.orgwww.theseed.org

14 http://metagenomics.theseed.org Free service User registration/log in Free to upload sequences in several formats Automatically annotates sequences Download in several formats Complete genomes too: http://www.nmpdr.org/anno-server Soon to come: Plasmids, phages, other short genomes

15 Metagenome Metabolic Reconstruction

16 Metabolic potential in environments

17 Phylogenomics

18 Comparing Metagenomes to Genomes (or other metagenomes!)

19 Reliable (Believable)

20 Metabolic potential in environments

21 Sulfur CDA 60.2% CDA 21.7% Respiration Capsule Motility Membrane transport Stress Signaling Phosphorus RNA Mine Saltern Marine Microbialites Coral Fish Animals Freshwater From sequences to environments

22 What do we want from annotations? Consistent Accurate Available Reliable When do we want it? NOW

23 Acknowledgements Environmental Genomics Forest Rohwer Rohwer lab members All the labs that provided sequence Metagenomics Annotation Server Rick Stevens Daniel Paarman Folker Meyer Bob Olsen Statistics Liz Dinsdale Dana Hall Beltran Rodriguez-Brito FIG Ross Overbeek Veronika Vonstein Annotators

24

25 Subsystems make up metabolism Wikipedia Metabolism http://en.wikipedia.org/wiki/Portal:Metabolism


Download ppt "Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,"

Similar presentations


Ads by Google