Presentation is loading. Please wait.

Presentation is loading. Please wait.

The SEED Family First bacterial genome 100 bacterial genomes 1,000 bacterial genomes Number of known sequences Year How.

Similar presentations


Presentation on theme: "The SEED Family First bacterial genome 100 bacterial genomes 1,000 bacterial genomes Number of known sequences Year How."— Presentation transcript:

1 The SEED Family www.nmpdr.orgwww.theseed.org

2 First bacterial genome 100 bacterial genomes 1,000 bacterial genomes Number of known sequences Year How much has been sequenced? Environmental sequencing

3 Annotations vs. sequences

4 Subsystems Make Up Metabolism Wikipedia Metabolism http://en.wikipedia.org/wiki/Portal:Metabolism

5 Subsystem spreadsheet (conceptually)

6 Three level “hierarchy” Amino Acids and Derivatives –Alanine, serine, and glycine Serine Biosynthesis Amino Acids and Derivatives –Lysine, threonine, methionine, and cysteine Methionine Biosynthesis Make your own subsystems! Over 1,000 Subsystems

7

8 Annotation of Complete Genomes Automated user originated processing Takes 1-7 hours depending on size and complexity of the genome ~2,000 external submissions, including hundreds of genomes not yet publicly released. Reannotation of >500 genomes complete 1,000 users, 200 organizations, 25 countries. http://rast.nmpdr.org/

9 ● Find the phylogenetic neighborhood of your genome ● Look for proteins that related organisms have –Core proteins –Subset of all subsystems ● Use those calls as a training set for critica/glimmer –Intrinsic training set! The annotation process (complete genomes)

10 This one’s for Gary

11 ● Subsystem, GO, and KEGG connections –KEGG EC numbers –KEGG reaction numbers –SEED reaction numbers (Chris Henry) ● Metabolic flux models –Automatically generate FBA matrices (Aaron Best/Matt DeJongh; Hope College) Automatic metabolic reconstruction

12

13 The Populated Subsystem

14 Automatically compare metabolic reconstructions

15 ● Rapidly correct missing annotations ● Add more members to subsystems Improves future genome annotations! (especially with new subsystems) Find and suggest candidate functions

16 10 genomes submitted on Thursday at 6 pm First annotation complete before 8 am Friday ● Remaining annotations completed Friday before noon ● (there were others in the pipeline too!) ● Presentation ASM 2009 Tuesday, 8pm The Live ASM Test Philadelphia, 2009

17 Subsystems coverage of sequenced Archaea

18 PHANTOME Mya Breitbart, Matt Sullivan, Jeff Elhai, Rob Edwards NSF Haloferax sulfurifontis prophage Prophages

19 Metagenomics RAST has 300 public metagenomes Compared using tblastx Comparing complete genomes to metagenomes

20 Human Poop

21 Thanks Nick Celms, Beltran Rodriguez-Mueller, Mya Breitbart, & Forest Rohwer High Salinity Salterns San Diego, July 2004

22 Low salinity salternsHigh salinity salterns July 2004 Nov 2005

23 The metagenomics RAST server

24 Automated Processing

25 www.nmpdr.orgwww.theseed.org Summary View

26 Metagenomics Tools Annotation & Subsystems www.nmpdr.orgwww.theseed.org

27 Metagenomics Tools Annotation & KEGG maps

28 Metagenomics Tools Recruitment Plots

29 Metagenomics Tools Phylogenetic Reconstruction

30 Metagenomics Tools Comparative Tools

31 Hours of Compute Time Input size (MB) Computational Requirements ~19 hours of compute per input megabyte

32 How much so far Total: 3,348 metagenomes 318,630,847 sequences 82,945,869,083 bp (83 Gbp) Largest metagenome: 729 Mbp, 11,719,618 reads Public: 393 Metagenomes 54,306,078 sequences 22,160,008,455 bp (22 Gbp) Compute time (on a single CPU): 1,575,971 hours = 65,665 days = 179 years

33 Lots of computers, no pattern

34 Does it work?

35 Lots of sequences all pyrosequencing

36 Metagenomics Tools Functional Heat Maps

37 Sulfur CDA 60.2% CDA 21.7% Respiration Capsule Motility Membrane transport Stress Signaling Phosphorus RNA Mine Saltern Marine Microbialites Coral Fish Animals Freshwater From Sequences To Environments Dinsdale et al, Nature 2008

38 BACK!


Download ppt "The SEED Family First bacterial genome 100 bacterial genomes 1,000 bacterial genomes Number of known sequences Year How."

Similar presentations


Ads by Google