Download presentation
Presentation is loading. Please wait.
Published byAndrew Lamb Modified over 8 years ago
1
The SEED Family www.nmpdr.orgwww.theseed.org
2
First bacterial genome 100 bacterial genomes 1,000 bacterial genomes Number of known sequences Year How much has been sequenced? Environmental sequencing
3
Annotations vs. sequences
4
Subsystems Make Up Metabolism Wikipedia Metabolism http://en.wikipedia.org/wiki/Portal:Metabolism
5
Subsystem spreadsheet (conceptually)
6
Three level “hierarchy” Amino Acids and Derivatives –Alanine, serine, and glycine Serine Biosynthesis Amino Acids and Derivatives –Lysine, threonine, methionine, and cysteine Methionine Biosynthesis Make your own subsystems! Over 1,000 Subsystems
8
Annotation of Complete Genomes Automated user originated processing Takes 1-7 hours depending on size and complexity of the genome ~2,000 external submissions, including hundreds of genomes not yet publicly released. Reannotation of >500 genomes complete 1,000 users, 200 organizations, 25 countries. http://rast.nmpdr.org/
9
● Find the phylogenetic neighborhood of your genome ● Look for proteins that related organisms have –Core proteins –Subset of all subsystems ● Use those calls as a training set for critica/glimmer –Intrinsic training set! The annotation process (complete genomes)
10
This one’s for Gary
11
● Subsystem, GO, and KEGG connections –KEGG EC numbers –KEGG reaction numbers –SEED reaction numbers (Chris Henry) ● Metabolic flux models –Automatically generate FBA matrices (Aaron Best/Matt DeJongh; Hope College) Automatic metabolic reconstruction
13
The Populated Subsystem
14
Automatically compare metabolic reconstructions
15
● Rapidly correct missing annotations ● Add more members to subsystems Improves future genome annotations! (especially with new subsystems) Find and suggest candidate functions
16
10 genomes submitted on Thursday at 6 pm First annotation complete before 8 am Friday ● Remaining annotations completed Friday before noon ● (there were others in the pipeline too!) ● Presentation ASM 2009 Tuesday, 8pm The Live ASM Test Philadelphia, 2009
17
Subsystems coverage of sequenced Archaea
18
PHANTOME Mya Breitbart, Matt Sullivan, Jeff Elhai, Rob Edwards NSF Haloferax sulfurifontis prophage Prophages
19
Metagenomics RAST has 300 public metagenomes Compared using tblastx Comparing complete genomes to metagenomes
20
Human Poop
21
Thanks Nick Celms, Beltran Rodriguez-Mueller, Mya Breitbart, & Forest Rohwer High Salinity Salterns San Diego, July 2004
22
Low salinity salternsHigh salinity salterns July 2004 Nov 2005
23
The metagenomics RAST server
24
Automated Processing
25
www.nmpdr.orgwww.theseed.org Summary View
26
Metagenomics Tools Annotation & Subsystems www.nmpdr.orgwww.theseed.org
27
Metagenomics Tools Annotation & KEGG maps
28
Metagenomics Tools Recruitment Plots
29
Metagenomics Tools Phylogenetic Reconstruction
30
Metagenomics Tools Comparative Tools
31
Hours of Compute Time Input size (MB) Computational Requirements ~19 hours of compute per input megabyte
32
How much so far Total: 3,348 metagenomes 318,630,847 sequences 82,945,869,083 bp (83 Gbp) Largest metagenome: 729 Mbp, 11,719,618 reads Public: 393 Metagenomes 54,306,078 sequences 22,160,008,455 bp (22 Gbp) Compute time (on a single CPU): 1,575,971 hours = 65,665 days = 179 years
33
Lots of computers, no pattern
34
Does it work?
35
Lots of sequences all pyrosequencing
36
Metagenomics Tools Functional Heat Maps
37
Sulfur CDA 60.2% CDA 21.7% Respiration Capsule Motility Membrane transport Stress Signaling Phosphorus RNA Mine Saltern Marine Microbialites Coral Fish Animals Freshwater From Sequences To Environments Dinsdale et al, Nature 2008
38
BACK!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.