Presentation is loading. Please wait.

Presentation is loading. Please wait.

High Throughput Computational Sequence Analysis Rob Edwards Argonne National Laboratory San Diego State University.

Similar presentations


Presentation on theme: "High Throughput Computational Sequence Analysis Rob Edwards Argonne National Laboratory San Diego State University."— Presentation transcript:

1 High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University

2 First bacterial genome 100 bacterial genomes 1,000 bacterial genomes Number of known sequences Year How much has been sequenced Environmental sequencing

3 Everybody in San Diego Everybody in USA All cultured Bacteria 100 people How much will be sequenced One genome from every species Most major microbial environments

4 High Performance Computing

5 TeraGrid

6 The Teragrid National Resource

7 Life Sciences Gateway to TeraGrid

8 Subsystems

9 Subsystems make up metabolism Wikipedia Metabolism http://en.wikipedia.org/wiki/Portal:Metabolism

10 Subsystems are not just metabolism http://aig.cs.man.ac.uk/gallery/Utopia/ Enzyme complex http://webdeptos.uma.es/ Cell Machinery http://www.brown.edu/ Cell Processes

11 http://www.theseed.org

12

13 Growth in generation of subsystems

14 Microbial Genomics Annotation Platform Goal 1: Automate the generation of high quality annotations by leveraging the information contained in SubSystems and FIGfams. Goal 2: Minimize turnaround time. Initial target 48 hours

15 Automated process consisting of: –Gene calling –Initial annotation of function –Initial metabolic reconstruction Process takes 1-7 hours depending on size and complexity of the genome ~20 genomes per day Password protected, secure, private Release to public databases if required Freely available annotation service http://www.nmpdr.org/anno-server/index48.cgi

16 Some estimate of annotation quality

17 Evaluation / Viewing

18 Download results We provide a number of export formats: –Genbank, Fasta, GFF3, Excel –can easily be extended to all formats supported by BioPerl Genomes can be deleted by the user at any time (we keep them for max. 120 days) Genomes can be directly imported into the SEED if the user wishes all genomes are password protected

19 Metagenomics SEED

20 http://metagenomics.theseed.org

21 Metagenome Metabolic Reconstruction

22 Starch utilization in cow rumens

23 Metabolic potential in environments

24 Everybody in San Diego Everybody in USA All cultured Bacteria 100 people Too much will be sequenced One genome from every species Most major microbial environments

25 Acknowledgements Argonne National Laboratory Rick Stevens Bob Olson Folker Meyer San Diego State University Forest Rohwer Fellowship for Interpretation of Genomes Ross Overbeek Veronika Vonstein The Annotators


Download ppt "High Throughput Computational Sequence Analysis Rob Edwards Argonne National Laboratory San Diego State University."

Similar presentations


Ads by Google