Presentation is loading. Please wait.

Presentation is loading. Please wait.

What's True For E. coli… Enlisting The Community In Ongoing Genome Annotation Jim Hu EcoliHub/EcoliWiki Texas A&M University.

Similar presentations


Presentation on theme: "What's True For E. coli… Enlisting The Community In Ongoing Genome Annotation Jim Hu EcoliHub/EcoliWiki Texas A&M University."— Presentation transcript:

1 What's True For E. coli… Enlisting The Community In Ongoing Genome Annotation Jim Hu EcoliHub/EcoliWiki Texas A&M University

2 Why more E. coli websites? The number of E. coli databases is large Extensive coverage exists for many aspects of E. coli biology Journals contain half a century of E. coli data Don't we already know everything?

3 Why more E. coli websites? The number of E. coli databases is large Extensive coverage exists for many aspects of E. coli biology Journals contain half a century of E. coli data Don't we already know everything? #(1-3) The problem isn't the amount of information, it's finding it #4: No

4 The diversity of information on different genomes, proteins, phenotypes and so on makes it difficult to keep track of all details. Molecular Systems Biology 3:128 (2007) Why more E. coli websites? Part of what we don't know yet is how the things we do know fit together Most of us need help mining what's out there The diversity of information on different genomes, proteins, phenotypes and so on makes it difficult to keep track of all details. Molecular Systems Biology 3:128 (2007)

5 1-2:30 today: Session 173/K Poster K-133, Board 0542 EcoliHub: Development of the Information Resource Problems and approaches Finding data from different resources –EcoliHub - information from collaborating biological electronic data resources Making data curation faster, cheaper, and better –EcoliWiki - community annotation for E. coli K-12 Community functional curation for cross-species comparison –GONUTS - a community Gene Ontology resource 1-2:30 today: Session 173/K Poster K-133, Board 0542 EcoliHub: Development of the Information Resource

6 Integrating information from multiple sites EcoliHub is based on web services A user query to EcoliHub is passed on to participating sites http://ecolihub.org or http://ecolicommunity.org

7 Integrating information from multiple sites EcoliHub is based on web services A user query to EcoliHub is passed on to participating sites EcoliHub gathers the responses and assembles output for the user http://ecolihub.org or http://ecolicommunity.org

8 Integrating information from multiple sites

9 But the users won't have to start at the EcoliHub site

10 Integrating information from multiple sites But the users won't have to start at the EcoliHub site EcoliHub will provide the infrastructure to help member sites do peer-to- peer queries who has info? Try EcoCyc and RegulonDB

11 Integrating information from multiple sites But the users won't have to start at the hub site EcoliHub will provide the infrastructure to help member sites do peer-to- peer queries The users don't need to know or care about the EcoliHub

12 What kinds of nodes are connected to EcoliHub? So far: –EcoCyc everything E. coli; professionally curated –EcoGene* everything E. coli; professionally curated –GenoBase functional genomics and resources –EcoliPredict protein structure models –OU GenExpDB transcriptomes, experimental data –RegulonDB* operons and regulons –EcoliWiki everything E. coli; community curated –GONUTS Community curation of the Gene Ontology; not just E. coli More coming…

13 The need for Annotation is growing

14 “What is true of Escherichia coli is true of the elephant” - Jacques Monod “Thanks to annotation creep, what’s false for E. coli is false for the elephant too” - Jim Hu “What is true of Escherichia coli is true of the elephant” - Jacques Monod “Thanks to annotation creep, what’s false for E. coli is false for the elephant too” - Jim Hu http://www.pasteur.fr/infosci/archives/mon/im_ele.html

15 People are limiting for annotation Major MODs (EcoCyc, SGD, Wormbase, Flybase, MGI, Zfin, TAIR etc.) employ large numbers of PhD-level curators This model problematic for the future of biocuration, and not just for E. coli –Curators are expensive NIH and NSF cannot afford to staff every organism at this level –Broad expertise across all areas is hard Curators have to read papers in areas they were not trained in. Curators may not recognize the significance of papers in areas they were not trained in Can we make it: –cheaper? –faster? –better?

16 The Wikipedia approach Get your user community to work for free! Many groups have tried community annotation, with mixed success (at best) Wikipedia has added more than a million articles in English since I made the first version of this slide!

17 EcoliWiki http://ecoliwiki.org or.net or.com or come from EcoliHub

18 EcoliWiki philosophy Any registered user can edit Any registered user can register new users Any registered user can create new pages It's easier to revise than to create new content –Seed content from other places, mostly EcoCyc Any registered user can edit Any registered user can register new users Any registered user can create new pages It's easier to revise than to create new content –Seed content from other sites, mostly EcoCyc

19 But won't that invite chaos? GenBank's managers are dead set against letting users into GenBank's files, however. They say there already are procedures to deal with errors in the database, and researchers themselves have created secondary databases that improve on what GenBank has to offer. "That we would wholesale start changing people's records goes against our idea of an archive," says David Lipman, director of the National Center for Biotechnology Information (NCBI), GenBank's home in Bethesda, Maryland. "It would be chaos."

20 Correct compared to what? NCBI RefSeq: Wikipedia:

21 Correct compared to what? NCBI RefSeq: Wikipedia:

22 Correct compared to what? NCBI RefSeq: Wikipedia:

23 Correct compared to what?

24 This is how biology achieves fidelity A collage of books I haven’t read

25 Biology Wikis are proliferating

26 Participation is the major challenge Anyone can edit ≠ Anyone will edit Wikipedia: a tiny fraction of the users edit anything –A tiny fraction of those do major editing –Really big denominator Outreach to increase our user base

27 Participation is the major challenge Tools to make it easier to edit

28 Participation is the major challenge Biggest difference from other systems: –Partial annotations are wanted –It doesn't matter if you don't know the wiki markup –It doesn't matter if what you're adding isn't fully worked out Someone else can fix it And you can fix what others write

29 Community annotation for everyone What if I don't work on E. coli? Community annotation of gene function via the Gene Ontology Gene Ontology Normal Usage Tracking System (GONUTS) http://gowiki.tamu.edu

30 Community annotation for everyone Annotation pages based on UniProt IDs

31 The future of EcoliHub and EcoliWiki Making the resource more useful to the community –incorporating more resources –providing integration workflows –teaching users how to use them –adding content people want Making the approach available to other biology communities –reusable open source tools –public web services E. COLI 2008 don't forget the acknowledgements!

32 Thanks to EcoliWiki/GONUTS Team –Chris Elsik –Gwen Knapp –Debby Siegele –Daniel Renfro –Jerry Tsai –Xiaotao Qu –Rosemarie Swanson –Anand Venkatraman –Adrienne Zweifel Sabbatical hosts –SGD/Stanford –Stein Lab/CSHL GO consortium EcoliHub Team Leaders –Barry Wanner PI, Purdue –Walid Aref, co-PI, Purdue –Tyrell Conway, co-PI, Oklahoma –Mike Gribskov, co-PI, Purdue –Peter Karp, co-PI, SRI –Daisuke Kihara, co-PI, Purdue Funding NIH U24-GM077905 URLs:http:ecolihub.org http:ecoliwiki.org http:gowiki.tamu.edu


Download ppt "What's True For E. coli… Enlisting The Community In Ongoing Genome Annotation Jim Hu EcoliHub/EcoliWiki Texas A&M University."

Similar presentations


Ads by Google