Download presentation
Presentation is loading. Please wait.
Published byRebecca Johnson Modified over 9 years ago
1
What's True For E. coli… Enlisting The Community In Ongoing Genome Annotation Jim Hu EcoliHub/EcoliWiki Texas A&M University
2
Why more E. coli websites? The number of E. coli databases is large Extensive coverage exists for many aspects of E. coli biology Journals contain half a century of E. coli data Don't we already know everything?
3
Why more E. coli websites? The number of E. coli databases is large Extensive coverage exists for many aspects of E. coli biology Journals contain half a century of E. coli data Don't we already know everything? #(1-3) The problem isn't the amount of information, it's finding it #4: No
4
The diversity of information on different genomes, proteins, phenotypes and so on makes it difficult to keep track of all details. Molecular Systems Biology 3:128 (2007) Why more E. coli websites? Part of what we don't know yet is how the things we do know fit together Most of us need help mining what's out there The diversity of information on different genomes, proteins, phenotypes and so on makes it difficult to keep track of all details. Molecular Systems Biology 3:128 (2007)
5
1-2:30 today: Session 173/K Poster K-133, Board 0542 EcoliHub: Development of the Information Resource Problems and approaches Finding data from different resources –EcoliHub - information from collaborating biological electronic data resources Making data curation faster, cheaper, and better –EcoliWiki - community annotation for E. coli K-12 Community functional curation for cross-species comparison –GONUTS - a community Gene Ontology resource 1-2:30 today: Session 173/K Poster K-133, Board 0542 EcoliHub: Development of the Information Resource
6
Integrating information from multiple sites EcoliHub is based on web services A user query to EcoliHub is passed on to participating sites http://ecolihub.org or http://ecolicommunity.org
7
Integrating information from multiple sites EcoliHub is based on web services A user query to EcoliHub is passed on to participating sites EcoliHub gathers the responses and assembles output for the user http://ecolihub.org or http://ecolicommunity.org
8
Integrating information from multiple sites
9
But the users won't have to start at the EcoliHub site
10
Integrating information from multiple sites But the users won't have to start at the EcoliHub site EcoliHub will provide the infrastructure to help member sites do peer-to- peer queries who has info? Try EcoCyc and RegulonDB
11
Integrating information from multiple sites But the users won't have to start at the hub site EcoliHub will provide the infrastructure to help member sites do peer-to- peer queries The users don't need to know or care about the EcoliHub
12
What kinds of nodes are connected to EcoliHub? So far: –EcoCyc everything E. coli; professionally curated –EcoGene* everything E. coli; professionally curated –GenoBase functional genomics and resources –EcoliPredict protein structure models –OU GenExpDB transcriptomes, experimental data –RegulonDB* operons and regulons –EcoliWiki everything E. coli; community curated –GONUTS Community curation of the Gene Ontology; not just E. coli More coming…
13
The need for Annotation is growing
14
“What is true of Escherichia coli is true of the elephant” - Jacques Monod “Thanks to annotation creep, what’s false for E. coli is false for the elephant too” - Jim Hu “What is true of Escherichia coli is true of the elephant” - Jacques Monod “Thanks to annotation creep, what’s false for E. coli is false for the elephant too” - Jim Hu http://www.pasteur.fr/infosci/archives/mon/im_ele.html
15
People are limiting for annotation Major MODs (EcoCyc, SGD, Wormbase, Flybase, MGI, Zfin, TAIR etc.) employ large numbers of PhD-level curators This model problematic for the future of biocuration, and not just for E. coli –Curators are expensive NIH and NSF cannot afford to staff every organism at this level –Broad expertise across all areas is hard Curators have to read papers in areas they were not trained in. Curators may not recognize the significance of papers in areas they were not trained in Can we make it: –cheaper? –faster? –better?
16
The Wikipedia approach Get your user community to work for free! Many groups have tried community annotation, with mixed success (at best) Wikipedia has added more than a million articles in English since I made the first version of this slide!
17
EcoliWiki http://ecoliwiki.org or.net or.com or come from EcoliHub
18
EcoliWiki philosophy Any registered user can edit Any registered user can register new users Any registered user can create new pages It's easier to revise than to create new content –Seed content from other places, mostly EcoCyc Any registered user can edit Any registered user can register new users Any registered user can create new pages It's easier to revise than to create new content –Seed content from other sites, mostly EcoCyc
19
But won't that invite chaos? GenBank's managers are dead set against letting users into GenBank's files, however. They say there already are procedures to deal with errors in the database, and researchers themselves have created secondary databases that improve on what GenBank has to offer. "That we would wholesale start changing people's records goes against our idea of an archive," says David Lipman, director of the National Center for Biotechnology Information (NCBI), GenBank's home in Bethesda, Maryland. "It would be chaos."
20
Correct compared to what? NCBI RefSeq: Wikipedia:
21
Correct compared to what? NCBI RefSeq: Wikipedia:
22
Correct compared to what? NCBI RefSeq: Wikipedia:
23
Correct compared to what?
24
This is how biology achieves fidelity A collage of books I haven’t read
25
Biology Wikis are proliferating
26
Participation is the major challenge Anyone can edit ≠ Anyone will edit Wikipedia: a tiny fraction of the users edit anything –A tiny fraction of those do major editing –Really big denominator Outreach to increase our user base
27
Participation is the major challenge Tools to make it easier to edit
28
Participation is the major challenge Biggest difference from other systems: –Partial annotations are wanted –It doesn't matter if you don't know the wiki markup –It doesn't matter if what you're adding isn't fully worked out Someone else can fix it And you can fix what others write
29
Community annotation for everyone What if I don't work on E. coli? Community annotation of gene function via the Gene Ontology Gene Ontology Normal Usage Tracking System (GONUTS) http://gowiki.tamu.edu
30
Community annotation for everyone Annotation pages based on UniProt IDs
31
The future of EcoliHub and EcoliWiki Making the resource more useful to the community –incorporating more resources –providing integration workflows –teaching users how to use them –adding content people want Making the approach available to other biology communities –reusable open source tools –public web services E. COLI 2008 don't forget the acknowledgements!
32
Thanks to EcoliWiki/GONUTS Team –Chris Elsik –Gwen Knapp –Debby Siegele –Daniel Renfro –Jerry Tsai –Xiaotao Qu –Rosemarie Swanson –Anand Venkatraman –Adrienne Zweifel Sabbatical hosts –SGD/Stanford –Stein Lab/CSHL GO consortium EcoliHub Team Leaders –Barry Wanner PI, Purdue –Walid Aref, co-PI, Purdue –Tyrell Conway, co-PI, Oklahoma –Mike Gribskov, co-PI, Purdue –Peter Karp, co-PI, SRI –Daisuke Kihara, co-PI, Purdue Funding NIH U24-GM077905 URLs:http:ecolihub.org http:ecoliwiki.org http:gowiki.tamu.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.