Download presentation
Presentation is loading. Please wait.
1
CBioC: Massive Collaborative Curation of Biomedical Literature Future Directions
2
Recap: The Problem Curation of “knowledge” nuggets from Biomedical articles. About 15 million abstracts in Pubmed 3 million published by US and EU researchers during 1994-2004 (800 articles per day) 300 K articles published so far reporting protein-protein interactions in human, yeast and mouse. BIND (in 7 yrs) -- 23K ; DIP – 3K; MINT – 2.4K.
3
Recap: our proposed solution Harness available human power: scientists around the world Seamlessly provide curation platform (web- based) to pop up while they research Even a little input from each counts Collaborators get immediate rewards
11
Future work & projects Extraction of other relationships (gene-disease, gene-organ...) Have prototype in related project, need improvement and formal testing (measuring accuracy) Extraction of organism info for each entity in a relationship High-priority. Use existing software for extraction, but need to use biological databases and algorithms for deducing info (not explicit), and allow users to correct this info. Example, PMID 16107876.Example Use ontologies and some automated tools to ensure consistency and cross-link info 2 people. Information entered by users needs to be validated against existing DB & ontologies. Also, need to tag our data for cross-reference. ExampleExample
12
Future work & projects (2) Support query processing in CBioC at a basic level Users want/need to access the facts directly, not only “related articles” but facts about a specific vote patterns, entities, etc. Incorporate data from other interaction databases Done for one (BIND), but needs to be revamped to include other databases & left semi-automatized for updates Integrate CBioC data w/ other traditionally curated databases Allow users to transparently access and query all the biological interaction databases. Need to map schemas, select appropriate sources “on the fly”, and provide provenance explanation on query results.
13
Future work & projects (3) Image extension - extracts images & information about images and allows collaborative curation. Take PDFs & other structured documents, and extract images with their captions & references within the text, then let users polish. Related.Related Develop adaptable software platform for similar applications. This is to be a flexible (adaptable) system that users can “generate” online for their own scientific needs. A “non-scientific” example.A “non-scientific” example Curating & representing pathways: linking related facts There are others that have done representation, but need to design & implement UI consistent w/ CBioC for curation. Example.Example.
14
Future work & projects (4) Recommender system that uses data from “user network” (votes, authors, etc) Have a related project that recommends, but need to take advantage of CBioC’s data. Handle incomplete data in CBioC Data obtained from text extraction or data integration is inherently incomplete. Here, we seek to predict missing values –using domain knowledge- and process queries even w/ the incomplete data Handle uncertain data in CBioC Associate confidence levels to all the facts curated by CBioC based on user trustworthiness and use these appropriately while processing user queries. Support advanced query processing UI to allow uncertainty & incompletness handling features described above.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.