CBioC: Massive Collaborative Curation of Biomedical Literature Future Directions.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

TDWG GUID-2 June 10, 2006Jessie Kennedy/Rob Gales LSID Resolution In SEEK Taxon.
Introduction to Mendeley. What is Mendeley? Mendeley is a reference manager allowing you to manage, read, share, annotate and cite your research papers...
Social networks, in the form of bibliographies and citations, have long been an integral part of the scientific process. We examine how to leverage the.
CBioC: Massive Collaborative Curation of Biomedical Literature Chitta Baral, Hasan Davulcu, Anthony Gitter, Graciela Gonzalez, Geeta Joshi-Tope, Mutsumi.
Provenance in a Collaborative Bio-database RAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
1.Data categorization 2.Information 3.Knowledge 4.Wisdom 5.Social understanding Which of the following requires a firm to expend resources to organize.
The OKKAM project the quest for a web of uniquely identified entities Stefano Bocconi.
Human Language Technologies. Issue Corporate data stores contain mostly natural language materials. Knowledge Management systems utilize rich semantic.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
SESSION 10 MANAGING KNOWLEDGE FOR THE DIGITAL FIRM.
A Robust System Architecture For Mining Semi-structured Data By Aby M Mathew CSE
Chapter 3 Database Management
1 CBioC: Collaborative Bio- Curation Chitta Baral Department of Computer Science and Engineering Arizona State University.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
HYGIA Design and Application of new Artificial Intelligence techniques to the acquisition and use of medical knowledge represented as care pathways.
Model Driven Architecture (MDA) Partha Kuchana. Agenda What is MDA Modeling Approaches MDA in a NutShell MDA Models SDLC MDA Models (an Example) MDA -
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 7: Expert Systems and Artificial Intelligence Decision Support.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Class Projects. Future Work and Possible Project Topic in Gene Regulatory network Learning from multiple data sources; Learning causality in Motifs; Learning.
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PROJECT VISTA: Integrating Heterogeneous Utility Data A very brief overview.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
Data Mining Techniques
Knowledge Integration for Gene Target Selection Graciela Gonzalez, PhD Juan C. Uribe Contact:
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR.
Scratchpads Publication Module - A paradigm shift in publishing RBG Kew, Seminar,
Life Sciences Integrated Demo Joyce Peng Senior Product Manager, Life Sciences Oracle Corporation
Introduction to Mendeley. What is Mendeley? Mendeley is a reference manager allowing you to manage, read, share, annotate and cite your research papers...
Mobile Topic Maps for e-Learning John McDonald & Darina Dicheva Intelligent Information Systems Group Computer Science Department Winston-Salem State University,
Networks and Interactions Boo Virk v1.0.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
Adaptive Hypermedia Tutorial System Based on AHA Jing Zhai Dublin City University.
CSE 219 Computer Science III Program Design Principles.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Copyright OpenHelix. No use or reproduction without express written consent1.
1 Mapping to Relational Databases Presented by Ramona Su.
Systems Analysis and Design in a Changing World, Fourth Edition
Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
Title Carolina First Steering Committee October 9, 2010 Online Voting System Design Yinpeng Li and Tian Cao May 3, 2011.
Xml:tm XML Text Memory Using XML technology to reduce the cost of translating XML documents.
Algorithmic Detection of Semantic Similarity WWW 2005.
Rational Unified Process Fundamentals Module 7: Process for e-Business Development Rational Unified Process Fundamentals Module 7: Process for e-Business.
A radiologist analyzes an X-ray image, and writes his observations on papers  Image Tagging improves the quality, consistency.  Usefulness of the data.
Trustworthy Semantic Webs Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #4 Vision for Semantic Web.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Web Technologies for Bioinformatics Ken Baclawski.
EBI is an Outstation of the European Molecular Biology Laboratory. Literature Resources at the EBI Information Workshop on European Bioinformatics Resources.
September 6, GJXDM Users Conference NCIC Schema Challenges Patrice A. Yuh
GEM METADATA DEVELOPMENT Xiaoping Wang, Macrosearch Allen Macklin, PMEL and Bernard Megrey, AFSC.
© 2003 Prentice Hall, Inc.3-1 Chapter 3 Database Management Information Systems Today Leonard Jessup and Joseph Valacich.
Riccardi: DIALOGUE Workshop August 1, 2005 Supported by NSF BDI 1 Representing and Using Phylogenetic Characters in Morphbank Greg Riccardi, David Gaitros,
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
What is a database? (a supplement, not a substitute for Chapter 1…) some slides copied/modified from text Collection of Data? Data vs. information Example:
CCNT Lab of Zhejiang University
Model Curation Edmund J. Crampin Auckland Bioengineering Institute
Graduation Project Kick-off presentation - SET
Existing Designs and Prototypes at RPI
Causal Models Lecture 12.
A framework for ontology Learning FROM Big Data
Presentation transcript:

CBioC: Massive Collaborative Curation of Biomedical Literature Future Directions

Recap: The Problem Curation of “knowledge” nuggets from Biomedical articles. About 15 million abstracts in Pubmed 3 million published by US and EU researchers during (800 articles per day) 300 K articles published so far reporting protein-protein interactions in human, yeast and mouse. BIND (in 7 yrs) -- 23K ; DIP – 3K; MINT – 2.4K.

Recap: our proposed solution Harness available human power: scientists around the world Seamlessly provide curation platform (web- based) to pop up while they research Even a little input from each counts Collaborators get immediate rewards

Future work & projects Extraction of other relationships (gene-disease, gene-organ...) Have prototype in related project, need improvement and formal testing (measuring accuracy) Extraction of organism info for each entity in a relationship High-priority. Use existing software for extraction, but need to use biological databases and algorithms for deducing info (not explicit), and allow users to correct this info. Example, PMID Example Use ontologies and some automated tools to ensure consistency and cross-link info 2 people. Information entered by users needs to be validated against existing DB & ontologies. Also, need to tag our data for cross-reference. ExampleExample

Future work & projects (2) Support query processing in CBioC at a basic level Users want/need to access the facts directly, not only “related articles” but facts about a specific vote patterns, entities, etc. Incorporate data from other interaction databases Done for one (BIND), but needs to be revamped to include other databases & left semi-automatized for updates Integrate CBioC data w/ other traditionally curated databases Allow users to transparently access and query all the biological interaction databases. Need to map schemas, select appropriate sources “on the fly”, and provide provenance explanation on query results.

Future work & projects (3) Image extension - extracts images & information about images and allows collaborative curation. Take PDFs & other structured documents, and extract images with their captions & references within the text, then let users polish. Related.Related Develop adaptable software platform for similar applications. This is to be a flexible (adaptable) system that users can “generate” online for their own scientific needs. A “non-scientific” example.A “non-scientific” example Curating & representing pathways: linking related facts There are others that have done representation, but need to design & implement UI consistent w/ CBioC for curation. Example.Example.

Future work & projects (4) Recommender system that uses data from “user network” (votes, authors, etc) Have a related project that recommends, but need to take advantage of CBioC’s data. Handle incomplete data in CBioC Data obtained from text extraction or data integration is inherently incomplete. Here, we seek to predict missing values –using domain knowledge- and process queries even w/ the incomplete data Handle uncertain data in CBioC Associate confidence levels to all the facts curated by CBioC based on user trustworthiness and use these appropriately while processing user queries. Support advanced query processing UI to allow uncertainty & incompletness handling features described above.