An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006.

Slides:



Advertisements
Similar presentations
XML DOCUMENTS AND DATABASES
Advertisements

Sandra Orchard EMBL-EBI Molecular Interactions
AHRT: The Automated Human Resources Tool BY Roi Ceren Muthukumaran Chandrasekaran.
Progress Update Semantic Web, Ontology Integration, and Web Query Seminar Department of Computing David George.
MitoInteractome : Mitochondrial Protein Interactome Database Rohit Reja Korean Bioinformation Center, Daejeon, Korea.
The database approach to data management provides significant advantages over the traditional file-based approach Define general data management concepts.
Management Information Systems, Sixth Edition
5 EBI is an Outstation of the European Molecular Biology Laboratory. Master title Molecular Interactions – the IntAct Database Sandra Orchard EMBL-EBI.
The IntAct Database Sandra Orchard & Birgit Meldal.
5 EBI is an Outstation of the European Molecular Biology Laboratory. Master title Molecular Interactions – the IntAct Database Sandra Orchard EMBL-EBI.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Interoperation of Molecular Biology Databases Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International Menlo Park, CA
Managing Data Resources
ProInt Finder to Search Protein Interactions Shwe S. Lin Mentor: Matteo Pellegrini, UCLA.
The Hierarchy of Data Bit (a binary digit): a circuit that is either on or off Byte: 8 bits Character: each byte represents a character; the basic building.
Fundamentals of Information Systems, Second Edition 1 Organizing Data and Information Chapter 3.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
CBioC: Massive Collaborative Curation of Biomedical Literature Future Directions.
Cloud based linked data platform for Structural Engineering Experiment Xiaohui Zhang
XP Information Information is everywhere in an organization Employees must be able to obtain and analyze the many different levels, formats, and granularities.
Overview of the Database Development Process
Cytoscape A powerful bioinformatic tool Mathieu Michaud
Fundamentals of Information Systems, Third Edition2 Principles and Learning Objectives The database approach to data management provides significant advantages.
Enriching the Ontology for Biomedical Investigations (OBI) to Improve Its Suitability for Web Service Annotations Chaitanya Guttula, Alok Dhamanaskar,
Chapter 5 Lecture 2. Principles of Information Systems2 Objectives Understand Data definition language (DDL) and data dictionary Learn about popular DBMSs.
Fundamentals of Information Systems, Second Edition 1 Organizing Data and Information.
Copyright © 2003 by Prentice Hall Computers: Tools for an Information Age Chapter 13 Database Management Systems: Getting Data Together.
Overview  Introduction  Biological network data  Text mining  Gene Ontology  Expression data basics  Expression, text mining, and GO  Modules and.
Fundamentals of Information Systems, Fifth Edition
Database Technical Session By: Prof. Adarsh Patel.
Patient Empowerment for Chronic Diseases System Sifat Islam Graduate Student, Center for Systems Integration, FAU, Copyright © 2011 Center.
Concepts and Terminology Introduction to Database.
Organizing Data and Information AD660 – Databases, Security, and Web Technologies Marcus Goncalves Spring 2013.
Chapter 7: Database Systems Succeeding with Technology: Second Edition.
Introduction to Database Management. 1-2 Outline  Database characteristics  DBMS features  Architectures  Organizational roles.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Copyright OpenHelix. No use or reproduction without express written consent1.
Oleh Munawar Asikin. Principles of Information Systems, Seventh Edition 2  Database management system (DBMS): group of programs that manipulate database.
Value Set Resolution: Build generalizable data normalization pipeline using LexEVS infrastructure resources Explore UIMA framework for implementing semantic.
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
CIS 210 Systems Analysis and Development Week 6 Part II Designing Databases,
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Semantic Web Constraint Language complement and the editor development in Protégé Piao Guangyuan.
SupervisorStudent Prof. Atilla ElciHussam Hussein ABUAZAB June 2007 Using ORACLE XML Parser to Access Ontology CMPE 588 Engineering Semantic for.
Database Systems Basic Data Management Concepts
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Organizing Data and Information
Maps Without Boundaries Howard Klayman. Key Takeaways - GeoPDF Technology Provide access to geospatial data for anyone, anywhere GeoPDF Mapbooks provide.
Mining the Biomedical Research Literature Ken Baclawski.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Ontologies for the Semantic Web Prepared By: Tseliso Molukanele Rapelang Rabana Supervisor: Associate Professor Sonia Burman 20 July 2005.
IIC Information Flow Interesting ions? Priority list of interesting ions Empty priority list? QA/QC? Peptide identification Protein identification External.
Fundamentals of Information Systems, Sixth Edition Chapter 3 Database Systems, Data Centers, and Business Intelligence.
Chapter 5-1. Chapter 5-2 Chapter 5: Organizing and Manipulating the Data in Databases Introduction Normalization Validating the Data in Databases Extracting.
ArrayExpress Ugis Sarkans EMBL - EBI
MSG-085 2RS Common Interest Group SINEX OVERVIEW
THE LEONS COLLEGE OF LAW1 Organizing Data and Information Chapter 4.
Networks and Interactions
Cloud based linked data platform for Structural Engineering Experiment
Interactions and Ontologies
Overview of MDM Site Hub
Fundamentals of Information Systems, Sixth Edition
Fundamentals & Ethics of Information Systems IS 201
Overview Gene Ontology Introduction Biological network data
Evaluating Compuware OptimalJ as an MDA tool
CCO: concept & current status
Presentation transcript:

An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006

Overview  Problem Statement  Objectives  Approach  Background  Methodology  Evaluation  Demonstration  Conclusion

Problem Statement  Several sources for protein-protein interaction data  Different schemata  Different purposes  Different strengths/weaknesses

Objectives  Unify the data  Enable data mining  Evaluate reliability of data across data sources  Gain new information about the entire data set  Enable others to easily add other data sources to the set

Approach: ontology o ontology – n. 1. that which exists (philosophy) 2. that which is represented (artificial intelligence) o A descriptive data model o Defines the entities and relationships within a domain o Based upon data o Human-readable

Approach: ontology Data integration Enables simultaneous querying across multiple databases  Data transformation Enables interchange between database formats  Data mining Enables reasoning and learning over the entire data set

Background: Data Sources  DIP (Jing Xia) D atabase of I nteracting P roteins Most reliable data set Jing Xia  BIND (Abhijit Erande, Aaron Schoenhofer) B iomolecular I nteractions N etwork D atabank Very large data set Contains interactions, molecular complexes, and pathways

Background: Data Sources  MINT M olecular INT eractions database experimentally verified protein interactions Evaluates confidence level  IntAct Not limited to binary interactions Allows user submissions  mips CYGD M unich I nformation C enter for P rotein S equences: C omprehensive Y east G enome D atabase Limited to yeast Focuses on sequencing

Background: Tools  Protégé Open-Source Project Graphical ontology editor Interacts with OWL Reasoner Detailed API for modifying ontologies programmatically

Background: Tools  Prompt A Protégé Plugin Enables ontology mapping Enables ontology comparison

Background: Related Work  PSI-MI Controlled vocabulary for PPI data Not a proposed database structure Decreases the strength of information Helpful in defining relationships and keys

Methodology: Overview Q: What interactions have been observed between with protein A? DIPBINDMIPSMINTIntAct Web Interface Unified Ontology Unified Data Set Q: What experiments give evidence for a given interaction?

Methodology: Design  Review the singular database schemata and determine strengths/weaknesses  View data files Native formats PSI-MI formats  Create a unified schema of the data sources  Create the unified ontology in Protégé  Create each singular database as a subset of the unified ontology

Protégé Screenshot

Methodology: Data Import  DOMParser Load data from XML  Protégé-OWL API Insert entities into singular databases

Methodology: Transformation  Use Prompt to create a mapping for each specific data source to the unified ontology  Use Prompt mappings to insert individuals from each singular ontology into the unified model

Methodology: Transformation  Duplicate Data Need to fill in attributes on existing records Write ‘Algorithm Plugin’ for Prompt to determine when individuals are the same

Prompt Screenshot - Mapping

Methodology: Query Interface  Export Protégé data into MySQL  Web interface for collecting data  Working with domain experts to determine useful views, queries

Evaluation  Performance Transformation Time in Protégé Query Time for Web Interface  Size Minimize redundancy in data model Minimize duplicate data

Evaluation  Correctness Domain Experts  Dr. Brown, Dr. Wang Maintain proper data relationships  Utility Enrich data

Evaluation

Demonstration

Future Work  Complete transformations  Import data  Evaluate ontology  Add other databases to model

Conclusions  Adequate start  Needs improvement, evolution, more data sources  As the project matures, the ontology will be ready for use in the biological domain  Will be able to more easily gain information about protein-protein interactions

References  AAAI.org - AITopics: “Ontology”  Protégé owl.html owl.html  Prompt  PSI-MI

References  BIND  DIP  IntAct  MINT  MIPS

Q & A