Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006.

Similar presentations


Presentation on theme: "An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006."— Presentation transcript:

1 An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006

2 Overview  Problem Statement  Objectives  Approach  Background  Methodology  Evaluation  Demonstration  Conclusion

3 Problem Statement  Several sources for protein-protein interaction data  Different schemata  Different purposes  Different strengths/weaknesses

4 Objectives  Unify the data  Enable data mining  Evaluate reliability of data across data sources  Gain new information about the entire data set  Enable others to easily add other data sources to the set

5 Approach: ontology o ontology – n. 1. that which exists (philosophy) 2. that which is represented (artificial intelligence) o A descriptive data model o Defines the entities and relationships within a domain o Based upon data o Human-readable

6 Approach: ontology Data integration Enables simultaneous querying across multiple databases  Data transformation Enables interchange between database formats  Data mining Enables reasoning and learning over the entire data set

7 Background: Data Sources  DIP (Jing Xia) D atabase of I nteracting P roteins Most reliable data set Jing Xia  BIND (Abhijit Erande, Aaron Schoenhofer) B iomolecular I nteractions N etwork D atabank Very large data set Contains interactions, molecular complexes, and pathways

8 Background: Data Sources  MINT M olecular INT eractions database experimentally verified protein interactions Evaluates confidence level  IntAct Not limited to binary interactions Allows user submissions  mips CYGD M unich I nformation C enter for P rotein S equences: C omprehensive Y east G enome D atabase Limited to yeast Focuses on sequencing

9 Background: Tools  Protégé Open-Source Project Graphical ontology editor Interacts with OWL Reasoner Detailed API for modifying ontologies programmatically

10 Background: Tools  Prompt A Protégé Plugin Enables ontology mapping Enables ontology comparison

11 Background: Related Work  PSI-MI Controlled vocabulary for PPI data Not a proposed database structure Decreases the strength of information Helpful in defining relationships and keys

12 Methodology: Overview Q: What interactions have been observed between with protein A? DIPBINDMIPSMINTIntAct Web Interface Unified Ontology Unified Data Set Q: What experiments give evidence for a given interaction?

13 Methodology: Design  Review the singular database schemata and determine strengths/weaknesses  View data files Native formats PSI-MI formats  Create a unified schema of the data sources  Create the unified ontology in Protégé  Create each singular database as a subset of the unified ontology

14 Protégé Screenshot

15 Methodology: Data Import  DOMParser Load data from XML  Protégé-OWL API Insert entities into singular databases

16 Methodology: Transformation  Use Prompt to create a mapping for each specific data source to the unified ontology  Use Prompt mappings to insert individuals from each singular ontology into the unified model

17 Methodology: Transformation  Duplicate Data Need to fill in attributes on existing records Write ‘Algorithm Plugin’ for Prompt to determine when individuals are the same

18 Prompt Screenshot - Mapping

19 Methodology: Query Interface  Export Protégé data into MySQL  Web interface for collecting data  Working with domain experts to determine useful views, queries

20 Evaluation  Performance Transformation Time in Protégé Query Time for Web Interface  Size Minimize redundancy in data model Minimize duplicate data

21 Evaluation  Correctness Domain Experts  Dr. Brown, Dr. Wang Maintain proper data relationships  Utility Enrich data

22 Evaluation

23 Demonstration

24 Future Work  Complete transformations  Import data  Evaluate ontology  Add other databases to model

25 Conclusions  Adequate start  Needs improvement, evolution, more data sources  As the project matures, the ontology will be ready for use in the biological domain  Will be able to more easily gain information about protein-protein interactions

26 References  AAAI.org - AITopics: “Ontology” http://www.aaai.org/AITopics/html/ontol.html  Protégé http://protege.stanford.edu/overview/protege- owl.html http://protege.stanford.edu/overview/protege- owl.html  Prompt http://protege.cim3.net/cgi-bin/wiki.pl?Prompt  PSI-MI http://psidev.sourceforge.net/mi/xml/doc/user

27 References  BIND http://www.bind.ca  DIP http://www.dip.doe-mbi.ucla.edu  IntAct http://www.ebi.ac.uk/intact/site/  MINT http://mint.bio.uniroma2.it/mint/Welcome.do  MIPS http://mips.gsf.de/genre/proj/yeast

28 Q & A


Download ppt "An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006."

Similar presentations


Ads by Google