1 SRI International Bioinformatics The Ocelot Frame Knowledge Representation System Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International.

Slides:



Advertisements
Similar presentations
SRI International Bioinformatics 1 Navigation to Related Objects Bioinformatics Research Group SRI International Mario Latendresse.
Advertisements

Introduction to LISP Programming of Pathway Tools Queries and Updates.
Editing Pathway/Genome Databases. SRI International Bioinformatics Pathway Tools Paradigm Separate database from user interface Navigator provides one.
Chapter 1: The Database Environment
Copyright © 2003 Pearson Education, Inc. Slide 8-1 The Web Wizards Guide to PHP by David Lash.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
Relational Database and Data Modeling
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
© 1998, Progress Software Corporation 1 Migration of a 4GL and Relational Database to Unicode Tex Texin International Product Manager.
SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION
Addition Facts
Preserving and Sharing Digital Data Greg Colati, Director, Archives and Special Collections May 11, 2012.
Introduction Lesson 1 Microsoft Office 2010 and the Internet
Report Card P Only 4 files are exported in SAMS, but there are at least 7 tables could be exported in WebSAMS. Report Card P contains 4 functions: Extract,
Configuration management
Software change management
Information Systems Today: Managing in the Digital World
Chapter 18 Methodology – Monitoring and Tuning the Operational System Transparencies © Pearson Education Limited 1995, 2005.
OO databases 1 Object Oriented databases. OO databases 2 Developing OODBMS - motivation motivation more and more application areas require systems that.
© 2011 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. Towards a Model-Based Characterization of Data and Services Integration Paul.
Chapter 10: Designing Databases
Creating Tables. 2 home back first prev next last What Will I Learn? List and provide an example of each of the number, character, and date data types.
4 Oracle Data Integrator First Project – Simple Transformations: One source, one target 3-1.
Jane Reid, BSc/IT DB, QMUL, 25/2/02 1 Object-oriented DBMS Background to ODBMS ODBMS requirements Object components ODB conceptual design –Graphical ODB.
the Entity-Relationship (ER) Model
Database System Concepts and Architecture
Chapter 9: The Client/Server Database Environment
Lecture plan Outline of DB design process Entity-relationship model
Semantic multimedia annotation tool Tutorial authors : Batatia, Piombo
Addition 1’s to 20.
Week 1.
Distributed DBMS©M. T. Özsu & P. Valduriez Ch.15/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Chapter 13 The Data Warehouse
Computer Concepts BASICS 4th Edition
SRI International Bioinformatics Data Import / Export Markus Krummenacker Bioinformatics Research Group SRI, International Q
SRI International Bioinformatics Comparative Analysis Q
SRI International Bioinformatics 1 The consistency Checker, or Overhauling a PGDB By Ron Caspi.
The Pathway Tools Schema. SRI International Bioinformatics Motivations for Understanding Schema Pathway Tools visualizations and analyses depend upon.
Interoperation of Molecular Biology Databases Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International Menlo Park, CA
1 Introduction The Database Environment. 2 Web Links Google General Database Search Database News Access Forums Google Database Books O’Reilly Books Oracle.
Pathway/Genome Databases and Software Tools Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International
Creating a … Community Database Organism-Specific Database Model-Organism Database.
Chapter 1: The Database Environment
SRI International Bioinformatics 1 Gene Ontology in Pathway Tools: Internals.
Integration of E. Coli Data (E. coli Pathway and Genomic Data from BioCyc) Jesse Walsh.
SRI International Bioinformatics 1 Recent Developments in Pathway Tools GMOD Workshop November ‘07 Suzanne Paley Bioinformatics Research Group SRI International.
SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman & Mario Latendresse Bioinformatics Research Group SRI, International.
SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman & Mario Latendresse Bioinformatics Research Group SRI, International.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Lecture2: Database Environment Prepared by L. Nouf Almujally 1 Ref. Chapter2 Lecture2.
The consistency Checker, or Overhauling a PGDB By Ron Caspi.
The Pathway Tools Schema. SRI International Bioinformatics Motivations for Understanding Schema Pathway Tools visualizations and analyses depend upon.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Object Oriented Database By Ashish Kaul References from Professor Lee’s presentations and the Web.
SRI International Bioinformatics 1 The Structured Advanced Query Page Mario Latendresse Tomer Altman Bioinformatics Research Group SRI International March,
SRI International Bioinformatics 1 Editing Pathway/Genome Databases Ron Caspi.
E.Bertino, L.Matino Object-Oriented Database Systems 1 Chapter 9. Systems Seoul National University Department of Computer Engineering OOPSLA Lab.
SRI International Bioinformatics 1 Pathway Tools Features Available Only in the Desktop Version PathoLogic.
SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman Mario Latendresse Bioinformatics Research Group SRI International April.
1 10 Systems Analysis and Design in a Changing World, 2 nd Edition, Satzinger, Jackson, & Burd Chapter 10 Designing Databases.
Recent Developments and Future Directions in Pathway Tools Peter D. Karp SRI International.
PythonCyc and other APIs A Python package to access Pathway Tools and its data using the Python programming language Mario Latendresse March 2016.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Introduction to Database Programming with Python Gary Stewart
Editing Pathway/Genome Databases
Why Create a PGDB? Perform pathway analyses as part of a genome project Analyze omics data Create a central public information resource for the organism,
The Pathway Tools Schema
How to Administer a PGDB
Presentation transcript:

1 SRI International Bioinformatics The Ocelot Frame Knowledge Representation System Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International

2 SRI International Bioinformatics Frame Knowledge Representation Systems Long history of development in the AI knowledge representation community Distant cousin of object-oriented databases (convergent evolution) Background reading on frame systems l P. Karp, The design space of frame knowledge representation systems u l P. Karp, Distinguishing Knowledge Bases and Data Bases: Who's on First and What's on Second u

3 SRI International Bioinformatics Ocelot Information P.D. Karp et al, A collaborative environment for authoring large knowledge bases, J Intelligent Information Systems 13: Ocelot Users Guide

4 SRI International Bioinformatics Pathway Tools Architecture Ocelot DBMS Generic Frame Protocol Pathway Genome Navigator Web Mode Desktop Mode Protein Editor Pathway Editor Reaction Editor Oracle or MySQL Disk File Lisp API PerlCyc API JavaCyc API

5 SRI International Bioinformatics Ocelot Data Model Ocelot database l Aka DB, Knowledge Base, KB, PGDB An Ocelot database is a collection of frames and slots

6 SRI International Bioinformatics Ocelot Frames Two kinds of frames: l Classes: Genes, Pathways, Biosynthetic Pathways l Instances (objects): trpA, TCA cycle A symbolic frame name (id, key) uniquely identifies each frame l Examples: EG10223, TRP, Proteins Classes have Superclass(es), Subclass(es), Instance(s) Instances have one or more parent classes

7 SRI International Bioinformatics Slots Encode attributes and properties of a frame l Molecular weight, gene coordinates, comments Represent relationships between frames l The value of a slot is the identifier of another frame

8 SRI International Bioinformatics Slots Number of values l Single valued l Multivalued: sets or lists Slot values l Integer, real, string, symbol (frame name) Every slot is described by a slot frame (slotunit) in a KB that defines meta information about that slot l Datatype, classes it pertains to, constraints l Enumerations l Two slots are inverses if they encode opposite relationships u Slot Product in class Genes u Slot Gene in class Polypeptides

9 SRI International Bioinformatics Ocelot Data Model Frame data model compared to relational model: Minimizes size of schema relative to semantic complexity Inheritance lets us define new classes by modifying existing classes Relational normalization breaks multivalued attributes into separate tables – not needed in frame data model

10 SRI International Bioinformatics Ocelot Schema Schema is stored within the DB Schema is self documenting Slot frames define metadata about slots Schema evolution facilitated by l Easy addition/removal of slots, or alteration of slot datatypes l Flexible data formats that do not require dumping/reloading of data l New versions of Pathway Tools include a schema upgrade function u Updates schema to match that of new MetaCyc version u Transforms data into new schema

11 SRI International Bioinformatics Ocelot Storage System Architecture Persistent storage via disk files or Oracle or MySQL Oracle or MySQL (RDBMS KBs) l Concurrent development by multiple users l Incrementally fault in frames as referenced by the application l Incrementally save modified frames only l Stores complete transaction history of PGDB Disk files l Updating by a single user at a time l Read in entirety at start of session l Write in entirety at every save

12 SRI International Bioinformatics Figure showing multiple users tapping into one mysql server

13 SRI International Bioinformatics Ocelot Storage Subsystem RDBMS KBs RDBMS schema is independent of application schema DBMS is submerged within Ocelot, invisible to users Frames transferred from DBMS to Ocelot l On demand l By background prefetcher l Memory cache l Persistent disk cache speeds performance via Internet

14 SRI International Bioinformatics Ocelot Frame Faulting When a frame is referenced by Pathway Tools l Look in Ocelot virtual memory l Look in disk cache l Look in RDBMS

15 SRI International Bioinformatics Ocelot RDBMS Transaction History RDBMS KBs store complete transaction history Stored as sequences of GFP operations executed by the user or by Pathway Tools Right click -> Show -> Changes in pop-up window Used to compute gene last-curated date Can be used to open a PGDB in an earlier state

16 SRI International Bioinformatics Ocelot RDBMS Concurrency Control When user A saves updates: l Ocelot queries all transactions that occurred since A last saved or since the start of As session l Ocelot compares the operations in those transactions with the updates made by A l If conflicts are found, save does not occur and conflicts are reported to the user l If no conflicts, save proceeds l Other user transactions are evaluated into As session u Refresh

17 SRI International Bioinformatics Ocelot Update Conflicts Example conflicting updates: l User A deletes frame F ; User B modifies value in slot F l User A changes MW of protein P from 3 to 4 ; User B changes MW of protein P from 3 to 5 Example of updates that dont conflict: l User A updates frame E ; User B updates frame F l User A updates the value of P.MW ; User B updates the value of P.pI l Users A and B both delete all values of P.MW

18 SRI International Bioinformatics Revert KB Operation Undoes all changes in current session

19 SRI International Bioinformatics Pathway Tools / BioCyc Software/Database Bundles Each downloadable Pathway Tools configuration contains a combination of PGDBs Those PGDBs are loaded into Lisp virtual memory Build process: l Start Common Lisp l Load in all Pathway Tools compiled Lisp code into virtual memory l Load in all PGDBs for that configuration into virtual memory l Save virtual memory image as binary executable file

20 SRI International Bioinformatics Full BioCyc or Tier Configuration 507 PGDBs loaded into virtual memory

21 SRI International Bioinformatics BioCyc at 10,000 Genomes Scalability of current approach is limited New approach: For full BioCyc, store PGDBs not in virtual memory but in Franz AllegroCache AllegroCache is a Common Lisp object-oriented database Implementation now in hand for Ocelot We have done extensive performance testing Performance looks good to 10,000 PGDBs