Research Collaboratory for Structural Bioinformatics Macromolecular Structure Middleware OpenMMS An Ontology Driven Architecture
Research Collaboratory for Structural Bioinformatics Overview The mmCIF Ontology OpenMMS Toolkit Macromolecular Structure (MMS) Metamodel Parser, XML SQL / Corba Servers and Clients Corba UML and the future...
Research Collaboratory for Structural Bioinformatics How do we “Enable” Science? n Promote well defined Macromolecular Structure (MMS) Specifications n Distribution – Open Interfaces –Now: flat files W3 browsing and searching –Future: XML, SQL, CORBA
Research Collaboratory for Structural Bioinformatics Why OpenMMS? n Allow programmers to more easily create efficient, high performance and robust applications. n A Java-only toolkit with that creates XML, CORBA and Relational DB representations of the mmCIF Macromolecular Structure Data. n Source code is publicly available so users can easily modify the metamodel or create an entirely new one.
Research Collaboratory for Structural Bioinformatics What Do We Mean by an Ontology Driven Architecture? What do we mean by an Ontology? A bridge between Our World of Natural Language and the World of Machines.
Research Collaboratory for Structural Bioinformatics mmCIF Dictionary and Data Files n Based on Ontology for Macromolecular Structure defined by the International Union of Crystallography n Replaces the older 80-Column PDB files n mmCIF Dictionary contains over 140 Category and 1600 Item definitions n Open, Extensible n Provides a well-defined reference standard for data distribution
Research Collaboratory for Structural Bioinformatics OpenMMS Toolkit Data Flow ApplicationsApplications mmCIF Data Files (Reference Standard) Corba Server Relational Database mmCIF Parsers XML Files
Research Collaboratory for Structural Bioinformatics Metamodel Information Flow mmCIF Dictionary Metamodel Framework Corba IDL, SQL Schema, XML DTD, Java Data Loaders JDBC Loaders mmCIF Ontology Metamodel
Research Collaboratory for Structural Bioinformatics What can OpenMMS do? n PDBase program will load any or all PDB files into any SQL-92 compatible database (Oracle, mySQL, Sybase...) n Translate any PDB file into an XML file. n Contains Two Corba servers: –Reference server will cache and serve data read from PDB flat files. –DB server will cache and serve data read from a SQL database (very quickly...) n All Source code written in Java and publicly available.
Research Collaboratory for Structural Bioinformatics Some Advantages of Using an Ontology Driven Architecture n Scales to very large Ontologies n More reliable and maintainable code n Transfer between representations n Scientific Correctness of representation n Help in maintaining backward compatibility
Research Collaboratory for Structural Bioinformatics How does one actually represent an ontology? (OpenMMS Internal Metamodel Overview) Root Module Interface Field Struct Field Visitor Abstract Class Visitor Subclass
Research Collaboratory for Structural Bioinformatics mmCIF Parsers n General Purpose, Low-level access to data n Parsers available in many languages n OpenMMS toolkit includes Java Parser –Uses “Builder” Design Pattern –An application subclasses Abstract Builder class and stores data into its data structures
Research Collaboratory for Structural Bioinformatics MMS in XML n Large Flat Files (open and close tags) n Tables can be grouped by rows or columns n XML from SQL Query –Many requests from Web browsers don’t really need or want all the data –SW available from DB Vendors and ISVs for creating XML files from SQL result sets –Smaller files load faster
Research Collaboratory for Structural Bioinformatics Relational DB Expression n SQL-92 Compatible n Schemas for all the standard DB vendors n Fast and Flexible Keyword searches n PDBase loader allows structures to be selectively loaded n Oracle Instance Tested –14,556 Structures –16GB, 88 Million Atom Records
Research Collaboratory for Structural Bioinformatics A very high-level (and very-rough) classification of communication n Person-to-Person communication – n Person-to-Machine communication –HTTP/HTML n Machine-to-Machine communication –CORBA, SQL,.NET, Soap n Not Communications -> Data Formats –XML, mmCIF (STAR), many more …
Research Collaboratory for Structural Bioinformatics What is CORBA? Common Object Request Broker Architecture Defines a family of open software interface specifications for distributed object computing.
Research Collaboratory for Structural Bioinformatics What is an Object? “ A Data Structure with an Attitude” Programs = Algorithms + Data Structure Object Oriented Programming Principle: Partition the parts of algorithms with the data structures they use
Research Collaboratory for Structural Bioinformatics Side View of a Distributed Application Client E.g. a Java Applet Server E.g. Mainframe Computer Server Internet (TCP/IP) Middle Ware Middle Ware Network IDL
Research Collaboratory for Structural Bioinformatics The “Hourglass” view of the Internet Unreliable Datagrams Reliable Bitsteam Applications TCP, RTP,... IP Copper, Glass Radio Spectrum HTTP, Corba,.NET OO High-Level Interface (ATM, Ethernet, V.90, SONET...)
Research Collaboratory for Structural Bioinformatics Where is Corba? n Inside every Java Runtime Environment. n Commonly used in middle tier and backend (e.g. database) connections. n Open Source and Commercial Implementations Available n Usually buried deep inside the software –Difficult or impossible to tell when it is being used
Research Collaboratory for Structural Bioinformatics What is Distributed Object Computing? n Extends the benefits of object-oriented technology across process and machine boundaries to encompass entire networks. n Attempts to make remote objects appear to programmers as if they were local objects in the same process. This is called location transparency.
Research Collaboratory for Structural Bioinformatics Advantages of Distributed Object Computing n Easier (and faster) for programmers to create distributed applications n Increases Reliability n Increases Maintainability n Increases Portability n Increases Extensibility
Research Collaboratory for Structural Bioinformatics The Alphabet Soup n OMG = Object Management Group Consortium of 800+ companies founded in n IDL = Interface Definition Language
Research Collaboratory for Structural Bioinformatics n The key is to focus on boundaries, interfaces, how things fit together n Not on the internal details of how they’re built; assume that will be diverse & changing Shape of boundary is defined in IDL Boundaries, Interfaces
Research Collaboratory for Structural Bioinformatics The Interface to an object can be distributed over a network The Interface to an object can be distributed over a network The glue that binds parts together is the ORB Shape of boundary is defined in IDL Boundaries, Interfaces
Research Collaboratory for Structural Bioinformatics Corba Independence n Open Standard for Distributed Object Oriented Design n Independent of Hardware Platform n Independent of Operating System n Independent of Programming Language n Independent of Object Location
Research Collaboratory for Structural Bioinformatics Object Request Broker Client Object L IDLIDL n ORBs mediate between objects and things that use them (clients) Object Request Broker
Research Collaboratory for Structural Bioinformatics Terminology n IIOP – The Internet Inter-ORB Protocol, defined in the Spec as a vendor-independent, wire- level network protocol on top of TCP/IP. This allows ORB implementations of different vendors to interoperate.
Research Collaboratory for Structural Bioinformatics ORB JavaPerlC++CAdaJava VBActiveX Corba / IIOP—Internet Inter-ORB Protocol ORBs: Medium for Integration
Research Collaboratory for Structural Bioinformatics Corba Facilities: Industry Standards in Vertical Markets n Manufacturing n Finance n Life Sciences Research n C4I n Many others...
Research Collaboratory for Structural Bioinformatics Using Corba to access Macromolecular Structure Data n No Parsing of Flat Files n Direct Access to Binary Data Structures n Strongly Typed Data n Granularity of Access n Indices and Presence Flags Pre-computed n Highest Performance
Research Collaboratory for Structural Bioinformatics OMG/LSR Macromolecular Structure Adoption Process n August 1999RFP issued n March 2000Initial Submission September 2000Revised Submission February 2001Adopted Spec by the OMG 4Q 2001OpenMMS LSR/MMS1.0 compliant implementation source code publicly available February 2002Approved as a Formal OMG Available Specification.
Research Collaboratory for Structural Bioinformatics Using the CORBA MMS Server An excerpt from legacy PDB Formatted File ATOM Record (4hhb.ent)... ATOM 6 CG1 VAL A ATOM 7 CG2 VAL A ATOM 8 N LEU A ATOM 9 CA LEU A ATOM 10 C LEU A ATOM 11 O LEU A ATOM 12 CB LEU A ATOM 13 CG LEU A ATOM 14 CD1 LEU A ATOM 15 CD2 LEU A
Research Collaboratory for Structural Bioinformatics LSR/MMS “ATOM Record” struct AtomSite { string id; IndexId type_symbol; AtomIndex label; IndexId label_entity; VectorXYZ cartn; float occupancy; float b_iso_or_equiv; }; DsLSRMacromolecularStructure.idl excerpt:
Research Collaboratory for Structural Bioinformatics Example Code and Resulting Output Entry e = entryFactory.get_entry_from_id(”4hhb"); AtomSite[] a = e.get_atom_site_list(); for (int i = 0; i < a.length; i++) { System.out.println(a[i].id + " " + a[i].type_symbol.id + " (" + a[i].cartn.x + ", " + a[i].cartn.y + ", " + a[i].cartn.z + ")"); } produces: 1 N (11.065, 7.352, 9.598) 2 C (12.436, 7.764, 9.902) 3 C (12.883, 7.09, ) 4 O (12.088, 7.0, ) 5 C (12.611, 9.264, 10.06)...
Research Collaboratory for Structural Bioinformatics What are the alternatives to Corba? n TCP/IP Sockets - Byte stream n DCOM, COM++, OLE,.NET (Microsoft Only) –DCOM Corba Bridges are available from several vendors n SOAP (Simple Object Access Protocol) –XML Based
Research Collaboratory for Structural Bioinformatics Unified Modeling Language – UML What do all those arrows and boxes Mean? n Schematic Language for Defining SW n Graphics Representations n UML = Things, Relations and Diagrams n 9 types of Diagrams n The most commonly used diagram is the “Class Diagram”
Research Collaboratory for Structural Bioinformatics UML Class Diagram Example get_version() get_entry_id_list() get_entry_modification_dates() native_formats_supported() get_native_entry_representation() EntryFactory EntryIdList * EntryId IdentifierModificationDateList Entry_id : EntryId date: TimeBase::TimeT ModificationDate *
Research Collaboratory for Structural Bioinformatics UML Class Diagram Basics method1() method2() method3() Class_Name var1: Type var2: Type Underlined for Class Instances, Italics for Abstract Classes Variables Methods Details may be omitted if not important
Research Collaboratory for Structural Bioinformatics UML Relationships * * 0..1 Dependency Association Generalization (Inheritance) Aggregation
Research Collaboratory for Structural Bioinformatics UML Example get_version() get_entry_id_list() get_entry_modification_dates() native_formats_supported() get_native_entry_representation() EntryFactory EntryIdList * EntryId IdentifierModificationDateList Entry_id : EntryId Date : TimeBase::TimeT ModificationDate *
Research Collaboratory for Structural Bioinformatics XMI: XML Metadata Interchange n UML is a graphical representation; need some way to exchange UML models between applications n XMI is used to store and transmit UML models n XML based n Defines XML tags for classes, relationships between classes etc.
Research Collaboratory for Structural Bioinformatics OMG MDA n Platform Independent Models (PIMs) that define the interface are defined in UML n The PIMs are translated to Platform Specific Models (PSMs) such as Corba, SOAP,.NET or XML Schemas n The Corba servers and clients may be the same, but now the interface is defined in UML and the IDL is then generated from the UML
Research Collaboratory for Structural Bioinformatics MDA Platform Independent to Platform Dependent Translation UML Corba SOAPXML.NET
Research Collaboratory for Structural Bioinformatics Thanks and Acknowledgments Phil Bourne John Westbrook David Benton Karl Konnerth Lynn TenEyck