4/12/2015 7:49 AM Architecting an RDF/OWL based Enterprise Conformance and Compliance Registry at the National Cancer Institute Cecil O. Lynch, MD, MS UC Davis Pathology Informatics NCI Chief Semantic Architect Cecil O. Lynch, MD, MS UC Davis Pathology Informatics NCI Chief Semantic Architect
Page 2 Outline What will we cover in this talk? –NCI Semantic Infrastructure Version 2 caGRID 2.0 – What’s new in 2? Services Aware Interoperability Framework (SAIF) – What does it do for semantics? Enterprise Conformance and Compliance Framework Registry (ECCF registry) – What is it and how do we use it? –Biomedical Research Integrated Domain Group Model (BRIDG) – How is OWL used in this model? What is the future for BRIDG OWL development? –What is the impact of the work at NCI for the Semantic Web and Ontology community?
Page 3 Semantic Infrastructure V2 Overarching/Core Requirements Lower the barrier-to-entry for participation in caBIG ® caBIG ® 1.x is too heavily front-loaded. Provide a “linear value proposition” to all stakeholders Easy things should be easy to do. Support legacy data and functionality Next-generation caBIG ® is evolution, not revolution Leverage caBIG ® 1.x Lessons Learned Leverage technology and semantic progress in the larger scientific and commercial communities
Page 4 Semantic Infrastructure 2.0 from 50,000 feet... Design-time and run-time integration with caGrid 2.0 Artifact management (design-time and run-time) Meta-data-driven service discovery and governance CBIIT SAIF IG ECCF (including Conformance testing) Forms definition (e.g. CRFs) Decision support Semantically-aware workflow caBIG® Clinical Information Suite (caCIS) project CDISC CSHARE project
Page 5 caGRID 2.0 Lower barriers to entry for all stakeholders-scientists, clinicians, technologists, and informaticists-creating a working environment in which “easy things are easy to do” Leverage the increasingly mature collection of publicly available open source infrastructure and the expanding trends in user friendly platforms Continue to provide support and migration strategies for users of caGRID 1.X
Page 6 caGRID 1.x Work Flow Concepts representing the domain must exist in a terminology server (EVS). Common Data Elements (ISO11179), which use those concepts and controlled vocabulary along with other information, must exist for every class and attribute to be used in the object model. An object model that has every class and attribute annotated with CDEs must exist which represents the data types to be used. A schema must be generated that reflects how the object model will actually look when serialized to XML. The annotated object model must be submitted to NCI CBIIT for review and acceptance. The annotated model must have a corresponding physical data model that describes exactly which class and attributes go into which tables and rows. Once the model is approved, the caCore and caGrid development tools can be used to create and expose the grid service.
Page 7 caGRID 2.0 Interaction with SIV2 Data Representation and Information Models –referenced in SIV2 Data and metadata discovery – SPARQL endpoint to provide SIV2 access through REST Service Discovery and Utilization –both caGRID registry and interaction with service metadata in SIV2 Service Semantics – maintained in SIV2 Data Semantics – SIV2 function Data Discovery and Exploration – Query history acquired by caGRID, linkages instantiated in SIV2 Service Interface Mediation - shared responsibility High-Throughput Data and Computation – SIV2 to capture the metadata about the mapping of models to binary content data types improving service choreography
Page 8 Services Aware Interoperability Framework (SAIF)
Page 9 The Lens of SAIF (1): Contextualizing SAIF SAIF: intersection of SOA, MDA, CSI, Distributed Systems Architecture, and HL7 (e.g. HDF, Core Principles) provide goals, artifacts, portions of a methodology, and a framework for defining the HL7 EA, a robust, durable business-oriented set of constructs that provide extensibility, reuse, and governance. You are here (Vous êtes içi) Service Oriented Architecture Reference Model For Open Distributed Processing Model Driven Architecture Computable Semantic Interoperability Health Level 7 (Implementation Guide)
Page 10 The Lens of SAIF (2): Services-Oriented Architecture SOA (Services-Oriented Architecture) –SAIF is “services-aware,” i.e., not “just about services” Service awareness (at the architecture level) surfaces need for: –Behavioural Framework built around Contracts and Roles –Well-defined Conformance/Compliance Framework (ECCF) –Attention to “separation of concerns” (static vs behavioural) –Requirement for Governance (GF) –Technology-Independent specifications Conformance certified for each technology binding
Page 11 The Lens of SAIF (3): Model-Driven Architecture MDA (Model-Driven Architecture) enables –Levels of abstraction that layer complexity from Conceptual through Logical to Implementation Support/enforce SOA thinking Support partitioning of artifacts to layers of Conformance/Compliance Framework –Solid tooling support
Page 12 The Lens of SAIF (4): Computable Semantic Interoperability (CSI) CSI (Computable Semantic Interoperability) –Pillar #1: Common Model across all domains-of-interest Multiple domains one or more domain analysis models Common Model Components Universally applied Static and Behavioural (“dynamic”) semantics –Pillar #2: Elements from Model(s) #1 bound to robust data type specification (e.g. ADT, ISO 21090) –Pillar #3: Methodology for binding terms from concept- based terminologies –Pillar #4: A formally-defined process for specifying the static and behavioral semantics for WI scenarios
Page 13 The Lens of SAIF (5): RM-ODP (1) ISO Standard (RM – ODP, ISO/IEC IS | ITU-T X.900 ) SAIF uses the Reference Model for Open Distributed Processing (RM-ODP) as its lingua franca categorize the various artifacts –Five non-orthogonal, non-hierarchical Viewpoints in which Conformance Assertions are made or validated Conformance Statements made (Conformance asserted): –Enterprise/ Business VP –Informational VP –Computational VP –Engineering VP –Technology VP –Conformance Statements validated via Conformance testing of Implementation-specific Conformance Assertions made against Conformance Statements.
Page 14 RM-ODP (2) ISO Standard (RM – ODP, ISO/IEC IS | ITU-T X.900 ) Why? True? Where? How? What? SAIF Specification Stack is made up of Conformance Statements and Compliance Validations. In SAIF, the artifacts are constructed via Constraint Patterns sorted by RM-ODP Viewpoints.
Page 15 RM-ODP (3) ISO Standard (RM – ODP, ISO/IEC IS | ITU-T X.900 ) RM-ODP Viewpoints are –Non-hierarchical –Non-orthogonal –Each Viewpoint can (and often will) contain a hierarchy of layered information Information Business / Enterprise Computational Engineering Technology
Page 16 The Lens of the SAIF (6): Health Level 7 SAIF takes a number of enterprise architecture best practices / approaches and contextualizes them to HL7 including –Use of existing HL7 artifacts Core Principles HDF RMIMs etc. –Awareness of HL7 business context –Dedication to HL7 Mission and Goals RE Working Interoperability
Page 17 Enterprise Conformance and Compliance Framework
Page 18 caBIG® Compatibility Guidelines: Today Compatibility Guidelines Today Platform Specific Annotated Models (CDE’s) Service Interfaces
Page 19 caBIG® Interoperability Specifications: Tomorrow Conformance Guidelines Tomorrow Layered Specifications Testable Conformance Behavioral Semantics Traceability Binding to standard models and data types
Page 20 CIM DAM PIM DIM PSM Vocabulary Information Framework
Page 21 Artifact Management (ECCF Registry) Manage lifecycle, governance and versioning of the models, content and derived forms (e.g. CRFs) –Establish and manage relationships and dependencies between models, content and, forms –Manage content provenance, jurisdiction, authority, and intellectual property –Tools to hide complexity of underlying semantic models –Support multiple representations/views of information Provide access control and other security constraints for content Define meta-data to enhance artifact/content discovery –Support usage scenarios and context for the information in the SI Support appropriate value set content and binding management –Value set queries, etc.
Page 22 caBIG® Clinical Information Suite (caCIS) as an SIV2 consumer
Page 23 caCIS Requirements CBIIT implementation of Service-Aware Interoperability Framework –ISO Data Types –HL7 Development Framework (HDF) Modeling/MDA Tooling –Clinical Document Architecture (CDA) Publishing of templates Evaluating semantics of CDA documents –ECCF-related Needs Modeling constructs to facilitate complete and valid system specification. Traceability across RM-ODP viewpoints and MDA layers. –RM-ODP Reference Model for Open Distributed Processing –MDA Model-Driven Architecture Formal expression of conformance assertions. Reasoning / Decision Support –Structured eligibility criteria –Adverse event reporting 23
Page 24 caCIS Metrics SVN Repository is 4.86 GB of data content around Analysis, Architecture, Development, QA and Deployment of the 100,023 Files organized into 41,489 Folders File types include Word documents, UML diagrams, XMI files, XML files, Visio diagrams, OWL files, JPEG images, HTML files, java code, Excel files, PDF documents, Cmaps, text files and others Increasingly difficult to find files of interest and contextual relevance Makes reuse difficult to impossible File relationships are limited to folder organization
Page 25 Proposed ECCF Metadata Management Apply DITA transforms to current SVN document artifacts to define high level metadata –Pilot testing has been completed and looks feasible Convert DITA headings to RDF triples Capture Dublin Core document metadata Query for common RDF statements to link objects as an automated first pass at linking artifacts Follow this with manual review for metrics
Page 26 Proposed ECCF Model Transforms All conceptual modeling is done in UML and these UML artifacts will be converted to OWL using the Eclipse eCore based EMF Ontology Definition Meta-model (EODM) that takes an EMF HL7 models are developed using the RMIM designer plug-in for Visio –Artifacts include the vsd file, an XML schema model and a MIF file –NCI is currently building an HL7 MIF to OWL converter that allows any V3 model to be precisely defined in OWL 2 capturing all constraints on the model as well as pointers to vocabulary bindings
Page 27 Proposed ECCF Conformance Testing Develop the OWL ontology for SAIF matrix representation to provide the ECCF meta-model Define the ECCF meta-model relations to the model artifact type Use the HL7 RIM MIF to OWL transform to define all classes as a model classification profile for constraint checking Use the standard OWL reasoners to classify new models according to the ECCF matrix meta-model Identify failed classification errors to inform users of compliance level
Page 28
Page 29 Biomedical Research Integrated Domain Group Model (BRIDG)
Page 30 BRIDG Project Stakeholders Clinical Data Interchange Standards Consortium (CDISC) HL7 Regulated Clinical Research Information Management Technical Committee (HL7 RCRIM WG) National Cancer Institute (NCI), including the Cancer Biomedical Informatics Grid (caBIG™) project Federal Drug Administration (FDA)
Page 31 BRIDG Project Goals Produce a shared view of the dynamic and static semantics of a common domain-of-interest, specifically the domain of protocol-driven research and its associated regulatory artifacts. Aid stakeholders and their communities to achieve computable semantic interoperability (CSI), i.e. the ability for information systems to exchange at a machine-to-machine level the meaning (rather than simply the structure) of data and/or to effectively combine functionality across machine/system boundaries Provide a shared view for multiple audiences and for multiple purposes through layering of the model as: – an abstract UML model friendly to the general community –An HL7 V3 model that expresses the UML with extensions to further refine the UML giving a more useful view to developers using the BRIDG model. –An OWL intermediate layer that allows precise mapping between the UML and HL7 V3 layers and allows reasoning across the models for classification and error testing.
Page 32 BRIDG Models Structure
Page 33 BRIDG UML View
Page 34 BRIDG HL7 V3 View
Page 35 BRIDG OWL View
Page 36 BRIDG OWL Today and Tomorrow Current OWL construction is complex, time consuming and costly There is always a lag between versions of BRIDG UML /RIM views and the OWL version due to the time to build the OWL model additions Current task has been approved and funded to build a MIF to OWL automated transform tool The tool will be generic to any MIF so will handle all RIM constructs Reasoning occurs on the V3 side as far as equivalent class structures, so most of the work is here and can be completely automated
Page 37 What is the impact of the work at NCI for the S e m a n t i c W e b a n d O n t o l o g y c o m m u n i t y ?
Page 38 What is the impact of the work at NCI for the S e m a n t i c W e b a n d O n t o l o g y c o m m u n i t y ? Impact is bidirectional - NCI is a consumer of the W3C and OMG specifications and feeds back to these communities by participating in the W3C Life sciences group, consuming NCBO ontologies All tooling and infrastructure built at NCI is open source and available without restrictions to all The BRIDG model is one of the most complicated ontologies from a reasoning perspective and has been shared with Clark and Parsia to aid in the development of future Pellet versions for reasoning optimization NCI Roadmaps for the Semantic Infrastructure and caGRID 2.0 are open and NCI would greatly appreciate feedback form the Semantic community
Page 39 Questions? or ntic+Infrastructure+2.0+Roadmap+Wikihttps://wiki.nci.nih.gov/display/CBIITseminfra/Sema ntic+Infrastructure+2.0+Roadmap+Wiki 0+Roadmap+Wikihttps://wiki.nci.nih.gov/display/CBIITtech/caGrid+2. 0+Roadmap+Wiki