This Briefing is: UNCLASSIFIED Aha! Analytics 2278 Baldwin Drive Phone: (937) , FAX: (866) Dave Lush, Senior SME Aha! Analytics Semantic Integration Layer for The As-Is Enterprise Data Warehouse
UNCLASSIFIED 2 Purpose(s) Communicate Some Observations About the General Data Integration Problem Cite and Discuss the Semantic Technologies Propose a Semantic Data Integration Layer for the General Data Warehouse Architecture Discuss a Lexis Nexus SSI Data Analytics Supercomputer (DAS) Based Solution Present Initial Thoughts on the Plan
UNCLASSIFIED 3 Topics Purpose Background Givens/Problems/Tasks Approaches to Data/Info Integration Semantic Technologies General Solution Architecture LNSSI DAS Based Solution Architecture Thoughts On the Plan
UNCLASSIFIED 4 Background Data Integration Problems Application and Enterprise Model Based Approaches Data Integration Problems Persist Not Adequately Leveraging Available Metadata Need for Improved Discovery and Semantic Integration Emergence of Semantic Technologies Emergence of LNSSI DAS Capability
UNCLASSIFIED 5 The Primary Givens/Problem/Task Givens: A Collection of Disparate Legacy Databases Perhaps Already Migrated to an Enterprise Data Warehouse Each with Own Independently Developed Logical Data Model and Query Interface The Requirement To Pose Single Unified Queries Across The Collection Of Legacy Databases And Achieve Semantically Consistent (Coherent) Results The Problem: Difficulties in Achieving Useful Results Because of Unresolved Semantic Disconnects in the Disparate Logical Models Note: The Problem Is Not Primarily One of Discovery of Relevant Already Existing Product Objects But Rather One of Discovering and Semantically Integrating Requisite Product Content From Multiple Sources The Task at Hand: Define, Design, and Implement a Capability for the Semantic Integration and Unified Query of the Collection of Disparate Legacy Databases to Achieve Semantically Coherent Results
UNCLASSIFIED 6 Basic Data/Info Integration Approaches Application Centric Approach Do It All in the Application Layer Via Ad Hoc Hand Coding This Is Very Expensive And Difficult! Enterprise Information Model and Data Warehouse Approach Do It Via EDW ETL Methods/Tools in Context of Strict Conformance with Overarching Enterprise Info Model This Is Also Very Expensive And Difficult And Requires Great Discipline! Enterprise Information Integration (EII) Approach Establish Common Single View of Disparate Legacy Sources Process/Parse Common Domain-wide Queries into Individual Legacy Source Queries and Execute Source Queries Integrate Source Query Results into Unified Response to the Domain-wide Query
UNCLASSIFIED 7 The Basic Data Integration Challenge Data Interface Legacy Databases Application Data Interface The application must process the unified query, formulate and submit associated queries against the disparate databases, and properly integrate the results into a unified response. This requires that the application handle disparate data interfaces. and that the application contain the necessary semantics regarding the problem domain and the relationships/mappings between problem domain and legacy data models, and the code that accomplishes the mappings. Logical models for these databases were generally developed independently of each other.
UNCLASSIFIED 8 The Enterprise Data Warehouse Approach Enterprise Data Warehouse Application Data Warehouse Services Layer The legacy databases are migrated to a data warehouse in the context of an overarching enterprise data model so that the logical data models for the individual databases are semantically consistent with the overall model. The application still must process the unified query, formulate and submit associated queries against the disparate warehouse databases, and properly integrate the results into a unified response. But this process in theory shouldn’t have serious semantic inconsistency problem because the individual logical databases in the warehouse are supposed to have logical models which are consistent with an over arching enterprise information model. target data Logical models for these databases are consistent with overarching domain model Extract Transform Load (ETL) Services Common Enterprise Model & Meta-data Meta-Data Mgt Tool source data
UNCLASSIFIED 9 Problems Ensue The Imperative to Abide by a Standard Global Data Model Does Not Prevail Stove Piped DBs Abound Semantics of the Stove Pipes Are Inconsistent Federated Queries Yield Semantically Inconsistent Results Cannot Replace/Re-engineer Legacy DBs Housed in the EDW Cannot Replace the EDW Platform (e.g.Teradata) In Use Today
UNCLASSIFIED 10 New Imperative Must Have Some Effective Way to Semantically Integrate the Information Acquired from the Multiplicity of Databases
UNCLASSIFIED 11 Semantic Integration Use Semantic Technologies in Context of the EII Approach (cited previously) Unified Ontology of Current Situation/View Is Developed and Expressed in OWL or Appropriate Successor Language Semantic Relationships Between Legacy Data and Rules for Transformation From Legacy to Current View Are Specified and Captured Via OWL or Appropriate Successor Language Queries in Terms of Current Unified View Are Parsed and Transformed Into Queries of Legacy Sources by a Semantic Query Engine. Individual Legacy Source Queries Are Executed. Results Are Transformed and Processed Into a Unified Response by the Semantic Mash-up Engine.
UNCLASSIFIED 12 Semantic Technologies Rapidly Maturing with Very Noteworthy Applications Enhanced Knowledge Discovery Data/Knowledge Integration Foundational Semantic Technology Constructs Ontology: Machine Readable Specification of the Essence of a Given Domain Machine Readable Knowledge/Facts Machine Readable Rules Standard Language(s) for Expressing the Above XML, RDF, RDFS, OWL, RuleML RDF Triple Store Capabilities for Storing the Above Standard Query Languages for Searching the Above SPARQL Open Source Semantic Application Frameworks Commercial Capabilities Oracle Semantic Technologies TopQuadrant Metatomix Ontoprise
UNCLASSIFIED 13 The General Solution Architecture Semantic Layer Between Apps and Data Unifying Domain Ontology Linkage Ontology OWL/RDF Data Management Semantic Query Engine Semantic Mash-up Semantic Tech Architecture and Building Blocks RDF(S), OWL, RuleML Jena Oracle Semantic Technologies Semantic Application Development Environment (e.g. Top Quadrant)
UNCLASSIFIED 14 The Semantic Approach Enterprise Data Warehouse Application Data Warehouse Services Layer The legacy databases have been migrated to the data warehouse independently each with their own logical model. The overall domain has a robust domain ontology. There are linking ontology and rules which relate & map the domain ontology to the underlying logical models. The application captures and submits the unified query to the semantic engine. The semantic engine processes the query to a standard semantic form and then applies the ontologies and rules to formulate the requisite queries against the individual databases. The individual queries are submitted and the individual responses are received by the semantic mash-up service which uses the available semantic data including the query to create an integrated semantically consistent result for the original query. Domain Ontologies & Rules Ontology & Rule Authoring Tools Semantic Layer Semantic Query Engine Semantic Query Results Mash-Up Domain, Legacy, & Derived Facts Linking Ontologies & Rules OWL/RDF DBMS
UNCLASSIFIED 15 Typical EDW Architecture Does Not Have a Data Integration Layer This Is a Problem If Total Discipline in Conforming to Enterprise Model Is Not Exercised And It Has Not Been Exercised Legacy Databases Independently Migrated Accomplishing Data Integration in the Application Layer Is Difficult and Expensive
UNCLASSIFIED 16 Data Access Tier (ODBC/JDBC) EDW Data Tier Pre-Generated Cache Web Reports & Charts (Output = HTML, XML, PDF, XLS, DOC, other) Presentation Transformation Tier Users Power Users AF Portal / AF COP (Presentation Containers = RIA, iFrame, HTML, WSRP Portlets, other) Presentation Tier Business Intelligence Tools Cognos BOBJ Other (Siebel, MS) Application Tier AJAX Current EDW Architecture Figure 4: Layered EDW Architecture Key Observations The as-Is architecture does not include an explicit knowledge mgt or semantics layer. This is a problem because the effectiveness of the as-is EDW depends on its ability to resolve semantic disconnects between the databases tthat must be queried. Building these capabilities into the application layer code is very difficult and costly and not responsive to highly dynamic situations.
UNCLASSIFIED 17 EDW Architecture with Semantic Layer Data Access Tier (ODBC/JDBC) EDW Data Tier Pre-Generated Cache Web Reports & Charts (Output = HTML, XML, PDF, XLS, DOC, other) Presentation Transformation Tier User s Power Users AF Portal / AF COP (Presentation Containers = RIA, iFrame, HTML, WSRP Portlets, other) Presentation Tier Business Intelligence Tools Cognos BOBJ Other (Siebel, MS) Application Tier AJAX Semantic Tier Semantic Tools Figure 5: EDW Architecture with Semantic Layer Domain Ontologies & Rules Ontology & Rules Authoring Tools Semantic Layer Semantic Query Engine Semantic Query Results Mash-Up Domain, Legacy, & Derived Facts Linking Ontologies & Rules OWL/RDF DBMS Key Observations The to-be architecture includes a semantics layer which mediates between the domain query and the data layer. The semantic layer includes the domain ontology and linkage ontologies which drive the processing of domain queries and the semantic mashup of individual query results coming from the source databases.
UNCLASSIFIED 18 A Major Obstacle: Computational Complexity Many Operations of the Semantic Layer Are Computationally Intensive Complex Queries Across Multiple Large Data Sources Are Computationally Intensive Some Kind of Specialized Solution to Execution of the Semantic Operations and Multiple Source Queries Is Required
UNCLASSIFIED 19 Hypothesis The LNSSI DAS Capability Can Be Brought to Bear on Execution of the Semantic Operations and Of Course the Source Queries As Well with Significant Benefits So the Big Question Is: Can the LNSSI DAS Be Applied to Large Ontologies, Rule Sets, and Large Data to Provide a Very High Performance Semantic Query Engine?
UNCLASSIFIED 20 The LNSSI DAS Based Solution Architecture Semantic Layer Federated Domain Ontologies Transformation Rules Semantic Engines LNSSI DAS Used in Two Contexts: Semantic Ops in the Semantic Layer Source Queries at the Data Services Level
UNCLASSIFIED 21 Enterprise Data Warehouse Query Application LNSSI DAS Services Layer The legacy databases have been migrated to the data warehouse independently each with their own logical model. The overall domain has a robust domain ontology. There are linking ontology and rules which relate & map the domain ontology to the underlying logical models. The application captures and submits the unified query to the semantic engine. The semantic engine processes the query to a standard semantic form and then applies the ontologies and rules to formulate the requisite queries against the individual databases. The individual queries are submitted and the individual responses are received by the semantic mash-up service which uses the available semantic data including the query to create an integrated semantically consistent result for the original query. Domain Ontologies & Rules Ontology & Rule Authoring Tools Semantic Layer Services Semantic Update/Query Engine Semantic Query Results Mash-Up Domain, Legacy, & Derived Facts Linking Ontologies & Rules LNSSI DAS Based OWL/RDF Query The LNSSI DAS Based Solution Architecture RDMS Based OWL/RDF DBMS
UNCLASSIFIED 22 What To Do? Lets Execute a Prototype Project To Test the Hypothesis That the LNSSI DAS Can Be Successfully Brought to Bear On Large Data Integration Problems Requiring Semantic Integration
UNCLASSIFIED 23 General Approach Find a Sponsor with Requisite $ Form Appropriate Team and Agreements Initiate a Prototype Project Apply Semantic Technologies Ontology, RFD(S), OWL RDF Triple Store, SPARQL Inference Engine(s) JENA Leverage LNSSI DAS Architecture and Capability Create an LNSSI Semantic Integration Platform Find a Benchmark Problem for Which There Is Already Data, Associated Semantics, and Existing Query Performance Data
UNCLASSIFIED 24 Major Activities Project Initiation Initial Analysis, Technology Research, and Knowledge Engineering CONOPS and System Requirements Development/Specification Detailed Program/Project Management Knowledge Engineering Domain Ontology Acquisition/Development Ontology Legacy Data Relationship, Mapping, and Rule Acquisition/Development Acquisition/Development of the Underlying Data Architecture Development and System Design Detailed Apps Requirements/Design Inclusion of RDF/OWL Data Mgt Inclusion of Semantic Query Engine Development of Semantic Mashup Capabilities Implementation Planning Implementation Test and Eval
UNCLASSIFIED 25