Presentation is loading. Please wait.

Presentation is loading. Please wait.

This Briefing is: UNCLASSIFIED Aha! Analytics 2278 Baldwin Drive Phone: (937) 477-2983, FAX: (866) 450-3812 1 Dave Lush, Senior SME Aha! Analytics Semantic.

Similar presentations


Presentation on theme: "This Briefing is: UNCLASSIFIED Aha! Analytics 2278 Baldwin Drive Phone: (937) 477-2983, FAX: (866) 450-3812 1 Dave Lush, Senior SME Aha! Analytics Semantic."— Presentation transcript:

1 This Briefing is: UNCLASSIFIED Aha! Analytics 2278 Baldwin Drive Phone: (937) 477-2983, FAX: (866) 450-3812 1 Dave Lush, Senior SME Aha! Analytics Semantic Integration Layer for The As-Is Enterprise Data Warehouse

2 UNCLASSIFIED 2 Purpose(s)  Communicate Some Observations About the General Data Integration Problem  Cite and Discuss the Semantic Technologies  Propose a Semantic Data Integration Layer for the General Data Warehouse Architecture  Discuss a Lexis Nexus SSI Data Analytics Supercomputer (DAS) Based Solution  Present Initial Thoughts on the Plan

3 UNCLASSIFIED 3 Topics  Purpose  Background  Givens/Problems/Tasks  Approaches to Data/Info Integration  Semantic Technologies  General Solution Architecture  LNSSI DAS Based Solution Architecture  Thoughts On the Plan

4 UNCLASSIFIED 4 Background  Data Integration Problems  Application and Enterprise Model Based Approaches  Data Integration Problems Persist  Not Adequately Leveraging Available Metadata  Need for Improved Discovery and Semantic Integration  Emergence of Semantic Technologies  Emergence of LNSSI DAS Capability

5 UNCLASSIFIED 5 The Primary Givens/Problem/Task  Givens:  A Collection of Disparate Legacy Databases Perhaps Already Migrated to an Enterprise Data Warehouse Each with Own Independently Developed Logical Data Model and Query Interface  The Requirement To Pose Single Unified Queries Across The Collection Of Legacy Databases And Achieve Semantically Consistent (Coherent) Results  The Problem:  Difficulties in Achieving Useful Results Because of Unresolved Semantic Disconnects in the Disparate Logical Models  Note: The Problem Is Not Primarily One of Discovery of Relevant Already Existing Product Objects But Rather One of Discovering and Semantically Integrating Requisite Product Content From Multiple Sources  The Task at Hand:  Define, Design, and Implement a Capability for the Semantic Integration and Unified Query of the Collection of Disparate Legacy Databases to Achieve Semantically Coherent Results

6 UNCLASSIFIED 6 Basic Data/Info Integration Approaches  Application Centric Approach  Do It All in the Application Layer Via Ad Hoc Hand Coding  This Is Very Expensive And Difficult!  Enterprise Information Model and Data Warehouse Approach  Do It Via EDW ETL Methods/Tools in Context of Strict Conformance with Overarching Enterprise Info Model  This Is Also Very Expensive And Difficult And Requires Great Discipline!  Enterprise Information Integration (EII) Approach  Establish Common Single View of Disparate Legacy Sources  Process/Parse Common Domain-wide Queries into Individual Legacy Source Queries and Execute Source Queries  Integrate Source Query Results into Unified Response to the Domain-wide Query

7 UNCLASSIFIED 7 The Basic Data Integration Challenge Data Interface Legacy Databases Application Data Interface The application must process the unified query, formulate and submit associated queries against the disparate databases, and properly integrate the results into a unified response. This requires that the application handle disparate data interfaces. and that the application contain the necessary semantics regarding the problem domain and the relationships/mappings between problem domain and legacy data models, and the code that accomplishes the mappings. Logical models for these databases were generally developed independently of each other.

8 UNCLASSIFIED 8 The Enterprise Data Warehouse Approach Enterprise Data Warehouse Application Data Warehouse Services Layer The legacy databases are migrated to a data warehouse in the context of an overarching enterprise data model so that the logical data models for the individual databases are semantically consistent with the overall model. The application still must process the unified query, formulate and submit associated queries against the disparate warehouse databases, and properly integrate the results into a unified response. But this process in theory shouldn’t have serious semantic inconsistency problem because the individual logical databases in the warehouse are supposed to have logical models which are consistent with an over arching enterprise information model. target data Logical models for these databases are consistent with overarching domain model Extract Transform Load (ETL) Services Common Enterprise Model & Meta-data Meta-Data Mgt Tool source data

9 UNCLASSIFIED 9 Problems Ensue  The Imperative to Abide by a Standard Global Data Model Does Not Prevail  Stove Piped DBs Abound  Semantics of the Stove Pipes Are Inconsistent  Federated Queries Yield Semantically Inconsistent Results  Cannot Replace/Re-engineer Legacy DBs Housed in the EDW  Cannot Replace the EDW Platform (e.g.Teradata) In Use Today

10 UNCLASSIFIED 10 New Imperative  Must Have Some Effective Way to Semantically Integrate the Information Acquired from the Multiplicity of Databases

11 UNCLASSIFIED 11 Semantic Integration  Use Semantic Technologies in Context of the EII Approach (cited previously)  Unified Ontology of Current Situation/View Is Developed and Expressed in OWL or Appropriate Successor Language  Semantic Relationships Between Legacy Data and Rules for Transformation From Legacy to Current View Are Specified and Captured Via OWL or Appropriate Successor Language  Queries in Terms of Current Unified View Are Parsed and Transformed Into Queries of Legacy Sources by a Semantic Query Engine.  Individual Legacy Source Queries Are Executed.  Results Are Transformed and Processed Into a Unified Response by the Semantic Mash-up Engine.

12 UNCLASSIFIED 12 Semantic Technologies  Rapidly Maturing with Very Noteworthy Applications  Enhanced Knowledge Discovery  Data/Knowledge Integration  Foundational Semantic Technology Constructs  Ontology: Machine Readable Specification of the Essence of a Given Domain  Machine Readable Knowledge/Facts  Machine Readable Rules  Standard Language(s) for Expressing the Above  XML, RDF, RDFS, OWL, RuleML  RDF Triple Store Capabilities for Storing the Above  Standard Query Languages for Searching the Above  SPARQL  Open Source Semantic Application Frameworks  Commercial Capabilities  Oracle Semantic Technologies http://www.oracle.com/technology/tech/semantic_technologies/index.html http://www.oracle.com/technology/tech/semantic_technologies/index.html  TopQuadrant  Metatomix  Ontoprise

13 UNCLASSIFIED 13 The General Solution Architecture  Semantic Layer Between Apps and Data  Unifying Domain Ontology  Linkage Ontology  OWL/RDF Data Management  Semantic Query Engine  Semantic Mash-up  Semantic Tech Architecture and Building Blocks  RDF(S), OWL, RuleML  Jena  Oracle Semantic Technologies  Semantic Application Development Environment (e.g. Top Quadrant)

14 UNCLASSIFIED 14 The Semantic Approach Enterprise Data Warehouse Application Data Warehouse Services Layer The legacy databases have been migrated to the data warehouse independently each with their own logical model. The overall domain has a robust domain ontology. There are linking ontology and rules which relate & map the domain ontology to the underlying logical models. The application captures and submits the unified query to the semantic engine. The semantic engine processes the query to a standard semantic form and then applies the ontologies and rules to formulate the requisite queries against the individual databases. The individual queries are submitted and the individual responses are received by the semantic mash-up service which uses the available semantic data including the query to create an integrated semantically consistent result for the original query. Domain Ontologies & Rules Ontology & Rule Authoring Tools Semantic Layer Semantic Query Engine Semantic Query Results Mash-Up Domain, Legacy, & Derived Facts Linking Ontologies & Rules OWL/RDF DBMS

15 UNCLASSIFIED 15 Typical EDW Architecture  Does Not Have a Data Integration Layer  This Is a Problem If Total Discipline in Conforming to Enterprise Model Is Not Exercised  And It Has Not Been Exercised  Legacy Databases Independently Migrated  Accomplishing Data Integration in the Application Layer Is Difficult and Expensive

16 UNCLASSIFIED 16 Data Access Tier (ODBC/JDBC) EDW Data Tier Pre-Generated Cache Web Reports & Charts (Output = HTML, XML, PDF, XLS, DOC, other) Presentation Transformation Tier Users Power Users AF Portal / AF COP (Presentation Containers = RIA, iFrame, HTML, WSRP Portlets, other) Presentation Tier Business Intelligence Tools Cognos BOBJ Other (Siebel, MS) Application Tier AJAX Current EDW Architecture Figure 4: Layered EDW Architecture Key Observations The as-Is architecture does not include an explicit knowledge mgt or semantics layer. This is a problem because the effectiveness of the as-is EDW depends on its ability to resolve semantic disconnects between the databases tthat must be queried. Building these capabilities into the application layer code is very difficult and costly and not responsive to highly dynamic situations.

17 UNCLASSIFIED 17 EDW Architecture with Semantic Layer Data Access Tier (ODBC/JDBC) EDW Data Tier Pre-Generated Cache Web Reports & Charts (Output = HTML, XML, PDF, XLS, DOC, other) Presentation Transformation Tier User s Power Users AF Portal / AF COP (Presentation Containers = RIA, iFrame, HTML, WSRP Portlets, other) Presentation Tier Business Intelligence Tools Cognos BOBJ Other (Siebel, MS) Application Tier AJAX Semantic Tier Semantic Tools Figure 5: EDW Architecture with Semantic Layer Domain Ontologies & Rules Ontology & Rules Authoring Tools Semantic Layer Semantic Query Engine Semantic Query Results Mash-Up Domain, Legacy, & Derived Facts Linking Ontologies & Rules OWL/RDF DBMS Key Observations The to-be architecture includes a semantics layer which mediates between the domain query and the data layer. The semantic layer includes the domain ontology and linkage ontologies which drive the processing of domain queries and the semantic mashup of individual query results coming from the source databases.

18 UNCLASSIFIED 18 A Major Obstacle: Computational Complexity  Many Operations of the Semantic Layer Are Computationally Intensive  Complex Queries Across Multiple Large Data Sources Are Computationally Intensive  Some Kind of Specialized Solution to Execution of the Semantic Operations and Multiple Source Queries Is Required

19 UNCLASSIFIED 19 Hypothesis  The LNSSI DAS Capability Can Be Brought to Bear on Execution of the Semantic Operations and Of Course the Source Queries As Well with Significant Benefits  So the Big Question Is:  Can the LNSSI DAS Be Applied to Large Ontologies, Rule Sets, and Large Data to Provide a Very High Performance Semantic Query Engine?

20 UNCLASSIFIED 20 The LNSSI DAS Based Solution Architecture  Semantic Layer  Federated Domain Ontologies  Transformation Rules  Semantic Engines  LNSSI DAS Used in Two Contexts:  Semantic Ops in the Semantic Layer  Source Queries at the Data Services Level

21 UNCLASSIFIED 21 Enterprise Data Warehouse Query Application LNSSI DAS Services Layer The legacy databases have been migrated to the data warehouse independently each with their own logical model. The overall domain has a robust domain ontology. There are linking ontology and rules which relate & map the domain ontology to the underlying logical models. The application captures and submits the unified query to the semantic engine. The semantic engine processes the query to a standard semantic form and then applies the ontologies and rules to formulate the requisite queries against the individual databases. The individual queries are submitted and the individual responses are received by the semantic mash-up service which uses the available semantic data including the query to create an integrated semantically consistent result for the original query. Domain Ontologies & Rules Ontology & Rule Authoring Tools Semantic Layer Services Semantic Update/Query Engine Semantic Query Results Mash-Up Domain, Legacy, & Derived Facts Linking Ontologies & Rules LNSSI DAS Based OWL/RDF Query The LNSSI DAS Based Solution Architecture RDMS Based OWL/RDF DBMS

22 UNCLASSIFIED 22 What To Do?  Lets Execute a Prototype Project  To Test the Hypothesis That the LNSSI DAS Can Be Successfully Brought to Bear On Large Data Integration Problems Requiring Semantic Integration

23 UNCLASSIFIED 23 General Approach  Find a Sponsor with Requisite $  Form Appropriate Team and Agreements  Initiate a Prototype Project  Apply Semantic Technologies  Ontology, RFD(S), OWL  RDF Triple Store, SPARQL  Inference Engine(s)  JENA  Leverage LNSSI DAS Architecture and Capability  Create an LNSSI Semantic Integration Platform  Find a Benchmark Problem for Which There Is Already Data, Associated Semantics, and Existing Query Performance Data

24 UNCLASSIFIED 24 Major Activities  Project Initiation  Initial Analysis, Technology Research, and Knowledge Engineering  CONOPS and System Requirements Development/Specification  Detailed Program/Project Management  Knowledge Engineering  Domain Ontology Acquisition/Development  Ontology Legacy Data Relationship, Mapping, and Rule Acquisition/Development  Acquisition/Development of the Underlying Data  Architecture Development and System Design  Detailed Apps Requirements/Design  Inclusion of RDF/OWL Data Mgt  Inclusion of Semantic Query Engine  Development of Semantic Mashup Capabilities  Implementation Planning  Implementation  Test and Eval

25 UNCLASSIFIED 25


Download ppt "This Briefing is: UNCLASSIFIED Aha! Analytics 2278 Baldwin Drive Phone: (937) 477-2983, FAX: (866) 450-3812 1 Dave Lush, Senior SME Aha! Analytics Semantic."

Similar presentations


Ads by Google