Download presentation
Presentation is loading. Please wait.
Published byAllison Campbell Modified over 9 years ago
1
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego
2
2 Overview TSIMMIS’ goals, technical challenges, and solutions Insufficiencies of the TSIMMIS’ framework Going forward
3
3 Information Resides on Heterogeneous Information Sources different interfaces different data representations redundant and conflicting information WWW Ticker Tape Personal database Dialog
4
4 Goal: System Providing Integrated View of Heterogeneous Data Integration System WWW Personal database collects and combines information provides integrated view, uniform user interface Ticker Tape Dialog
5
5 The Wrapper and Mediator Architecture Mediator Wrapper Client business reports portfolios for each company stock market prices Ticker Tape Dialog Common Data Model
6
6 The Data Warehousing Approach to Integration Mediator Wrapper Client Ticker Tape Dialog Stored Integrated View
7
7 The Lazy Integration Approach Mediator Wrapper Client IBM portfolio IBM price IBM related reports (in common model) IBM related reports Ticker Tape Dialog Query Decomposition, Translation and Result Fusion
8
8 Mediator Client Wrapper Wrappers & Mediators from High-Level Specifications Mediator Specification Interpreter Wrapper Generator Wrapper Specification Mediator Specification Source
9
9 Challenge: Sources Without a Well- Structured Schema semistructured –irregular –deeply nested –cross-referenced incomplete schema knowledge –autonomous –dynamic HTML pages SGML documents genome data chemical structures bibliographic information results of the integration process Examples
10
10 Challenge: Different and Limited Source Capabilities Client Wrapper (A) Wrapper (B) Mediator (U = A + B) retrieve IBM data
11
11 Mediator has to Adapt to Query Capabilities of Sources Client Wrapper (A) Wrapper (B) Mediator (U = A + B) retrieve everything retrieve IBM data (A) does not allow selection
12
12 Part B Semistructured Data Representation Mediator Generation Wrapper Generation Capabilities-Based Rewriting
13
13 Representation of Semistructured Information using OEM semantic object-id label Atomic Value Set Value structural object-id
14
14 Graph Representation of OEM Data faculty first_name “John” last_name “Doe” rank “professor” http://www/~doe
15
15 OEM Structures Represent Arbitrary Labeled Graphs faculty first_name “John” last_name “Doe” rank “professor” http://www/~doe faculty name “Mary Smith” project “Air DB” paper author name “John Doe” author name “Mary Smith” title “Thin Air DB” http://www/~smith
16
16 Overview Semistructured Data Representation Mediator Generation Example of mediator specification Language expressiveness Implementation and performance Wrapper Generation Capabilities-Based Rewriting
17
17 Merge Information Relating to a Faculty person name “John Doe” birthday “April 1” s2 faculty name “John Doe” rank “professor” papers... s1 faculty name “John Doe” rank “professor” birthday “April 1” papers...
18
18 Mediator Specification Example person name “John Doe” birthday “April 1” s2 }> :- }>@s1 }> :- }>@s2 faculty name “John Doe” rank “professor” papers... s1 faculty name “John Doe” rank “professor” birthday “April 1” papers...
19
19 Mediator Specification Example: Semantics of Rule Bodies }> :- }>@s1 }> :- }>@s2 person name “John Doe” birthday “April 1” s2 faculty name “John Doe” rank “professor” birthday “April 1” papers... faculty name “John Doe” rank “professor” papers... s1
20
20 Mediator Specification Example: Semantics of Rule Heads }> :- }>@s1 }> :- }>@s2 person name “John Doe” birthday “April 1” s2 “John Doe” faculty name “John Doe” rank “professor” birthday “April 1” papers... faculty name “John Doe” rank “professor” papers... s1
21
21 Incrementally Add to Semantically Identified Object }> :- }>@s1 }> :- }>@s2 faculty name “John Doe” rank “professor” papers... s1 person name “John Doe” birthday “April 1” s2 “John Doe” faculty name “John Doe” rank “professor” birthday “April 1” papers...
22
22 Irregularities & Incomplete Schema Knowledge }> :- }>@s1 faculty name “John Doe” rank “professor” papers faculty name “Mary Smith” project “Air DB” s1 person name “John Doe” birthday “April 1” s2 faculty name “John Doe” rank “professor” birthday “April 1” papers faculty name “Mary Smith” project “Air DB” “John Doe” “Mary Smith”
23
23 Second Rule Attaches More Subobjects to View Objects }> :- }>@s1 }> :- }>@s2 faculty name “John Doe” rank “professor” papers... s1 “John Doe” faculty name “John Doe” rank “professor” birthday “April 1” papers... person name “John Doe” birthday “April 1” s2
24
24 Language Expressiveness Information fusion problems solved by MSL –Irregularities –Incomplete knowledge of source structure –Transformation of cross-referenced structures –Inconsistent and redundant data –Use of arbitrary matching criteria Theoretical analysis of expressiveness –Consider the relational representation of OEM graphs. Then MSL is equivalent to “SQL + special form of transitive closure”
25
25 faculty name “John Doe” rank “associate” Inconsistent and Redundant Information }> :- }>@s1 }> :- }>@s2 AND NOT }>@s1 person name “John Doe” rank “assistant” s1s2 “John Doe” faculty name “John Doe” rank “associate” rank “assistant”
26
26 Overview Semistructured Data Representation Mediator Generation Example of mediator specification Language expressiveness Implementation and performance Wrapper Generation Capabilities-Based Rewriting
27
27 Mediator Specification Interpreter Architecture Query Rewriter Cost-Based Optimizer Datamerge Engine Mediator Specification Query logical datamerge program plan Result Queries to Wrappers Results
28
28 Query Rewriting When Known Origins of Information }> :- :- }>@s1 }> :- }>@s2 }> :- }> AND X>65000
29
29 Query Rewriter Pushes Conditions to Sources }> :- :- }>@s1 }> :- }>@s2 }> :- }> AND X>65000 logical datamerge program }> :- ( }> AND X>65000)@s1 AND }>@s2
30
30 :- <person { }> Passing Bindings & Local Join Plans Passing Bindings Local Join :- }> AND X>65000 :- <person { }> }>:- }> AND X>65000 N s1s2 s1s2
31
31 Query Decomposition When Unknown Origins of Information }> :- }> }> :- }>@s1 }> :- }>@s2
32
32 Plan Considers All Possible Sources of birthday }> :- }> }> :- }>@s1 }> :- }>@s2 name s2s1 name birthday
33
33 Overview Semistructured-Data Representation Mediator Generation Wrapper Generation Capabilities-Based Rewriting
34
34 Query Translation in Wrappers Source SELECT * FROM person WHERE name=“Smith” find -all find -n Smith Query Translator Result Translator Wrapper
35
35 Rapid Query Translation Using Templates and Actions Source SELECT * FROM person WHERE name=“Smith” find -all find -n Smith Template Interpreter Result Translator SELECT * FROM person {emit “find -all” } SELECT * FROM person WHERE name=$N {emit “find -n $N”}
36
36 Description of Infinite Sets of Supported Queries uses recursive nonterminals Example: –job description contains word w1 and word w2 and... –SELECT subset(person) FROM person WHERE \CJob \CJob : job LIKE $W AND \CJob \CJob : TRUE
37
37 Overview Semistructured-Data Representation Mediator Generation Wrapper Generation Capabilities-Based Rewriting
38
38 Wrapper Supported Queries Description Capabilities-Based Rewriter in Mediator Architecture Capabilities- Based Rewriter Query Rewriter Cost-Based Optimizer Datamerge Engine logical datamerge program supported plans optimal plan Mediator Specification Wrapper Supported Queries Description Query
39
39 Capabilities-Based Rewriter Finds Supported Plans Supported Queries SELECT * FROM A WHERE salary>65000 SELECT * FROM A
40
40 Capabilities-Based Rewriter Finds Most-Selective Supported Plans Supported Queries SELECT * FROM B WHERE salary>65000 SELECT * FROM B WHERE salary >65000
41
41 Capabilities-Based Rewriter Architecture Component SubQuery Discovery Plan Construction Plan Refinement Query Capabilities Description Component SubQueries Plans (not fully optimized) Query Algebraically optimal plans
42
42 What TSIMMIS Achieved system for integration of heterogeneous sources challenges and solutions –semistructured data & incomplete schema knowledge appropriate specification language and query processing algorithms –limited and different query capabilities query translation algorithm capabilities-based query rewriting algorithm
43
43 Overview TSIMMIS’ goals, technical challenges, and solutions Insufficiencies of the TSIMMIS’ framework Going forward
44
44 Insufficiencies of the TSIMMIS framework OEM was really unstructured data –some loose and partial schematic info may pay off tremendously too “databasy” user/mediator/source interaction
45
45 Overview TSIMMIS’ goals, technical challenges, and solutions Insufficiencies of the TSIMMIS’ framework Going forward
46
46 Web emerges as a Distributed DB and XML as its Data Model Data Source Native XML Database XML View Document(s) XML View Document(s) XML View Document(s) Also export: 1. Schemas & Metadata (XML-Data, RDF,…) 2. Description of supported queries Wrapper Legacy Source XMAS Query Language
47
47 Definition of Integrated Views Data Source Data Source Data Source Mediator XML View Document(s) Integrated XML View Document(s) XML View Document(s) View Definition in XMAS
48
48 Non-Materialized Views in the MIX mediator system Blended Browsing & Querying (BBQ) GUI Application DOM for Virtual XML Doc’s MIX Mediator XMAS queryXML document DTD Inference Integrated View DTD XML Source Query Processor View Definition in XMAS Source DTD
49
49 RDB RDB2XML Wrapper DTD Inference Resolution Simplification Execution Unfolded Query Blended Browsing & Querying (BBQ) GUI MIX Mediator XMAS Mediator View Definition View DTD Translation to Algebra Optimization XML Document Fragments XMAS Query XML Source 1 XML Source 2 DTD XMAS Query XML Document Fragments DOM (VXD) Client API Application
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.