Download presentation
Presentation is loading. Please wait.
1
Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu
2
Page 2 Outline Motivation and Background Two basic integration approaches: u global as view (GAV) u local as view (LAV) What is the open problem? How Unity is different Using Unity example Benefits and Contributions Future Work
3
Page 3 Motivation There are many integration environments: u Operational systems within an organization u System integration during company merger u Data warehouses, Intranets, and the WWW Users require information from many data sources which often do not work together.
4
Page 4 What is Integration? Two levels of integration: u Schema integration - the description of the data u Data integration - the individual data instances Integration handles the different mechanisms for storing data (structural conflicts), for referencing data (naming conflicts), and for attributing meaning to the data (semantic conflicts).
5
Page 5 Two Current Approaches The current state-of-the-art integration systems all can be reduced to a logical basis. u For this demo, assume the data is physically stored in the relational model and queried using Datalog. There are two basic "database" approaches to integration: u global as view approach - the extraction and integration of data is defined simulatenously with the global view definition ïTSIMMIS using Object Exchange Model (OEM) u local as view approach - pre-defines the global view and then defines what portion of the global view each local source provides ïInformation Manifold using description logic
6
BodyWorks Systems Web Server Custom Accounting Package Shipment Tracking Software Customer Order Database Invoice Database Shipment Database
7
BodyWorks Systems Web Server Custom Accounting Package Shipment Tracking Software Customer Order Database Invoice Database Shipment Database Question: Who has a complete picture of a customer's order, or the entire customer relatioship?
8
BodyWorks Systems Web Server Custom Accounting Package Shipment Tracking Software Customer Order Database Invoice Database Shipment Database Answer: No one, but management wants to know...
9
Data Warehouse Approach Order Database Invoice Database Shipment Database Gather Refine Aggregate Store Gather Refine Aggregate Store Gather Refine Aggregate Store Warehouse Features: - static, materialized view - performs data cleansing and aggregation - historical more than operational
10
Query-Driven Dynamic Approach Invoice Database Cust(id,name,addr,city,state,cty) Order(oid,cid,odate) OrdProd(oid,pid,amt,pr) Prod(id,name,pr,desc) Order Database Shipment Database Cust(id,name,addr,city,state,cty) Invoice(invId,custId,shipId,iDate) InvProd(invId,prodId,amt,pr) Prod(id,name,pr,desc) Cust(id,name,addr,city,state,cty) Shipment(shipid,oid,cid,shipdate) ShipProd(shipid,prodid,amt) Prod(id,name,pr,desc, inv) Wrapper mediator Features: - view dynamically built - data is extracted at query-time - still typically read-only
11
Page 11 Global as View Approach Define global objects by specifying how to extract their information from the local sources. Requires that the administrator defining the global view understand the semantics of every local data source. Further, if the local views or global views must be changed for whatever reason (such as adding a new data source), the global view must be re-compiled.
12
Page 12 Global as View Example Tsimmis MSL example extracting customer info: Equivalent SQL: }>@med :- customer { }@invoiceDB }>@med :- customer { }@orderDB }>@med :- customer { }@shipmentDB Union the results of the following 3 queries: (matching ids if possible) orderDB: SELECT * FROM customer invoiceDB: SELECT * FROM customer shipmentDB: SELECT * FROM customer
13
Page 13 Global as View Example (2) Extract all orders with invoices and shipments: Equivalent SQL: (if possible to query multiple databases) }>@med :- }@shipmentDB AND }>@orderDB AND }@invoiceDB SELECT shipment.shipid, invoice.invId, order.oid FROM shipment, invoice, order WHERE shipment.shipid = invoice.shipId AND shipment.oid = order.oid
14
Page 14 Local as View Approach Pre-define an integrated global view that encompasses the information present in all sources. For each local source, specify the local view as a subset of the information available in the GV. Building the GV is typically not discussed. However, LAV approach makes it easier to add/remove sources as GV does not have to be updated. Query processing using LAV approach is more difficult than GAV approach as have to determine what information can be extracted from the views.
15
Page 15 Local as View Example Consider this global customer relation in the GV: u Assume that the order, shipment, and invoice databases only contains a customer record if the customer had an invoice, order,or shipment respectively. Further, assume that only shipmentDB contains a customer address. Local views of each source: customer(id, name, addr) orderView(C,N) :- customer(C,N) invoiceView(C,N) :- customer(C,N) shipView(C,N,A) :- customer(C,N,A)
16
Page 16 Local as View Example (2) Let the user pose the following query: u Query asks for all customer names. u Query processor must determine which views are relevant (in this case all of them). Local queries on each source: q(N) :- customer(I, N, A) q(N) :- orderView(C,N) q(N) :- invoiceView(C,N) q(N) :- shipView(C,N,A)
17
Page 17 What is the open problem? The two approaches are both viable methods for solving data integration. However, the open problem is that neither approach performs schema integration - the construction of the global view itself. u GAV - GV constructed (schema integration performed) by global designer when specifying extraction rules u LAV - GV is pre-defined using some previous integration process (most likely manual in nature) u Both methods rely on the concept of a global user to create the global schema.
18
Page 18 How Unity is Different Our integration architecture called Unity is different because it approaches the integration problem for a different perspective: Thus, the integration problem is tackled from a different set of starting assumptions: u Do not assume pre-existing or manually created GV. u However, assume we have a dictionary and a language for describing schema and data element semantics. u Attempt to automatically build a GV from source descriptions of each data source. How can we automate, or semi-automate, the construction of the global view by extracting information from the local data sources?
19
Page 19 The Unity Approach Given a set of data sources and a dictionary and language to describe data semantics: u 1) Semi-automatically extract and represent data source semantics in the language using the dictionary. u 2) Automatically match concepts across data sources by using the dictionary to determine related concepts. ïThis process effectively builds the global level relations or objects initially assumed or created in other approaches. ïHowever, since there is no manual intervention, the precision of global view construction is affected by inconsistencies in the descriptions of the data sources and matching concepts. u 3) Automatically generate queries specified by the user using dictionary terms (not structures) and map the user's query to appropriate data elements in the local sources.
20
Page 20 Unity Overview Unity is a software package that implements the integration architecture with a GUI. Developed using Microsoft Visual C++ 6 and Microsoft Foundation Classes (MFC). Unity allows the user to: u Construct and modify standard dictionaries u Build X-Specs to describe data sources u Integrate X-Specs into an integrated view u Transparently query integrated systems using ODBC and automatically generate SQL transactions
21
Page 21 Unity Example Step #1 - Standard Dictionary A standard dictionary (SD) provides standardized terms to capture data semantics. u Hierarchy of terms related by IS-A or HAS-A links u Contains base set of common database concepts, but new concepts can be added A SD term is a single, unambiguous semantic definition. u Several SD entries for a single English word are required if the word has multiple definitions. The top-level dictionary terms are those proposed by Sowa.
23
Page 23 Unity Example Step #2 - Data Extraction For each data source, an X-Spec document is constructed that consists of: u field, table, key, and join information extracted from the ODBC source u assignment of semantic names for each field and table Semantic names combine dictionary terms to describe the semantics of schema elements. u semantic name := [CT_Type] | [CT_Type] PN u CT_Type := CT | CT {; CT} | CT {,CT} u CT := context term, PN := property name u each CT and PN is a single term from the dictionary
24
Page 24 Unity Example Step #2 - Data Extraction (2) Semantic names are initially assigned using an automatic algorithm which attempts to find the best matches. u The integrator can then refine initial semantic name assignments. Semantic names have two major purposes: u used as a means for describing, documenting, and comparing concepts across systems u allow information in the database (and later in the integrated view) to be organized by semantic concept instead of using structures or relations ïThis simplifies querying the database and integrated view because the information is not divided in normalized relations.
26
Page 26 Unity Example Step #3 - Schema Building Unlike previous approaches, the global view (or schema) is constructed automatically by combining source specifications (X-Specs). This is possible because semantic naming of concepts allows matching across systems: u The same semantic name in two databases is assumed to represent the same concept. u Hierarchical nature of semantic names (consisting of multiple terms) allows a schema to be built-up from pieces of relations or objects from each data source. Effectively, the global view is synthesized by the union of concepts in the underlying systems.
28
Page 28 Unity Example Step #4 - Query Processing The query processor: u Allows the user to formulate queries on the view. u Translates from semantic names in the context view to structural queries (SQL) on databases. ïInvolves determining correct field and table mappings and discovery of join conditions and join paths u Retrieves query results and formats them for display to the user. Client-side query processing: u Perform joins between databases using common keys.
30
Page 30 Benefits and Contributions The architecture automatically integrates relational schemas into a global view for querying. Unique contributions: u Synthesizing a global view from the bottom-up instead of top-down. This should improve integration scalability. u Organizing the global view as a hierarchy of concepts instead of relations or predicates simplifies querying similar to the Universal Relation as the user does not have to specify specific predicates/relations or join conditions. u Query processing is achieved by dynamically discovering extraction rules. ïThe discovered rules are similar to extraction rules of GAV systems.
31
Page 31 Future Work Unity performs schema integration by extracting data source information and performing global joins. u However, the global query processor needs to be extended to handle more diverse queries involving: ïaggregration and grouping, recursive queries, queries with selection conditions that span data sources ïsupport for typical data integration problems of scaling, data type conversions, and translation of units Synthesizing the global view by combining concepts can be improved by exploiting dictionary knowledge: u Use IS-A relationships in dictionary to improve matching. u Determine when to create new global level attributes and contexts that are discovered based on interschema relationships.
32
Page 32 References Publications: u Unity - A Database Integration Tool, R. Lawrence and K. Barker, TRLabs Emerging Technology Bulletin, Jan. 2000. u Multidatabase Querying by Context, R. Lawrence and K. Barker, DataSem2000, pages 127-136, Oct. 2000. u Integrating Relational Database Schemas using a Standardized Dictionary, SAC’2001 - ACM Symposium on Applied Computing, pages 225-230, March 2001. u Querying Relational Databases without Explicit Joins DASWIS 2001- International Workshop on Data Semantics in Web Information Systems (with ER'2001), Nov. 2001. Further Information: u http://www.cs.uiowa.edu/~rlawrenc/
33
Page 33 Extra Slides Extra Slides...
34
Data Warehouse Approach Invoice Database Gather Refine Aggregate Store Gather Refine Aggregate Store Gather Refine Aggregate Store Warehouse Cust(id,name,addr,city,state,cty) Order(oid,cid,odate) OrdProd(oid,pid,amt,pr) Prod(id,name,pr,desc) Order Database Shipment Database Cust(id,name,addr,city,state,cty) Invoice(invId,custId,invDate) InvProdinvId,prodId,amt,pr) Prod(id,name,pr,desc) Cust(id,name,addr,city,state,cty) Shipment(shipid,oid,cid,shipdate) ShipProd(shipid,prodid,amt) Prod(id,name,pr,desc, inv)
35
Integration Architecture Architecture Components: 1) Integrated Context View user’s view of integration 2) X-Spec Editor stores schema & metadata uses XML 3) Standard Dictionary terms to express semantics 4) Integration Algorithm combines X-Specs into integrated context view 5) Query Processor accepts query on view determines data source mappings and joins executes queries and formats results Local Transactions X-Spec X-Spec Editor Standard Dictionary Integration Algorithm Integrated Context View Query Processor and ODBC Manager Database Client Subtransactions Client Multidatabase Layer Database X-Spec
36
Page 36 Architecture Components The architecture consists of four components: u A standard dictionary (SD) to capture data semantics ïSD terms are used to build semantic names describing semantics of schema elements. u X-Specs for storing data semantics ïDatabase metadata and semantic names stored using XML u Integration Algorithm ïMatches concepts in different databases by semantic names. ïProduces an integrated view of all database concepts. u Query Processor ïAllows the user to formulate queries on the view. ïTranslates from semantic names in integrated view to SQL queries and integrates and formats results. s Involves determining correct field and table mappings and discovery of join conditions and join paths
37
Page 37 The integration architecture consists of three separate processes: u Capture process: independently extracts database schema information and metadata into a XML document called a X- Spec. u Integration process: combines X-Specs into a structurally- neutral hierarchy of database concepts called an integrated context view. u Query process: allows the user to formulate queries on the integrated view that are mapped by the query processor to structural queries (SQL) and the results are integrated and formatted. Integration Processes
38
Page 38 Architecture Components: Dictionary vs. Knowledge Base The standard dictionary differs from a knowledge base such as Cyc because: u Not intended to be a general English dictionary or contain knowledge facts about the world ïDictionary is evolved as new terms are required ïNot all English words are used u Dictionary provides the systems with no “knowledge” ïSince no facts are stored, system cannot deduce new facts ïDictionary terms are just semantic place holders, integrators determine the semantics of the database not the system u Simplified organization ïDictionary is organized as a tree for efficiency and simplicity in determining related concepts u Re-use of terms ïTerms are re-used in semantic names
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.