Download presentation
Presentation is loading. Please wait.
1
Page 1 Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker Querying Relational Databases without Explicit Joins
2
Page 2 Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker Outline è Introduction, Motivation, and Background u What is wrong with SQL? u How can we replace SQL? è Querying by Context using Semantic Names u An example query è Query Architecture u Term dictionary, X-Specs, query processor è Query Processor Algorithms u Field/table mapping discovery, join selection u GUI extensions to simplify query construction è Future work and conclusions
3
Page 3 Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker Introduction and Motivation è Despite improvements in “core” database technology, advances in database query languages have not kept pace. è SQL is still the fundamental basis for most access languages and tools. è SQL is often difficult to use for beginning users or when formulating queries on large and complex database schema. è Thus, the motivation for designing a high-level query language for users that is also backwards compatible with SQL.
4
Page 4 Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker What is wrong with SQL? è There is nothing wrong with SQL. However, SQL is not a simple query language for many reasons: u Querying by structure does not hide complexities introduced due to database normalization. u Structures (fields and tables) may be assigned poor names that do not adequately describe their semantics. u Notion of a “join” is confusing for beginning users especially when multiple joins are present. u SQL forces structural access which does not provide logical query transparency and restricts logical schema evolution.
5
Page 5 Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker SQL as a Standard è SQL is an universally-accepted standard supported by all major relational DB vendors. u Object-oriented query languages are also developed similar to SQL. u Most vendors provide graphical query tools that simplify SQL construction. u SQL is “relatively” standardized across database systems and platforms. è Conclusion: Any new query language should be backwards compatible with SQL to guarantee its usefulness. u However, it is desirable to totally hide SQL formulation at the user-level.
6
Page 6 Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker Previous Work è There have been several research systems and prototypes in the area. They fall in 3 categories: u 1) Graphical Query Tools and Models ïQuery by example (QBE), database query tools u 2) Query by Word Phrases ïSemQL (WordNet) ïInformation retrieval techniques, web searching u 3) User-directed Querying ïKaleidoscope (logic)
7
Page 7 Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker Query Issues è A desirable query language should: u Allow the user to be able to systematically and deterministically produce results. u Provide a suitable formalism for representing and querying data. u Hide the naming and structures of the database by providing logical and physical access transparency. u Allow the user to browse the contents of the database to determine query concepts. u Provide a graphical query interface. u Be generally applicable to multiple database systems and/or data models.
8
Page 8 Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker Querying by Context (QBC) è Querying by context (QBC) is a methodology for querying relational databases by semantics. u Querying is performed by selecting semantic names that represent query concepts. u Semantic names are assigned once by the DBA to describe database semantics. è Users query the database by selecting semantic names from the context view. u The context view contains all concepts present in the database with appropriate semantic names. è A query processor maps the user’s selections and criteria to an actual SQL query.
9
Page 9 Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker Querying by Context Example
10
Page 10 Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker Querying by Context Example (2) è User query: Retrieve all orders and customer names where an order contains a product from category ‘Produce’. è SQL: SELECT O.OrderID, CU.CompanyName FROM [Categories] AS C, [Customers] AS CU, [Orders] AS O, [Products] AS P, [Order Details] AS OD WHERE C.CategoryName = 'PRODUCE’ AND (C.CategoryID = P.CategoryID) AND (P.ProductID = OD.ProductID) AND (OD.OrderID = O.OrderID) AND (O.CustomerID = CU.CustomerID)
13
Page 13 Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker è The query architecture consists of three separate processes: u Capture process: independently extracts database schema information and metadata into a XML document called a X-Spec. u Integration process: combines X-Spec(s) into a structurally-neutral hierarchy of database concepts called an integrated context view. u Query process: allows the user to formulate queries on the integrated view that are mapped by the query processor to structural queries (SQL) and the results are integrated and formatted. Query by Context Processes
14
Page 14 Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker The Integrated Context View è Before semantic querying can begin, the DBA must assign semantic names to each field and table in the database. u Semantic names and schema mappings are stored in an XML document called an X-Spec. è Integration of one or more X-Specs produces a structurally-neutral hierarchy of concepts called an integrated context view. u We will only look at querying one database. è A context view (CV) is a valid Universal Relation. u Each field is assigned a semantic name which uniquely identifies its semantic connotation.
15
Query Architecture Architecture Components: 1) Integrated Context View user’s view of integration 2) X-Spec Editor stores schema & metadata uses XML 3) Standard Dictionary terms to express semantics 4) Integration Algorithm combines X-Specs into integrated context view 5) Query Processor accepts query on view determines data source mappings and joins executes queries and formats results Local Transactions X-Spec X-Spec Editor Standard Dictionary Integration Algorithm Integrated Context View Query Processor and ODBC Manager Database Client Subtransactions Client Global Query Layer Database X-Spec
16
Page 16 Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker What is a semantic name? è A semantic name is a universal, semantic identifier in a domain. u Similar to a field name in the Universal Relation. u Semantics are guaranteed unique by construction. è Users query based on the semantic names provided by the DBA. è Systematic construction of semantic names allows: u The system to insert join conditions into query. u The context view to be organized hierarchically by semantic concept to reduce burden on user.
17
Page 17 Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker User Query Formulation è Users query the context view by: u browsing and selecting semantic names from the context view for selection/projection u specifying ordering and selection criteria è Users do not have to specify joins between concepts: u Some joins are implicit by virtue of hierarchical semantic names. (e.g. [Order;Product] of [Order]). u Joins are automatically inserted when required if not directly specified by user. u User can explicitly direct joins by browsing CV which automatically connects concepts.
18
Page 18 Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker Query Processor è Query processor translates from semantic query to SQL which requires: u Mapping from semantic names to field and table names using supplied X-Spec mappings. ïOne challenge is determining a particular field mapping to use if multiple are present. (e.g. [Order] Id) u Insertion of joins to preserve user query semantics. ïIf user specifies no relationship between concepts, chose shortest semantic join path, otherwise select shortest physical join path. u If user specifies (some) join semantics, then join determination is simplified. ïSystem does not prevent user from specifying complex, non-standard joins like outer joins.
19
Page 19 Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker Selecting Joins Using Join Graphs è Determining joins to use is simplified by defining a join graph. è A join graph is an undirected graph where: u Each node N i is a table in the database. u There is a link from node N i to node N j if there is a join between the two tables. è A join path is a sequence of joins connecting two nodes in the graph. è A join tree is a set of joins connecting two or more nodes. è A join matrix M stores the shortest join paths between any two nodes (tables).
20
Page 20 Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker Join Graph for Order Database
21
Page 21 Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker Join Discovery Results è Join discovery in a database with a connected, acyclic join graph is simple as there exists only one join tree for any set of tables. è For a cyclic join graph, there may exist more than one join tree for a set of tables and each tree may have different semantics. u Users may eliminate possibilities when browsing the context view or using direct specification. u Otherwise, system selects shortest join path first considering semantic names, and then physical join paths.
22
Page 22 Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker Reducing Join Ambiguity è Join ambiguity can be reduced during query formulation without the user’s knowledge. u Example: Retrieve all orders and customer names where an order contains a product in category ‘Produce’. è Semantic names selected: u [Order] Id, [Customer] Name, [Category] Name u no join ambiguity if acyclic graph, however… è Semantic names with no ambiguity: u [Order] Id, [Order;Customer] Name, [Order;Product;Category] Name è Names indicate path from starting context “Order”.
23
Page 23 Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker Reducing Join Ambiguity (2) è From a user interface perspective, the second set of semantic names can be determined if: u When the user browses the [Order] context, the [Customer] name information is merged into the [Order] context using a hidden connection on [Order] Id. u Similarly, connect [Product] and [Category] information through [Order] Id (Order to Product) and [Product] Id (Product to Category). u As the user browses the view and uses these connections, this information can be exploited to determine appropriate join paths.
24
Page 24 Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker Query by Context Discussion è Important benefits of querying by context: u System table and field names are not presented to the user who queries based on semantic names. u Database structure is not shown to the user. u Field and table mappings are automatically determined based on X-Spec information. u Join conditions are inserted as needed when available to join tables and will preserve user’s query semantics as specified. u Structural neutrality of context view allows QBC to be extended to non-relational databases and be used as a query language for integrated databases.
25
Page 25 Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker Conclusions è User querying can be simplified by semantic naming of schema constructs (by the DBA) that hierarchically organizes concepts into a view. è Query by context provides logical query transparency that is suited for databases with schema evolution or integrated systems. è Users are able to transparently query integrated systems by concept instead of structure. è Handling join ambiguity is an important component in mapping to SQL.
26
Page 26 Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker Future Work è Continuing to refine a prototype of the system called Unity. è A comparison study of query by context versus traditional SQL, database query tools, and query by example. è The query processor is being extended to resolve more complex queries. u Joins on non-standard keys, joins across databases, etc.
27
Page 27 Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker References è Publications: u Unity - A Database Integration Tool, R. Lawrence and K. Barker, TRLabs Emerging Technology Bulletin, January 2000. u Multidatabase Querying by Context, R. Lawrence and K. Barker, DataSem2000, pg 127-136, Oct. 2000. u Using Unity to Semi-Automatically Integrate Relational Schema, Demonstration to appear at ICDE’2002. u Integrating Relational Database Schemas using a Standardized Dictionary, SAC’2001 - ACM Symposium on Applied Computing, March, 2001. è Further Information: u http://www.cs.uiowa.edu/~rlawrenc/
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.