AutoJoin: Providing Freedom from Specifying Joins Terrence Mason Lixin Wang

Slides:



Advertisements
Similar presentations
Three-Step Database Design
Advertisements

Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
Discovering Queries based on Example Tuples
Anindya Datta Debra VanderMeer Krithi Ramamritham Presented by –
Efficient IR-Style Keyword Search over Relational Databases Vagelis Hristidis University of California, San Diego Luis Gravano Columbia University Yannis.
Data Warehouse Tuning. 7 - Datawarehouse2 Datawarehouse Tuning Aggregate (strategic) targeting: –Aggregates flow up from a wide selection of data, and.
Ingres/Vectorwise Implementation Details XXV Ingres Benutzerkonferenz 2012 Confidential © 2011 Actian Corporation Doug Inkster 1 of 9.
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
Review Indra Budi Fakultas Ilmu Komputer UI 2 Database Introduction Database vs File Processing Main purpose of database Database Actors.
Review for Final Test Indra Budi
Harikrishnan Karunakaran Sulabha Balan CSE  Introduction  Icicles  Icicle Maintenance  Icicle-Based Estimators  Quality & Performance  Conclusion.
Efficient Management of Inconsistent and Uncertain Data Renée J. Miller University of Toronto.
Query Optimization Dr. Karen C. Davis Professor School of Electronic and Computing Systems School of Computing Sciences and Informatics.
University of Konstanz Advances in Database Query Processing Sahak Maloyan Avoiding Sorting and Grouping In Processing Queries Sahak Maloyan.
Outline SQL Server Optimizer  Enumeration architecture  Search space: flexibility/extensibility  Cost and statistics Automatic Physical Tuning  Database.
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
Interactive Generation of Integrated Schemas Laura Chiticariu et al. Presented by: Meher Talat Shaikh.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Database management concepts Database Management Systems (DBMS) An example of a database (relational) Database schema (e.g. relational) Data independence.
Keyword Proximity Search on XML Graphs Vagelis Hristidis Yannis Papakonstatinou Andrey Presenter: Feng Shao.
Dynamic Database Integration in a JDBC Driver Terrence Mason and Dr. Ramon Lawrence Iowa Database and Emerging Application Laboratory University of Iowa.
Tuning Relational Systems I. Schema design  Trade-offs among normalization, denormalization, clustering, aggregate materialization, vertical partitioning,
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Multimedia Information Systems CS Outlines Introduction to DMBS Relational database and SQL B + - tree index structure.
Motivation Mobile devices often work offline, and users often need to download large query results for later use. Results are often accessed in small pieces.
Page 1 Multidatabase Querying by Context Ramon Lawrence, Ken Barker Multidatabase Querying by Context.
Sangam: A Transformation Modeling Framework Kajal T. Claypool (U Mass Lowell) and Elke A. Rundensteiner (WPI)
CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:
A Cost-based Approach For Converting Relational Schemas To XML Ramon Lawrence University of Iowa
Midterm 1 Concepts Relational Algebra (DB4) SQL Querying and updating (DB5) Constraints and Triggers (DB11) Unified Modeling Language (DB9) Relational.
Jingren Zhou, Per-Ake Larson, Ronnie Chaiken ICDE 2010 Talk by S. Sudarshan, IIT Bombay Some slides from original talk by Zhou et al. 1.
SQL Server Parallel Data Warehouse: Supporting Large Scale Analytics José Blakeley, Software Architect Database Systems Group, Microsoft Corporation.
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
Keyword Search in Relational Databases Jaehui Park Intelligent Database Systems Lab. Seoul National University
School of Software SUN YAT-SEN UNIVERSITY Mar, 27, 2011.
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal Surajit Chaudhuri Gautam Das Presented by Bhushan Pachpande.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
“Here is my data. Where do I start?” Examples of Ad Hoc Databases Automatic Example Queries for Ad Hoc Databases Bill Howe 1, Garret Cole 2, Nodira Khoussainova.
Analyzing Plan Diagrams of Database Query Optimizers Naveen Reddy Jayant Haritsa Database Systems Lab Indian Institute of Science Bangalore, INDIA.
Communicating with the Outside. Overview Package several SQL statements within one call to the database server Embedded procedural language (Transact.
CS Data Warehouse & Performance Tuning Xiaofang Zhou School of Computing, NUS Office: S URL:
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal, Surajit Chaudhuri, Gautam Das Cathy Wang
Querying Structured Text in an XML Database By Xuemei Luo.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
Harikrishnan Karunakaran Sulabha Balan CSE  Introduction  Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model 
Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.
1 Schema Refinement, Normalization, and Tuning. 2 Design Steps v The design steps: 1.Real-World 2. ER model 3. Relational Schema 4. Better relational.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
1 Relational Databases and SQL. Learning Objectives Understand techniques to model complex accounting phenomena in an E-R diagram Develop E-R diagrams.
RRXS Redundancy reducing XML storage in relations O. MERT ERKUŞ A. ONUR DOĞUÇ
Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.
Slide Chapter 5 The Relational Data Model and Relational Database Constraints.
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:
Schema Tuning. Outline Database design: Normalization –Problem of redundancy –Why? Functional dependency –How to solve? Decomposition –Objective of the.
Multi-Way Hash Join Effectiveness M.Sc Thesis Michael Henderson Supervisor Dr. Ramon Lawrence 2.
Query Formation From High-Level Concepts for Relational Databases Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University.
Session 1 Module 1: Introduction to Data Integrity
Generalized Hash Teams for Join and Group-By Alfons Kemper Donald Kossmann Christian Wiesner Universität Passau Germany.
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
Keyword Searching and Browsing in Databases using BANKS Charuta Nakhe, Arvind Hulgeri, Gaurav Bhalotia, Soumen Chakrabarti, S. Sudarshan Presented by Sushanth.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying and Ling Tok Wang.
Some TPC-H queries on Teradata and PostgreSQL
Introduction to the database systems (1)
Computing Full Disjunctions
Database management concepts
Keyword Searching and Browsing in Databases using BANKS
Presentation transcript:

AutoJoin: Providing Freedom from Specifying Joins Terrence Mason Lixin Wang Dr. Ramon Lawrence Iowa Database and Emerging Application Laboratory University of Iowa 7th International Conference on Enterprise Information Systems ICEIS 2005 Miami, Florida

Presentation Outline Define Query Inference Define Query Inference Query Languages that require Inference Query Languages that require Inference AutoJoin Architecture AutoJoin Architecture Join Graph represent a schema Join Graph represent a schema Queries and Query Interpretations on a Join Graph Queries and Query Interpretations on a Join Graph Pre-compute maximal join trees Pre-compute maximal join trees Algorithm EMO Algorithm EMO Query time processing – Example Query time processing – Example Performance Evaluation Performance Evaluation

Query Inference Problem New Languages The query inference problem requires enumerating and ranking query interpretations of a query such that the query interpretation desired by the user is among the highest ranked interpretations.

State of the art query languages require it State of the art query languages require it Keyword Search – automatically relate keywords across relations of a schema Keyword Search – automatically relate keywords across relations of a schema Conceptual Queries – Concepts mapped to database must be related Conceptual Queries – Concepts mapped to database must be related Natural Language Queries Natural Language Queries Natural language query mapped to concepts Natural language query mapped to concepts Relate concepts as in Conceptual Queries Relate concepts as in Conceptual Queries Current approaches not scalable Current approaches not scalable Tied to specific language Tied to specific language Or conceptual model Or conceptual model Motivation for Query Inference

Reduces to graph problem Reduces to graph problem Connect relations (nodes) with joins (edges) Connect relations (nodes) with joins (edges) Exponential solutions for highly connected graphs (database graphs less connected) Exponential solutions for highly connected graphs (database graphs less connected) Approaches to join determination Approaches to join determination Grow all ways Grow all ways Universal Relation (Maier and Ullman, 1983) Universal Relation (Maier and Ullman, 1983) Discover (Keyword) (Hristidis and Papakonstantinou, 2002, 2003, 2004) Discover (Keyword) (Hristidis and Papakonstantinou, 2002, 2003, 2004) Shortest Paths Shortest Paths CQL Conceptual Query Language (Owei and Navathe, 2001) CQL Conceptual Query Language (Owei and Navathe, 2001) Limited Interpretations Limited Interpretations Steiner Tree (2-Trees) (Wald and Sorenson, 1984) Steiner Tree (2-Trees) (Wald and Sorenson, 1984) Limit number of joins and interpretations (Zhang et al., 1999) Limit number of joins and interpretations (Zhang et al., 1999) Query time find spanning trees of keywords Query time find spanning trees of keywords DBXplorer Keyword Search (Agrawal et al. 2002) DBXplorer Keyword Search (Agrawal et al. 2002) Motivation for Query Inference

Goal of AutoJoin Consistent, Scalable Inference Engine Abstract database schema from users Automatically determine joins to relate relations and attributes Consistent approach to handle ambiguity in queries Efficient algorithm to pre-compute potential joins Minimal overhead at query time Demonstrate efficiency and scalability Structured on relational model without any required conceptual models

Example Query on TPC-H Schema English Query: List all parts ordered by Customers in the United States. Attribute-only SQL Determine Joins with AutoJoin New formulation for Query Inference problem.

Table Attributes Part partkey, name, mfgr, brand, type, size, container, retailprice, comment Supplier supkey, name, address, nationkey, phone, acctbal, comment PartSupp partkey, suppkey, availqty, supplycost, comment Customer custkey, name, address, nationkey, phone, acctbal, mktsegment, comment Order orderkey, custkey, orderstatus, totalprice, orderdate, orderpriority, clerk, shippriority, comment LineItem orderkey, partkey, suppkey, linenumber, quantity, extendedprice, discount, returnflag, tax, linestatus, shipdate, commitdate, receiptdate, shipinstruct, shipmode, comment Nation nationkey, name, regionkey, comment Region regionkey, name, comment TPC-H Schema TPC-H BENCHMARK™ ( List all parts ordered by Customers in the United States.

Attribute-only Query: Select Part.Name where Nation.Name=‘United States’; Part.Name - name attribute in Part Table Nation.Name – name attribute in Nation Table Select and where similar to SQL No From clause or joins specified Keyword Query: Part ‘United States’ Maps Part to Part relation Maps ‘United States’ to tuple in Nation relation No joins specified

SQL Query SQL Query Select Part.Name where Nation.Name = ‘United States’; SELECT P.name FROM part P, nation N, partsupp PS, lineitem LI, orders O, customer C WHERE N.name = ‘United States’ And P.partkey = PS.partkey And PS.partkey = LI.partkey And PS.suppkey = LI.suppkey And O.custkey = C.custkey And C.nationkey = N.nationkey And LI.orderkey = O.orderkey; Specified Joins and Tables

User Query Interface Inference Request Query Builder GeneratorRanker Iterator Loader XML Document AutoJoin Inference Engine Relational Database Execute Queries Interpretations AutoJoin Architecture

Representing Joins of a Schema Join Graph Graph representation of relational schema Nodes Relations in schema Directed Edges Foreign key constraint between relations Edges directed from N to 1 cardinality of relationships Maintain Lossless property (No spurious tuples on joins)

Create Join Graph TPC-H Nodes JoinedForeign key/Join Line Item to Part partkey  partkey Line Item to PartSupp partkey, suppkey  partkey, suppkey Line Item to Supplier suppkey  suppkey Line Item to Order l_orderkey  o_orderkey PartSupp to Part ps_partkey  p_partkey PartSupp to Supplier ps_suppkey  s_suppkey Supplier to Nation s_nationkey  n_nationkey Order to Customer o_custkey  c_custkey Customer to Nation c_nationkey  n_nationkey Nation to Region n_regionkey  r_regionkey Part Supp Nation Supplier Part Line Item Order Customer Region Tables as Nodes

Pre-compute Maximal Join Trees EMO Algorithm on Join Graph Efficiently computes all Trees Executes where previous strategy failed Direction of edges results in lossless join trees Pre-computed Executed once prior to query time Structures built for query time performance

Compute Lossless Joins Maximal sets of lossless joins Maximal sets of lossless joins Ambiguity inherent in the schema Ambiguity inherent in the schema Two types of ambiguity: Two types of ambiguity: Single relation that plays multiple roles Single relation that plays multiple roles Node with more than one incoming edge in join graph Node with more than one incoming edge in join graph Multiple semantic relationships between entities Multiple semantic relationships between entities Strongly connected components greater than one node Strongly connected components greater than one node

Creation of Maximal Join Trees Lossless Joins Efficient Algorithm EMO Determine all reachable graphs from nodes that may be a root for Maximal Set of Lossless Joins Identify all Strong Connected Components (SCC) For each SCC If SCC is single node and no incoming edges, create reachable graph from this node If SCC has multiple nodes, for each node in SCC with no incoming edges that are not part of SCC create reachable graph. For each reachable graph find all spanning trees Spanning trees represent Maximal Join Trees

Maximal Join Trees of TPC-H LineItem is the only root for a reachable graph. No strongly connected components Join graph is reachable graph Enumerate spanning trees on original graph Remove shortcut joins and re-compute

Part Supp Nation SupplierPart Line Item Order Customer Region TPC-H Join Graph

Part Supp Nation Supplier Part Line Item Order Customer Region TPC-H Maximal Join Trees Part Supp Nation Supplier Part Line Item Order Customer Region Part Supp Nation Supplier Part Line Item Order Customer Region Part Supp Nation Supplier Part Line Item Order Customer Region Part Supp Nation Supplier Part Line Item Order Customer Region Part Supp Nation Supplier Part Line Item Order Customer Region Part Supp Nation Supplier Part Line Item Order Customer Region Part Supp Nation Supplier Part Line Item Order Customer Region

Shortcut Joins Semantically equivalent join paths A shortcut join is a join that is semantically equivalent to a longer join path Core join path (longer) preserved in join graph Shortcut join removed for join determination Appears to be a semantically different interpretation of the query Substituted back into query No nodes on core path in query (faster) execution) TPC-H has two shortcut joins

Part Supp Nation SupplierPart Line Item Order Customer Region TPC-H Join Graph Remove Shortcut Joins Red – Shortcut Joins

Part Supp Nation Supplier Part Line Item Order Customer Region Original TPC-H Maximal Join Trees Part Supp Nation Supplier Part Line Item Order Customer Region Part Supp Nation Supplier Part Line Item Order Customer Region Part Supp Nation Supplier Part Line Item Order Customer Region Part Supp Nation Supplier Part Line Item Order Customer Region Part Supp Nation Supplier Part Line Item Order Customer Region Part Supp Nation Supplier Part Line Item Order Customer Region Part Supp Nation Supplier Part Line Item Order Customer Region

TPC-H Semantically Unique Maximal Join Trees Part Supp Nation SupplierPart Line Item Order Customer Region Part Supp Nation SupplierPart Line Item Order Customer Region 12

Query and Query Interpretation AutoJoin Join Graphs Query: Sub-graph of the join graph Nodes and (optionally) edges Not connected requires inference Query Interpretation: Connected sub-graph of the join graph Includes all specified nodes and edges

Example Query SELECT Part.Name WHERE Nation.Name = ‘United States’; Relate Part.Name to Nation.Name Part and Nation Nodes. Query of Part and Nation nodes to AutoJoin. The query is ambiguous More than one query interpretation Nation relates to Supplier and Customer Return the query with fewest joins first

Efficient Query Time Execution Find maximal join trees with query nodes Reverse index - relation to its set of join trees Intersect lists Build Interpretations Least common ancestor (vs. recursive prune) Pre-compute ancestor lists No lossless interpretations (no trees) Find lossy interpretation Rank interpretations by cost function maximal sets of lossless joins

Both Trees Contain Query Nodes Select Part.Name where Nation.Name = ‘United States’; Part Supp Nation SupplierPart Line Item Order Customer Region Part Supp Nation SupplierPart Line Item Order Customer Region 12 Red – Target Nodes

Query Processing Part Supp Nation SupplierPart Line Item Order Customer Region Part Supp Nation SupplierPart Line Item Order Customer Region 12 Red – Target Nodes Blue – Tree Nodes Gray – Nodes to Prune

Query Interpretations Part Supp Nation Part Line Item Order Customer Part Supp Nation SupplierPart 12 Select Part.Name where Customer.Nation.Name = ‘United States’; Select Part.Name where Supplier.Nation.Name = ‘United States’;

Unambiguous Query Select Supplier.Name where Order.Id = 73; Part Supp Nation SupplierPart Line Item Order Customer Region Part Supp Nation SupplierPart Line Item Order Customer Region 12 Red – Target Nodes

Query Processing Select Supplier.Name where Order.Id = 73; Red – Target Nodes Blue – Tree Nodes Gray – Nodes to Prune Part Supp Nation SupplierPart Line Item Order Customer Region Part Supp Nation SupplierPart Line Item Order Customer Region 12

Query Interpretations Select Supplier.Name where Order.Id = 73; Part Supp Supplier Line Item Order 12 Part Supp Supplier Line Item Order

The Unambiguous Query Interpretation Select Supplier.Name where Order.Id = 73; Part Supp Supplier Line Item Order

Additional Interpretations Lossy Joins Related through a node involved in two distinct roles Two maximal join trees contain all query nodes and have at least one node in common Union maximal join trees Common nodes provide relation for trees. Interpretation where node will have two incoming edges No longer lossless Example Customer and Supplier related through Nation in TPC-H. Cross products of Customers and Suppliers with the same nation

Beyond Natural Joins Theta joins Merge the two nodes related by theta join into single node and re-compute maximal objects. Expand this node for final query interpretation with theta join Tuple Variables A query interface may specify tuple variables Additional nodes and edges will be added to join graph to complete the query interpretations

Performance Experiments Broad Range of Schemas caBIO (NCI) 149 relations, 213 joins, and 1253 maximal join trees TPC-H Standard Database Inferred standard queries (21 specified queries) Ambiguity reduced by removing shortcut joins Tenant – 9 nodes, 50 joins, and 1286 maximal join trees

Peformance Results Time to generate all Maximal Join Trees Handles schemas where previous method failed Worst test 2.7 seconds Average < 1 second Reduce Ambiguity Removing shortcut joins reduces ambiguity Increased number of unambiguous query From 45% to 68% for TPC-H Benchmark Queries Minimal overhead of inference at query time Average < 1 millisecond Worst test 7.4 milliseconds

Compute Maximal Join Trees EMO vs. All Ways

Reducing Ambiguity Remove Shortcut Joins

Query Inference Time (Milliseconds)

AutoJoin Conclusions Scalable inference engine Efficiently pre-compute maximal join trees Reduced ambiguity by removing shortcut joins Overhead is minimal Complex queries can be inferred Built directly on relational model

Future Work Develop a query language Develop a query language Remove requirement of understanding the underlying schema Remove requirement of understanding the underlying schema Automatically determines joins Automatically determines joins End user interface based on AutoJoin End user interface based on AutoJoin Query inference for integration systems. Query inference for integration systems.

Query Inference (Previous) The translation of a query in a query language into an unambiguous representation of the query [Wald and Sorenson, 1984]

Universal Relation First model to require query inference First model to require query inference Maximal Objects (Maier and Ullman, 1983) Maximal Objects (Maier and Ullman, 1983) Lossless Join property to identify potential joins Lossless Join property to identify potential joins Grows all ways on hyper-graph Grows all ways on hyper-graph Returns a union of all query interpretations Returns a union of all query interpretations Minimum Directed Cost Steiner Tree (Wald and Sorenson, 1984) Minimum Directed Cost Steiner Tree (Wald and Sorenson, 1984) Limited to Partial 2-Trees Limited to Partial 2-Trees Returns only lowest cost query interpretation Returns only lowest cost query interpretation Generate a single interpretation Generate a single interpretation Do not meet need of new query languages Do not meet need of new query languages Limited query interpretations possible Limited query interpretations possible

State of the Art Query Languages Keyword Searches Keyword Searches Keywords map to either specific data, attribute names, or relation names in a database. Keywords map to either specific data, attribute names, or relation names in a database. Must identify joins to relate keywords spread across multiple relations. Must identify joins to relate keywords spread across multiple relations. Multiple approaches to identifying the top-k relationships between keywords. Multiple approaches to identifying the top-k relationships between keywords.

Keyword Search T op-K Relationships Discover (Hristidis and Papakonstantinou, 2002, 2003, 2004) Discover (Hristidis and Papakonstantinou, 2002, 2003, 2004) Grow all ways from a keyword Grow all ways from a keyword Limit on number of joins Limit on number of joins Creates extra graphs Creates extra graphs DBXplorer (Agrawal et al. 2002) DBXplorer (Agrawal et al. 2002) Generates spanning trees at query time Generates spanning trees at query time BANKS ( ) BANKS ( ) Graph of all tuples related by joins Graph of all tuples related by joins Must fit in memory (limited to smaller databases) Must fit in memory (limited to smaller databases)

State of the Art Query Languages Conceptual Query Languages or Models Conceptual Query Languages or Models Queries built with concepts that map to a database. Queries built with concepts that map to a database. Remove the burden of knowledge of the schema. Remove the burden of knowledge of the schema. Must determine joins to relate concepts in query. Must determine joins to relate concepts in query. Use conceptual model to determine joins Use conceptual model to determine joins

Conceptual Query Languages CQL (Owei and Navathe, 2001) CQL (Owei and Navathe, 2001) Queries may include roles or joins required for a query Queries may include roles or joins required for a query Pathfinder algorithm for completing the query Pathfinder algorithm for completing the query Based on shortest path between source and target concepts in query Based on shortest path between source and target concepts in query Semantically Constrained ER Diagram as a graph used to determine joins. Semantically Constrained ER Diagram as a graph used to determine joins. Conceptual Model (Zhang et al., 1999) Conceptual Model (Zhang et al., 1999) Semantic graph of database Semantic graph of database Search algorithm constrained by number of joins or number of interpretations Search algorithm constrained by number of joins or number of interpretations

State of the Art Query Languages Natural Language Queries Natural Language Queries Natural language queries map the language to concepts in a database Natural language queries map the language to concepts in a database Joins must be determined to relate concepts in database similar to Conceptual Query Languages Joins must be determined to relate concepts in database similar to Conceptual Query Languages

Functional Dependencies due to Primary Keys TPC-H TableFunctional Dependencies Part p_partkey  p_name, p_mfgr, p_brand, p_type, p_size, p_container, p_retailprice, p_comment Supplier s_suppkey  s_name, s_address, s_nationkey, s_phone, s_acctbal, s_comment PartSupp ps_partkey, ps_suppkey  ps_availqty, ps_supplycost, ps_comment Customer c_custkey  c_name, c_address, c_nationkey, c_phone, c_acctbal, c_mktsegment, c_comment Order o_orderkey  o_custkey, o_orderstatus, o_totalprice, o_orderdate, o_orderpriority, o_clerk, o_shippriority, o_comment LineItem l_orderkey, l_linenumber  l_partkey, l_suppkey, l_orderkey, l_quantity, l_extendedprice, l_discount, l_returnflag, l_tax, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment Nation n_nationkey  n_name, n_regionkey, n_comment Region r_regionkey  r_name, r_comment Primary Keys Foreign Keys

Table with Foreign Key Table Referenced Functional Dependencies LineItemPart l_partkey  p_partkey LineItemSupplier l_suppkey  s_suppkey LineItemPartSupp l_partkey, l_suppkey  ps_partkey, ps_suppkey LineItemOrder l_orderkey  o_orderkey PartSuppPart ps_partkey  p_partkey PartSuppSupplier ps_suppkey  s_suppkey SupplierNation s_nationkey  n_nationkey CustomerNation c_nationkey  n_nationkey OrderCustomer o_custkey  c_custkey NationRegion n_regionkey  r_regionkey Primary Keys Foreign Keys Function Dependencies TPC-H implied by Foreign Keys

Part Supp Nation Supplier Part Line Item Order Customer Region TPC-H Join Graph