Dynamic Database Integration in a JDBC Driver Terrence Mason and Dr. Ramon Lawrence Iowa Database and Emerging Application Laboratory University of Iowa 7th International Conference on Enterprise Information Systems ICEIS 2005 Miami, Florida
Discuss the contributions of JDBC Driver Review the Architecture Step through an example integration and query (Partitioned TPC-H* dataset) Review the experimental results Demonstrate efficient Database Integration * Presentation
Contributions to Database Integration Standard API for Integration (JDBC) Automatic generation of a global view of integrated data sources –Annotation done locally –Common Vocabulary (National Cancer Institute-EVS) –Scalable to build a global schema Simple Conceptual Query Language Automatic Join Determination for queries Allows evolution of data sources Detects inconsistent data across sources
Unity JDBC Driver Architecture DB 1 DB 2 DB n Embedded Database Engine JDBC SQL Unity JDBC Driver Java Application Semantic Query Results
Extending Standard JDBC API for Integration Standard Java Interfaces for Single Database JDBC Connections extended to Multiple Databases –Connection –Driver Manager –Statement –Result Set
Java Code for JDBC Integration import java.sql.*; public class JDBCApplication { public static void main(String[] args) { {String url = “jdbc:unity://sources.xml"; (1) Connection con;(2) // Load UnityDriver class try { { Class.forName(``unity.jdbc.UnityDriver");} } (3) catch (java.lang.ClassNotFoundException e) { System.exit(1); }(4) try { //Initiate connection (5) con = DriverManager.getConnection(url);(6) Statement stmt = con.createStatement(); (7) ResultSet rst = stmt.executeQuery(`SELECT Part.Name, (8) LineItem.Quantity, Customer.Name (9) WHERE Customer.Name='Customer_25’ ” ); } (10) System.out.println(``Part, Quantity, Customer");(11) while (rst.next()) (12) { System.out.println(rst.getString(``Part.Name")(13) +”,”+rst.getString(``LineItem.Quantity")(14) +”,”+rst.getString(``Customer.Name") ); (15) } con.close(); (16) } (17) catch (SQLException ex) { System.exit(1); } (18) }
XML File to Reference Data Sources SOURCES> <DATABASE> jdbc:microsoft:sqlserver://IDEALAB5.cs.uiowa.edu:1433;DatabaseName=TPC; User=terry;Password=xxxxx jdbc:microsoft:sqlserver://IDEALAB5.cs.uiowa.edu:1433;DatabaseName=TPC; User=terry;Password=xxxxx <DRIVER>com.microsoft.jdbc.sqlserver.SQLServerDriver</DRIVER><XSPEC>xspec/Order.xml</XSPEC></DATABASE> jdbc:microsoft:sqlserver://IDEALAB3.cs.uiowa.edu:1433;DatabaseName=TPC; User=terry;Password=yyyyyy jdbc:microsoft:sqlserver://IDEALAB3.cs.uiowa.edu:1433;DatabaseName=TPC; User=terry;Password=yyyyyy <DRIVER>com.microsoft.jdbc.sqlserver.SQLServerDriver</DRIVER><XSPEC>xspec/Part.xml</XSPEC></DATABASE></SOURCES>
- Order Microsoft SQL Server 2000 Microsoft SQL Server Customer Customer CUSTOMER CUSTOMER - Customer.Id C_CUSTKEY C_CUSTKEY int int - Customer.Name C_NAME C_NAME varchar varchar Customer.Nation.Id C_NATIONKEY C_NATIONKEY int int - 4 Organization C_CUSTKEY C_CUSTKEY Organization Organization C_NATIONKEY C_NATIONKEY NATION NATION - CUSTOMER->NATION CUSTOMER->NATION FK__CUSTOMER__C_NATI__7A672E12 FK__CUSTOMER__C_NATI__7A672E12 CUSTOMER CUSTOMER PK__NATION__6E01572D PK__NATION__6E01572D NATION NATION 3 3 Order.xml file (XSpec) Schema Information Table Fields Primary key Foreign key Join Order Database Annotation -Semantic Names Scope of Keys (Global joins) XML Document created Semi-automatically Schema Information - Extracted Automatically from Database Annotation and Scopes – Semi-automatically
Order Database customer(c_custkey, c_name, c_nationkey) orders(o_orderkey,o_custkey,o_orderdate) lineitem(l_orderkey,l_partkey,l_suppkey,l_linenum,l_qty) nation(n_nationkey, n_name,n_regionkey) region(r_regionkey, r_name) Part Database part(p_partkey, p_name, p_mfgr) supplier(s_suppkey, s_name, s_nationkey) partsupp(ps_partkey,ps_suppkey) nation(n_nationkey, n_name, n_regionkey) region(r_regionkey, r_name) Global Schema Part.Id, Part.Name, Part.Manufacturer Supplier.Id, Supplier.Name, Supplier.Nation.Id Order.Id, Order.Customer.Id, Order.Date LineItem.Linenumber, LineItem.Order.Id LineItem.Quantity, LineItem.Part.Id, LineItem.Supplier.Id, Customer.Id, Customer.Name, Customer.Nation.Id Nation.Id, Nation.Name, Nation.Region.Id Region.Id, Region.Name Build Global Schema On Local Database Annotations
Global Schema Part.Id, Part.Name, Part.Manufacturer Supplier.Id, Supplier.Name, Supplier.Nation.Id Order.Id, Order.Customer.Id, Order.Date LineItem.Linenumber, LineItem.Order.Id LineItem.Quantity, LineItem.Part.Id, LineItem.Supplier.Id, Customer.Id, Customer.Name, Customer.Nation.Id Nation.Id, Nation.Name, Nation.Region.Id Region.Id, Region.Name Attribute Only SQL Global Schema Query Language on concepts in Global Schema No FROM clause –Tables not specified Selection conditions on concepts Order by Query: SELECT Part.Name, LineItem.Quantity, Customer.Name WHERE Customer.Name = 'Customer# '
QueryProcessing Steps Query Processing Steps Parse Semantic Query – –Validate concepts – –Create parse tree Map concepts to fields in local databases Determine joins to relate attributes in each local database Build Execution Tree (Relational Algebra) – –Execute a sub-query to each local database – –Find global join or union to relate sub-queries – –Combine sub-queries into single result set
Conceptual Query and Parse Tree Parse Tree: SELECT Identifier: Part.Name Identifier: LineItem.Quantity Identifier: Customer.Name WHERE Comparison_Op: = Identifier: Customer.Name String: 'Customer# ' Conceptual Query: SELECT Part.Name, LineItem.Quantity, Customer.Name WHERE Customer.Name = 'Customer# '
Join Graph Construction Graph represents joins for each local database Edges directed as N:1joins Automatically extracted into XSpec or added to the XSpec. Used to calculate joins for each sub-query
Nation Line Item Order Customer Region Database Join Graphs Part Supp Nation SupplierPart Region Part Database Order Database
Nation Line Item Order Customer Region Map the Concepts in query to Relations Part Supp Nation SupplierPart Region Part Database Order Database Part.Name, LineItem.Quantity, Customer.Name
Line Item Order Customer Determine Local Joins Steiner Tree Approximation Algorithm Part Part Database Order Database Global Join LineItem.Part.Id is foreign key to Part.Id Semantic Query: SELECT Part.Name, LineItem.Quantity, Customer.Name Where Customer.Name = 'Customer# '
Build Execution Tree Relational Algebra Projection –Concepts in SELECT portion of conceptual query –Sub-query projections of required fields (global joins) Selection –WHERE conditions of conceptual query Joins –Determined from Join Graphs –Global joins identified by key scopes
Sub-queries Sent to Each Local Database SQL through JDBC Part Database: SELECT P.P_NAME, P.P_PARTKEY FROM PART AS P Order Database: SELECT L.L_QUANTITY, C.C_NAME, L.L_PARTKEY FROM LINEITEM AS L, CUSTOMER AS C, ORDERS AS O WHERE C.C_NAME = 'Customer# ' AND O.O_CUSTKEY = C.C_CUSTKEY AND L.L_ORDERKEY = O.O_ORDERKEY Local joins determined from join graphs Selection ConditionSelection Condition Elements added to queries in order for the global join to be executed in Unity Driver.
Operator Execution Tree Idealab5 Database Server Idealab1 Client (Unity Driver) Unity Embedded Database Engine Idealab3 Database Server
Experimental Results Dynamic integration is efficient and scalable Minimal overhead Multi-source query processing –Competitive with single source execution –Possible to execute queries on a global schema
Schema Integration Results Multiple Copies of TPC-H (Seconds) Number of Schemas Integrate SchemasConnect Parse Sub-queries Total Time Integration of schemas occurs in linear time based on number of schemas integrated. Integration and Connection executes only once at start up. Not for each query.
Query Small Result Size (76 Tuples) * Only 76 tuples transported over network for single sub-query * Separate requires entire Part table imported to Unity for join
Query Large Result Size (6,000,215 tuples) * Distributed execution of the queries on multiple computers executed faster than a single database server due to parallelism for this particular query.
Conclusions Integration possible in a JDBC Driver Local Annotation permits scalable integrations Minimal Overhead to Process Queries Query Multiple Database on a Global View –No need to specify joins –No requirement to know underlying schemas
Future Works AutoJoin – Scalable inference engine for join determination Improve global query inference Sophisticated Global Query Optimizer Extend to support Federated Database Queries –No global schema –Fully Specified Queries
Queries to Test Unity Performance (Data Labels on Charts) TPC-H - Conceptual query executed through Unity driver against a single source TPC-H database. JDBC TPC-H - SQL query equivalent to conceptual query executed directly through SQL Server JDBC driver on a single source TPC-H database. Partitioned on One Computer - Conceptual query executed on TPC-H data set virtually partitioned into the Part and Order databases