Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building
Outline Requirements Benchmark Discussion of Design & Implementation Demo Assignment Q&A DDB2
Outline Requirements Benchmark Discussion of Design & Implementation Demo Assignment Q&A DDB3
Database Management Compulsory Commands SELECT Fragmentation Horizontal Fragmentation Vertical Fragmentation DDB4
Architecture P2P Architecture DDB5
Query Processing SELECT statement One table & multi-tables (JOIN) Types of operator in the predicate: >,<,= Command Parsing Query Processing General query tree Query tree optimization and reduction Network traffic optimization DDB6
User Interface The user should be able to use the interface to interact with your Distributed Query Engine Any type of interface Command Line Interface Application-based Interface Web-based Interface Note: DO NOT focus on the interface design. The interface meets the requirements if: Let users input the commands Display the results and additional evaluation metrics DDB7
System Outputs The size of query result set The optimized query tree The time cost of query The communication cost of query DDB8
Documentation and Report Mid-term presentation Design of the distributed database query engine Project work plan Final report Architecture Query optimization method Implementation of communication protocols System operation specification Instruction of installation, configuration, and operation of the query engine DDB9
System Evaluation Demonstration Time 16th Week System Test Environment Operating system: Windows Local DBMS: MySQL DDB10
Outline Requirements Benchmark Discussion of Design & Implementation Demo Assignment Q&A DDB11
Dataset We simulate a scenario of using distributed database systems. In general, the followings are provided: The schema of a database (global tables) The fragmentation schemes The allocation DDB12
Dataset DDB13
Fragmentation Horizontal Fragmentation Vertical Fragmentation DDB14
Allocation DDB15
Overview of Query Processing Decomposition and Localization Rewriting: a query an Algebra tree Reduction Optimization Optimize the cost of data transfer Execution Intermediate table storage and access The TOTAL response time after the user issues a query DDB16
Decomposition and Localization Evaluation Points: The elimination of useless fragmentations and joins The global optimization of algebra tree Example: DDB17
Decomposition and Localization DDB18
Optimization Evaluation metric: The amounts (Bytes) of data transfer You should provide the following information: The execution plan, where all operations as well as data transfers should be listed in sequence. The amounts of each data transfer and the sum of amounts of all transfers. Note that the amounts of data transfer is measured by data BYTES before compression (you can compress the transferred data if it is necessary) DDB19
Execution Evaluation metric: total response time Total response time is the sum of Time of input receiving Time of query processing (decomposition, localization and optimization) Time of result display DDB20
Outline Requirements Benchmark Discussion of Design & Implementation Demo Assignment Q&A DDB21
Communication Protocols Access Level Client-Server Protocols Server-Server Protocols How to Design Communication Protocols Sync vs. Async Design of commands and responses How to implement Communication Protocols Strong vs. Economy Techniques DDB22
Database Management Global vs. Local Global Management Local Management GDD Global Information of DDB Storage Issues Local DBMS Recommendation MySQL DDB23
Query Processing DDB24 Client A Master site Optimize the query Formulate execution plan Broadcast the plan All sites Execute commands from Master site Return results B CD commands The Crucial Points Global Optimization Global Execution Formulation
Other Issues SQL Statement Parser Multi-Thread Mechanism Query Tree Layout and Visualization DDB25
Outline Requirements Benchmark Discussion of Design & Implementation Demo Assignment Q&A DDB26
Demo For References Only Authors: Shoubin Kong 孔守斌 Jun Wang 王 军 FangQiang Yu 余芳强 DDB27
Implementation Details Programming Language: Java Local DBMS: MySQL Protocol: RMI DDB28
An Overview of the System DDB29 Client User System Client End Server End DDBMS Server End DDBMS Communication Protocols Communication Protocols
Deployment Client: Site server 1: :40001 Site server 2: :40002 Site server 3: :40003 Site server 4: : DDB30
Database Initialization Use your self-defined commands to initialize the database: Define the 4 sites over 4 servers Create the database Create the tables Fragment the tables Allocation each fragmentation to sites DDB31
Commands Define site Create table Fragment Allocate Import Insert / Delete Select DDB32
Summaries Requirement Driven Perfect vs. Good Enough Comparative Advantage A Central Management Scheme to a Distributed Project DDB33
Outline Requirements Benchmark Discussion of Design & Implementation Demo Assignment Q&A DDB34
Assignment : Fragmentation Q1: Select SNO from PARTA, SUPPLY Where PARTS.PNO = SUPPLY.PNO and PARTS.PRICE<6000 Q2: Select SNAME, PNO from SUPPLIER, SUPPLY Where SUPPLIER.SNO = SUPPLY.SNO and SUPPLIER.COUNTRY = “USA” Q3: Select SNO, SNAME, COUNT(*) FROM SUPPLIER, SUPPLY Where SUPPLIER.SNO = SUPPLY.SNO group by SUPPLIER.SNO DDB35
Assignment : Fragmentation The Set of Complete and Minimal Simple Predicates {PRICE < 6000, PRICE ≥ 6000, COUNTRY = “USA”, COUNTRY ≠ “USA” } DDB36
Assignment : Fragmentation PART – Horizontal Fragmentation PARTS1 = σ price<6000 PARTS PARTS2 = σ price≥6000 PARTS DDB37 PARTS1 PNOPNAMEPRICE P3VIDEO5000 P4HI-HI3000 PARTS2 PNOPNAMEPRICE P1PC10000 P2CAMERA8000
Assignment : Fragmentation SUPPLIER – Horizontal Fragmentation SUPPLIER1 = σ country=“USA” SUPPLIER SUPPLIER2 = σ country≠ “USA” SUPPLIER DDB38 SUPPLIER1 SNOSNAMECOUNTRY S1SN1USA S6SN6USA SUPPLIER2 SNOSNAMECOUNTRY S2SN2INDIA S3SN3CHINA S4SN4CHINA S5SN5INDIA
Assignment : Fragmentation SUPPLY – Derived Fragmentation SUPPLY 1 = (SUPPLY SUPPLIER1) PARTS1 SUPPLY 2 = (SUPPLY SUPPLIER1) PARTS2 SUPPLY 3 = (SUPPLY SUPPLIER2) PARTS1 SUPPLY 4 = (SUPPLY SUPPLIER2) PARTS DDB39 SUPPLY1 SNOPNOQTY S1P370 S6P496 SUPPLY2 SNOPNOQTY S1P160 S6P270 SUPPLY3 S3P355 S3P496 SUPPLY4 S2P260 S4P265
Assignment : Allocation a) Solution DDB40
Assignment : Allocation DDB41 a) Solution2
Assignment : Allocation b) Solution DDB42
CLUE43 Q & A Thank You!