Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504

Similar presentations


Presentation on theme: "Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504"— Presentation transcript:

1 Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

2 Outline  Requirements  Benchmark  Discussion of Design & Implementation  Demo  Assignment  Q&A 2015-9-5DDB2

3 Outline  Requirements  Benchmark  Discussion of Design & Implementation  Demo  Assignment  Q&A 2015-9-5DDB3

4 Database Management  Compulsory Commands SELECT  Fragmentation Horizontal Fragmentation Vertical Fragmentation 2015-9-5DDB4

5 Architecture  P2P Architecture 2015-9-5DDB5

6 Query Processing  SELECT statement One table & multi-tables (JOIN) Types of operator in the predicate: >,<,=  Command Parsing  Query Processing General query tree Query tree optimization and reduction Network traffic optimization 2015-9-5DDB6

7 User Interface  The user should be able to use the interface to interact with your Distributed Query Engine  Any type of interface Command Line Interface Application-based Interface Web-based Interface  Note: DO NOT focus on the interface design. The interface meets the requirements if: Let users input the commands Display the results and additional evaluation metrics 2015-9-5DDB7

8 System Outputs  The size of query result set  The optimized query tree  The time cost of query  The communication cost of query 2015-9-5DDB8

9 Documentation and Report  Mid-term presentation Design of the distributed database query engine Project work plan  Final report Architecture Query optimization method Implementation of communication protocols  System operation specification Instruction of installation, configuration, and operation of the query engine 2015-9-5DDB9

10 System Evaluation  Demonstration Time 16th Week  System Test Environment Operating system: Windows Local DBMS: MySQL 2015-9-5DDB10

11 Outline  Requirements  Benchmark  Discussion of Design & Implementation  Demo  Assignment  Q&A 2015-9-5DDB11

12 Dataset  We simulate a scenario of using distributed database systems.  In general, the followings are provided: The schema of a database (global tables) The fragmentation schemes The allocation 2015-9-5DDB12

13 Dataset 2015-9-5DDB13

14 Fragmentation  Horizontal Fragmentation  Vertical Fragmentation 2015-9-5DDB14

15 Allocation 2015-9-5DDB15

16 Overview of Query Processing  Decomposition and Localization Rewriting: a query  an Algebra tree Reduction  Optimization Optimize the cost of data transfer  Execution Intermediate table storage and access The TOTAL response time after the user issues a query 2015-9-5DDB16

17 Decomposition and Localization  Evaluation Points: The elimination of useless fragmentations and joins The global optimization of algebra tree  Example: 2015-9-5DDB17

18 Decomposition and Localization 2015-9-5DDB18

19 Optimization  Evaluation metric: The amounts (Bytes) of data transfer  You should provide the following information: The execution plan, where all operations as well as data transfers should be listed in sequence. The amounts of each data transfer and the sum of amounts of all transfers. Note that the amounts of data transfer is measured by data BYTES before compression (you can compress the transferred data if it is necessary). 2015-9-5DDB19

20 Execution  Evaluation metric: total response time  Total response time is the sum of Time of input receiving Time of query processing (decomposition, localization and optimization) Time of result display 2015-9-5DDB20

21 Outline  Requirements  Benchmark  Discussion of Design & Implementation  Demo  Assignment  Q&A 2015-9-5DDB21

22 Communication Protocols  Access Level Client-Server Protocols Server-Server Protocols  How to Design Communication Protocols Sync vs. Async Design of commands and responses  How to implement Communication Protocols Strong vs. Economy Techniques 2015-9-5DDB22

23 Database Management  Global vs. Local Global Management Local Management  GDD Global Information of DDB Storage Issues  Local DBMS Recommendation MySQL 2015-9-5DDB23

24 Query Processing 2015-9-5DDB24 Client A  Master site Optimize the query Formulate execution plan Broadcast the plan  All sites Execute commands from Master site Return results B CD commands  The Crucial Points Global Optimization Global Execution Formulation

25 Other Issues  SQL Statement Parser  Multi-Thread Mechanism  Query Tree Layout and Visualization 2015-9-5DDB25

26 Outline  Requirements  Benchmark  Discussion of Design & Implementation  Demo  Assignment  Q&A 2015-9-5DDB26

27 Demo For References Only  Authors: Shoubin Kong 孔守斌 Jun Wang 王 军 FangQiang Yu 余芳强 2015-9-5DDB27

28 Implementation Details  Programming Language: Java  Local DBMS: MySQL  Protocol: RMI 2015-9-5DDB28

29 An Overview of the System 2015-9-5DDB29 Client User System Client End Server End DDBMS Server End DDBMS Communication Protocols Communication Protocols

30 Deployment  Client: 127.0.0.1  Site server 1: 127.0.0.1:40001  Site server 2: 127.0.0.1:40002  Site server 3: 127.0.0.1:40003  Site server 4: 127.0.0.1:40004 2015-9-5DDB30

31 Database Initialization  Use your self-defined commands to initialize the database: Define the 4 sites over 4 servers Create the database Create the tables Fragment the tables Allocation each fragmentation to sites 2015-9-5DDB31

32 Commands  Define site  Create table  Fragment  Allocate  Import  Insert / Delete  Select 2015-9-5DDB32

33 Summaries  Requirement Driven  Perfect vs. Good Enough  Comparative Advantage  A Central Management Scheme to a Distributed Project 2015-9-5DDB33

34 Outline  Requirements  Benchmark  Discussion of Design & Implementation  Demo  Assignment  Q&A 2015-9-5DDB34

35 Assignment : Fragmentation Q1: Select SNO from PARTA, SUPPLY Where PARTS.PNO = SUPPLY.PNO and PARTS.PRICE<6000 Q2: Select SNAME, PNO from SUPPLIER, SUPPLY Where SUPPLIER.SNO = SUPPLY.SNO and SUPPLIER.COUNTRY = “USA” Q3: Select SNO, SNAME, COUNT(*) FROM SUPPLIER, SUPPLY Where SUPPLIER.SNO = SUPPLY.SNO group by SUPPLIER.SNO 2015-9-5DDB35

36 Assignment : Fragmentation  The Set of Complete and Minimal Simple Predicates {PRICE < 6000, PRICE ≥ 6000, COUNTRY = “USA”, COUNTRY ≠ “USA” } 2015-9-5DDB36

37 Assignment : Fragmentation  PART – Horizontal Fragmentation PARTS1 = σ price<6000 PARTS PARTS2 = σ price≥6000 PARTS 2015-9-5DDB37 PARTS1 PNOPNAMEPRICE P3VIDEO5000 P4HI-HI3000 PARTS2 PNOPNAMEPRICE P1PC10000 P2CAMERA8000

38 Assignment : Fragmentation  SUPPLIER – Horizontal Fragmentation SUPPLIER1 = σ country=“USA” SUPPLIER SUPPLIER2 = σ country≠ “USA” SUPPLIER 2015-9-5DDB38 SUPPLIER1 SNOSNAMECOUNTRY S1SN1USA S6SN6USA SUPPLIER2 SNOSNAMECOUNTRY S2SN2INDIA S3SN3CHINA S4SN4CHINA S5SN5INDIA

39 Assignment : Fragmentation  SUPPLY – Derived Fragmentation SUPPLY 1 = (SUPPLY SUPPLIER1) PARTS1 SUPPLY 2 = (SUPPLY SUPPLIER1) PARTS2 SUPPLY 3 = (SUPPLY SUPPLIER2) PARTS1 SUPPLY 4 = (SUPPLY SUPPLIER2) PARTS2 2015-9-5DDB39 SUPPLY1 SNOPNOQTY S1P370 S6P496 SUPPLY2 SNOPNOQTY S1P160 S6P270 SUPPLY3 S3P355 S3P496 SUPPLY4 S2P260 S4P265

40 Assignment : Allocation  a) Solution1 2015-9-5DDB40

41 Assignment : Allocation 2015-9-5DDB41  a) Solution2

42 Assignment : Allocation  b) Solution 2015-9-5DDB42

43 2015-9-5CLUE43 Q & A Thank You!


Download ppt "Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504"

Similar presentations


Ads by Google