Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504

Slides:



Advertisements
Similar presentations
21 Sep 2005LCG's R-GMA Applications R-GMA and LCG Steve Fisher & Antony Wilson.
Advertisements

Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Database System Concepts and Architecture
Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Chapter 8: Evaluating Alternatives for Requirements, Environment, and Implementation.
Microsoft Access Course 1. Introduction to the user interface.
Distributed databases
PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong Ng, Beng Chin Ooi, Kian-Lee Tan, Aoying Zhou Shawn Jeffery CS294-4 Peer-to-Peer Systems.
Web-Database Integration Week 8 LBSC 690 Information Technology.
Revision of course For examination purposes. Outline of Examination Question 1 is compulsory and is worth 40%. There are five other questions, of which.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
System Analysis and Design
Jun Peng Stanford University – Department of Civil and Environmental Engineering Nov 17, 2000 DISSERTATION PROPOSAL A Software Framework for Collaborative.
Web-Enabling the Warehouse Chapter 16. Benefits of Web-Enabling a Data Warehouse Better-informed decision making Lower costs of deployment and management.
31 January 2007Craig E. Ward1 Large-Scale Simulation Experimentation and Analysis Database Programming Using Java.
JDBC Vs. Java Blend Presentation by Gopal Manchikanti Shivakumar Balasubramanyam.
Selected Topics in Software Computing Distributed Software Development CVSQL Final Project Presentation.
MySQL GUI Administration Tools Rob Donahue Manager, Distributed Systems Development May 7th, 2001 Rob Donahue Manager, Distributed Systems Development.
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 7-1 David M. Kroenke’s Chapter Seven: SQL for Database Construction and.
Analysis of SQL injection prevention using a proxy server By: David Rowe Supervisor: Barry Irwin.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
Simple Database.
** NOTICE! These materials are prepared only for the students enrolled in the course Distributed Software Development (DSD) at the Department of Computer.
Living markets ® living agents ® Adaptive Execution in Business Networks January 21 st, 2002.
Session-9 Data Management for Decision Support
Master Thesis Defense Jan Fiedler 04/17/98
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
CS480 Computer Science Seminar Introduction to Microsoft Solutions Framework (MSF)
CSC 411/511: DBMS Design Dr. Nan WangCSC411_L12_JDBC_MySQL 1 MySQL and JDBC.
Dr. Mohamed Osman Hegazi 1 Database Systems Concepts Database Systems Concepts Course Outlines: Introduction to Databases and DBMS. Database System Concepts.
1 12. Course Summary Course Summary Distributed Database Systems.
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
Information System Development Courses Figure: ISD Course Structure.
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
1-1 Homework 3 Practical Implementation of A Simple Rational Database Management System.
DDBMS Distributed Database Management Systems Fragmentation
9 Systems Analysis and Design in a Changing World, Fourth Edition.
Web Server Administration Chapter 7 Installing and Testing a Programming Environment.
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
DISTRIBUTED DATABASES JORGE POMBAR. Overview Most businesses need to support databases at multiple sites. Most businesses need to support databases at.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
Communicating with the Outside. Hardware [Processor(s), Disk(s), Memory] Operating System Concurrency ControlRecovery Storage Subsystem Indexes Query.
A Framework for Testing Database Application Author: David Chays, Saikat Dan, Filippos I. Vokolos, Elaine J. Weyuker Presenter: Liping Liu.
Central Arizona Phoenix LTER Center for Environmental Studies Arizona State University Data Query Peter McCartney RDIFS Training Workshop Sevilleta LTER.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
Copyright 2007, Information Builders. Slide 1 Machine Sizing and Scalability Mark Nesson, Vashti Ragoonath June 2008.
Basics of JDBC Session 14.
Chapter 12 Distributed Data Bases. Learning Objectives What a distributed database management system (DDBMS) is and what its components are How database.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
Data Manipulation with Globus Toolkit Ivan Ivanovski TU München,
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Chapter 17: Additional Slides February 6, Outline Physical Data Management  Fragments  Distributed Query Processing  Transactions Logical Data.
Chapter 1 Database Access from Client Applications.
Sesame A generic architecture for storing and querying RDF and RDFs Written by Jeen Broekstra, Arjohn Kampman Summarized by Gihyun Gong.
Database Systems, 8 th Edition SQL Performance Tuning Evaluated from client perspective –Most current relational DBMSs perform automatic query optimization.
CHAPTER 25 - Distributed Databases and Client–Server Architectures
Table General Guidelines for Better System Performance
6/25/2018.
CIS 111 Competitive Success/snaptutorial.com
CIS 111 Education for Service-- snaptutorial.com.
CIS 111 Teaching Effectively-- snaptutorial.com
April 20th – RDBMS Internals
Systems Analysis and Design in a Changing World, 6th Edition
Tiers vs. Layers.
Table General Guidelines for Better System Performance
Query Optimization CS 157B Ch. 14 Mien Siao.
Introduction of Week 14 Return assignment 12-1
Course Instructor: Supriya Gupta Asstt. Prof
Presentation transcript:

Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building

Outline  Requirements  Benchmark  Discussion of Design & Implementation  Demo  Assignment  Q&A DDB2

Outline  Requirements  Benchmark  Discussion of Design & Implementation  Demo  Assignment  Q&A DDB3

Database Management  Compulsory Commands SELECT  Fragmentation Horizontal Fragmentation Vertical Fragmentation DDB4

Architecture  P2P Architecture DDB5

Query Processing  SELECT statement One table & multi-tables (JOIN) Types of operator in the predicate: >,<,=  Command Parsing  Query Processing General query tree Query tree optimization and reduction Network traffic optimization DDB6

User Interface  The user should be able to use the interface to interact with your Distributed Query Engine  Any type of interface Command Line Interface Application-based Interface Web-based Interface  Note: DO NOT focus on the interface design. The interface meets the requirements if: Let users input the commands Display the results and additional evaluation metrics DDB7

System Outputs  The size of query result set  The optimized query tree  The time cost of query  The communication cost of query DDB8

Documentation and Report  Mid-term presentation Design of the distributed database query engine Project work plan  Final report Architecture Query optimization method Implementation of communication protocols  System operation specification Instruction of installation, configuration, and operation of the query engine DDB9

System Evaluation  Demonstration Time 16th Week  System Test Environment Operating system: Windows Local DBMS: MySQL DDB10

Outline  Requirements  Benchmark  Discussion of Design & Implementation  Demo  Assignment  Q&A DDB11

Dataset  We simulate a scenario of using distributed database systems.  In general, the followings are provided: The schema of a database (global tables) The fragmentation schemes The allocation DDB12

Dataset DDB13

Fragmentation  Horizontal Fragmentation  Vertical Fragmentation DDB14

Allocation DDB15

Overview of Query Processing  Decomposition and Localization Rewriting: a query  an Algebra tree Reduction  Optimization Optimize the cost of data transfer  Execution Intermediate table storage and access The TOTAL response time after the user issues a query DDB16

Decomposition and Localization  Evaluation Points: The elimination of useless fragmentations and joins The global optimization of algebra tree  Example: DDB17

Decomposition and Localization DDB18

Optimization  Evaluation metric: The amounts (Bytes) of data transfer  You should provide the following information: The execution plan, where all operations as well as data transfers should be listed in sequence. The amounts of each data transfer and the sum of amounts of all transfers. Note that the amounts of data transfer is measured by data BYTES before compression (you can compress the transferred data if it is necessary) DDB19

Execution  Evaluation metric: total response time  Total response time is the sum of Time of input receiving Time of query processing (decomposition, localization and optimization) Time of result display DDB20

Outline  Requirements  Benchmark  Discussion of Design & Implementation  Demo  Assignment  Q&A DDB21

Communication Protocols  Access Level Client-Server Protocols Server-Server Protocols  How to Design Communication Protocols Sync vs. Async Design of commands and responses  How to implement Communication Protocols Strong vs. Economy Techniques DDB22

Database Management  Global vs. Local Global Management Local Management  GDD Global Information of DDB Storage Issues  Local DBMS Recommendation MySQL DDB23

Query Processing DDB24 Client A  Master site Optimize the query Formulate execution plan Broadcast the plan  All sites Execute commands from Master site Return results B CD commands  The Crucial Points Global Optimization Global Execution Formulation

Other Issues  SQL Statement Parser  Multi-Thread Mechanism  Query Tree Layout and Visualization DDB25

Outline  Requirements  Benchmark  Discussion of Design & Implementation  Demo  Assignment  Q&A DDB26

Demo For References Only  Authors: Shoubin Kong 孔守斌 Jun Wang 王 军 FangQiang Yu 余芳强 DDB27

Implementation Details  Programming Language: Java  Local DBMS: MySQL  Protocol: RMI DDB28

An Overview of the System DDB29 Client User System Client End Server End DDBMS Server End DDBMS Communication Protocols Communication Protocols

Deployment  Client:  Site server 1: :40001  Site server 2: :40002  Site server 3: :40003  Site server 4: : DDB30

Database Initialization  Use your self-defined commands to initialize the database: Define the 4 sites over 4 servers Create the database Create the tables Fragment the tables Allocation each fragmentation to sites DDB31

Commands  Define site  Create table  Fragment  Allocate  Import  Insert / Delete  Select DDB32

Summaries  Requirement Driven  Perfect vs. Good Enough  Comparative Advantage  A Central Management Scheme to a Distributed Project DDB33

Outline  Requirements  Benchmark  Discussion of Design & Implementation  Demo  Assignment  Q&A DDB34

Assignment : Fragmentation Q1: Select SNO from PARTA, SUPPLY Where PARTS.PNO = SUPPLY.PNO and PARTS.PRICE<6000 Q2: Select SNAME, PNO from SUPPLIER, SUPPLY Where SUPPLIER.SNO = SUPPLY.SNO and SUPPLIER.COUNTRY = “USA” Q3: Select SNO, SNAME, COUNT(*) FROM SUPPLIER, SUPPLY Where SUPPLIER.SNO = SUPPLY.SNO group by SUPPLIER.SNO DDB35

Assignment : Fragmentation  The Set of Complete and Minimal Simple Predicates {PRICE < 6000, PRICE ≥ 6000, COUNTRY = “USA”, COUNTRY ≠ “USA” } DDB36

Assignment : Fragmentation  PART – Horizontal Fragmentation PARTS1 = σ price<6000 PARTS PARTS2 = σ price≥6000 PARTS DDB37 PARTS1 PNOPNAMEPRICE P3VIDEO5000 P4HI-HI3000 PARTS2 PNOPNAMEPRICE P1PC10000 P2CAMERA8000

Assignment : Fragmentation  SUPPLIER – Horizontal Fragmentation SUPPLIER1 = σ country=“USA” SUPPLIER SUPPLIER2 = σ country≠ “USA” SUPPLIER DDB38 SUPPLIER1 SNOSNAMECOUNTRY S1SN1USA S6SN6USA SUPPLIER2 SNOSNAMECOUNTRY S2SN2INDIA S3SN3CHINA S4SN4CHINA S5SN5INDIA

Assignment : Fragmentation  SUPPLY – Derived Fragmentation SUPPLY 1 = (SUPPLY SUPPLIER1) PARTS1 SUPPLY 2 = (SUPPLY SUPPLIER1) PARTS2 SUPPLY 3 = (SUPPLY SUPPLIER2) PARTS1 SUPPLY 4 = (SUPPLY SUPPLIER2) PARTS DDB39 SUPPLY1 SNOPNOQTY S1P370 S6P496 SUPPLY2 SNOPNOQTY S1P160 S6P270 SUPPLY3 S3P355 S3P496 SUPPLY4 S2P260 S4P265

Assignment : Allocation  a) Solution DDB40

Assignment : Allocation DDB41  a) Solution2

Assignment : Allocation  b) Solution DDB42

CLUE43 Q & A Thank You!