The State of the Art in Distributed Query Processing by Donald Kossmann Presented by Chris Gianfrancesco.

Slides:



Advertisements
Similar presentations
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Advertisements

Adding scalability to legacy PHP web applications Overview Mario A. Valdez-Ramirez.
Chapter 13 (Web): Distributed Databases
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong Ng, Beng Chin Ooi, Kian-Lee Tan, Aoying Zhou Shawn Jeffery CS294-4 Peer-to-Peer Systems.
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
ABCSG - Distributed Database 1 Data Management Distributed Database Data Replication.
Overview Distributed vs. decentralized Why distributed databases
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 11 Database Performance Tuning and Query Optimization.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
What Can Databases Do for Peer-to-Peer Steven Gribble, Alon Halevy, Zachary Ives, Maya Rodrig, Dan Suciu Presented by: Ryan Huebsch CS294-4 P2P Systems.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Distributed Systems: Client/Server Computing
Introduction to client/server architecture
DATABASE MANAGEMENT SYSTEMS 2 ANGELITO I. CUNANAN JR.
Distributed Databases
N-Tier Architecture.
Ekrem Kocaguneli 11/29/2010. Introduction CLISSPE and its background Application to be Modeled Steps of the Model Assessment of Performance Interpretation.
INTRODUCTION TO WEB DATABASE PROGRAMMING
Client/Server Databases and the Oracle 10g Relational Database
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida Distributed Databases Business needs.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Database Design – Lecture 16
An Integration Framework for Sensor Networks and Data Stream Management Systems.
M1G Introduction to Database Development 6. Building Applications.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
March 19981© Dennis Adams Associates Tuning Oracle: Key Considerations Dennis Adams 25 March 1998.
Lecture Set 14 B new Introduction to Databases - Database Processing: The Connected Model (Using DataReaders)
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
Csi315csi315 Client/Server Models. Client/Server Environment LAN or WAN Server Data Berson, Fig 1.4, p.8 clients network.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Introduction to the Adapter Server Rob Mace June, 2008.
Intro – Part 2 Introduction to Database Management: Ch 1 & 2.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
Lecture Set 14 B new Introduction to Databases - Database Processing: The Connected Model (Using DataReaders)
DDBMS Distributed Database Management Systems Fragmentation
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Distributed Databases DBMS Textbook, Chapter 22, Part II.
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
Distributed Databases
Databases Illuminated
1 Database Management Systems (DBMS). 2 Database Management Systems (DBMS) n Overview of: ä Database Management Components ä Database Systems Architecture.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
CS338Parallel and Distributed Databases11-1 Parallel and Distributed Databases Lecture Topics Multi-CPU and distributed systems Monolithic system Client–server.
Distributed database system
1 Distributed Databases Chapter 21, Part B. 2 Introduction v Data is stored at several sites, each managed by a DBMS that can run independently. v Distributed.
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
Application Development
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
IT System Administration Lesson 3 Dr Jeffrey A Robinson.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
20 Copyright © 2008, Oracle. All rights reserved. Cache Management.
Chapter 1 Database Access from Client Applications.
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Database Management System Architecture 2004, Spring Pusan National University.
Chapter 13: Query Processing
Retele de senzori Curs 1 - 1st edition UNIVERSITATEA „ TRANSILVANIA ” DIN BRAŞOV FACULTATEA DE INGINERIE ELECTRICĂ ŞI ŞTIINŢA CALCULATOARELOR.
DBMS & TPS Barbara Russell MBA 624.
Parallel and Distributed Databases
Database Performance Tuning and Query Optimization
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Database management concepts
Chapter 11 Database Performance Tuning and Query Optimization
Database System Architectures
Course Instructor: Supriya Gupta Asstt. Prof
Distributed Databases
Presentation transcript:

The State of the Art in Distributed Query Processing by Donald Kossmann Presented by Chris Gianfrancesco

Introduction Distributed database technology is becoming an increasingly attractive enhancement to many database systems  Cost and scalability  Software integration Legacy systems  New applications  Market forces

Introduction Topics covered in this paper  Basics of distributed query processing  Client-server distributed DB models  Heterogeneous distributed DB models  Data placement techniques  Other distributed architectures

Client-Server Database Systems Relationships between distributed nodes take a client-server form Client: makes requests of the servers, usually the source of queries Server: responds to client requests, usually the source of data System architectures: peer-to-peer, strict client-server, middleware/multitier

Architectures: Peer-to-Peer All nodes are equivalent Each can be either a client or server on demand (can store data and/or make requests) Ex: SHORE system Peer Node Server or Client Peer Node Server or Client Peer Node Server or Client

Architectures: Strict Client-Server Client or server status is pre-defined and can never change Clients supply queries, servers supply data Most common architecture in commercial DBMS’s Client Query source Server Data source

Architectures: Middleware/Multitier Multiple levels of client-server interaction Nodes act as clients to those below them and servers to those above SAP R/3, web servers with DB backends Node 1 Client to Node 2 Node 2 Server to Node 1, Client to Node 3 Node 3 Server to Node 2

Architectures: Evaluation Peer-to-Peer  Simplest setup  Equal load sharing Strict Client-Server  Specialization  Administration for servers only Middleware/Multitier  Functionality integration  Scalability

Client-Server Query Processing Queries initiated at clients, data stored at servers Where do we execute the query? Query shipping: move the query down to the data Data shipping: move the data up to the query Hybrid shipping: combination of both

Query Shipping SQL query code is sent down to the server Server parses and evaluates query, returns result Used in DB2, Oracle, MS SQL Server

Data Shipping Client parses query and requests data from server Server provides data, then client executes query Data can be cached at client (main memory or disk)

Hybrid Shipping Mix-and-match data shipping and query shipping Query parts can be executed at any level according to query plan Data is cached when beneficial

Evaluation Query Shipping  Reliant on server performance  Scales poorly with increasing client load Data Shipping  Good scalability  High communication costs Hybrid  Potential to outperform other options  More complex optimizations

Hybrid Shipping Observations Some observations of optimal performance using hybrid shipping Preference to not use a client cache  If network transfer cost < client access cost Shipping down cached data  If in main memory & execution at server Multiple small updates  Maintain at client and post to server only when necessary

Query Optimization Query plans must also specify where the query pieces are executed Data shipping: all execution done at client Query shipping: all execution done at server Hybrid: choice can be made for each operator Results display to user is always at client

Distributed Query Plans Each operator is annotated with a logical site of execution – plans are shareable client means an operator is executed from the client where the query is issued server means:  for scan operators, execute at a location that has the necessary data  for updates, execute at all locations with the relevant data

Query Optimization: Where? Should optimization occur at the client or the server? At client: less load on servers, better scalability At server: more information about system statistics, especially server loads Potential solution: primary parsing and query rewriting at client, further optimization at server

Query Optimization: Statistics Even when optimization is done at a server, that server does not usually have full knowledge of the system System can either:  Guess the status of other servers – less accuracy, less cost  Ask other servers their status – fully accurate, additional communication costs

Query Optimization: When? Tradeoff of accuracy vs. cost Traditional-style: optimize once, store plan  No support for changing DB conditions  No incurred cost for query execution Plan sets: optimize for possible scenarios  Generate a few query plans for diff. conditions  Choose plans based on runtime statistics On-the-fly: observe intermediate results  Re-optimize query if different from expectations

Query Optimization: Two-Step Compile-time: generate join order, etc. Runtime: perform site selection Reasonable cost at each end Responds well to changing server loads Fully utilizes client data caching

Two-Step Optimization: Downside 1.Optimal plan is generated traditional-style 2.Site selection is performed 3.True optimal plan was missed Optimal was missed because first optimization step was done with no knowledge of the system

Query Execution Techniques Standard fare: row blocking, multithread when possible Issues: transactions with both updates and retrieval queries using hybrid shipping  We want to wait to propagate updates for efficiency’s sake  Other option: perform query before update and temporarily pad results

Questions? Comments?