R*: An Overview of the Architecture

Slides:



Advertisements
Similar presentations
V. Megalooikonomou Distributed Databases (based on notes by Silberchatz,Korth, and Sudarshan and notes by C. Faloutsos at CMU) Temple University – CIS.
Advertisements

Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
ISOM Distributed Databases Arijit Sengupta. ISOM Learning Objectives Understand the concept and necessity of distributed databases Understand the types.
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Distributed databases
Transaction.
Chapter 13 (Web): Distributed Databases
1 HYRISE – A Main Memory Hybrid Storage Engine By: Martin Grund, Jens Krüger, Hasso Plattner, Alexander Zeier, Philippe Cudre-Mauroux, Samuel Madden, VLDB.
Distributed Database Systems
Session – 10 QUERY OPTIMIZATION Matakuliah: M0184 / Pengolahan Data Distribusi Tahun: 2005 Versi:
Distributed Database Management Systems
Overview Distributed vs. decentralized Why distributed databases
Chapter 12 Distributed Database Management Systems
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 17 Client-Server Processing, Parallel Database Processing,
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Access Path Selection in a Relation Database Management System (summarized in section 2)
Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.
Client-Server Processing and Distributed Databases
Database Architecture Optimized for the New Bottleneck: Memory Access Peter Boncz Data Distilleries B.V. Amsterdam The Netherlands Stefan.
10 1 Chapter 10 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
KORHAN KOCABIYIK1 R* Optimizer Validation and Performance Evaluation for Distributed Queries Lothar F. Mackert, Guy M. Lohman IBM Almaden Research Center.
Distributed Databases DBMS Textbook, Chapter 22, Part II.
Instructor: Marina Gavrilova. Outline Introduction Types of distributed databases Distributed DBMS Architectures and Storage Replication Synchronous replication.
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Distributed DB.
Databases Illuminated
SQL Server 2005 Implementation and Maintenance Chapter 12: Achieving High Availability Through Replication.
1 Distributed Databases Chapter 21, Part B. 2 Introduction v Data is stored at several sites, each managed by a DBMS that can run independently. v Distributed.
MBA 664 Database Management Systems Dave Salisbury ( )
Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Distributed Databases (based on slides by Silberchatz,Korth, and.
Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,
Distributed Database: Part 2. Distributed DBMS Distributed database requires distributed DBMS Distributed database requires distributed DBMS Functions.
R*: An overview of the Architecture By R. Williams et al. Presented by D. Kontos Instructor : Dr. Megalooikonomou.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Chapter 17: Additional Slides February 6, Outline Physical Data Management  Fragments  Distributed Query Processing  Transactions Logical Data.
Distributed DBMS, Query Processing and Optimization
Chapter 1 Database Access from Client Applications.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Topics in Distributed Databases Database System Implementation CSE 507 Some slides adapted from Navathe et. Al and Silberchatz et. Al.
CS4432: Database Systems II Query Processing- Part 1 1.
Distributed Databases
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
CS742 – Distributed & Parallel DBMSPage 3. 1M. Tamer Özsu Outline Introduction & architectural issues Data distribution  Distributed query processing.
Distributed Databases and Client-Server Architectures
CS 540 Database Management Systems
Distributed Database Concepts
Chapter 12 Distributed Database Management Systems
Distributed Query Processing using different Semijoin operations.
Database System Implementation CSE 507
Database Performance Tuning and Query Optimization
Introduction to Query Optimization
Access Path Selection in a Relational Database Management System
Evaluation of Relational Operations: Other Operations
Database management concepts
CS 440 Database Management Systems
Outline Introduction Background Distributed DBMS Architecture
Selected Topics: External Sorting, Join Algorithms, …
Distributed Databases and DBMSs: Concepts and Design
Distributed Databases
Chapter 11 Database Performance Tuning and Query Optimization
Evaluation of Relational Operations: Other Techniques
Distributed Databases
Introduction of Week 14 Return assignment 12-1
Distributed Database Management Systems
The Gamma Database Machine Project
Distributed Databases
Presentation transcript:

R*: An Overview of the Architecture R. Williams, et al IBM Almaden Research Center

Outline Environment and Data Definitions Object Naming Distributed Catalogs Transaction Management and Commit Protoctols Query Preparation Query Execution SQL Additions and Changes

Environment and Data Definitions CICS as the underlying communication model Data distribuion: Dispersed Replicated Partitioned Horizontal vertical Snapshot

Figure 1 from paper

Figure 21.4 from CS 432 text

Object Naming System Wide Names (SWN): USER @ USER_SITE.OBJECT_NAME @ BIRTH_SITE

Distributed Catalogs Local site maintains objects in its database Catalog entry may be cached Entries are versioned SWN Type Format Access path Object ref (view) Statistics

Transaction Management and Commit Protocol Transaction number: SITE.SEQ_NUM (or SITE.TIME) Two phase commit (2PC)

Query Preparation Name resolution Authorization check Distributed compilation Global plan generation/optimization Local access path selection Local optimization Local view materialization

Figure 2 from paper

Cost Model 3 weighted components: I/O CPU Message # of messages sent # of bytes sent

Query Execution Synchronous vs asynchronous execution Distributed concurrency control Deadlock detection and resolution Crash recovery

Figure 3 from paper

SQL Additions and Changes DEFINE SYNONYM DISTRIBUTE TABLE HORIZONTALLY VERTICALLY REPLICATED DEFINE SNAPSHOT REFRESH SNAPSHOT MIGRATE TABLE

Lothar F. Mackert Guy M. Lohman IBM Almaden Research Center R* Optimizer Validation and Performance Evaluation for Distributed Queries Lothar F. Mackert Guy M. Lohman IBM Almaden Research Center

Outline Distributed Compilation/Optimization Instrumentation Experiments and Results

Distributed Compilation/Optimization Issues: Join site Transfer methods: ship whole fetch matches Cost model

Weights Estimation CPU: inverse of MIPS I/O: avg seek, latency, transfer time MSG: # of instruction per msg BYTE: effective transmission speed of network

Figure 2 from paper

Instrumentation Distributed EXPLAIN Distributed COLLECT COUNTERS Force optimizier

Experiment I Transfer method Merge-scan join of 2 tables: 500 tuples in each table Project both table – 50% 100 different values for join attribute Join result: 2477 tuples

Figure 4 from paper

Figure 3 from paper

Experiment II Distributed vs local join Join of 2 tables: 1000 tuples in each table Project both table – 50% 3000 different values for join attribute

Figure 5 from paper

Figure 6 from paper

Experiment III Relative importance of cost components

Figure 7, 8, 9, 10 from paper

Experiment IV Optimizer evaluation Accurate estimates of # of msgs and bytes sent (<2% difference) Better estimates when tables are more distributed

Experiment V Alternative distributed join methods: 2 tables: Dynamically created indexes Semijoins Bloomjoins 2 tables: 1000 tuples for outer Varies inner from 100 to 6000 tuples

Figure 11, 12 from paper

Other Experiments Clustered index: 50% Projection: Wider join column: Bloomjoins < Semijoins < R* 50% Projection: Site 1: Bloomjoins < Semijoins < R* Site 2: Bloomjoins < R* << Semijoins Wider join column: Bloomjoins < R* << Semijoins