R*: An Overview of the Architecture

R*: An Overview of the Architecture
R. Williams, et al IBM Almaden Research Center

Outline Environment and Data Definitions Object Naming
Distributed Catalogs Transaction Management and Commit Protoctols Query Preparation Query Execution SQL Additions and Changes

Environment and Data Definitions
CICS as the underlying communication model Data distribuion: Dispersed Replicated Partitioned Horizontal vertical Snapshot

Figure 1 from paper

Figure 21.4 from CS 432 text

Object Naming System Wide Names (SWN):
BIRTH_SITE

Distributed Catalogs Local site maintains objects in its database
Catalog entry may be cached Entries are versioned SWN Type Format Access path Object ref (view) Statistics

Transaction Management and Commit Protocol
Transaction number: SITE.SEQ_NUM (or SITE.TIME) Two phase commit (2PC)

Query Preparation Name resolution Authorization check
Distributed compilation Global plan generation/optimization Local access path selection Local optimization Local view materialization

Figure 2 from paper

Cost Model 3 weighted components: I/O CPU Message # of messages sent
# of bytes sent

Query Execution Synchronous vs asynchronous execution
Distributed concurrency control Deadlock detection and resolution Crash recovery

Figure 3 from paper

SQL Additions and Changes
DEFINE SYNONYM DISTRIBUTE TABLE HORIZONTALLY VERTICALLY REPLICATED DEFINE SNAPSHOT REFRESH SNAPSHOT MIGRATE TABLE

Lothar F. Mackert Guy M. Lohman IBM Almaden Research Center
R* Optimizer Validation and Performance Evaluation for Distributed Queries Lothar F. Mackert Guy M. Lohman IBM Almaden Research Center

Outline Distributed Compilation/Optimization Instrumentation
Experiments and Results

Distributed Compilation/Optimization
Issues: Join site Transfer methods: ship whole fetch matches Cost model

Weights Estimation CPU: inverse of MIPS
I/O: avg seek, latency, transfer time MSG: # of instruction per msg BYTE: effective transmission speed of network

Figure 2 from paper

Instrumentation Distributed EXPLAIN Distributed COLLECT COUNTERS
Force optimizier

Experiment I Transfer method Merge-scan join of 2 tables:
500 tuples in each table Project both table – 50% 100 different values for join attribute Join result: 2477 tuples

Figure 4 from paper

Figure 3 from paper

Experiment II Distributed vs local join Join of 2 tables:
1000 tuples in each table Project both table – 50% 3000 different values for join attribute

Figure 5 from paper

Figure 6 from paper

Experiment III Relative importance of cost components

Figure 7, 8, 9, 10 from paper

Experiment IV Optimizer evaluation
Accurate estimates of # of msgs and bytes sent (<2% difference) Better estimates when tables are more distributed

Experiment V Alternative distributed join methods: 2 tables:
Dynamically created indexes Semijoins Bloomjoins 2 tables: 1000 tuples for outer Varies inner from 100 to 6000 tuples

Figure 11, 12 from paper

Other Experiments Clustered index: 50% Projection: Wider join column:
Bloomjoins < Semijoins < R* 50% Projection: Site 1: Bloomjoins < Semijoins < R* Site 2: Bloomjoins < R* << Semijoins Wider join column: Bloomjoins < R* << Semijoins

R*: An Overview of the Architecture

Similar presentations

Presentation on theme: "R*: An Overview of the Architecture"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

R*: An Overview of the Architecture

Similar presentations

Presentation on theme: "R*: An Overview of the Architecture"— Presentation transcript:

Similar presentations

About project

Feedback