Download presentation
Presentation is loading. Please wait.
1
R*: An Overview of the Architecture
R. Williams, et al IBM Almaden Research Center
2
Outline Environment and Data Definitions Object Naming
Distributed Catalogs Transaction Management and Commit Protoctols Query Preparation Query Execution SQL Additions and Changes
3
Environment and Data Definitions
CICS as the underlying communication model Data distribuion: Dispersed Replicated Partitioned Horizontal vertical Snapshot
4
Figure 1 from paper
5
Figure 21.4 from CS 432 text
6
Object Naming System Wide Names (SWN):
BIRTH_SITE
7
Distributed Catalogs Local site maintains objects in its database
Catalog entry may be cached Entries are versioned SWN Type Format Access path Object ref (view) Statistics
8
Transaction Management and Commit Protocol
Transaction number: SITE.SEQ_NUM (or SITE.TIME) Two phase commit (2PC)
9
Query Preparation Name resolution Authorization check
Distributed compilation Global plan generation/optimization Local access path selection Local optimization Local view materialization
10
Figure 2 from paper
11
Cost Model 3 weighted components: I/O CPU Message # of messages sent
# of bytes sent
12
Query Execution Synchronous vs asynchronous execution
Distributed concurrency control Deadlock detection and resolution Crash recovery
13
Figure 3 from paper
14
SQL Additions and Changes
DEFINE SYNONYM DISTRIBUTE TABLE HORIZONTALLY VERTICALLY REPLICATED DEFINE SNAPSHOT REFRESH SNAPSHOT MIGRATE TABLE
15
Lothar F. Mackert Guy M. Lohman IBM Almaden Research Center
R* Optimizer Validation and Performance Evaluation for Distributed Queries Lothar F. Mackert Guy M. Lohman IBM Almaden Research Center
16
Outline Distributed Compilation/Optimization Instrumentation
Experiments and Results
17
Distributed Compilation/Optimization
Issues: Join site Transfer methods: ship whole fetch matches Cost model
18
Weights Estimation CPU: inverse of MIPS
I/O: avg seek, latency, transfer time MSG: # of instruction per msg BYTE: effective transmission speed of network
19
Figure 2 from paper
20
Instrumentation Distributed EXPLAIN Distributed COLLECT COUNTERS
Force optimizier
21
Experiment I Transfer method Merge-scan join of 2 tables:
500 tuples in each table Project both table – 50% 100 different values for join attribute Join result: 2477 tuples
22
Figure 4 from paper
23
Figure 3 from paper
24
Experiment II Distributed vs local join Join of 2 tables:
1000 tuples in each table Project both table – 50% 3000 different values for join attribute
25
Figure 5 from paper
26
Figure 6 from paper
27
Experiment III Relative importance of cost components
28
Figure 7, 8, 9, 10 from paper
29
Experiment IV Optimizer evaluation
Accurate estimates of # of msgs and bytes sent (<2% difference) Better estimates when tables are more distributed
30
Experiment V Alternative distributed join methods: 2 tables:
Dynamically created indexes Semijoins Bloomjoins 2 tables: 1000 tuples for outer Varies inner from 100 to 6000 tuples
31
Figure 11, 12 from paper
32
Other Experiments Clustered index: 50% Projection: Wider join column:
Bloomjoins < Semijoins < R* 50% Projection: Site 1: Bloomjoins < Semijoins < R* Site 2: Bloomjoins < R* << Semijoins Wider join column: Bloomjoins < R* << Semijoins
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.