Gamma DBMS Part 1: Physical Database Design Shahram Ghandeharizadeh Computer Science Department University of Southern California.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Gamma DBMS (Part 2): Failure Management Query Processing Shahram Ghandeharizadeh Computer Science Department University of Southern California.
Parallel Databases By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
Multiple Processor Systems
Transaction.
Parallel Database Systems
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Chapter 13 (Web): Distributed Databases
1 HYRISE – A Main Memory Hybrid Storage Engine By: Martin Grund, Jens Krüger, Hasso Plattner, Alexander Zeier, Philippe Cudre-Mauroux, Samuel Madden, VLDB.
NETWORK LOAD BALANCING NLB.  Network Load Balancing (NLB) is a Clustering Technology.  Windows Based. (windows server).  To scale performance, Network.
IS 4420 Database Fundamentals Chapter 6: Physical Database Design and Performance Leon Chen.
Distributed Database Management Systems
Lecture 3: A Case for RAID (Part 1) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.
Midterm 2: April 28th Material:   Query processing and Optimization, Chapters 12 and 13 (ignore , 12.7, and 13.5)   Transactions, Chapter.
Chapter 3 Parallel Search 3.1Search Queries 3.2Data Partitioning 3.3Search Algorithms 3.4Summary 3.5Bibliographical Notes 3.6Exercises.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 11 Database Performance Tuning and Query Optimization.
6/28/2015EECS 584, Fall The Gamma Database Machine DeWitt, Ghandeharizadeh, Schneider, Bricker, Hsiao, Rasmussen Deepak Bastakoty (With slide material.
Chapter 12 Distributed Database Management Systems
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
DISTRIBUTED COMPUTING
Distributed Databases
PETAL: DISTRIBUTED VIRTUAL DISKS E. K. Lee C. A. Thekkath DEC SRC.
Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.
Indexing structures for files D ƯƠ NG ANH KHOA-QLU13082.
Shilpa Seth.  Centralized System Centralized System  Client Server System Client Server System  Parallel System Parallel System.
PMIT-6102 Advanced Database Systems
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
Database Design – Lecture 16
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
Physical Database Design I, Ch. Eick 1 Physical Database Design I About 25% of Chapter 20 Simple queries:= no joins, no complex aggregate functions Focus.
Multiprossesors Systems.. What are Distributed Databases ? “ A Logically interrelated collection of shared data ( and a description of this data) physically.
Lecture # 3 & 4 Chapter # 2 Database System Concepts and Architecture Muhammad Emran Database Systems 1.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Database Management COP4540, SCS, FIU Physical Database Design (ch. 16 & ch. 3)
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
ASMA AHMAD 28 TH APRIL, 2011 Database Systems Distributed Databases I.
Databases Illuminated
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems with Multi-programming Chapter 4.
Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
Physical Database Design I, Ch. Eick 1 Physical Database Design I Chapter 16 Simple queries:= no joins, no complex aggregate functions Focus of this Lecture:
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
Lecture 14- Parallel Databases Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
©Silberschatz, Korth and Sudarshan20.1Database System Concepts 3 rd Edition Chapter 20: Parallel Databases Introduction I/O Parallelism Interquery Parallelism.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
Practical Database Design and Tuning
Parallel Databases.
Chapter 19: Distributed Databases
Database Performance Tuning and Query Optimization
Department of Computer Science University of California, Santa Barbara
Chapter 17: Database System Architectures
Physical Database Design
Chapter 11 Database Performance Tuning and Query Optimization
Database System Architectures
Department of Computer Science University of California, Santa Barbara
The Gamma Database Machine Project
Presentation transcript:

Gamma DBMS Part 1: Physical Database Design Shahram Ghandeharizadeh Computer Science Department University of Southern California

Outline Alternative architectures: Alternative architectures:  Shared-disk versus Shared-Nothing Declustering techniques. Declustering techniques.

Shared-Disk Architecture Emerged in 1980s: Emerged in 1980s:  Many clients share storage and data: data remains available when a client fails. Network Data

Shared-Disk Architecture Advantages: Advantages:  Many clients share storage and data.  Redundancy is implemented in one place protecting all clients from disk failure. Network

Shared-Disk Architecture Advantages: Advantages:  Many clients share storage and data.  Redundancy is implemented in one place protecting all clients from disk failure.  Centralized backup: The administrator does not care/know how many clients are on the network sharing storage. Network

Shared-Disk Architecture Advantages: Advantages:  Many clients share storage and data.  Redundancy is implemented in one place protecting all clients from disk failure.  Centralized backup: The administrator does not care/know how many clients are on the network sharing storage. Network High Availability Data Backup Data Sharing

Network failures What about network failures? What about network failures?  Two host bus adapters per server,  Each server connected to a different switch.

Shared-Disk Architecture Storage Area Network (SAN): Storage Area Network (SAN):  Block level access,  Write to storage is immediate,  Specialized hardware including switches, host bus adapters, disk chassis, battery backed caches, etc.  Expensive  Supports transaction processing systems. Network Attached Storage (NAS): Network Attached Storage (NAS):  File level access,  Write to storage might be delayed,  Generic hardware,  In-expensive,  Not appropriate for transaction processing systems.

Concepts and Terminology Virtualization: Virtualization:  Available storage is represented as one HUGE disk drive, e.g., a SAN with a thousand 1.5 TB disk provides 1 Petabyte of storage,  Available storage is partitioned into Logical Unit Numbers (LUNs),  A LUN is presented to one or more servers,  A LUN appears as a disk drive to a server.  SAN places blocks across physical disks intelligently to balance load. What to do when a PC fails? What to do when a PC fails?

Shared-Nothing Each node (blade) consisted of one processor, memory, and a disk drive. Each node (blade) consisted of one processor, memory, and a disk drive. Network CPU 1 CPU N ….

Shared-Nothing Each node (blade) may consist of one or several processors, memory, and one or several disk drives. Each node (blade) may consist of one or several processors, memory, and one or several disk drives. Network …. CPU 1 CPU 2 CPU n DRAM 1 DRAM 2 DRAM D … … CPU 1 CPU 2 CPU n DRAM 1 DRAM 2 DRAM D … … Node 1 Node M

Shared-Nothing Network CPU 1 CPU nM …. Partition resources to construct logical nodes. With an 8 CPU PC, construct eight logical nodes each with a CPU, fraction of memory, and one disk drive. Partition resources to construct logical nodes. With an 8 CPU PC, construct eight logical nodes each with a CPU, fraction of memory, and one disk drive.

Data Declustering Data is partitioned across the nodes (why?): Data is partitioned across the nodes (why?):  Random/round-robin,  Hash partitioning,  Range partitioning. Each piece of a table is termed a fragment. Each piece of a table is termed a fragment. Single attribute declustering strategies Single attribute declustering strategies Two multi-attribute declustering strategies: Two multi-attribute declustering strategies: 1. Multi-Attribute GrId deClustering (MAGIC) 2. Bubba’s Extended Range Declustering (BERD)

Horizontal Declustering Physical View Bob2010K Shideh1835K Ted5060K Kevin62120K Angela55140K Mike4590K Logical View nameage salary Emp

Horizontal Declustering No partitioning attribute: Random and Round-robin. No partitioning attribute: Random and Round-robin. Single attribute declustering strategies: Single attribute declustering strategies:  Hash,  Range. Note: the database administrator must choose one attribute as the partitioning attribute.

Hash Declustering Bob2010K Shideh1835K Ted5060K Kevin62120K Angela55140K Mike4590K Physical View nameage salary salary % 3 Ted5060KKevin62120K nameage salaryBob2010KMike4590K nameage salaryShideh1835KAngela55140K nameage salary Emp salary is the partitioning attribute.

Hash Declustering Selections with equality predicates referencing the partitioning attribute are directed to a single node: Selections with equality predicates referencing the partitioning attribute are directed to a single node:  Retrieve Emp where salary = 60K Equality predicates referencing a non- partitioning attribute and range predicates are directed to all nodes: Equality predicates referencing a non- partitioning attribute and range predicates are directed to all nodes:  Retrieve Emp where age = 20  Retrieve Emp where salary < 20K SELECT * FROM Emp WHERE salary=60K SELECT * FROM Emp WHERE salary<20K

Range Declustering Bob2010K Shideh1835K Ted5060K Kevin62120K Angela55140K Mike4590K Physical View nameage salaryBob2010KShideh1835K nameage salaryTed5060KMike4590K nameage salaryKevin62120KAngela55140K nameage salary 0-50K51K-100K 101K- ∞ Emp salary is the partitioning attribute.

Range Declustering Equality and range predicates referencing the partitioning attribute are directed to a subset of nodes: Equality and range predicates referencing the partitioning attribute are directed to a subset of nodes:  Retrieve Emp where salary = 60K  Retrieve Emp where salary < 20K Predicates referencing a non-partitioning attribute are directed to all nodes. Predicates referencing a non-partitioning attribute are directed to all nodes. In our example, both queries are directed to one node.

An iPSC/2 Intel Hypercube Year is 1988! Year is 1988! 32 Processor Hypercube 32 Processor Hypercube Each node consists of: Each node consists of:  processor (12 MHz)  2 MB DRAM  333 MB disk  A hypercube inter- connect supporting parallel transmission of messages among nodes.

Software Architecture Each node stores its fragment on its local disk drive. Each node stores its fragment on its local disk drive. Each node may build a B+-tree (clustered/non-clustered) and hash index on its fragment of a relation. Each node may build a B+-tree (clustered/non-clustered) and hash index on its fragment of a relation. Each node has its own concurrency control and crash recovery mechanism. Each node has its own concurrency control and crash recovery mechanism.

Software Architecture

Processes executing on one node shared memory – identical to today’s threads! Processes executing on one node shared memory – identical to today’s threads! At initialization time, a node would start a fixed number of threads (processes). At initialization time, a node would start a fixed number of threads (processes). All threads listen on a well defined socket, waiting for the Scheduler to dispatch work to them. All threads listen on a well defined socket, waiting for the Scheduler to dispatch work to them. A message contains the identity that the operator should assume: A message contains the identity that the operator should assume:  A “switch” statement would enable a thread to become a select, project, hash-join build, hash-join probe, etc…  The message specifies the role of the thread.

A Comparison of Range & Hash Closed simulation model: Closed simulation model:  A client generates a range selection predicate: X < age < Y.  The age attribute value is unique with values ranging from 0 to 999,999 (1 million rows).  A client does not generate a new request until its pending request is processed by Gamma and returned.  The system is multi-programmed by increasing the number of clients in the system.  A multi-programming level of 8 means there are 8 clients generating requests to the system (independent of one another). … 32 Node Gamma

A Comparison of Range & Hash Closed simulation model: Closed simulation model:  A client generates a range selection predicate: X < age < Y.  The age attribute value is unique with values ranging from 0 to 999,999 (1 million rows).  A client does not generate a new request until its pending request is processed by Gamma and returned.  A 0.01% selection predicate retrieves 100 rows.  With a clustered B+-tree index, the 100 rows are grouped together in a few disk pages. … 32 Node Gamma

A Comparison of Range & Hash Closed simulation model: Closed simulation model:  A client generates a range selection predicate: X < age < Y.  The age attribute value is unique with values ranging from 0 to 999,999 (1 million rows).  A client does not generate a new request until its pending request is processed by Gamma and returned.  A 0.01% selection predicate retrieves 100 rows.  With a clustered B+-tree index, the 100 rows are grouped together in a few disk pages.  With range partitioning, the predicate is processed by one node.  With hash partitioning, the predicate is processed by all 32 nodes with the scheduler coordinating the execution of each predicate on a node, and gathering of the results from every node. … 32 Node Gamma 0-31,249 31,250 – 62, – 1,000,000

Declustering Techniques: Tradeoffs Range selection predicate using a clustered B + -tree, 0.01% selectivity (10 records) Range selection predicate using a clustered B + -tree, 0.01% selectivity (10 records) Range Hash/Random/Round-robin Multiprogramming Level Throughput (Queries/Second)

A Comparison of Range & Hash Closed simulation model: Closed simulation model:  A client generates a range selection predicate: X < age < Y.  The age attribute value is unique with values ranging from 0 to 999,999 (1 million rows).  A client does not generate a new request until its pending request is processed by Gamma and returned.  A 1% selection predicate retrieves 10,000 rows.  With a clustered B+-tree index, the 10,000 rows are grouped together. … 32 Node Gamma 0-31,249 31,250 – 62, – 1,000,000

A Comparison of Range & Hash Closed simulation model: Closed simulation model:  A client generates a range selection predicate: X < age < Y.  The age attribute value is unique with values ranging from 0 to 999,999 (1 million rows).  A client does not generate a new request until its pending request is processed by Gamma and returned.  A 1% selection predicate retrieves 10,000 rows.  With a clustered B+-tree index, the 10,000 rows are grouped together.  With Range partitioning, the predicate is processed using one or two nodes.  With Hash partitioning, the predicate is processed by all the nodes with the scheduler coordinating the execution of the predicate. … 0-31,249 31,250 – 62, – 1,000,000

Tradeoffs (Cont…) Range selection predicate using a clustered B + -tree, 1% selectivity (1000 records) Range selection predicate using a clustered B + -tree, 1% selectivity (1000 records) Range Hash/Random/Round-robin Multiprogramming Level Throughput (Queries/Second)

Why Range Performs Poorly? Note: Range performed poorly because the query (1% selection) imposed a high workload onto a node! Note: Range performed poorly because the query (1% selection) imposed a high workload onto a node!  For a query with minimal (0.01% selection) workload requirement, Range is ideal! Two reasons: Two reasons:  Random generation of selection predicates does NOT mean uniform distribution of workload across nodes.  The number of ranges is the same as the number of nodes causing the tail-end servers to observe a lower load.

3 R1R2R3 R1R3R2 R1R3 R2R3R1 R3R1R2 R3R2R1 {R1, R2, R3} {R1, R3}R2 {R1, R3}R2 {R1, R3}R2 {R1, R3}R2 {R1, R3}R2 6 Ideal cases {R1, R3}R2 {R2, R3}R1 {R2, R3}R1 {R2, R3}R1 {R2, R3}R1 {R2, R3}R1 {R2, R3}R1 {R2, R1}R3 {R2, R1}R3 {R2, R1}R3 {R2, R1}R3 {R2, R1}R3 {R2, R1}R ways to assign 3 requests to the 3 nodes! Only 6 result in a uniform distribution of requests.

Tradeoffs (Cont…) Simple range partitioning may lead to load imbalance for queries with high selectivity: Simple range partitioning may lead to load imbalance for queries with high selectivity:  Low performance: increased response time and low system throughput. Consider a table that maintains the grade of students for different exams, range partitioned on the grade. Consider a table that maintains the grade of students for different exams, range partitioned on the grade

Tradeoffs (Cont…) Assume a range predicate overlaps 3 partitions, e.g., Assume a range predicate overlaps 3 partitions, e.g.,  0 < grade < 45  45 < grade <

Tradeoffs (Cont…) Higher response time because 2 nodes sit idle while 3 nodes process the query (assuming overhead of parallelism is negligible). Higher response time because 2 nodes sit idle while 3 nodes process the query (assuming overhead of parallelism is negligible) < grade < 90

Tradeoffs (Cont…) Lower throughput because node 3 becomes a bottleneck. Lower throughput because node 3 becomes a bottleneck.  Assuming even distribution of access to ranges, when node 3 is utilized 100%, nodes 2 and 4 have a 66% utilization, while nodes 1 and 5 are utilized 33%

Hybrid Range Partitioning [VLDB’90] To minimize the impact of load imbalance, construct more ranges than nodes, e.g., 10 ranges for a 5 node system. To minimize the impact of load imbalance, construct more ranges than nodes, e.g., 10 ranges for a 5 node system.  Predicates such as “0 < grade < 45” are now directed to all nodes.  Assuming even distribution of access to ranges where workload consists of predicates utilizing 3 sequential ranges, when node 3 become 100% utilized, nodes 2 and 4 are now utilized 83%, while nodes 1 and 5 are utilized 66%

Multi-Attribute Declustering [SIGMOD’92] Queries with minimal resource requirements should be directed to a few processors. Why? Queries with minimal resource requirements should be directed to a few processors. Why?  Overhead of parallelism 1. Impacts query response time adversely, 2. Wastes system resources, reducing throughput.  OLTP has come a long way:  Heaviest transaction in TPC-C reads approximately 400 records.  Assuming no disk accesses, a low-end PC processes this transaction < 1 ms.  Transactions should be single sited! Range Round-robin

Multi-Attribute Declustering (E.g.) Recall the Emp(name, age, salary) table. Recall the Emp(name, age, salary) table. Workload consists of two queries, each with a 50% frequency of occurrence: Workload consists of two queries, each with a 50% frequency of occurrence:  Query A, range query referencing the age attribute. On average, retrieves 5 tuples.  Retrieve Emp where age > 21 and age 21 and age < 22.  Query B, range query referencing the salary attribute. On average, retrieves 10 tuples.  Retrieve Emp where salary > 50K and salary 50K and salary < 50.5K  Access methods:  A non-clustered B + -tree index on age  A clustered B + -tree index on salary Ideally, both queries should be directed to one node. Ideally, both queries should be directed to one node.

Multi-Attribute Declustering (E.g. Cont...) Range decluster Emp using age as the partitioning attribute. Range decluster Emp using age as the partitioning attribute. Assuming a system configured with nine nodes, the number of employed nodes is: Assuming a system configured with nine nodes, the number of employed nodes is: RangeIdeal A 50% * 1 B 50% * 9 50% * 1 Average51

MAGIC Construct a multi-attribute grid directory on the Emp table Construct a multi-attribute grid directory on the Emp table Each dimension corresponds to a partitioning attribute. Each dimension corresponds to a partitioning attribute. Each cell represents a fragment of the relation. Each cell represents a fragment of the relation Salary Age

MAGIC (Low Correlation) Low correlation between salary and age attribute values: Low correlation between salary and age attribute values: MAGICRangeIdealA 50% * 3 50% * 1 B 50% * 3 50% * 9 50% * 1 Avg

MAGIC (High Correlation) High correlation between salary and age attribute values: High correlation between salary and age attribute values: MAGICRangeIdealA 50% * 1 B 50% * 9 50% * 1 Avg

BERD Range partition Emp using the salary attribute. Range partition Emp using the salary attribute. For the age attribute, construct an auxiliary relation containing: For the age attribute, construct an auxiliary relation containing: 1. The age attribute value of each record 2. Node containing that record Range partition the auxiliary relation using the age attribute value. Range partition the auxiliary relation using the age attribute value.

BERD Bob2010K Shideh1835K Ted5060K Kevin62120K Angela55140K Mike4590K Physical View nameage salaryBob2010KShideh1835K nameage salaryTed5060KMike4590K nameage salaryKevin62120KAngela55140K nameage salary 0-50K51K-100K 101K- ∞ Emp salary is the primary partitioning attribute.

BERD, Auxiliary relation age NodeBob2010KShideh1835K nameage salaryTed5060KMike4590K nameage salaryKevin62120KAngela55140K nameage salary 0-50K51K-100K 101K- ∞ Auxiliary relation

BERD, Auxiliary relation age Node age node ∞ Auxiliary relation Range partition auxiliary relation using the age attribute age node age node

BERD, Auxiliary relation age node Aux.age0-20 Aux.age21-52 Aux.age 53- ∞ age node age nodeTed5060KMike4590K nameage salary Salary51K-100KKevin62120KAngela55140K nameage salary Salary 101K- ∞ Bob2010KShideh1835K nameage salary Salary0-50K

BERD (Cont…) High correlation between age and salary attribute values: High correlation between age and salary attribute values: BERDRangeIdeal A 50% * 1 B 50% * 9 50% * 1 Avg151

BERD (Cont…) Low correlation between age and salary attribute values: Low correlation between age and salary attribute values: BERDRangeIdeal A 50% * 1 B 50% * 9 50% * 1 Avg551 Is it possible to avoid lookup in the auxiliary table?

Experimental environment Verified simulation model of the Gamma database machine Verified simulation model of the Gamma database machine A 32 processor system A 32 processor system Database consists of a 100,000 tuple table based on the Wisconsin Benchmark. Database consists of a 100,000 tuple table based on the Wisconsin Benchmark.

Experimental Design Correlation between partitioning attribute values Workload characteristics (A,B) Multiprogramming level Low High Low, Low Low, Moderate Moderate, Low Moderate, Moderate

Low-Low Query Mix (Low Correlation) Multiprogramming Level Throughput (Queries/Second)

Low-Low Query Mix (High Correlation) Multiprogramming Level Throughput (Queries/Second)

Low-Moderate Mix (Low Correlation) Multiprogramming Level Throughput (Queries/Second)

Low-Moderate Mix (High Correlation) Multiprogramming Level Throughput (Queries/Second)

Moderate-Moderate Mix (Low Correlation) Multiprogramming Level Throughput (Queries/Second)

Moderate-Moderate Mix (High Correlation) Multiprogramming Level Throughput (Queries/Second)

Advantages of MAGIC Provides a superior performance when compared to BERD and Range Provides a superior performance when compared to BERD and Range Constructs the grid directory using the workload of the relation. Changes the shape of the grid directory in order to compensate for the different frequencies of access to the partitioning attributes. Constructs the grid directory using the workload of the relation. Changes the shape of the grid directory in order to compensate for the different frequencies of access to the partitioning attributes. Minimizes the overhead of parallelism. Minimizes the overhead of parallelism. Supports partial declustering of a relation in large systems. Supports partial declustering of a relation in large systems.

Summary Given the fast speed of CPUs, each query/transaction should be processed by one node ideally. Given the fast speed of CPUs, each query/transaction should be processed by one node ideally.

Parallelism versus Efficient Servers Even if all queries and transactions become single-sited, parallelism is no substitute for smart algorithms that make a single server efficient. Even if all queries and transactions become single-sited, parallelism is no substitute for smart algorithms that make a single server efficient. Why? Why?

Why? Assume a single server that can process one request per second. Assume a single server that can process one request per second. Two choices: Two choices: 1. Extend it with Flash and obtain a throughput of 3 requests per second. 2. Buy two additional servers and partition the data across the 3 servers. Given 3 simultaneous requests issued to each alternative: Given 3 simultaneous requests issued to each alternative:  The single processor system will process 3 requests per second.  The 3 node system may not provide a throughput of 3 requests per second.

3 R1R2R3 R1R3R2 R1R3 R2R3R1 R3R1R2 R3R2R1 {R1, R2, R3} {R1, R3}R2 {R1, R3}R2 {R1, R3}R2 {R1, R3}R2 {R1, R3}R2 6 Ideal cases {R1, R3}R2 {R2, R3}R1 {R2, R3}R1 {R2, R3}R1 {R2, R3}R1 {R2, R3}R1 {R2, R3}R1 {R2, R1}R3 {R2, R1}R3 {R2, R1}R3 {R2, R1}R3 {R2, R1}R3 {R2, R1}R ways to assign 3 requests to the 3 nodes!

Brain Teaser Given N servers and M requests, Given N servers and M requests,  compute the probability of:  M/N requests per node.  Number of ways M requests may map onto N servers and the probability of each scenario.

Brain Teaser Given N servers and M requests, Given N servers and M requests,  compute the probability of:  M/N requests per node.  Number of ways M requests may map onto N servers and the probability of each scenario.  Reward for correct answer: