Gamma DBMS (Part 2): Failure Management Query Processing Shahram Ghandeharizadeh Computer Science Department University of Southern California.

Slides:



Advertisements
Similar presentations
Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.
Advertisements

RAID (Redundant Arrays of Independent Disks). Disk organization technique that manages a large number of disks, providing a view of a single disk of High.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 12, Part A.
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Copyright © 2011 Ramez Elmasri and Shamkant Navathe Algorithms for SELECT and JOIN Operations (8) Implementing the JOIN Operation: Join (EQUIJOIN, NATURAL.
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
Parallel Databases By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
Fall 2008Parallel Query Optimization1. Fall 2008Parallel Query Optimization2 Bucket Sizes and I/O Costs Bucket B does not fit in the memory in its entirety,
BTrees & Bitmap Indexes
Lecture 3: A Case for RAID (Part 1) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.
ACS-4902 Ron McFadyen Chapter 15 Algorithms for Query Processing and Optimization.
Efficient Storage and Retrieval of Data
6/28/2015EECS 584, Fall The Gamma Database Machine DeWitt, Ghandeharizadeh, Schneider, Bricker, Hsiao, Rasmussen Deepak Bastakoty (With slide material.
Chapter 19 Query Processing and Optimization
Chapter 5 Parallel Join 5.1Join Operations 5.2Serial Join Algorithms 5.3Parallel Join Algorithms 5.4Cost Models 5.5Parallel Join Optimization 5.6Summary.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
Distributed Databases
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
MapReduce VS Parallel DBMSs
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 14 – Join Processing.
TM 7-1 Copyright © 1999 Addison Wesley Longman, Inc. Physical Database Design.
Database Management 9. course. Execution of queries.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
Implementing Natural Joins, R. Ramakrishnan and J. Gehrke with corrections by Christoph F. Eick 1 Implementing Natural Joins.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Search Algorithm Lecture Chapter 15-2 Algorithms for SELECT and JOIN Operations Implementing the SELECT Operation : Search Methods for Simple Selection:
Storage and Indexing1 Overview of Storage and Indexing.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Architecture.
Database Management COP4540, SCS, FIU Physical Database Design (ch. 16 & ch. 3)
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Partitioning and Replication.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Chapter 9 Database Systems © 2007 Pearson Addison-Wesley. All rights reserved.
Storage Structures. Memory Hierarchies Primary Storage –Registers –Cache memory –RAM Secondary Storage –Magnetic disks –Magnetic tape –CDROM (read-only.
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
Lecture 12 Distributed Hash Tables CPE 401/601 Computer Network Systems slides are modified from Jennifer Rexford.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
1 Lecture 21: Hash Tables Wednesday, November 17, 2004.
9-1 © Prentice Hall, 2007 Topic 9: Physical Database Design Object-Oriented Systems Analysis and Design Joey F. George, Dinesh Batra, Joseph S. Valacich,
Advance Database Systems Query Optimization Ch 15 Department of Computer Science The University of Lahore.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
Gamma DBMS Part 1: Physical Database Design Shahram Ghandeharizadeh Computer Science Department University of Southern California.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
Query Processing CS 405G Introduction to Database Systems.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Hash Tables and Query Execution March 1st, Hash Tables Secondary storage hash tables are much like main memory ones Recall basics: –There are n.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
CS 540 Database Management Systems
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
1 Lecture 23: Query Execution Monday, November 26, 2001.
Topics in Distributed Databases Database System Implementation CSE 507 Some slides adapted from Navathe et. Al and Silberchatz et. Al.
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
CS 540 Database Management Systems
Indexing and hashing.
CS 440 Database Management Systems
Parallel Databases.
Database Management Systems (CS 564)
Chapter 11: Indexing and Hashing
Physical Database Design
Indexing and Hashing B.Ramamurthy Chapter 11 2/5/2019 B.Ramamurthy.
Advance Database Systems
Implementation of Relational Operations
Lecture 11: B+ Trees and Query Execution
Chapter 11: Indexing and Hashing
The Gamma Database Machine Project
Presentation transcript:

Gamma DBMS (Part 2): Failure Management Query Processing Shahram Ghandeharizadeh Computer Science Department University of Southern California

Failure Management Techniques Teradata’s Interleaved Declustering Teradata’s Interleaved Declustering  A partitioned table has a primary and a backup copy.  The primary copy is constructed using one of the partitioning techniques.  The secondary copy is constructed by:  Dividing the nodes into clusters (cluster size is 4),  Partition a primary fragment (R0) across the remaining nodes of the cluster: 1, 2, and 3. Realizing r0.0, r0.1, and r0.2.

Teradata’s Interleaved Declustering When a node (say 1) fails, its backup copy processes requests directed towards the primary copy of R1. When a node (say 1) fails, its backup copy processes requests directed towards the primary copy of R1.  Three backup fragments r1.2, r1.0 and r1.1. Note that the load of R1 is distributed across the remaining nodes of the cluster. Note that the load of R1 is distributed across the remaining nodes of the cluster.

Teradata’s Interleaved Declustering MTTR involves: MTTR involves: 1. Replacing the failed node with a new one. 2. Reconstructing the primary copy of the fragment assigned to the failed node, R1.  By reading r1.2, r1.0, and r1.1 from Nodes 0, 2, and Reconstructing the backup fragments assigned to the failed node: r0.0, r2.2, and r3.1.

Teradata’s Interleaved Declustering When does data become unavailable? When does data become unavailable?

Teradata’s Interleaved Declustering When does data become unavailable? When does data become unavailable?  When a second node in a cluster fails prior to repair of the first failed node in that cluster.  Note that it is a bit more complex than the discussion here.

Teradata’s Interleaved Declustering What is the advantage of making the cluster size equal to 8? What is the advantage of making the cluster size equal to 8?

Teradata’s Interleaved Declustering What is the advantage of making the cluster size equal to 8? What is the advantage of making the cluster size equal to 8?  Better distribution of the workload across the nodes in the presence of a failure.

Teradata’s Interleaved Declustering What is the advantage of making the cluster size equal to 8? What is the advantage of making the cluster size equal to 8?  Better distribution of the workload across the nodes in the presence of a failure. What is the dis-advantage of making the cluster size equal to 8? What is the dis-advantage of making the cluster size equal to 8?

Teradata’s Interleaved Declustering What is the advantage of making the cluster size equal to 8? What is the advantage of making the cluster size equal to 8?  Better distribution of the workload across the nodes in the presence of a failure. What is the dis-advantage of making the cluster size equal to 8? What is the dis-advantage of making the cluster size equal to 8?  Higher likelihood of data becoming unavailable.

Teradata’s Interleaved Declustering What is the advantage of making the cluster size equal to 8? What is the advantage of making the cluster size equal to 8?  Better distribution of the workload across the nodes in the presence of a failure. What is the dis-advantage of making the cluster size equal to 8? What is the dis-advantage of making the cluster size equal to 8?  Higher likelihood of data becoming unavailable. Tradeoff between load-balancing (in the presence of a failure) and data availability. Tradeoff between load-balancing (in the presence of a failure) and data availability.

Gamma’s Chained Declustering Nodes are divided into disjoint groups called relation clusters. Nodes are divided into disjoint groups called relation clusters. A relation is assigned to one relation cluster and its records are declustered across the nodes of that relation cluster using a partitioning strategy (Range, Hash). A relation is assigned to one relation cluster and its records are declustered across the nodes of that relation cluster using a partitioning strategy (Range, Hash). Given a primary fragment Ri, its backup copy is assigned to node (i+1) mod M (M is the number of nodes in the relation cluster). Given a primary fragment Ri, its backup copy is assigned to node (i+1) mod M (M is the number of nodes in the relation cluster).

Gamma’s Chained Declustering During normal mode of operation: During normal mode of operation:  Read requests are directed to the fragments of primary copy,  Write requests update both primary and backup copies.

Gamma’s Chained Declustering In the presence of failure: In the presence of failure:  Both primary and backup fragments are used for read operations,  Objective: Balance the load and avoid bottlenecks!  Write requests update both primary and backup copies. Note: Note:  Load of R1 (on node 1) is pushed to node 2 in its entirety.  A fraction of read request from each node is pushed to the others for a 1/8 load increase attributed to node 1’s failure.

Gamma’s Chained Declustering MTTR involves: MTTR involves:  Replace node 1 with a new node,  Reconstruct R1 (from r1 on node 2) on node 1,  Reconstruct backup copy of R0 (i.e., r0) on node 1. Note: Note:  Once Node 1 becomes operational, primary copies are used to process read requests.

Gamma’s Chained Declustering Any two node failures in a relation cluster does not result in data un-availability. Any two node failures in a relation cluster does not result in data un-availability. Two adjacent nodes must fail in order for data to become unavailable. Two adjacent nodes must fail in order for data to become unavailable.

Gamma’s Chained Declustering Re-assignment of active fragments incurs neither disk I/O nor data movement. Re-assignment of active fragments incurs neither disk I/O nor data movement.

Join Hash-join Hash-join A data-flow execution paradigm A data-flow execution paradigm

Example Join of Emp and Dept Emp join Dept (using dno) SS#NameAgeSalarydno 1Joe Mary Bob Kathy Shideh EMPdnodnamefloormgrss#1Toy15 2Shoe21 DeptSS#NameAgeSalarydnodnamefloormgrss#1Joe Shoe21 2Mary Toy15 3Bob Toy15 4Kathy Shoe21 5Shideh440001Toy15

Hash-Join: 1 Node Join of Tables A and B using attribute j (A.j = B.j) consists of two phase: Join of Tables A and B using attribute j (A.j = B.j) consists of two phase: 1. Build phase: Build a main-memory hash table on Table A using the join attribute j, e.g., build a hash table on the Toy department using dno as the key of the hash table. 2. Probe phase: Scan table B one record at a time and use its attribute j to probe the hash table constructed on Table A, e.g., probe the hash table using the rows of the Emp department.

Hash-Join: Build Read rows of Dept table one at a time and place in a main-memory hash table. Read rows of Dept table one at a time and place in a main-memory hash table. 1Toy15 2Shoe21 dno % 7

Hash-Join: Build Read rows of Emp table and probe the hash table. Read rows of Emp table and probe the hash table. 1Toy15 2Shoe21 dno % 7 SS#NameAgeSalarydno1Joe

Hash-Join: Build Read rows of Emp table and probe the hash table and produce results when a match is found. Read rows of Emp table and probe the hash table and produce results when a match is found. SS#NameAgeSalarydno 1Joe Toy15 2Shoe21 dno % 7 SS#NameAgeSalarydnodnamefloormgrss#1Joe Shoe21

Hash-Join: Build Termination condition is when all rows of the Emp table have been processed! Termination condition is when all rows of the Emp table have been processed! SS#NameAgeSalarydno 1Joe Toy15 2Shoe21 dno % 7 SS#NameAgeSalarydnodnamefloormgrss#1Joe Shoe21

Hash-Join Key challenge: Key challenge:

Hash-Join Key challenge: Table used to build the hash table does not fit in main memory! Key challenge: Table used to build the hash table does not fit in main memory! Solution: Solution:

Hash-Join Key challenge: Table used to build the hash table does not fit in main memory! Key challenge: Table used to build the hash table does not fit in main memory! A divide-and-conquer approach: A divide-and-conquer approach:  Use the inner table (Dept) to construct n memory buckets where each bucket is a hash table.  Every time memory is exhausted, spill a fixed number of buckets to the disk.  The build phase terminates with a set of in-memory buckets and a set of disk-resident buckets.  Read the outer relation (Emp) and probe the in-memory buckets for joining records. For those records that map onto the disk- resident buckets, stream and store them to disk.  Discard the in memory buckets to free memory space.  While disk-resident buckets of inner-relation exist:  Read as many (say i) of the disk-resident buckets of the inner- relation into memory as possible.  Read the corresponding buckets of the outer relation (Emp) to probe the in-memory buckets for joining records.  Discard the in memory buckets to free memory space.  Delete the i buckets of the inner and outer relations.

Hash-Join: Build Two buckets of Dept table. One in memory and the second is disk-resident. Two buckets of Dept table. One in memory and the second is disk-resident. 1Toy15 2Shoe21 dno % 7

Hash-Join: Probe Read Emp table and probe the hash table for joining records when dno = 1. With dno=2, stream the data to disk. Read Emp table and probe the hash table for joining records when dno = 1. With dno=2, stream the data to disk. 1Toy15 2Shoe21 dno % 7 SS#NameAgeSalarydno1Joe Mary Bob Kathy Shideh440001

Hash-Join: Probe Those rows of Emp table with dno=1 probed the hash table and produce 3 joining records. Those rows of Emp table with dno=1 probed the hash table and produce 3 joining records. 1Toy15 2Shoe21 dno % 7 SS#NameAgeSalarydno1Joe Kathy

Hash-Join: While loop Read the disk-resident bucket of Dept into memory. Read the disk-resident bucket of Dept into memory. 2Shoe21 dno % 7 SS#NameAgeSalarydno1Joe Kathy

Hash-Join: While loop Read the disk-resident bucket of Dept into memory. Read the disk-resident bucket of Dept into memory. Probe it with the disk-resident buckets of Emp table to produce the remaining two joining records. Probe it with the disk-resident buckets of Emp table to produce the remaining two joining records. 2Shoe21 dno % 7 SS#NameAgeSalarydno1Joe Kathy

Parallelism and Hash-Join Each node may perform hash-join independently when: Each node may perform hash-join independently when:  The join attribute is the declustering attribute of the tables participating in the join operation.  The participating tables are declustered across the same number of nodes using the same declustering strategy.  The system may re-partition the table (see the next bullet) if its aggregate memory exceeds the size of memory the tables are declustered across. Otherwise, the data must be re-partitioned to perform the join operation correctly. Otherwise, the data must be re-partitioned to perform the join operation correctly. Show an example! Show an example!

Parallelism and Hash-Join (Cont…) R join S where R is the inner table. R join S where R is the inner table.

Data Flow Execution Paradigm Retrieve all those Employees working for the toy department: Retrieve all those Employees working for the toy department: SELECT * FROM Dept d, Emp e WHERE d.dno = e.dno and d.dname = Toy

Data Flow Execution Paradigm Producer/Consumer relationship where consumers are activated in advance of the producers. Producer/Consumer relationship where consumers are activated in advance of the producers.

Data Flow Execution Paradigm “Split Table” contains routing information for the records “Split Table” contains routing information for the records The consumers must be setup in order to activate producers. The consumers must be setup in order to activate producers.