Parallel Databases 77. Introduction 4 Basic idea: use multiple disks, memory and/or processors to speed up querying. 4 Measures –Throughput – how many.

Slides:



Advertisements
Similar presentations
Physical Database Design and Tuning R&G - Chapter 20 Although the whole of this life were said to be nothing but a dream and the physical world nothing.
Advertisements

Unit 1:Parallel Databases
Copyright © 2011 Ramez Elmasri and Shamkant Navathe Algorithms for SELECT and JOIN Operations (8) Implementing the JOIN Operation: Join (EQUIJOIN, NATURAL.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 19 Algorithms for Query Processing and Optimization.
Chapter 15 Algorithms for Query Processing and Optimization Copyright © 2004 Pearson Education, Inc.
Implementation of Relational Operations (Part 2) R&G - Chapters 12 and 14.
Parallel Databases By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
Fall 2008Parallel Query Optimization1. Fall 2008Parallel Query Optimization2 Bucket Sizes and I/O Costs Bucket B does not fit in the memory in its entirety,
04/25/2005Yan Huang - CSCI5330 Database Implementation – Parallel Database Parallel Databases.
OUTLINE OF THE LECTURE PART I GOAL: Understand the Data Definition Statements in Fig 4.1 Step1: Columns of the Tables and Data types. Step2: Single column.
Spark: Cluster Computing with Working Sets
Transaction.
Parallel Database Systems
Parallel Database Systems The Future Of High Performance Database Systems David Dewitt and Jim Gray 1992 Presented By – Ajith Karimpana.
Data Warehousing 1 Lecture-25 Need for Speed: Parallelism Methodologies Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Manish Bhide, Manoj K Agarwal IBM India Research Lab India {abmanish, Amir Bar-Or, Sriram Padmanabhan IBM Software Group, USA
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #6.
Parallel Algorithms for Relational Operations. Models of Parallelism There is a collection of processors. –Often the number of processors p is large,
Chapter 1 Introduction 1.1A Brief Overview - Parallel Databases and Grid Databases 1.2Parallel Query Processing: Motivations 1.3Parallel Query Processing:
ACS-4902 Ron McFadyen Chapter 15 Algorithms for Query Processing and Optimization.
Parallel and distributed databases R & G Chapter 22.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.
TDD: Topics in Distributed Databases
Fall 2008Parallel Databases1. Fall 2008Parallel Databases2 Ideal Parallel Systems Two key properties:  Linear Speedup: Twice as much hardware can perform.
Chapter 19 Query Processing and Optimization
Chapter 5 Parallel Join 5.1Join Operations 5.2Serial Join Algorithms 5.3Parallel Join Algorithms 5.4Cost Models 5.5Parallel Join Optimization 5.6Summary.
Database System Architectures  Client-server Database System  Parallel Database System  Distributed Database System Wei Jiang.
Parallel & Distributed databases Agenda –The problem domain of design parallel & distributed databases (chp 18-20) –The data allocation problem –The data.
PMIT-6102 Advanced Database Systems
Database Architecture Optimized for the New Bottleneck: Memory Access Peter Boncz Data Distilleries B.V. Amsterdam The Netherlands Stefan.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
CS Transaction Processing Lecture 18 Parallelism.
Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.
Query Processing and Optimization
Physical Database Design I, Ch. Eick 1 Physical Database Design I About 25% of Chapter 20 Simple queries:= no joins, no complex aggregate functions Focus.
Database Management COP4540, SCS, FIU Physical Database Design (2) (ch. 16 & ch. 6)
42 Example Join-- File Information 4 Emp( Fn Char(10), Minit Char, LN Char(20), SSN number(9), Bdate Date, Addr char(40), Sex Char, Salary Number(9,2),
Radix Sort and Hash-Join for Vector Computers Ripal Nathuji 6.893: Advanced VLSI Computer Architecture 10/12/00.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Queries Objective 5.02 Understand queries, forms, and reports used in business.
Lecture 14- Parallel Databases Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
1 Distributed Databases architecture, fragmentation, allocation Lecture 1.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database.
603 Database Systems Senior Lecturer: Laurie Webster II, M.S.S.E.,M.S.E.E., M.S.BME, Ph.D., P.E. Lecture 16 A First Course in Database Systems.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
Handling Data Skew in Parallel Joins in Shared-Nothing Systems Yu Xu, Pekka Kostamaa, XinZhou (Teradata) Liang Chen (University of California) SIGMOD’08.
Performing Fault-tolerant, Scalable Data Collection and Analysis James Jolly University of Wisconsin-Madison Visualization and Scientific Computing Dept.
CPT-S Advanced Databases 11 Yinghui Wu EME 49.
BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.
Distributed Databases and Client-Server Architectures
Parallel Databases.
Physical Database Design and Performance
Database System Implementation CSE 507
Introduction to NewSQL
What is the Azure SQL Datawarehouse?
Mapping the Data Warehouse to a Multiprocessor Architecture
April 30th – Scheduling / parallel
Company Requirements.
Cse 344 May 2nd – Map/reduce.
Chapter 17: Database System Architectures
Akshay Tomar Prateek Singh Lohchubh
Parallel DBMS Chapter 22, Part A
Parallel DBMS Chapter 22, Sections 22.1–22.6
Kabra and DeWitt presented by Zack Ives CSE 590DB, May 11, 1998
Database System Architectures
Database Administration
The Gamma Database Machine Project
Parallel DBMS DBMS Textbook Chapter 22
Presentation transcript:

Parallel Databases 77

Introduction 4 Basic idea: use multiple disks, memory and/or processors to speed up querying. 4 Measures –Throughput – how many tasks can be completed in some unit of time. –Response time – how long does it take to complete one task? 4 Using parallelism to increase response time is called speedup. 4 Using parallelism to increase throughput is called scale up. 78

Problems 4 Optimally, we would like linear scale up/speedup. This is not usually the case. 4 Why? –Start Up Costs –Interference – different processors need the same resource. –Communication Costs –Some parts may not be able to be parallelized. –Skew – Not likely to be able to break problem into equal sized parts. 79

Skew Example 4 Suppose I have 8 processors to do a query. I should be able to do it in 1/8 the time. 4 Now suppose data is distributed this way: –P1: 5% –P2: 10% –P3: 10% –P4: 5% –P5: 10% –P6: 10% –P7: 25% -- these only allow ¼ of the time. –P8: 25% 80

What Can Be Shared? 4 Share Memory –Advantages: dynamic partitioning (any process may be allocated all/some of memory available). Cheaper than each processor having its own memory. Lower communication cost between processors –Disadvantages: Memory can become a bottleneck. Scalability is a problem. 81

Sharing Continued 4 Share Disk –Advantages: Data need not be replicated – no synchronization Better scalability Fault tolerance may be built into the system –Disadvantages: Single point of failure Communication cost is greater 82

Sharing III 4 Share Nothing -- really a type of distributed DB –Advantages: Complete parallel solution Less bottlenecks Multiple points of failures Scalability –Disadvantages: Cost for the bean counters Communication costs are greater Multiple points of failures 83

Sharing IV 4 Hierarchical –Advantages: Gain advantages of speed and scalability –Disadvantages: How to partition? 84

Disk Partitioning 4 wikipedia-Standard RAID levels wikipedia-Standard RAID levels 85

Disk Partitioning for DB Usage 4 Round Robin Partitioning – like RAID 5 4 Range Partitioning – all tuples with a column value within some range go to the same partition. 4 Hash Partition – all tuples with a column value that hash to the same value go to the same partition. 86

Usage 4 Which is best for –Simple selects – unique match –Simple selects – non-unique match –Range queries –Print unsorted –Print sorted 87

Skew In This Context 4 Attribute-Value Skew – many tuples with the same value for the partitioning column. 4 Partition Skew – some partitions end up with more tuples, even if they have different values. –Change the ranges – use a histogram to better predict cut-offs. 4 Time-Value Skew – a good partitioning algorithm acquires skew over time. 88

Parallel Joins  R ⨝ (A=B) S –Range Partition R on A and S on B. Pass same ranges off to the same partition. –Hash Partition – would also work  R ⨝ (A<B) S –Partition R and replicate S. 89

Example 4 Emp(Fn, Minit, LN, SSN, Bdate, Addr, Sex, Salary, SuperSSN, Dno) –r = 100,000 records –bf = 5 records/block –b = 20,000 blocks 4 Dept(D#, Dname, MGRSSN, MgrStartDate) –r = 1250 records –bf = 10 records/block –b = 125 blocks 90

Example Query 4 I want to perform Emp ⨝ (DNO=D#) Dept 4 How can I parallelize this and how much can I save? 91