Akshay Tomar Prateek Singh Lohchubh

Slides:



Advertisements
Similar presentations
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Advertisements

1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
04/25/2005Yan Huang - CSCI5330 Database Implementation – Parallel Database Parallel Databases.
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
Parallel Database Systems
Parallel Database Systems The Future Of High Performance Database Systems David Dewitt and Jim Gray 1992 Presented By – Ajith Karimpana.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
VLDB Revisiting Pipelined Parallelism in Multi-Join Query Processing Bin Liu and Elke A. Rundensteiner Worcester Polytechnic Institute
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Chapter 1 Introduction 1.1A Brief Overview - Parallel Databases and Grid Databases 1.2Parallel Query Processing: Motivations 1.3Parallel Query Processing:
1 The Optimizer How ORACLE optimizes SQL statements David Konopnicky 1997, Revised by Mordo Shalom 2004.
Chapter 4 Parallel Sort and GroupBy 4.1Sorting, Duplicate Removal and Aggregate 4.2Serial External Sorting Method 4.3Algorithms for Parallel External Sort.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Database System Architectures  Client-server Database System  Parallel Database System  Distributed Database System Wei Jiang.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Access Path Selection in a Relational Database Management System Selinger et al.
1 © 2012 OpenLink Software, All rights reserved. Virtuoso - Column Store, Adaptive Techniques for RDF Orri Erling Program Manager, Virtuoso Openlink Software.
Database Management 9. course. Execution of queries.
Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
Parallel Database Systems Instructor: Dr. Yingshu Li Student: Chunyu Ai.
The Value of Parallelism 16 th Meeting Course Name: Business Intelligence Year: 2009.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Lecture 1- Query Processing Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
Mapping the Data Warehouse to a Multiprocessor Architecture
1 Copyright © 2005, Oracle. All rights reserved. Following a Tuning Methodology.
Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database.
CS 440 Database Management Systems Parallel DB & Map/Reduce Some slides due to Kevin Chang 1.
Chapter 13: Query Processing
SQL Server Statistics DEMO SQL Server Statistics SREENI JULAKANTI,MCTS.MCITP,MCP. SQL SERVER Database Administration.
CPT-S Advanced Databases 11 Yinghui Wu EME 49.
BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.
CHAPTER 19 Query Optimization. CHAPTER 19 Query Optimization.
These slides are based on the book:
SQL Server Statistics and its relationship with Query Optimizer
Chapter 17: Database System Architectures
UNIT 11 Query Optimization
Chapter 20: Database System Architectures
Parallel Databases.
Database Management System
Query Optimization Kush Kashyap B.Tech -IT.
Interquery Parallelism
Parallel Data Laboratory, Carnegie Mellon University
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Chapter 12: Query Processing
Database Performance Tuning and Query Optimization
Introduction to Query Optimization
Chapter 15 QUERY EXECUTION.
Mapping the Data Warehouse to a Multiprocessor Architecture
April 30th – Scheduling / parallel
Cardinality Estimator 2014/2016
Cse 344 May 2nd – Map/reduce.
Chapter 17: Database System Architectures
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
Query Optimization CS 157B Ch. 14 Mien Siao.
AN INTRODUCTION ON PARALLEL PROCESSING
Parallel DBMS Chapter 22, Sections 22.1–22.6
Chapter 12 Query Processing (1)
Chapter 11 Database Performance Tuning and Query Optimization
Evaluation of Relational Operations: Other Techniques
Performance And Scalability In Oracle9i And SQL Server 2000
Database System Architectures
Query Processing.
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Parallel DBMS DBMS Textbook Chapter 22
Presentation transcript:

Akshay Tomar 2012013 Prateek Singh Lohchubh 2012077 Parallel Databases Akshay Tomar 2012013 Prateek Singh Lohchubh 2012077

Motivation Innovation in specialised hardware weren’t making much progress. Difficult to make machines powerful enough to meet CPU and I/O needs of large RDBs. Parallel Databases(Parallelism) - Pipelined parallelism - streaming the output of one operator into input of another operator Partitioned parallelism - splitting the operator into many independent operators

Introduction Relational Databases are suited for parallel executions. Consists of uniform operations applied to uniform streams of data. Parallelism Goals : Speedup & Scaleup - aim to achieve linear/super linear speedups and scaleup Threats to linear speedup & scaleup : Startup, Interference and Skew

Basics of Design Type Major design types : Shared-memory, Shared-disks, Shared-nothing. Parallel DBs are based on Shared-Nothing hardware design. Processors communicate by sending messages via an interconnection network only. Not much interference, near linear speedups and scale-ups on complex relational queries. Large possibilities for scaleup (upto hundreds and probably thousands of processors)

Query Optimizer It computes a cost for an Execution Plan taking into account IO/ CPU and Communication Execution Plans depend on: How the Query is Written Size of the Data Set Layout of the Data Access Structure of the DB Execution plan ∝ Number of Objects in the “From” field Optimizer Components: Query Transformer Estimator Plan Generator The optimizer attempts to generate the best execution plan for a SQL statement. Execution Plan: Evaluation of expressions and conditions: The optimizer first evaluates expressions and conditions containing constants as fully as possible Statement transformation: For complex statements involving, correlated subqueries or views, the optimizer might transform the original statement into an equivalent join statement. Choice of optimizer goals: Throughput or Response time Choice of access paths: Full scan or indexed scan etc Choice of join orders: which row is joined first and so on The best execution plan is defined as the plan with the lowest cost among all considered candidate plans. Plans are generated for Query Blocks from bottom up i.e. last/innermost query block is optimized first Directly proportional as joins occur and thus execution plans increase exponentially Query Transformer: The optimizer determines whether it is helpful to change the form of the query so that the optimizer can generate a better execution plan Estimator: Uses Statistics to compute costs of each plan Plan Generator: Compares cost for each plan and selects the lowest one

Estimator & Plan Generator Selectivity: The percentage of rows in the row set that the query selects, with 0 meaning no rows and 1 meaning all rows. Cardinality: The cardinality is the number of rows returned by each operation in an execution plan. Cost: Represents units of work or resource used. The query optimizer uses disk I/O, CPU usage, and memory usage as units of work. Plan Generator: explores various plans for a query block by trying out different access paths, join methods, and join orders. Selectivity is tied to a query predicate, such as WHERE last_name LIKE 'A%', or a combination of predicates Cardinality is computed taking into account orderby, filters and joins

Parallel Query Optimization Parallel Query Optimization is an Extension of Serial Optimizer. Analyzes the cost of parallel access methods for each combination of join orders, join types, and indexes. Select, join, and data searches benefit from parallel Optimization Optimization possible as in a RDBMS all queries result in a new relation and thus we can break this task in multiple parts

References http://pages.cs.wisc.edu/~cs764-1/paralleldb.pdf http://www.cs.berkeley.edu/~brewer/cs262/5-dewittgray92.pdf http://dl.acm.org/citation.cfm?id=2353340.2353721

Thank You