Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

Unit 1:Parallel Databases
Chapter 12 File Management Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
Chapter 15 Algorithms for Query Processing and Optimization Copyright © 2004 Pearson Education, Inc.
CS 540 Database Management Systems
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
Parallel Databases By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
Parallel Database Systems
File Management Chapter 12. File Management File management system is considered part of the operating system Input to applications is by means of a file.
File Management Chapter 12. File Management A file is a named entity used to save results from a program or provide data to a program. Access control.
Processing Data in External Storage CS Data Structures Mehmet H Gunes Modified from authors’ slides.
Query processing and optimization. Advanced DatabasesQuery processing and optimization2 Definitions Query processing –translation of query into low-level.
Midterm Review Lecture 14b. 14 Lectures So Far 1.Introduction 2.The Relational Model 3.Disks and Files 4.Relational Algebra 5.File Org, Indexes 6.Relational.
Optimizing Query Execution Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems January 26, 2005 Content on hashing.
1 Lecture 22: Query Execution Wednesday, March 2, 2005.
Physical Database Monitoring and Tuning the Operational System.
ACS-4902 Ron McFadyen Chapter 15 Algorithms for Query Processing and Optimization.
1 File Management Chapter File Management File management system consists of system utility programs that run as privileged applications Input to.
Chapter 4 Parallel Sort and GroupBy 4.1Sorting, Duplicate Removal and Aggregate 4.2Serial External Sorting Method 4.3Algorithms for Parallel External Sort.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
File Management Chapter 12.
Sorting and Query Processing Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 29, 2005.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
File Management Chapter 12. File Management File management system is considered part of the operating system Input to applications is by means of a file.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
Chapter pages1 File Management Chapter 12.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
CSCE Database Systems Chapter 15: Query Execution 1.
Database Management 9. course. Execution of queries.
Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
File Management Chapter 12. File Management File management system is considered part of the operating system Input to applications is by means of a file.
1 File Management Chapter File Management n File management system consists of system utility programs that run as privileged applications n Concerned.
Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
CS4432: Database Systems II Query Processing- Part 2.
Radix Sort and Hash-Join for Vector Computers Ripal Nathuji 6.893: Advanced VLSI Computer Architecture 10/12/00.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
Advance Database Systems Query Optimization Ch 15 Department of Computer Science The University of Lahore.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts - 6 th Edition Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism.
Lecture 14- Parallel Databases Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
Lecture 17: Query Execution Tuesday, February 28, 2001.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
CS 540 Database Management Systems
Parallel Databases.
Interquery Parallelism
Database Performance Tuning and Query Optimization
Chapter 15 QUERY EXECUTION.
Join Processing in Database Systems with Large Main Memories (part 2)
Examples of Physical Query Plan Alternatives
Cse 344 May 7th – Exam Review.
April 30th – Scheduling / parallel
Physical Database Design
Database Query Execution
Parallel DBMS Chapter 22, Part A
Advance Database Systems
Evaluation of Relational Operations: Other Techniques
Lecture 22: Query Execution
The Gamma Database Machine Project
Lecture 20: Query Execution
Parallel DBMS DBMS Textbook Chapter 22
Presentation transcript:

Querying Large Databases Rukmini Kaushik

Purpose Research for efficient algorithms and software architectures of query engines.

Query Execution Engine Architecture Query processing algorithms – physical algebra Data Model – logical algebra

Sorting & Hashing Both are memory intensive. Memory Concerns - Merge Efficiency & memory management. - Hash table overflow

Aggregation and Duplicate Removal Aggregation Concept Describes a set of objects with one value. Algorithms Three Types - Nested Loops - Sorting - Hashing

Aggregation & Duplicate Removal Nested Loops - Easiest of the three - Doesn’t work well for large inputs Sorting - Sort for common elements which results in a simple duplicate removal. - Should remove duplicates as early as possible.

Aggregation & Duplicate Removal Hashing - Hash on group attributes. - Can perform duplicate removal when creating hash table. Algorithm Analysis Sorting and hashing functions are logarithmic with input size

Complex Query Execution Plan Purpose - To schedule a query with several operations optimally Ideas - Right-deep plans - Left-deep plans

Complex Query Execution Plan Prediction - Use a decision tree of sub-plans - Done by using choose-plan operators Major Concern - Optimal resource allocation

Parallel Query Execution Mechanism Goal Obtain speed-up & scale-up Speed-up - Uses extra hardware for constant size problem - Linear speed-up is optimal - Can be expressed as parallel efficiency

Parallel Query Execution Mechanism Scale-up - Uses same resources with altered problem size - Can be expressed as parallel efficiency.

Parallel Query Execution Mechanism Parallel Vs Distributed Systems Distributed - Locally Autonomous - Also uses Parallelism

Parallel Query Execution Mechanism Parallel - One center of control - Three types Shared memory Shared Disk Distributed Memory

Parallel Query Execution Mechanism Three forms of parallelism - Inter Query: Servicing multiple requests at the same time - Inter Operator: Pipelining - Intra Operator: Execute a single operator in multiple processors

Parallel Query Execution Mechanism Implementation Bracket Models Operator Models Bracket Model Goal: Generic process template that receives and sends data and performs one operation at a time

Parallel Query Execution Mechanism Number of inputs is limited to two Can be run in parallel by having many templates in the system running simultaneously. Operator Model Goal: Insert parallel operators in an ordered plan

Parallel Query Execution Mechanism Uses the exchange operator Exchange operator - Does not manipulate data - Provides capabilities for parallel query processing - Changes a complex query into a single process

Parallel Algorithms Idea: More focus on algorithms and parallel execution Parallel selections and updates - Disk input and output should be made parallel - Selection: Maintain indices near stored data - Updates: Use keys for partitioning attributes

Parallel Algorithms Parallel Sorting: -classified by - number of parallel inputs - number of parallel outputs - Algorithms consists of local sort and a data exchange step

Parallel Algorithms - Major Concern - Deadlock can be avoided by using range partitioning - having a sufficient size data exchange buffer - using a modified sort algorithm

Query Optimization Uses the differences between logical and physical aspects Must keep track of the properties of the inputs Cost models focus on throughput measures

Tuning query performance Focus - Guidelines for improving query performance Guidelines for three points of view - implementor and vendor - database administrator - application programmer

Tuning Query Performance Implementor System should support indexing and clustering Query optimizer should be reliable and accurate Administrator Ensure usage of system facilities

Tuning Query Performance carefully choose physical database design provide available and efficient processing resources Application Programmer Provide high level queries