Download presentation
Presentation is loading. Please wait.
1
Akshay Tomar 2012013 Prateek Singh Lohchubh 2012077
Parallel Databases Akshay Tomar Prateek Singh Lohchubh
2
Motivation Innovation in specialised hardware weren’t making much progress. Difficult to make machines powerful enough to meet CPU and I/O needs of large RDBs. Parallel Databases(Parallelism) - Pipelined parallelism - streaming the output of one operator into input of another operator Partitioned parallelism - splitting the operator into many independent operators
3
Introduction Relational Databases are suited for parallel executions. Consists of uniform operations applied to uniform streams of data. Parallelism Goals : Speedup & Scaleup - aim to achieve linear/super linear speedups and scaleup Threats to linear speedup & scaleup : Startup, Interference and Skew
4
Basics of Design Type Major design types : Shared-memory, Shared-disks, Shared-nothing. Parallel DBs are based on Shared-Nothing hardware design. Processors communicate by sending messages via an interconnection network only. Not much interference, near linear speedups and scale-ups on complex relational queries. Large possibilities for scaleup (upto hundreds and probably thousands of processors)
5
Query Optimizer It computes a cost for an Execution Plan taking into account IO/ CPU and Communication Execution Plans depend on: How the Query is Written Size of the Data Set Layout of the Data Access Structure of the DB Execution plan ∝ Number of Objects in the “From” field Optimizer Components: Query Transformer Estimator Plan Generator The optimizer attempts to generate the best execution plan for a SQL statement. Execution Plan: Evaluation of expressions and conditions: The optimizer first evaluates expressions and conditions containing constants as fully as possible Statement transformation: For complex statements involving, correlated subqueries or views, the optimizer might transform the original statement into an equivalent join statement. Choice of optimizer goals: Throughput or Response time Choice of access paths: Full scan or indexed scan etc Choice of join orders: which row is joined first and so on The best execution plan is defined as the plan with the lowest cost among all considered candidate plans. Plans are generated for Query Blocks from bottom up i.e. last/innermost query block is optimized first Directly proportional as joins occur and thus execution plans increase exponentially Query Transformer: The optimizer determines whether it is helpful to change the form of the query so that the optimizer can generate a better execution plan Estimator: Uses Statistics to compute costs of each plan Plan Generator: Compares cost for each plan and selects the lowest one
6
Estimator & Plan Generator
Selectivity: The percentage of rows in the row set that the query selects, with 0 meaning no rows and 1 meaning all rows. Cardinality: The cardinality is the number of rows returned by each operation in an execution plan. Cost: Represents units of work or resource used. The query optimizer uses disk I/O, CPU usage, and memory usage as units of work. Plan Generator: explores various plans for a query block by trying out different access paths, join methods, and join orders. Selectivity is tied to a query predicate, such as WHERE last_name LIKE 'A%', or a combination of predicates Cardinality is computed taking into account orderby, filters and joins
7
Parallel Query Optimization
Parallel Query Optimization is an Extension of Serial Optimizer. Analyzes the cost of parallel access methods for each combination of join orders, join types, and indexes. Select, join, and data searches benefit from parallel Optimization Optimization possible as in a RDBMS all queries result in a new relation and thus we can break this task in multiple parts
8
References http://pages.cs.wisc.edu/~cs764-1/paralleldb.pdf
9
Thank You
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.