8-1 Outline  Overview of Physical Database Design  File Structures  Query Optimization  Index Selection  Additional Choices in Physical Database Design.

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Chapter 10: Designing Databases
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Chapter Physical Database Design Methodology Software & Hardware Mapping Logical Design to DBMS Physical Implementation Security Implementation Monitoring.
IS 4420 Database Fundamentals Chapter 6: Physical Database Design and Performance Leon Chen.
1 Overview of Storage and Indexing Yanlei Diao UMass Amherst Feb 13, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Physical Database Monitoring and Tuning the Operational System.
Quick Review of material covered Apr 8 B+-Tree Overview and some definitions –balanced tree –multi-level –reorganizes itself on insertion and deletion.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 11 Database Performance Tuning and Query Optimization.
Modern Systems Analysis and Design Third Edition
1 Chapter 2 Reviewing Tables and Queries. 2 Chapter Objectives Identify the steps required to develop an Access application Specify the characteristics.
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
Chapter 17 Methodology – Physical Database Design for Relational Databases Transparencies © Pearson Education Limited 1995, 2005.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Chapter 6 Physical Database Design. Introduction The purpose of physical database design is to translate the logical description of data into the technical.
Systems analysis and design, 6th edition Dennis, wixom, and roth
Practical Database Design and Tuning. Outline  Practical Database Design and Tuning Physical Database Design in Relational Databases An Overview of Database.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 11 Database Performance Tuning and Query Optimization.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
CSC271 Database Systems Lecture # 30.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design, 2 nd Edition Copyright 2003 © John Wiley & Sons, Inc. All rights reserved.
TM 7-1 Copyright © 1999 Addison Wesley Longman, Inc. Physical Database Design.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering.
Copyright 2006 Prentice-Hall, Inc. Essentials of Systems Analysis and Design Third Edition Joseph S. Valacich Joey F. George Jeffrey A. Hoffer Chapter.
1 © Prentice Hall, 2002 Chapter 6: Physical Database Design and Performance Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott,
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
Chapter 16 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
Designing Databases Systems Analysis and Design, 7e Kendall & Kendall 13 © 2008 Pearson Prentice Hall.
DataBase Management System What is DBMS Purpose of DBMS Data Abstraction Data Definition Language Data Manipulation Language Data Models Data Keys Relationships.
CIS 210 Systems Analysis and Development Week 6 Part II Designing Databases,
1 Agenda – 04/18/2006 and 04/20/2006 Identify tasks in physical database design. Define the design goals for physical database design. Discuss relevant.
Database Management COP4540, SCS, FIU Physical Database Design (ch. 16 & ch. 3)
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Chapter 13 Designing Databases Systems Analysis and Design Kendall & Kendall Sixth Edition.
SQL/Lesson 7/Slide 1 of 32 Implementing Indexes Objectives In this lesson, you will learn to: * Create a clustered index * Create a nonclustered index.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
IS 320 Notes for April 15, Learning Objectives Understand database concepts. Use normalization to efficiently store data in a database. Use.
Lec 7 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Chapter 5 Index and Clustering
Session 1 Module 1: Introduction to Data Integrity
9-1 © Prentice Hall, 2007 Topic 9: Physical Database Design Object-Oriented Systems Analysis and Design Joey F. George, Dinesh Batra, Joseph S. Valacich,
11-1 © Prentice Hall, 2004 Chapter 11: Physical Database Design Object-Oriented Systems Analysis and Design Joey F. George, Dinesh Batra, Joseph S. Valacich,
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Database Systems, 8 th Edition SQL Performance Tuning Evaluated from client perspective –Most current relational DBMSs perform automatic query optimization.
SQL Basics Review Reviewing what we’ve learned so far…….
IT 5433 LM4 Physical Design. Learning Objectives: Describe the physical database design process Explain how attributes transpose from the logical to physical.
Practical Database Design and Tuning
Module 11: File Structure
Indexing Structures for Files and Physical Database Design
Record Storage, File Organization, and Indexes
Physical Database Design
Physical Database Design
Physical Database Design and Performance
Modern Systems Analysis and Design Third Edition
Database Performance Tuning and Query Optimization
CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE
Physical Database Design
Chapter 12 Designing Databases
Practical Database Design and Tuning
The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited)
Chapter 11 Database Performance Tuning and Query Optimization
Evaluation of Relational Operations: Other Techniques
Chapter 17 Designing Databases
Presentation transcript:

8-1 Outline  Overview of Physical Database Design  File Structures  Query Optimization  Index Selection  Additional Choices in Physical Database Design

8-2 Overview of Physical Database Design  Sequence of decision-making processes.  Decisions involve the storage level of a database: file structure and optimization choices.  Importance of providing detailed inputs and using tools that are integrated with DBMS usage of file structures and optimization decisions

8-3 Storage Level of Databases  Closest to the hardware and operating system.  Physical records organized into files.  The number of physical record accesses is an important measure of database performance.  Difficult to predict physical record accesses

8-4 Logical Records (LR) and Physical Records (PR)

8-5 Transferring Physical Records

8-6 Objectives  Minimize response time to access and change a database.  Minimizing computing resources is a substitute measure for response time.  Database resources  Physical record transfers  CPU operations  Communication network usage (distributed processing)

8-7 Constraints  Main memory and disk space  Minimizing main memory and disk space can lead to high response times.  Useful to consider additional memory and disk space

8-8 Combined Measure of Database Performance  Weight combines physical record accesses and CPU usage  Weight is usually close to 0  Mmany CPU operations can be performed in the time to perform one physical record transfer.

8-9 Inputs, Outputs, and Environment

8-10 Difficulty of physical database design  Number of decisions  Relationship among decisions  Detailed inputs  Complex environment  Uncertainty in predicting physical record accesses

8-11 Inputs of Physical Database Design  Physical database design requires inputs specified in sufficient detail.  Table profiles used to estimate performance measures.  Application profiles provide importance of applications.

8-12 Table Profile  Tables  Number of rows  Number of physical records  Columns  Number of unique values  Distribution of values  Correlation of columns  Relationships: distribution of related rows

8-13 Histogram  Specify distribution of values  Two dimensional graph  Column values on the x axis  Number of rows on the y axis  Variations  Equal-width: do not work well with skewed data  Equal-height: control error by the number of ranges

8-14 Equal-Width Histogram

8-15 Equal-Height Histogram

8-16 Application profiles  Application profiles summarize the queries, forms, and reports that access a database.

8-17 File structures  Selecting among alternative file structures is one of the most important choices in physical database design.  In order to choose intelligently, you must understand characteristics of available file structures.

8-18 Sequential Files  Simplest kind of file structure  Unordered: insertion order  Ordered: key order  Simple to maintain  Provide good performance for processing large numbers of records

8-19 Unordered Sequential File

8-20 Ordered Sequential File

8-21 Hash Files  Support fast access by unique key value  Convert a key value into a physical record address  Mod function: typical hash function  Divisor: large prime number close to the file capacity  Physical record number: hash function plus the starting physical record number

8-22 Example: Hash Function Calculations for StdSSN Key

8-23 Hash File after Insertions

8-24 Collision Handling Example

8-25 Hash File Limitations  Poor performance for sequential search  Reorganization when capacity exceeds 70%  Dynamic hash files reduce random search performance but eliminate periodic reorganization

8-26 Multi-Way Tree (Btrees) Files  A popular file structure supported by most DBMSs.  Btree provides good performance on both sequential search and key search.  Btree characteristics:  Balanced  Bushy: multi-way tree  Block-oriented  Dynamic

8-27 Structure of a Btree of Height 3

8-28 Btree Node Containing Keys and Pointers

8-29 Btree Insertion Examples

8-30 Btree Deletion Examples

8-31 Cost of Operations  The height of Btree dominates the number of physical record accesses operation.  Logarithmic search cost  Upper bound of height: log function  Log base: minimum number of keys in a node  Insertion cost  Cost to locate the nearest key  Cost to change nodes

8-32 B+Tree  Provides improved performance on sequential and range searches.  In a B+tree, all keys are redundantly stored in the leaf nodes.  To ensure that physical records are not replaced, the B+tree variation is usually implemented.

8-33 B+tree Illustration

8-34 Index Matching  Determining usage of an index for a query  Complexity of condition determines match.  Single column indexes: =,, =, IN, BETWEEN, IS NULL, LIKE ‘Pattern’ (meta character not the first symbol)  Composite indexes: more complex and restrictive rules

8-35 Index Matching Examples  C2 BETWEEN 10 and 20: match on C2  C3 IN (10,20): match on C3  C1 <> 10: no match  C4 LIKE 'A%‘: match on C4  C4 LIKE '%A‘: no match  C2 = 5 AND C3 = 20 AND C1 = 10: matches on index with C1, C2, and C3

8-36 Bitmap Index  Can be useful for stable columns with few values  Bitmap:  String of bits: 0 (no match) or 1 (match)  One bit for each row  Bitmap index record  Column value  Bitmap  DBMS converts bit position into row identifier.

8-37 Bitmap Index Example Faculty Table Bitmap Index on FacRank

8-38 Bitmap Join Index  Bitmap identifies rows of a related table.  Represents a precomputed join  Can define for a join column or a non-join column  Typically used in query dominated environments such as data warehouses (Chapter 16)

8-39 Summary of File Structures

8-40 Query Optimization  Query optimizer determines implementation of queries.  Major improvement in software productivity  Improve performance with knowledge about the optimization process

8-41 Translation Tasks

8-42 Access Plan Evaluation  Optimizer evaluates thousands of access plans  Access plans vary by join order, file structures, and join algorithm.  Some optimizers can use multiple indexes on the same table.  Access plan evaluation can consume significant resources

8-43 Access Plan Example 1

8-44 Access Plan Example 2

8-45 Join Algorithms  Nested loops: inner and outer loops; universal  Sort merge: join column sorting or indexes  Hybrid join: combination of nested loops and sort merge  Hash join: uses internal hash table  Star join: uses bitmap join indexes

8-46 Improving Optimization Results  Monitor poorly performing access plans  Look for problems involving table profiles and query coding practices  Use hints carefully to improve results  Override optimizer judgment  Cover file structures, join algorithms, and join orders  Use as a last result

8-47 Table Profile Deficiencies  Detailed and current statistics needed  Beware of uniform value assumption and independence assumption  Use hints to overcome optimization blind spots  Estimation of result size for parameterized queries  Correlated columns: multiple index access may be useful

8-48 Query Coding Practices  Avoid functions on indexable columns  Eliminate unnecessary joins  For conditions on join columns, test the condition on the parent table.  Do not use the HAVING clause for row conditions.  Avoid repetitive binding of complex queries  Beware of queries that use complex views

8-49 Index Selection  Most important decision  Difficult decision  Choice of clustered and nonclustered indexes

8-50 Clustering Index Example

8-51 Nonclustering Index Example

8-52 Inputs and Outputs of Index Selection

8-53 Trade-offs in Index Selection  Balance retrieval against update performance  Nonclustering index usage:  Few rows satisfy the condition in the query  Join column usage if a small number of rows result in child table  Clustering index usage:  Larger number of rows satisfy a condition than for nonclustering index  Use in sort merge join algorithm to avoid sorting  More expensive to maintain

8-54 Difficulties of Index Selection  Application weights are difficult to specify.  Distribution of parameter values needed  Behavior of the query optimization component must be known.  The number of choices is large.  Index choices can be interrelated.

8-55 Selection Rules I Rule 1: A primary key is a good candidate for a clustering index. Rule 2: To support joins, consider indexes on foreign keys. Rule 3: A column with many values may be a good choice for a non-clustering index if it is used in equality conditions. Rule 4: A column used in highly selective range conditions is a good candidate for a non-clustering index. Rule 5: A combination of columns used together in query conditions may be good candidates for nonclustering indexes if the joint conditions return few rows, the DBMS optimizer supports multiple index access, and the columns are stable.

8-56 Selection Rules II Rule 6: A frequently updated column is not a good index candidate. Rule 7: Volatile tables (lots of insertions and deletions) should not have many indexes. Rule 8: Stable columns with few values are good candidates for bitmap indexes if the columns appear in WHERE conditions. Rule 9: Avoid indexes on combinations of columns. Most optimization components can use multiple indexes on the same table.

8-57 Index Creation  To create the indexes, the CREATE INDEX statement can be used.  The word following the INDEX keyword is the name of the index.  CREATE INDEX is not part of SQL:2003.  Examples

8-58 Denormalization  Additional choice in physical database design  Denormalization combines tables so that they are easier to query.  Use carefully because normalized designs have important advantages.

8-59 Normalized designs  Better update performance  Require less coding to enforce integrity constraints  Support more indexes to improve query performance

8-60 Repeating Groups  Collection of associated values.  Normalization rules force repeating groups to be stored in an M table separate from an associated one table.  If a repeating group is always accessed with its associated parent table, denormalization may be a reasonable alternative.

8-61 Denormalizing a Repeating Group

8-62 Denormalizing a Generalization Hierarchy

8-63 Codes and Meanings

8-64 Record Formatting  Record formatting decisions involve compression and derived data.  Compression is a trade-off between input- output and processing effort.  Derived data is a trade-offs between query and update operations.

8-65 Storing Derived Data to Improve Query Performance

8-66 Parallel Processing  Parallel processing can improve retrieval and modification performance.  Retrieving many records can be improved by reading physical records in parallel.  Many DBMSs provide parallel processing capabilities with RAID systems.  RAID is a collection of disks (a disk array) that operates as a single disk.

8-67 Striping in RAID Storage Systems

8-68 Other Ways to Improve Performance  Transaction processing: add computing capacity and improve transaction design.  Data warehouses: add computing capacity and store derived data.  Distributed databases: allocate processing and data to various computing locations.

8-69 Summary  Goal: minimize response time  Constraints: disk space, memory, communication bandwidth  Table profiles and application profiles must be specified in sufficient detail.  Environment: file structures and query optimization  Monitor and possibly improve query optimization results  Index selection: most important decision  Other techniques: denormalization, record formatting, and parallel processing