Tuning Relational Systems I. Schema design  Trade-offs among normalization, denormalization, clustering, aggregate materialization, vertical partitioning,

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

Chapter 18 Methodology – Monitoring and Tuning the Operational System Transparencies © Pearson Education Limited 1995, 2005.
Dimensional Modeling.
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Relational Database. Relational database: a set of relations Relation: made up of 2 parts: − Schema : specifies the name of relations, plus name and type.
Data Warehouse Tuning. 7 - Datawarehouse2 Datawarehouse Tuning Aggregate (strategic) targeting: –Aggregates flow up from a wide selection of data, and.
Relational Normalization Theory. Limitations of E-R Designs Provides a set of guidelines, does not result in a unique database schema Does not provide.
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
Chapter Physical Database Design Methodology Software & Hardware Mapping Logical Design to DBMS Physical Implementation Security Implementation Monitoring.
1 Distributed Databases Chapter Two Types of Applications that Access Distributed Databases The application accesses data at the level of SQL statements.
IS 4420 Database Fundamentals Chapter 6: Physical Database Design and Performance Leon Chen.
Physical Database Monitoring and Tuning the Operational System.
Chapter 11 Data Management Layer Design
Motivation Mobile devices often work offline, and users often need to download large query results for later use. Results are often accessed in small pieces.
Distributed Databases and Query Processing. Distributed DB’s vs. Parallel DB’s Many autonomous processors that may participate in database operations.
Why Normalization? To Reduce Redundancy to 1.avoid modification, insertion, deletion anomolies 2.save space Goal: One Fact in One Place.
Schema Dennis Shasha and Philippe Bonnet, 2013.
Week 6 Lecture Normalization
Index tuning Performance Tuning.
Practical Database Design and Tuning. Outline  Practical Database Design and Tuning Physical Database Design in Relational Databases An Overview of Database.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
Getting Started With Ingres VectorWise
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
Communicating with the Outside. Overview Package several SQL statements within one call to the database server Embedded procedural language (Transact.
Chapter 16 Methodology – Physical Database Design for Relational Databases.
CS Data Warehouse & Performance Tuning Xiaofang Zhou School of Computing, NUS Office: S URL:
Normalization Transparencies
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
1 Recovery Tuning Main techniques Put the log on a dedicated disk Delay writing updates to the database disks as long as possible Setting proper intervals.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Chapter 16 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
Physical Database Design I, Ch. Eick 1 Physical Database Design I About 25% of Chapter 20 Simple queries:= no joins, no complex aggregate functions Focus.
1 Schema Refinement, Normalization, and Tuning. 2 Design Steps v The design steps: 1.Real-World 2. ER model 3. Relational Schema 4. Better relational.
Chapter 13 Normalization Transparencies. 2 Chapter 13 - Objectives u Purpose of normalization. u Problems associated with redundant data. u Identification.
Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.
CS Operating System & Database Performance Tuning Xiaofang Zhou School of Computing, NUS Office: S URL:
SCUHolliday - coen 1788–1 Schedule Today u Modifications, Schemas, Views. u Read Sections (except and 6.6.6) Next u Constraints. u Read.
1 Functional Dependencies and Normalization Chapter 15.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
Schema Tuning. Outline Database design: Normalization –Problem of redundancy –Why? Functional dependency –How to solve? Decomposition –Objective of the.
Methodology – Physical Database Design for Relational Databases.
Announcements Program 3 due Friday Homework 2 out today, due Mon Read: Chapter 3.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Physical Database Design I, Ch. Eick 1 Physical Database Design I Chapter 16 Simple queries:= no joins, no complex aggregate functions Focus of this Lecture:
Lec 7 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,
Chapter 5 Index and Clustering
Physical Database Design DeSiaMorePowered by DeSiaMore 1.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
1 Chapter 4. Tuning a Relational Database System May 2002 Prof. Sang Ho Lee School of Computing, Soongsil University
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
Query Tuning. overview Use of Views CREATE VIEW techlocation AS SELECT ssnum, tech.dept, location FROM employee, tech WHERE employee.dept = tech.dept;
1 CS122A: Introduction to Data Management Lecture #15: Physical DB Design Instructor: Chen Li.
Practical Database Design and Tuning
RELATION.
Physical Database Design and Performance
Methodology – Monitoring and Tuning the Operational System
Physical Database Design for Relational Databases Step 3 – Step 8
Database Performance Tuning and Query Optimization
Evaluation of Relational Operations
Practical Database Design and Tuning
Methodology – Monitoring and Tuning the Operational System
Chapter 11 Database Performance Tuning and Query Optimization
Chapter 17 Designing Databases
Presentation transcript:

Tuning Relational Systems I

Schema design  Trade-offs among normalization, denormalization, clustering, aggregate materialization, vertical partitioning, etc Query rewriting  Using indexes appropriately, avoiding DISTINCTS and ORDER Bys, etc Procedural extensions to relational algebra  Stored procedures, triggers, etc

Schema design  Trade-offs among normalization, denormalization, clustering, aggregate materialization, vertical partitioning, etc Query rewriting  Using indexes appropriately, avoiding DISTINCTS and ORDER Bys, etc Procedural extensions to relational algebra  Stored procedures, triggers, etc

Database Schema A relation schema is a relation name and a set of attributes R(a int, b varchar[20]); A relation instance for R is a set of records over the attributes in the schema for R.

Some Schema are better than others Schema1 (unnormalized): OnOrder1(supplier_id, part_id, quantity, supplier_address) Schema 2 (normalized): OnOrder2(supplier_id, part_id, quantity); Supplier(supplier_id, supplier_address); 100,000 orders; 2,000 suppliers; supplier_ID: 8-byte integer; supplier_address: 50 bytes Space  Schema 2 saves space Information preservation  Some supplier addresses might get lost with schema 1 if a supplier is deleted once the order has been filled Performance trade-off  Frequent access to address of supplier given an ordered part, then schema 1 is good, specially if there are few updates  Many new orders, schema 1 is not good, because it requires extra data entry effort or extra lookup to the DB system for every part ordered by the supplier

Functional Dependencies X is a set of attributes of relation R, and A is a single attribute of R.  X determines A (the functional dependency X  A holds for R) iff: For any relation instance I of R, whenever there are two records r and r’ in I with the same X values, they have the same A value as well. Interesting FD if A is not an attribute of X OnOrder1(supplier_id, part_id, quantity, supplier_address)  supplier_id  supplier_address is an interesting functional dependency

Key of a Relation Attributes X from R constitute a key of R if X determines every attribute in R and no proper subset of X determines an attribute in R.  A key of a relation is a minimal set of attributes that determines all attributes in the relation OnOrder1(supplier_id, part_id, quantity, supplier_address)  supplier_id, part_id is a key  Supplier_id is not a key, because does not determine part_id Supplier(supplier_id, supplier_address);  Supplier_id is a key  Supplier_id, supplier_address is not a key, because they do not constitute a minimal set of attributes that determines all attributes and supplier_id does

Normalization A relation is normalized if every interesting functional dependency X  A involving attributes in R has the property that X is a key of R. OnOrder1 is not normalized, because the key is constituted by supplier_ID, part_ID together, but supplier_ID by itself determines supplier_address OnOrder2 and Supplier are normalized

Example #1 Suppose that a bank associates each customer with his or her home branch. Each branch is in a specific legal jurisdiction.  Is the relation R(customer, branch, jurisdiction) normalized? What are the functional dependencies?  customer  branch  branch  jurisdiction  customer  jurisdiction  Customer is the key, but a functional dependency exists where customer is not involved.  R is not normalized.

Example #2 Suppose that a doctor can work in several hospitals and receives a salary from each one. Is R(doctor, hospital, salary) normalized? What are the functional dependencies?  doctor, hospital  salary The key is doctor, hospital The relation is normalized.

Example #3 Same relation R(doctor, hospital, salary) and we add the attribute primary_home_address. Each doctor has a primary home address and several doctors can have the same primary home address. Is R(doctor, hospital, salary, primary_home_address) normalized? What are the functional dependencies?  doctor, hospital  salary  doctor  primary_home_address  doctor, hospital  primary_home_address The key is no longer doctor, hospital because doctor (a subset) determines one attribute. A normalized decomposition would be:  R1(doctor, hospital, salary)  R2(doctor, primary_home_address)

Practical Schema Design (e.g. ER modeling) Identify entities in the application (e.g., doctors, hospitals, suppliers). Each entity has attributes (an hospital has an address, a juridiction, …). There are two constraints on attributes: 1. An attribute cannot have attribute of its own. 2. The entity associated with an attribute must functionally determine that attribute.

Practical Schema Design Each entity becomes a relation To those relations, add relations that reflect relationships between entities.  Worksin (doctor_ID, hospital_ID) Identify the functional dependencies among all attributes and check that the schema is normalized:  If functional dependency AB  C holds, then ABC should be part of the same relation.

Tuning normalization Different normalization strategies may guide us to different sets of normalized relations  Which one to choose depends on our application’s query patterns

Example Three attributes: account_ID, balance, address. Functional dependencies:  account_ID  balance  account_ID  address Two normalized schema design:  (account_ID, balance, address) or  (account_ID, balance)  (account_ID, address) Which design is better?

Vertical Partitioning Which design is better depends on the query pattern:  The application that sends a monthly statement is the principal user of the address of the owner of an account  The balance is updated or examined several times a day. The second schema might be better because the relation (account_ID, balance) can be made smaller:  More (account_ID, balance) pairs fit in memory, thus increasing the hit ratio  A scan performs better because there are fewer pages. Here, two relations are better than one, even though they require more space

Vertical Partitioning and Scan R (X,Y,Z)  X is an integer  YZ are large strings Scan Query Vertical partitioning exhibits poor performance when all attributes are accessed. Vertical partitioning provides a speed up if only two of the attributes are accessed.

Tuning Normalization - rule A single normalized relation XYZ is better than two normalized relations XY and XZ if the single relation design allows queries to access X, Y and Z together without requiring a join. The two-relation design is better iff:  Users access tend to partition between the two sets Y and Z most of the time  Attributes Y or Z have large values

Vertical Partitioning and Point Queries R (X,Y,Z)  X is an integer  YZ are large strings A mix of point queries access either XYZ or XY. Vertical partitioning gives a performance advantage if the proportion of queries accessing only XY is greater than 20%. The join is not expensive compared to a simple look- up.

Vertical Antipartitioning: example Brokers base their bond-buying decisions on the price trends of those bonds. The database holds the closing price for the last 3000 trading days, however the 10 most recent trading days are especially important.  (bond_id, issue_date, maturity, …) (bond_id, date, price) Vs.  (bond_id, issue_date, maturity, today_price, …10dayago_price) (bond_id, date, price) Second schema stores redundant info, requires extra space  better for queries that need info about prices in the last 10 days, because it avoids a join and avoids fetching 10 price records per bond

Tuning Denormalization Denormalizing means violating normalization for the sake of performance:  Denormalization speeds up performance when attributes from different normalized relations are often accessed together  Denormalization hurts performance for relations that are often updated.

Denormalizing – data Settings: lineitem ( L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT ); region( R_REGIONKEY, R_NAME, R_COMMENT ); nation( N_NATIONKEY, N_NAME, N_REGIONKEY, N_COMMENT,); supplier( S_SUPPKEY, S_NAME, S_ADDRESS, S_NATIONKEY, S_PHONE, S_ACCTBAL, S_COMMENT);  600,000 rows in lineitem, 25 nations, 5 regions, 500 suppliers

Denormalizing – denormalized relation L_REGIONNAME lineitemdenormalized ( L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT, L_REGIONNAME);  600,000 rows in lineitemdenormalized  Cold Buffer  Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives (10000RPM), Windows 2000.

Queries on Normalized vs. Denormalized Schemas Normalized: select L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT, R_NAME from LINEITEM, REGION, SUPPLIER, NATION where L_SUPPKEY = S_SUPPKEY and S_NATIONKEY = N_NATIONKEY and N_REGIONKEY = R_REGIONKEY and R_NAME = 'EUROPE'; Denormalized: select L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT, L_REGIONNAME from LINEITEMDENORMALIZED where L_REGIONNAME = 'EUROPE';

Denormalization TPC-H schema Query: find all lineitems whose supplier is in europe. With a normalized schema this query is a 4-way join. If we denormalize lineitem and introduce the name of the region for each lineitem we obtain a 30% throughput improvement.

Aggregate Maintenance In reporting applications, aggregates (sums, averages, etc) are often used For those queries it may be worthwhile to maintain special tables that hold those aggregates in precomputed form Those tables are called materialized views

Example The accounting department of a convenience store chain issues queries every twenty minutes to obtain:  The total dollar amount on order for a particular vendor  The total dollar amount on order by a particular store outlet. Original Schema: Ordernum(ordernum, itemnum, quantity, purchaser, vendor) Item(itemnum, price) Ordernum and Item have a clustering index on itemnum The total dollar queries are expensive. Can you see why?

Solution: aggregation maintenance Add: VendorOutstanding(vendor, amount), where amount is the dollar value of goods on order to the vendor, with a clustering index on vendor StoreOutstanding(purchaser, amount), where amount is the dollar value of goods on order by the purchaser store, with a clustering index on purchaser. Each update to order causes an update to these two redundant tables (triggers can be used to implement this explicitely, materialized views make these updates implicit) Trade-off between update overhead and lookup speed-up.

Materialized Views in Oracle9i Oracle9i supports materialized views: CREATE MATERIALIZED VIEW VendorOutstanding BUILD IMMEDIATE REFRESH COMPLETE ENABLE QUERY REWRITE AS SELECT orders.vendor, sum(orders.quantity*item.price) FROM orders,item WHERE orders.itemnum = item.itemnum group by orders.vendor; Some Options:  BUILD immediate/deferred  REFRESH complete/fast  ENABLE QUERY REWRITE Key characteristics:  Transparent aggregate maintenance  Transparent expansion performed by the optimizer based on cost. It is the optimizer and not the programmer that performs query rewriting

Aggregate Maintenance SQLServer on Windows2000 accounting department schema and queries orders, 1000 items Using triggers for view maintenance On this experiment, the trade-off is largely in favor of aggregate maintenance