CMU SCS Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos – A. Pavlo Lecture#18: Physical Database Design.

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

Introduction to Structured Query Language (SQL)
Manajemen Basis Data Pertemuan 7 Matakuliah: M0264/Manajemen Basis Data Tahun: 2008.
Midterm Review Lecture 14b. 14 Lectures So Far 1.Introduction 2.The Relational Model 3.Disks and Files 4.Relational Algebra 5.File Org, Indexes 6.Relational.
1 Physical Database Design and Tuning Module 5, Lecture 3.
Physical Database Design CIT alternate keys - named constraints - indexes.
Introduction to Structured Query Language (SQL)
IS 4420 Database Fundamentals Chapter 6: Physical Database Design and Performance Leon Chen.
Introduction to Structured Query Language (SQL)
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo How to Scale a Database System.
CS346: Advanced Databases Graham Cormode Physical Database Design and Tuning.
Chapter 17 Methodology – Physical Database Design for Relational Databases Transparencies © Pearson Education Limited 1995, 2005.
Team Dosen UMN Physical DB Design Connolly Book Chapter 18.
Daniel AdinugrohoDatabase Programming 1 DATABASE PROGRAMMING Lecture on 29 – 04 – 2005.
1 DATABASE TECHNOLOGIES BUS Abdou Illia, Fall 2007 (Week 3, Tuesday 9/4/2007)
ASP.NET Programming with C# and SQL Server First Edition
Practical Database Design and Tuning. Outline  Practical Database Design and Tuning Physical Database Design in Relational Databases An Overview of Database.
CSC271 Database Systems Lecture # 30.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
Lecture 9 Methodology – Physical Database Design for Relational Databases.
 DATABASE DATABASE  DATABASE ENVIRONMENT DATABASE ENVIRONMENT  WHY STUDY DATABASE WHY STUDY DATABASE  DBMS & ITS FUNCTIONS DBMS & ITS FUNCTIONS 
TM 7-1 Copyright © 1999 Addison Wesley Longman, Inc. Physical Database Design.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 2: Intro to Relational.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
I Information Systems Technology Ross Malaga 4 "Part I Understanding Information Systems Technology" Copyright © 2005 Prentice Hall, Inc. 4-1 DATABASE.
DATABASE MGMT SYSTEM (BCS 1423) Chapter 5: Methodology – Conceptual Database Design.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
Chapter 16 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
Physical Database Design I, Ch. Eick 1 Physical Database Design I About 25% of Chapter 20 Simple queries:= no joins, no complex aggregate functions Focus.
Object Persistence Design Chapter 13. Key Definitions Object persistence involves the selection of a storage format and optimization for performance.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
7 1 Chapter 7 Introduction to Structured Query Language (SQL) Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Chapter No 4 Query optimization and Data Integrity & Security.
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
Database Management COP4540, SCS, FIU Physical Database Design (ch. 16 & ch. 3)
Data Driven Designs 99% of enterprise applications operate on database data or at least interface databases. Most common DBMS are Microsoft SQL Server,
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#16: Schema Refinement & Normalization.
Database Fundamental & Design by A.Surasit Samaisut Copyrights : All Rights Reserved.
GLOBEX INFOTEK Copyright © 2013 Dr. Emelda Ntinglet-DavisSYSTEMS ANALYSIS AND DESIGN METHODSINTRODUCTORY SESSION EFFECTIVE DATABASE DESIGN for BEGINNERS.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Lec 7 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
Chapter 4 Logical & Physical Database Design
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#25: Column Stores.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
SCALING AND PERFORMANCE CS 260 Database Systems. Overview  Increasing capacity  Database performance  Database indexes B+ Tree Index Bitmap Index 
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
1 Introduction to Database Systems, CS420 SQL Views and Indexes.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 The Relational Model Chapter 3.
1 CS122A: Introduction to Data Management Lecture #4 (E-R  Relational Translation) Instructor: Chen Li.
1 CS122A: Introduction to Data Management Lecture #15: Physical DB Design Instructor: Chen Li.
Practical Database Design and Tuning
Lecture#18: Physical Database Design
CS 540 Database Management Systems
ITD1312 Database Principles Chapter 5: Physical Database Design
Database Application Development
Translation of ER-diagram into Relational Schema
Physical Database Design
Faloutsos/Pavlo C. Faloutsos – A. Pavlo Lecture#25: Column Stores
Faloutsos/Pavlo C. Faloutsos – A. Pavlo Lecture#27: Final Review
Cse 344 APRIL 23RD – Indexing.
Practical Database Design and Tuning
Team Project, Part II NOMO Auto, Part II IST 210 Section 4
Selected Topics: External Sorting, Join Algorithms, …
The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited)
Evaluation of Relational Operations: Other Techniques
Distributed Databases
Physical Database Design
Database Application Development
Presentation transcript:

CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#18: Physical Database Design

CMU SCS Administrivia HW6 is due right now. HW7 is out today –Phase 1: Wed Nov 11 th –Phase 2: Mon Nov 30th Recitations (WEH 5302 ): –Tue Nov 10 th –Tue Nov 17 th Faloutsos/PavloCMU SCS /6152

CMU SCS HW7: CMU “YikYak” Faloutsos/PavloCMU SCS /6153 PHP Web Application Postgres Database Phase 1: Design Spec Phase 2: Implementation

CMU SCS Last Class Decomposition –Lossless –Dependency Preserving Normal Forms –3NF –BCNF Faloutsos/PavloCMU SCS /6154

CMU SCS Today’s Class Introduction Index Selection Denormalization Decomposition Partitioning Advanced Topics Faloutsos/PavloCMU SCS /6155

CMU SCS Introduction After ER design, schema refinement, and the view definitions, we have a conceptual and external schema for our database. The next step is to create the physical design of the database. Faloutsos/PavloCMU SCS /6156

CMU SCS Physical Database Design Physical design is tightly linked to query optimization –Query optimization is usually a “top down” concept. –But in this lecture we’ll discuss this from the “bottom up” Faloutsos/PavloCMU SCS /6157

CMU SCS Physical Database Design It is important to understand the application’s workload: –What kind of queries/updates does it execute? –How fast is the database growing? –What is the desired performance metric? Faloutsos/PavloCMU SCS /6158

CMU SCS Understanding Queries For each query in the workload: –Which relations does it access? –Which attributes are retrieved? –Which attributes are involved in selection/join conditions? –How selective are these conditions likely to be? Faloutsos/PavloCMU SCS /6159

CMU SCS Understanding Updates For each update in the workload: –Which attributes are involved in predicates? –How selective are these conditions likely to be? –What types of update operations and what attributes do they affect? –How often are records inserted/updated/deleted? Faloutsos/PavloCMU SCS /61510

CMU SCS Consequences Changing a database’s design does not magically make every query run faster. –May require you to modify your queries and/or application logic. APIs hide implementation details and can help prevent upstream apps from breaking when things change. Faloutsos/PavloCMU SCS /61511

CMU SCS General DBA Advice Modifying the physical design of a database is expensive. DBA’s usually do this when the application demand is low –Typically Sunday mornings. –May have to do it whenever the application changes. Faloutsos/PavloCMU SCS /61512

CMU SCS Today’s Class Introduction Index Selection Denormalization Decomposition Partitioning Advanced Topics Faloutsos/PavloCMU SCS /61513

CMU SCS Index Selection Which relations should have indexes? What attributes(s) or expressions should be the search key? What order to use for attributes in index? How many indexes should we create? For each index, what kind of an index should it be? Faloutsos/PavloCMU SCS /61514

CMU SCS Example #1 Faloutsos/PavloCMU SCS /61515 CREATE TABLE users ( userID INT, servID VARCHAR, data VARCHAR, updated DATETIME, PRIMARY KEY (userId) ); CREATE TABLE users ( userID INT, servID VARCHAR, data VARCHAR, updated DATETIME, PRIMARY KEY (userId) ); CREATE TABLE locations ( locationID INT, servID VARCHAR, coordX FLOAT, coordY FLOAT ); CREATE TABLE locations ( locationID INT, servID VARCHAR, coordX FLOAT, coordY FLOAT ); Get the location coordinates of a service for any user with an id greater than some value and whose record was updated on a Tuesday. SELECT U.*, L.coordX, L.coordY FROM users AS U INNER JOIN locations AS L ON (U.servID = L.servID) WHERE U.userID > $1 AND EXTRACT(dow FROM U.updated) = 2; SELECT U.*, L.coordX, L.coordY FROM users AS U INNER JOIN locations AS L ON (U.servID = L.servID) WHERE U.userID > $1 AND EXTRACT(dow FROM U.updated) = 2;

CMU SCS Example #1: Join Clause Examine the attributes in the join clause –Is there an index? –What is the cardinality of the attributes? Faloutsos/PavloCMU SCS /61516 SELECT U.*, L.coordX, L.coordY FROM users AS U INNER JOIN locations AS L ON (U.servID = L.servID) WHERE U.userID > $1 AND EXTRACT(dow FROM U.updated) = 2; SELECT U.*, L.coordX, L.coordY FROM users AS U INNER JOIN locations AS L ON (U.servID = L.servID) WHERE U.userID > $1 AND EXTRACT(dow FROM U.updated) = 2;

CMU SCS Example #1: Where Clause Examine the attributes in the where clause –Is there an index? –How are they be accessed? Faloutsos/PavloCMU SCS /61517 SELECT U.*, L.coordX, L.coordY FROM users AS U INNER JOIN locations AS L ON (U.servID = L.servID) WHERE U.userID > $1 AND EXTRACT(dow FROM U.updated) = 2; SELECT U.*, L.coordX, L.coordY FROM users AS U INNER JOIN locations AS L ON (U.servID = L.servID) WHERE U.userID > $1 AND EXTRACT(dow FROM U.updated) = 2;

CMU SCS Example #1: Output Clause Examine the query’s output clause –What attributes from what tables are needed? Faloutsos/PavloCMU SCS /61518 SELECT U.*, L.coordX, L.coordY FROM users AS U INNER JOIN locations AS L ON (U.servID = L.servID) WHERE U.userID > $1 AND EXTRACT(dow FROM U.updated) = 2; SELECT U.*, L.coordX, L.coordY FROM users AS U INNER JOIN locations AS L ON (U.servID = L.servID) WHERE U.userID > $1 AND EXTRACT(dow FROM U.updated) = 2;

CMU SCS Example #1: Summary Join: U.servID, L.servID Where: U.userID, U.updated Output: U.userID, U.servID, U.data, U.updated, L.coordX, L.coordY Faloutsos/PavloCMU SCS /61519 SELECT U.*, L.coordX, L.coordY FROM users AS U INNER JOIN locations AS L ON (U.servID = L.servID) WHERE U.userID > $1 AND EXTRACT(dow FROM U.updated) = 2; SELECT U.*, L.coordX, L.coordY FROM users AS U INNER JOIN locations AS L ON (U.servID = L.servID) WHERE U.userID > $1 AND EXTRACT(dow FROM U.updated) = 2;

CMU SCS Index Selection We already have an index on U.userID. –Why? What if we created separate indexes for U.servID and L.servID ? Faloutsos/PavloCMU SCS /61520 CREATE INDEX idx_u_servID ON users (servID); CREATE INDEX idx_l_servID ON locations (servID);

CMU SCS Index Selection (2) We still have to look up U.updated. What if we created another index? This doesn’t help our query. Why? Faloutsos/PavloCMU SCS /61521 CREATE INDEX idx_u_servID ON users (servID); CREATE INDEX idx_l_servID ON locations (servID); CREATE INDEX idx_u_updated ON users (updated); CREATE INDEX idx_u_updated ON users ( EXTRACT(dow FROM updated)); CREATE INDEX idx_u_updated ON users ( EXTRACT(dow FROM updated)); SELECT U.*, L.coordX, L.coordY FROM users AS U INNER JOIN locations AS L ON (U.servID = L.servID) WHERE U.userID > $1 AND EXTRACT(dow FROM U.updated) = 2; SELECT U.*, L.coordX, L.coordY FROM users AS U INNER JOIN locations AS L ON (U.servID = L.servID) WHERE U.userID > $1 AND EXTRACT(dow FROM U.updated) = 2;

CMU SCS Index Selection (3) The query outputs L.coordX and L.coordX. This means that we have to fetch the location record. We can create a covering index. Faloutsos/PavloCMU SCS /61522 CREATE INDEX idx_u_servID ON users (servID); CREATE INDEX idx_l_servID ON locations (servID); CREATE INDEX idx_u_updated ON users ( EXTRACT(dow FROM updated)); CREATE INDEX idx_u_updated ON users ( EXTRACT(dow FROM updated)); CREATE INDEX idx_l_servID ON locations ( servID, coordX, coordY); CREATE INDEX idx_l_servID ON locations ( servID, coordX, coordY); CREATE INDEX idx_l_servID ON locations (servID) INCLUDE (coordX, coordY); CREATE INDEX idx_l_servID ON locations (servID) INCLUDE (coordX, coordY); Only MSSQL

CMU SCS Index Selection (4) Can we do any better? Is the index U.servID necessary? Create a partial index Faloutsos/PavloCMU SCS /61523 CREATE INDEX idx_u_servID ON users (servID); CREATE INDEX idx_u_updated ON users ( EXTRACT(dow FROM updated)); CREATE INDEX idx_u_updated ON users ( EXTRACT(dow FROM updated)); CREATE INDEX idx_l_servID ON locations ( servID, coordX, coordY); CREATE INDEX idx_l_servID ON locations ( servID, coordX, coordY); CREATE INDEX idx_u_servID ON users (servID) WHERE EXTRACT(dow FROM updated) = 2; CREATE INDEX idx_u_servID ON users (servID) WHERE EXTRACT(dow FROM updated) = 2; Repeat for the other six days of the week!

CMU SCS Index Selection (5) Should we make the index on users a covering index? –What if U.data is large? –Should userID come before servID ? –Do we still need the primary key index? Faloutsos/PavloCMU SCS /61524 CREATE INDEX idx_u_everything ON users (servID, userID, data) WHERE EXTRACT(dow FROM updated) = 2; CREATE INDEX idx_u_everything ON users (servID, userID, data) WHERE EXTRACT(dow FROM updated) = 2; CREATE INDEX idx_u_everything ON users (servID, userID) WHERE EXTRACT(dow FROM updated) = 2; CREATE INDEX idx_u_everything ON users (servID, userID) WHERE EXTRACT(dow FROM updated) = 2; CREATE INDEX idx_u_everything ON users (userID, servID) WHERE EXTRACT(dow FROM updated) = 2; CREATE INDEX idx_u_everything ON users (userID, servID) WHERE EXTRACT(dow FROM updated) = 2;

CMU SCS Other Index Decisions What type of index to use? –B+Tree, Hash table, Bitmap, R-Tree, Full Text Faloutsos/PavloCMU SCS /61525

CMU SCS Today’s Class Introduction Index Selection Denormalization Decomposition Partitioning Advanced Topics Faloutsos/PavloCMU SCS /61526

CMU SCS Denormalization Joins can be expensive, so it might be better to denormalize two tables back into one. This is goes against all of the BCNF goodness that we talked about it. –But we have bills to pay, so this is an example where reality conflicts with the theory… Faloutsos/PavloCMU SCS /61527

CMU SCS Game Example #1 Faloutsos/PavloCMU SCS /61528 CREATE TABLE players ( playerID INT PRIMARY KEY, ⋮ ); CREATE TABLE players ( playerID INT PRIMARY KEY, ⋮ ); CREATE TABLE prefs ( playerID INT PRIMARY KEY REFERENCES player (playerID), data VARBINARY ); CREATE TABLE prefs ( playerID INT PRIMARY KEY REFERENCES player (playerID), data VARBINARY );

CMU SCS Game Example #1 Get player preferences (1:1) Faloutsos/PavloCMU SCS /61529 SELECT P1.*, P2.* FROM players AS P1 INNER JOIN prefs AS P2 ON P1.playerID = P2.playerID WHERE P1.playerID = $1 SELECT P1.*, P2.* FROM players AS P1 INNER JOIN prefs AS P2 ON P1.playerID = P2.playerID WHERE P1.playerID = $1

CMU SCS Game Example #1 Faloutsos/PavloCMU SCS /61530 CREATE TABLE players ( playerID INT PRIMARY KEY, ⋮ ); CREATE TABLE players ( playerID INT PRIMARY KEY, ⋮ ); CREATE TABLE prefs ( playerID INT PRIMARY KEY REFERENCES player (playerID), data VARBINARY ); CREATE TABLE prefs ( playerID INT PRIMARY KEY REFERENCES player (playerID), data VARBINARY ); CREATE TABLE players ( playerID INT PRIMARY KEY, data VARBINARY, ⋮ ); CREATE TABLE players ( playerID INT PRIMARY KEY, data VARBINARY, ⋮ ); SELECT P1.* FROM players AS P1 WHERE P1.playerID = $1 SELECT P1.* FROM players AS P1 WHERE P1.playerID = $1 Denormalize into parent table

CMU SCS Denormalization (1:n) It’s harder to denormalize tables with a 1:n relationship. –Why? Example: There are multiple “game” instances in our application that players participate in. We need to keep track of each player’s score per game. Faloutsos/PavloCMU SCS /61531

CMU SCS Game Example #2 Faloutsos/PavloCMU SCS /61532 CREATE TABLE players ( playerID INT PRIMARY KEY, ⋮ ); CREATE TABLE players ( playerID INT PRIMARY KEY, ⋮ ); CREATE TABLE games ( gameID INT PRIMARY KEY, ⋮ ); CREATE TABLE games ( gameID INT PRIMARY KEY, ⋮ ); CREATE TABLE scores ( gameID INT REFERENCES games (gameID), playerID INT REFERENCES players (playerID), score INT, PRIMARY KEY (gameID, playerID) ); CREATE TABLE scores ( gameID INT REFERENCES games (gameID), playerID INT REFERENCES players (playerID), score INT, PRIMARY KEY (gameID, playerID) );

CMU SCS Game Example #2 Get the list of playerIDs for a particular game sorted by their score. Faloutsos/PavloCMU SCS /61533 SELECT S.gameID, S.score, G.*, P.* FROM players AS P, games AS G, scores AS S WHERE G.gameID = $1 AND G.gameID = S.gameID AND S.playerID = P.playerID ORDER BY S.score DESC SELECT S.gameID, S.score, G.*, P.* FROM players AS P, games AS G, scores AS S WHERE G.gameID = $1 AND G.gameID = S.gameID AND S.playerID = P.playerID ORDER BY S.score DESC

CMU SCS Game Example #2 Faloutsos/PavloCMU SCS /61534 CREATE TABLE players ( playerID INT PRIMARY KEY, ⋮ ); CREATE TABLE players ( playerID INT PRIMARY KEY, ⋮ ); CREATE TABLE games ( gameID INT PRIMARY KEY, ⋮ ); CREATE TABLE games ( gameID INT PRIMARY KEY, ⋮ ); CREATE TABLE scores ( gameID INT REFERENCES games (gameID), playerID INT REFERENCES players (playerID), score INT, PRIMARY KEY (gameID, playerID) ); CREATE TABLE scores ( gameID INT REFERENCES games (gameID), playerID INT REFERENCES players (playerID), score INT, PRIMARY KEY (gameID, playerID) ); CREATE TABLE games ( gameID INT PRIMARY KEY, playerID1 INT REFERENCES players (playerID), score1 INT, playerID2 INT REFERENCES players (playerID), score2 INT, playerID3 INT REFERENCES players (playerID), score3 INT, ⋮ ); CREATE TABLE games ( gameID INT PRIMARY KEY, playerID1 INT REFERENCES players (playerID), score1 INT, playerID2 INT REFERENCES players (playerID), score2 INT, playerID3 INT REFERENCES players (playerID), score3 INT, ⋮ );

CMU SCS Arrays Denormalize 1:n relationships by storing multiple values in a single attribute. –Oracle: VARRAY –Postgres: ARRAY –DB2/MSSQL/SQLite: UDTs –MySQL: Fake it with VARBINARY Requires you to modify your application to manage these arrays. –DBMS will not enforce foreign key constraints. Faloutsos/PavloCMU SCS /61535

CMU SCS CREATE TABLE games ( gameID INT PRIMARY KEY, playerIDs INT ARRAY, scores INT ARRAY, ⋮ ); CREATE TABLE games ( gameID INT PRIMARY KEY, playerIDs INT ARRAY, scores INT ARRAY, ⋮ ); Game Example #2 36 CREATE TABLE players ( playerID INT PRIMARY KEY, ⋮ ); CREATE TABLE players ( playerID INT PRIMARY KEY, ⋮ ); CREATE TABLE games ( gameID INT PRIMARY KEY, playerIDs INT[], scores INT[], ⋮ ); CREATE TABLE games ( gameID INT PRIMARY KEY, playerIDs INT[], scores INT[], ⋮ ); INSERT INTO games VALUES ( 1, --gameId '{4, 3, 1, 5, 2}', --playerIDs '{900, 800, 700, 600, 500}' --scores ); INSERT INTO games VALUES ( 1, --gameId '{4, 3, 1, 5, 2}', --playerIDs '{900, 800, 700, 600, 500}' --scores ); SELECT P.*, G.* G.scores[idx(G.playerIDs, P.playerID)] AS score FROM players AS P JOIN games AS G ON P.playerID = ANY(G.playerIDs) WHERE G.gameID = $1 ORDER BY score DESC; SELECT P.*, G.* G.scores[idx(G.playerIDs, P.playerID)] AS score FROM players AS P JOIN games AS G ON P.playerID = ANY(G.playerIDs) WHERE G.gameID = $1 ORDER BY score DESC; Not a standard SQL function See:

CMU SCS Game Example #2 Faloutsos/PavloCMU SCS /61537 CREATE TABLE players ( playerID INT PRIMARY KEY, ⋮ ); CREATE TABLE players ( playerID INT PRIMARY KEY, ⋮ ); CREATE TABLE games ( gameID INT PRIMARY KEY, playerScores INT[][], -- (playerId, score) ⋮ ); CREATE TABLE games ( gameID INT PRIMARY KEY, playerScores INT[][], -- (playerId, score) ⋮ ); No easy way to query this in pure SQL…

CMU SCS Today’s Class Introduction Index Selection Denormalization Decomposition Partitioning Advanced Topics Faloutsos/PavloCMU SCS /61538

CMU SCS Decomposition Split physical tables up to reduce the overhead of reading data during query execution. –Vertical: Break off attributes from a table. –Horizontal: Split data from a single table into multiple tables. This is an application of normalization. Faloutsos/PavloCMU SCS /61539

CMU SCS Wikipedia Example Faloutsos/PavloCMU SCS /61540 CREATE TABLE pages ( pageID INT PRIMARY KEY, title VARCHAR UNIQUE, latest INT REFERENCES revisions (revID), updated DATETIME ); CREATE TABLE pages ( pageID INT PRIMARY KEY, title VARCHAR UNIQUE, latest INT REFERENCES revisions (revID), updated DATETIME ); CREATE TABLE revisions ( revID INT PRIMARY KEY, pageID INT REFERENCES pages (pageID), content TEXT, updated DATETIME ); CREATE TABLE revisions ( revID INT PRIMARY KEY, pageID INT REFERENCES pages (pageID), content TEXT, updated DATETIME );

CMU SCS Wikipedia Example Load latest revision for page Get all revision history for page Faloutsos/PavloCMU SCS /61541 SELECT P.*, R.* FROM pages AS P INNER JOIN revisions AS R ON P.latest = R.revID WHERE P.pageID = $1 SELECT P.*, R.* FROM pages AS P INNER JOIN revisions AS R ON P.latest = R.revID WHERE P.pageID = $1 SELECT R.revID, R.updated,... FROM revisions AS R WHERE R.pageID = $1 SELECT R.revID, R.updated,... FROM revisions AS R WHERE R.pageID = $1

CMU SCS Wikipedia Example Faloutsos/PavloCMU SCS /61542 CREATE TABLE pages ( pageID INT PRIMARY KEY, title VARCHAR UNIQUE, latest INT REFERENCES revisions (revID), updated DATETIME ); CREATE TABLE pages ( pageID INT PRIMARY KEY, title VARCHAR UNIQUE, latest INT REFERENCES revisions (revID), updated DATETIME ); CREATE TABLE revisions ( revID INT PRIMARY KEY, pageID INT REFERENCES pages (pageID), content TEXT, updated DATETIME ); CREATE TABLE revisions ( revID INT PRIMARY KEY, pageID INT REFERENCES pages (pageID), content TEXT, updated DATETIME ); Avg. Size of Wikipedia Revision: ~16KB

CMU SCS Vertical Decomposition Split out large attributes into a separate table (aka “normalize”). Trade-offs: –Some queries will have to perform a join to get the data that they need. –But other queries will read less data. Faloutsos/PavloCMU SCS /61543

CMU SCS Vertical Decomposition Faloutsos/PavloCMU SCS /61544 CREATE TABLE revData ( revID INT REFERENCES revisions (revID), content TEXT ); CREATE TABLE revData ( revID INT REFERENCES revisions (revID), content TEXT ); CREATE TABLE pages ( pageID INT PRIMARY KEY, title VARCHAR UNIQUE, latest INT REFERENCES revisions (revID), updated DATETIME ); CREATE TABLE pages ( pageID INT PRIMARY KEY, title VARCHAR UNIQUE, latest INT REFERENCES revisions (revID), updated DATETIME ); CREATE TABLE revisions ( revID INT PRIMARY KEY, pageID INT REFERENCES pages (pageID), content TEXT, updated DATETIME ); CREATE TABLE revisions ( revID INT PRIMARY KEY, pageID INT REFERENCES pages (pageID), content TEXT, updated DATETIME ); CREATE TABLE revisions ( revID INT PRIMARY KEY, pageID INT REFERENCES pages (pageID), updated DATETIME ); CREATE TABLE revisions ( revID INT PRIMARY KEY, pageID INT REFERENCES pages (pageID), updated DATETIME ); SELECT P.*, R.*, RD.* FROM pages AS P, revisions AS R, revData AS RD WHERE P.pageID = $1 AND P.latest = R.revID AND R.revID = RD.revID SELECT P.*, R.*, RD.* FROM pages AS P, revisions AS R, revData AS RD WHERE P.pageID = $1 AND P.latest = R.revID AND R.revID = RD.revID

CMU SCS Horizontal Decomposition Replace a single table with multiple tables where tuples are assigned to a table based on some condition. Can mask the changes to the application using views and triggers. Faloutsos/PavloCMU SCS /61545

CMU SCS Horizontal Decomposition Faloutsos/PavloCMU SCS /61546 CREATE TABLE revDataNew ( revID INT REFERENCES revisions (revID), content TEXT ); CREATE TABLE revDataNew ( revID INT REFERENCES revisions (revID), content TEXT ); CREATE TABLE revisions ( revID INT PRIMARY KEY, pageID INT REFERENCES pages (pageID), updated DATETIME ); CREATE TABLE revisions ( revID INT PRIMARY KEY, pageID INT REFERENCES pages (pageID), updated DATETIME ); CREATE TABLE revDataOld ( revID INT REFERENCES revisions (revID), content TEXT ); CREATE TABLE revDataOld ( revID INT REFERENCES revisions (revID), content TEXT ); All new revisions are first added to this table. Then a revision is moved to this table when it is no longer the latest. CREATE VIEW revData AS (SELECT * FROM revDataNew) UNION (SELECT * FROM revDataOld) CREATE VIEW revData AS (SELECT * FROM revDataNew) UNION (SELECT * FROM revDataOld) SELECT R.revID, R.updated,... FROM revisions AS R WHERE R.pageID = $1 SELECT R.revID, R.updated,... FROM revisions AS R WHERE R.pageID = $1

CMU SCS Today’s Class Introduction Index Selection Denormalization Decomposition Partitioning Advanced Topics Faloutsos/PavloCMU SCS /61547

CMU SCS Partitioning Split single logical table into disjoint physical segments that are stored/managed separately. Ideally partitioning is transparent to the application. –The application accesses logical tables and doesn’t care how things are stored. –Not always true. Faloutsos/PavloCMU SCS /61548

CMU SCS Vertical Partitioning Store a table’s attributes in a separate location (e.g., file, disk volume). Have to store tuple information to reconstruct the original record. 49 Tuple#1 Tuple#2 Tuple#3 Tuple#4 revIDpageIDupdated revIDpageIDupdated revIDpageIDupdated revIDpageIDupdated content... Tuple#1 Tuple#2 Tuple#3 Tuple#4 Partition #1Partition #2

CMU SCS Horizontal Partitioning Divide the tuples of a table up into disjoint segments based on some partitioning key. –Hash Partitioning –Range Partitioning –Predicate Partitioning We will cover this more in depth when we talk about distributed databases. Faloutsos/PavloCMU SCS /61550

CMU SCS Horizontal Partitioning (Postgres) 51 CREATE TABLE revisions ( revID INT PRIMARY KEY, pageID INT REFERENCES pages (pageID), updated DATETIME ); CREATE TABLE revisions ( revID INT PRIMARY KEY, pageID INT REFERENCES pages (pageID), updated DATETIME ); CREATE TABLE revDataNew ( revID INT REFERENCES revisions (revID), content TEXT ); CREATE TABLE revDataNew ( revID INT REFERENCES revisions (revID), content TEXT ); CREATE TABLE revDataOld ( revID INT REFERENCES revisions (revID), content TEXT ); CREATE TABLE revDataOld ( revID INT REFERENCES revisions (revID), content TEXT ); CREATE TABLE revData ( revID INT REFERENCES revisions (revID), content TEXT, isLatest BOOLEAN DEFAULT true ); CREATE TABLE revData ( revID INT REFERENCES revisions (revID), content TEXT, isLatest BOOLEAN DEFAULT true ); CREATE TABLE revDataNew ( CHECK (isLatest = true) ) INHERITS revData; CREATE TABLE revDataNew ( CHECK (isLatest = true) ) INHERITS revData; CREATE TABLE revDataOld ( CHECK (isLatest = false) ) INHERITS revData; CREATE TABLE revDataOld ( CHECK (isLatest = false) ) INHERITS revData; Still need triggers to move data between partitions on update.

CMU SCS Today’s Class Introduction Index Selection Denormalization Decomposition Partitioning Advanced Topics Faloutsos/PavloCMU SCS /61552

CMU SCS Caching Queries for content that does not change often slow down the database. Use external cache to store objects. –Memcached, Facebook Tao –Application has to maintain consistency. Faloutsos/PavloCMU SCS /61553

CMU SCS Auto-Tuning Vendors include tools that can help with the physical design process: –IBM DB2 Advisor –Microsoft AutoAdmin –Oracle SQL Tuning Advisor –Random MySQL/Postgres tools Still a very manual process. We are working on something better… Faloutsos/PavloCMU SCS /61554

CMU SCS

Next Three Weeks Database System Internals –Concurrency Control –Logging & Recovery –Distributed DBMSs –Column Store DBMSs Faloutsos/PavloCMU SCS /61556