CSE 544 Data Models and Views Lecture #4 Wednesday, January 19, 2011 Dan Suciu -- 544, Winter 2011 1.

Slides:



Advertisements
Similar presentations
1 Lecture 5: SQL Schema & Views. 2 Data Definition in SQL So far we have see the Data Manipulation Language, DML Next: Data Definition Language (DDL)
Advertisements

Chapter 10: Designing Databases
CMSC724: Database Management Systems Instructor: Amol Deshpande
Introduction to Databases
1 Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes November 14, 2007.
Lecture 03: Views and Constraints Wednesday, October 13, 2010 Dan Suciu -- CSEP544 Fall
Midterm Review Lecture 14b. 14 Lectures So Far 1.Introduction 2.The Relational Model 3.Disks and Files 4.Relational Algebra 5.File Org, Indexes 6.Relational.
1 Introduction to Database Systems CSE 444 Lecture 05: Views, Constraints April 9, 2008.
1 Data Definition in SQL So far we have see the Data Manipulation Language, DML Next: Data Definition Language (DDL) Data types: Defines the types. Data.
1 Lecture 06: SQL Friday, January 14, Outline Indexes Defining Views (6.7) Constraints (Chapter 7) We begin E/R diagrams (Chapter 2)
File Systems and Databases
IS 4420 Database Fundamentals Chapter 6: Physical Database Design and Performance Leon Chen.
1 Lecture 05: SQL Wednesday, October 8, Outline Outer joins (6.3.8) Database Modifications (6.5) Defining Relation Schema in SQL (6.6) Indexes.
Chapter 1: Data Models and DBMS Architecture Title: What Goes Around Comes Around Authors: M. Stonebraker, J. Hellerstein Pages: 2-40.
Ch1: File Systems and Databases Hachim Haddouti
Chapter 4 Relational Databases Copyright © 2012 Pearson Education, Inc. publishing as Prentice Hall 4-1.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education 4-1.
Exercises Product ( pname, price, category, maker) Purchase (buyer, seller, store, product) Company (cname, stock price, country) Person( per-name, phone.
SQL (almost end) April 26 th, Agenda HAVING clause Views Modifying views Reusing views.
LECTURE 2 DATABASE SYSTEM CONCEPTS AND ARCHITECTURE.
IST Databases and DBMSs Todd S. Bacastow January 2005.
Copyright © 2004 Pearson Education, Inc. Chapter 1 Introduction.
1 DATABASE TECHNOLOGIES BUS Abdou Illia, Fall 2007 (Week 3, Tuesday 9/4/2007)
Information storage: Introduction of database 10/7/2004 Xiangming Mu.
CSC271 Database Systems Lecture # 30.
Module Title? DBMS Introduction to Database Management System.
Database Technical Session By: Prof. Adarsh Patel.
Database System Concepts and Architecture
Chapter 7: Database Systems Succeeding with Technology: Second Edition.
Architecture for a Database System
File Processing - Database Overview MVNC1 DATABASE SYSTEMS Overview.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
Physical Database Design I, Ch. Eick 1 Physical Database Design I About 25% of Chapter 20 Simple queries:= no joins, no complex aggregate functions Focus.
Announcements. Data Management Chapter 12 Traditional File Approach  Structure Field  Record  File  Fixed All records have common fields, and a field.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
 Three-Schema Architecture Three-Schema Architecture  Internal Level Internal Level  Conceptual Level Conceptual Level  External Level External Level.
Lecture2: Database Environment Prepared by L. Nouf Almujally 1 Ref. Chapter2 Lecture2.
Lecture # 3 & 4 Chapter # 2 Database System Concepts and Architecture Muhammad Emran Database Systems 1.
Database Management COP4540, SCS, FIU Physical Database Design (ch. 16 & ch. 3)
1 CS 430 Database Theory Winter 2005 Lecture 2: General Concepts.
Chapter 8 Data and Knowledge Management. 2 Learning Objectives When you finish this chapter, you will  Know the difference between traditional file organization.
Data resource management
Methodology – Physical Database Design for Relational Databases.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
3/6: Data Management, pt. 2 Refresh your memory Relational Data Model
9-1 © Prentice Hall, 2007 Topic 9: Physical Database Design Object-Oriented Systems Analysis and Design Joey F. George, Dinesh Batra, Joseph S. Valacich,
Lecture 03: Normal Forms, Constraints, Views Wednesday, October 12, 2011 Dan Suciu -- CSEP544 Fall
1 Introduction to Database Systems CSE 444 Lecture 04: SQL April 7, 2008.
1 Lecture 5: Outerjoins, Schema Creation and Views Wednesday, January 15th, 2003.
11-1 © Prentice Hall, 2004 Chapter 11: Physical Database Design Object-Oriented Systems Analysis and Design Joey F. George, Dinesh Batra, Joseph S. Valacich,
Presentation on Database management Submitted To: Prof: Rutvi Sarang Submitted By: Dharmishtha A. Baria Roll:No:1(sem-3)
1 10 Systems Analysis and Design in a Changing World, 2 nd Edition, Satzinger, Jackson, & Burd Chapter 10 Designing Databases.
1 Lecture 06 Data Modeling: E/R Diagrams Wednesday, January 18, 2006.
1 Lecture 05: SQL Wednesday, October 8, Outline Database Modifications (6.5) Defining Relation Schema in SQL (6.6) Indexes Defining Views (6.7)
1 Lecture 03 Views, Constraints Tuesday, January 23, 2007.
LECTURE TWO Introduction to Databases: Data models Relational database concepts Introduction to DDL & DML.
Introduction: Databases and Database Systems Lecture # 1 June 19,2012 National University of Computer and Emerging Sciences.
Lecture 05: SQL Wednesday, January 12, 2005.
Database Management:.
Introduction to Database Systems CSE 444 Lecture 04: SQL
Cse 344 May 18th – Loss and views.
MANAGING DATA RESOURCES
Lecture 05 Views, Constraints
Lecture 06: SQL Monday, October 11, 2004.
UNIT-I Introduction to Database Management Systems
Lecture 05: SQL Wednesday, October 9, 2002.
Lecture 14: SQL Wednesday, October 31, 2001.
Presentation transcript:

CSE 544 Data Models and Views Lecture #4 Wednesday, January 19, 2011 Dan Suciu , Winter

Announcements Projects: please sign up to meet with me on Friday, between 11-1pm (need about 15’). Before that do this: –Form team –Choose project –Think, so we can have a meaningful discussion Homework 1: due on Monday, 12pm (before the lecture) Wayne State, 1/18/2011 Dan Suciu: Queries on Probabilistic Data 2

CSE Fall 2009 References M. Stonebraker and J. Hellerstein. What Goes Around Comes Around. In "Readings in Database Systems" (aka the Red Book). 4th ed. 3

Data Model Motivation User is concerned with real-world data –Data represents different aspects of user’s business –Data typically includes entities and relationships between them –Example entities are students, courses, products, clients –Example relationships are course registrations, product purchases User somehow needs to define data to be stored in DBMS Data model enables a user to define the data using high-level constructs without worrying about many low-level details of how data will be stored on disk CSE Fall

Levels of Abstraction Disk Physical Schema Conceptual Schema External Schema a.k.a logical schema describes stored data in terms of data model includes storage details file organization indexes schema seen by apps 5 Classical picture. Remember it ! Classical picture. Remember it !

CSE Fall 2009 Outline Different types of data Early data models –IMS –CODASYL Physical and logical independence in the relational model Other data models 6

CSE Fall 2009 Different Types of Data Structured data –What is this ? Examples ? Semistructured data –What is this ? –Examples ? Unstructured data –What is this ? Examples ? 7

CSE Fall 2009 Different Types of Data Structured data –All data conforms to a schema. Ex: business data Semistructured data –Some structure in the data but implicit and irregular –Ex: resume, ads Unstructured data –No structure in data. Ex: text, sound, video, images Our focus: structured data & relational DBMSs 8

CSE Fall 2009 Outline Different types of data Early data models –IMS – late 1960’s and 1970’s –CODASYL – 1970’s Physical and logical independence in the relational model Other data models 9

CSE Fall 2009 Early Proposal 1: IMS What is it ? 10

CSE Fall 2009 Early Proposal 1: IMS Hierarchical data model Record –Type: collection of named fields with data types (+) –Instance: must match type definition (+) –Each instance must have a key (+) –Record types must be arranged in a tree (-) IMS database is collection of instances of record types organized in a tree 11

CSE Fall 2009 IMS Example See Figure 2 in paper “What goes around comes around” 12

CSE Fall 2009 Data Manipulation Language: DL/1 How does a programmer retrieve data in IMS ? 13

CSE Fall 2009 Data Manipulation Language: DL/1 Each record has a hierarchical sequence key (HSK) –Records are totally ordered: depth-first and left-to-right HSK defines semantics of commands: –get_next –get_next_within_parent DL/1 is a record-at-a-time language –Programmer constructs an algorithm for solving the query –Programmer must worry about query optimization 14

CSE Fall 2009 Data storage How is the data physically stored in IMS ? 15

CSE Fall 2009 Data storage Root records –Stored sequentially (sorted on key) –Indexed in a B-tree using the key of the record –Hashed using the key of the record Dependent records –Physically sequential –Various forms of pointers Selected organizations restrict DL/1 commands –No updates allowed with sequential organization –No “get-next” for hashed organization 16

CSE Fall 2009 Data Independence What is it ? 17

CSE Fall 2009 Data Independence Physical data independence: Applications are insulated from changes in physical storage details Logical data independence: Applications are insulated from changes to logical structure of the data Why are these properties important? –Reduce program maintenance as –Logical database design changes over time –Physical database design tuned for performance 18

CSE Fall 2009 IMS Limitations Tree-structured data model –Redundant data, existence depends on parent, artificial structure Record-at-a-time user interface –User must specify algorithm to access data Very limited physical independence –Phys. organization limits possible operations –Application programs break if organization changes Provides some logical independence –DL/1 program runs on logical database –Difficult to achieve good logical data independence with a tree model 19

CSE Fall 2009 Early Proposal 2: CODASYL What is it ? 20

CSE Fall 2009 Early Proposal 2: CODASYL Networked data model Primitives are also record types with keys (+) Network model is more flexible than hierarchy(+) –Ex: no existence dependence Record types are organized into network (-) –A record can have multiple parents –Arcs between records are named –At least one entry point to the network Record-at-a-time data manipulation language (-) 21

CSE Fall 2009 CODASYL Example See Figure 5 in paper “What goes around comes around” 22

CSE Fall 2009 CODASYL Limitations No physical data independence –Application programs break if organization changes No logical data independence –Application programs break if organization changes Very complex Programs must “navigate the hyperspace” Load and recover as one gigantic object 23

CSE Fall 2009 Outline Different types of data Early data models –IMS –CODASYL Physical and logical independence in the relational model Other data models 24

CSE Fall 2009 Relational Model Overview Proposed by Ted Codd in 1970 Motivation: better logical and physical data independence 25

Relational Model Overview Defines logical schema only –No physical schema Set-at-a-time query language Wayne State, 1/18/2011 Dan Suciu: Queries on Probabilistic Data 26

CSE Fall 2009 Physical Independence Definition: Applications are insulated from changes in physical storage details Early models (IMS and CODASYL): No Relational model: Yes –Yes through set-at-a-time language: algebra or calculus –No specification of what storage looks like –Administrator can optimize physical layout 27

CSE Fall 2009 Physical Independence Definition: Applications are insulated from changes in physical storage details Early models (IMS and CODASYL): No Relational model: Yes –Yes through set-at-a-time language: algebra or calculus –No specification of what storage looks like –Administrator can optimize physical layout 28

CSE Fall 2009 Logical Independence Definition: Applications are insulated from changes to logical structure of the data Early models –IMS: some logical independence –CODASYL: no logical independence Relational model –Yes through views 29

Great Debate Pro relational –What where the arguments ? Against relational –What where the arguments ? How was it settled ? CSE Fall

Great Debate Pro relational –CODASYL is too complex –CODASYL does not provide sufficient data independence –Record-at-a-time languages are too hard to optimize –Trees/networks not flexible enough to represent common cases Against relational –COBOL programmers cannot understand relational languages –Impossible to represent the relational model efficiently –CODASYL can represent tables Ultimately settled by the market place CSE Fall

CSE Fall 2009 Outline Different types of data Early data models –IMS –CODASYL Physical and logical independence in the relational model Other data models 32

Other Data Models Entity-Relationship: 1970’s –Successful in logical database design (next lecture) Extended Relational: 1980’s Semantic: late 1970’s and 1980’s Object-oriented: late 1980’s and early 1990’s –Address impedance mismatch: relational dbs  OO languages –Interesting but ultimately failed (several reasons, see paper) Object-relational: late 1980’s and early 1990’s –User-defined types, ops, functions, and access methods Semi-structured: late 1990’s to the present CSE Fall

CSE Fall 2009 Summary Data independence is desirable –Both physical and logical –Early data models provided very limited data independence –Relational model facilitates data independence Set-at-a-time languages facilitate phys. indep. [more next lecture] Simple data models facilitate logical indep. [more next lecture] Flat models are also simpler, more flexible User should specify what they want not how to get it –Query optimizer does better job than human New data model proposals must –Solve a “major pain” or provide significant performance gains 34

Views Dan Suciu , Winter Views are relations, but may not be physically stored. For presenting different information to different users Employee(ssn, name, department, project, salary) Payroll has access to Employee, others only to Developers CREATE VIEW Developers AS SELECT name, project FROM Employee WHERE department = ‘Development’ CREATE VIEW Developers AS SELECT name, project FROM Employee WHERE department = ‘Development’

Example Dan Suciu , Winter CREATE VIEW CustomerPrice AS SELECT x.customer, y.price FROM Purchase x, Product y WHERE x.product = y.pname CREATE VIEW CustomerPrice AS SELECT x.customer, y.price FROM Purchase x, Product y WHERE x.product = y.pname Purchase(customer, product, store) Product(pname, price) CustomerPrice(customer, price) “virtual table”

Dan Suciu , Winter SELECT u.customer, v.store FROM CustomerPrice u, Purchase v WHERE u.customer = v.customer AND u.price > 100 SELECT u.customer, v.store FROM CustomerPrice u, Purchase v WHERE u.customer = v.customer AND u.price > 100 We can later use the view: Purchase(customer, product, store) Product(pname, price) CustomerPrice(customer, price)

Types of Views Virtual views: –Pros/cons ??? Materialized views –Pros/cons ?? Dan Suciu , Winter

Types of Views Virtual views: –Used in databases –Computed only on-demand – slow at runtime –Always up to date Materialized views –Used in data warehouses –Pre-computed offline – fast at runtime –May have stale data or expensive synchronization Dan Suciu , Winter

Query Modification Dan Suciu , Winter SELECT u.customer, v.store FROM CustomerPrice u, Purchase v WHERE u.customer = v.customer AND u.price > 100 SELECT u.customer, v.store FROM CustomerPrice u, Purchase v WHERE u.customer = v.customer AND u.price > 100 CREATE VIEW CustomerPrice AS SELECT x.customer, y.price FROM Purchase x, Product y WHERE x.product = y.pname CREATE VIEW CustomerPrice AS SELECT x.customer, y.price FROM Purchase x, Product y WHERE x.product = y.pname View: Query: Purchase(customer, product, store) Product(pname, price) CustomerPrice(customer, price)

Query Modification Dan Suciu , Winter SELECT u.customer, v.store FROM (SELECT x.customer, y.price FROM Purchase x, Product y WHERE x.product = y.pname) u, Purchase v WHERE u.customer = v.customer AND u.price > 100 SELECT u.customer, v.store FROM (SELECT x.customer, y.price FROM Purchase x, Product y WHERE x.product = y.pname) u, Purchase v WHERE u.customer = v.customer AND u.price > 100 Modified query: Purchase(customer, product, store) Product(pname, price) CustomerPrice(customer, price)

Query Modification Dan Suciu , Winter SELECT x.customer, v.store FROM Purchase x, Product y, Purchase v, WHERE x.customer = v.customer AND y.price > 100 AND x.product = y.pname SELECT x.customer, v.store FROM Purchase x, Product y, Purchase v, WHERE x.customer = v.customer AND y.price > 100 AND x.product = y.pname Modified and unnested query: Purchase(customer, product, store) Product(pname, price) CustomerPrice(customer, price)

Another Example Dan Suciu , Winter SELECT DISTINCT u.customer, v.store FROM CustomerPrice u, Purchase v WHERE u.customer = v.customer AND u.price > 100 SELECT DISTINCT u.customer, v.store FROM CustomerPrice u, Purchase v WHERE u.customer = v.customer AND u.price > 100 ?? Purchase(customer, product, store) Product(pname, price) CustomerPrice(customer, price)

Answer Dan Suciu , Winter SELECT DISTINCT u.customer, v.store FROM CustomerPrice u, Purchase v WHERE u.customer = v.customer AND u.price > 100 SELECT DISTINCT u.customer, v.store FROM CustomerPrice u, Purchase v WHERE u.customer = v.customer AND u.price > 100 Purchase(customer, product, store) Product(pname, price) CustomerPrice(customer, price) SELECT DISTINCT x.customer, v.store FROM Purchase x, Product y, Purchase v, WHERE x.customer = v.customer AND y.price > 100 AND x.product = y.pname SELECT DISTINCT x.customer, v.store FROM Purchase x, Product y, Purchase v, WHERE x.customer = v.customer AND y.price > 100 AND x.product = y.pname

Applications of Virtual Views Physical data independence. E.g. –Vertical data partitioning –Horizontal data partitioning Security –The view reveals only what the users are allowed to know Dan Suciu , Winter

Vertical Partitioning SSNNameAddressResumePicture MaryHustonClob1…Blob1… SueSeattleClob2…Blob2… JoanSeattleClob3…Blob3… AnnPortlandClob4…Blob4… Resumes SSNNameAddress MaryHuston SueSeattle... SSNResume Clob1… Clob2… SSNPicture Blob1… Blob2… T1 T2T3

Vertical Partitioning Dan Suciu , Winter CREATE VIEW Resumes AS SELECT T1.ssn, T1.name, T1.address, T2.resume, T3.picture FROM T1,T2,T3 WHERE T1.ssn=T2.ssn and T2.ssn=T3.ssn CREATE VIEW Resumes AS SELECT T1.ssn, T1.name, T1.address, T2.resume, T3.picture FROM T1,T2,T3 WHERE T1.ssn=T2.ssn and T2.ssn=T3.ssn

Vertical Partitioning Dan Suciu , Winter SELECT address FROM Resumes WHERE name = ‘Sue’ SELECT address FROM Resumes WHERE name = ‘Sue’ Which of the tables T1, T2, T3 will be queried by the system ?

Vertical Partitioning When to do this: When some fields are large, rarely accessed –E.g. Picture In distributed databases –Customer info site 1, customer orders at site 2 In data integration –T1 comes from one source –T2 comes from a different source Dan Suciu , Winter

Horizontal Partitioning 50 SSNNameCity MaryHuston SueSeattle JoanSeattle AnnPortland --FrankSpokane --JeanSpokane Customers SSNNameCity MaryHuston CustomersInHuston SSNNameCity SueSeattle JoanSeattle CustomersInSeattle SSNNameCity --FrankSpokane --JeanSpokane CustomersInCanada

Horizontal Partitioning Dan Suciu , Winter CREATE VIEW Customers AS CustomersInHuston UNION ALL CustomersInSeattle UNION ALL... CREATE VIEW Customers AS CustomersInHuston UNION ALL CustomersInSeattle UNION ALL... SSNNameCity MaryHuston CustomersInHuston SSNNameCity SueSeattle JoanSeattle CustomersInSeattle SSNNameCity --FrankSpokane --JeanSpokane CustomersInCanada

Horizontal Partitioning Dan Suciu , Winter SELECT name FROM Cusotmers WHERE city = ‘Seattle’ SELECT name FROM Cusotmers WHERE city = ‘Seattle’ SSNNameCity MaryHuston CustomersInHuston SSNNameCity SueSeattle JoanSeattle CustomersInSeattle SSNNameCity --FrankSpokane --JeanSpokane CustomersInCanada Which tables are inspected by the system ?

Horizontal Partitioning Dan Suciu , Winter SELECT name FROM Cusotmers WHERE city = ‘Seattle’ SELECT name FROM Cusotmers WHERE city = ‘Seattle’ Which tables are inspected by the system ? SSNNameCity MaryHuston CustomersInHuston SSNNameCity SueSeattle JoanSeattle CustomersInSeattle SSNNameCity --FrankSpokane --JeanSpokane CustomersInCanada All ! The system doesn’t know where ‘Seattle’ is

Better Dan Suciu , Winter CREATE VIEW Customers AS SELECT *, ‘Huston’ AS City FROM CustomersInHuston UNION ALL SELECT *, ‘Seattle’ AS City FROM CustomersInSeattle UNION ALL... CREATE VIEW Customers AS SELECT *, ‘Huston’ AS City FROM CustomersInHuston UNION ALL SELECT *, ‘Seattle’ AS City FROM CustomersInSeattle UNION ALL... SSNNameCity MaryHuston CustomersInHuston SSNNameCity SueSeattle JoanSeattle CustomersInSeattle SSNNameCity --FrankSpokane --JeanSpokane CustomersInCanada

Better Dan Suciu , Winter SELECT name FROM Cusotmers WHERE city = ‘Seattle’ SELECT name FROM Cusotmers WHERE city = ‘Seattle’ SELECT name FROM CusotmersInSeattle SELECT name FROM CusotmersInSeattle

Horizontal Partitioning Applications: Optimizations: –E.g. archived applications and active applications Distributed databases Data integration Dan Suciu , Winter

SQL Security Model Discretionary access control: –Users × Tables × {SELECT, INSERT, UPDATE, …} –GRANT and REVOKE commands Coarse grained ! Now row-level access control: –Each customer is allowed to see his/her own records Views are quick fix to that Dan Suciu , Winter

Views and Security Dan Suciu , Winter NameAddressBalance MaryHuston SueSeattle-240 JoanSeattle AnnPortland-520 Customers: Fred is not allowed to see this How do we grant Fred access only to Name/Address ?

Views and Security Dan Suciu , Winter NameAddressBalance MaryHuston SueSeattle-240 JoanSeattle AnnPortland-520 Grant Fred access to this Customers: Fred is not allowed to see this CREATE VIEW PublicCustomers SELECT Name, Address FROM Customers CREATE VIEW PublicCustomers SELECT Name, Address FROM Customers

Views and Security Dan Suciu , Winter NameAddressBalance MaryHuston SueSeattle-240 JoanSeattle AnnPortland-520 Customers: John is not allowed to see >0 balances How do we grant John access only to delinquent accounts ?

Views and Security Dan Suciu , Winter NameAddressBalance MaryHuston SueSeattle-240 JoanSeattle AnnPortland-520 Customers: John is not allowed to see >0 balances CREATE VIEW BadCreditCustomers SELECT * FROM Customers WHERE Balance < 0 CREATE VIEW BadCreditCustomers SELECT * FROM Customers WHERE Balance < 0

Technical Problems in Virtual Views Simplifying queries over virtual views Updating virtual views Dan Suciu , Winter

Set v.s. Bag Semantics Dan Suciu , Winter SELECT DISTINCT a,b,c FROM R, S, T WHERE... SELECT DISTINCT a,b,c FROM R, S, T WHERE... SELECT a,b,c FROM R, S, T WHERE... SELECT a,b,c FROM R, S, T WHERE... Set semantics Bag semantics

Unnesting Queries Inner query: set/bag semantics Outer query: set/bag semantics When can we unnest ? Dan Suciu , Winter

SELECT DISTINCT a,b,c FROM (SELECT DISTINCT u,v FROM R,S WHERE …), T WHERE... SELECT DISTINCT a,b,c FROM (SELECT DISTINCT u,v FROM R,S WHERE …), T WHERE... SELECT DISTINCT a,b,c FROM (SELECT u,v FROM R,S WHERE …), T WHERE... SELECT DISTINCT a,b,c FROM (SELECT u,v FROM R,S WHERE …), T WHERE... SELECT a,b,c FROM (SELECT u,v FROM R,S WHERE …), T WHERE... SELECT a,b,c FROM (SELECT u,v FROM R,S WHERE …), T WHERE... SELECT a,b,c FROM (SELECT DISTINCT u,v FROM R,S WHERE …), T WHERE... SELECT a,b,c FROM (SELECT DISTINCT u,v FROM R,S WHERE …), T WHERE...

SELECT DISTINCT a,b,c FROM (SELECT DISTINCT u,v FROM R,S WHERE …), T WHERE... SELECT DISTINCT a,b,c FROM (SELECT DISTINCT u,v FROM R,S WHERE …), T WHERE... SELECT DISTINCT a,b,c FROM R, S, T WHERE... SELECT DISTINCT a,b,c FROM R, S, T WHERE... SELECT DISTINCT a,b,c FROM (SELECT u,v FROM R,S WHERE …), T WHERE... SELECT DISTINCT a,b,c FROM (SELECT u,v FROM R,S WHERE …), T WHERE... SELECT a,b,c FROM (SELECT u,v FROM R,S WHERE …), T WHERE... SELECT a,b,c FROM (SELECT u,v FROM R,S WHERE …), T WHERE... SELECT a,b,c FROM (SELECT DISTINCT u,v FROM R,S WHERE …), T WHERE... SELECT a,b,c FROM (SELECT DISTINCT u,v FROM R,S WHERE …), T WHERE...

SELECT DISTINCT a,b,c FROM (SELECT DISTINCT u,v FROM R,S WHERE …), T WHERE... SELECT DISTINCT a,b,c FROM (SELECT DISTINCT u,v FROM R,S WHERE …), T WHERE... SELECT DISTINCT a,b,c FROM R, S, T WHERE... SELECT DISTINCT a,b,c FROM R, S, T WHERE... SELECT DISTINCT a,b,c FROM (SELECT u,v FROM R,S WHERE …), T WHERE... SELECT DISTINCT a,b,c FROM (SELECT u,v FROM R,S WHERE …), T WHERE... SELECT DISTINCT a,b,c FROM R, S, T WHERE... SELECT DISTINCT a,b,c FROM R, S, T WHERE... SELECT a,b,c FROM (SELECT u,v FROM R,S WHERE …), T WHERE... SELECT a,b,c FROM (SELECT u,v FROM R,S WHERE …), T WHERE... SELECT a,b,c FROM (SELECT DISTINCT u,v FROM R,S WHERE …), T WHERE... SELECT a,b,c FROM (SELECT DISTINCT u,v FROM R,S WHERE …), T WHERE...

SELECT DISTINCT a,b,c FROM (SELECT DISTINCT u,v FROM R,S WHERE …), T WHERE... SELECT DISTINCT a,b,c FROM (SELECT DISTINCT u,v FROM R,S WHERE …), T WHERE... SELECT DISTINCT a,b,c FROM R, S, T WHERE... SELECT DISTINCT a,b,c FROM R, S, T WHERE... SELECT DISTINCT a,b,c FROM (SELECT u,v FROM R,S WHERE …), T WHERE... SELECT DISTINCT a,b,c FROM (SELECT u,v FROM R,S WHERE …), T WHERE... SELECT DISTINCT a,b,c FROM R, S, T WHERE... SELECT DISTINCT a,b,c FROM R, S, T WHERE... SELECT a,b,c FROM (SELECT u,v FROM R,S WHERE …), T WHERE... SELECT a,b,c FROM (SELECT u,v FROM R,S WHERE …), T WHERE... SELECT a,b,c FROM (SELECT DISTINCT u,v FROM R,S WHERE …), T WHERE... SELECT a,b,c FROM (SELECT DISTINCT u,v FROM R,S WHERE …), T WHERE... NO

SELECT DISTINCT a,b,c FROM (SELECT DISTINCT u,v FROM R,S WHERE …), T WHERE... SELECT DISTINCT a,b,c FROM (SELECT DISTINCT u,v FROM R,S WHERE …), T WHERE... SELECT DISTINCT a,b,c FROM R, S, T WHERE... SELECT DISTINCT a,b,c FROM R, S, T WHERE... SELECT DISTINCT a,b,c FROM (SELECT u,v FROM R,S WHERE …), T WHERE... SELECT DISTINCT a,b,c FROM (SELECT u,v FROM R,S WHERE …), T WHERE... SELECT DISTINCT a,b,c FROM R, S, T WHERE... SELECT DISTINCT a,b,c FROM R, S, T WHERE... SELECT a,b,c FROM (SELECT u,v FROM R,S WHERE …), T WHERE... SELECT a,b,c FROM (SELECT u,v FROM R,S WHERE …), T WHERE... SELECT a,b,c FROM R, S, T WHERE... SELECT a,b,c FROM R, S, T WHERE... SELECT a,b,c FROM (SELECT DISTINCT u,v FROM R,S WHERE …), T WHERE... SELECT a,b,c FROM (SELECT DISTINCT u,v FROM R,S WHERE …), T WHERE... NO

Updating Virtual Views V(A1, A2,..) = view over R1, R2, … Insert/modify/delete in/from V Can we push this to R1, R2, … ? –Updatable view = yes. –Non-updatable view = no. Dan Suciu , Winter

Updatable View Dan Suciu , Winter CREATE VIEW Expensive-Product AS SELECT pname FROM Product WHERE price > 100 CREATE VIEW Expensive-Product AS SELECT pname FROM Product WHERE price > 100 Purchase(customer, product, store) Product(pname, price) INSERT INTO Expensive-Product VALUES(‘Gizmo’) INSERT INTO Expensive-Product VALUES(‘Gizmo’)

Updatable View Dan Suciu , Winter CREATE VIEW Expensive-Product AS SELECT pname FROM Product WHERE price > 100 CREATE VIEW Expensive-Product AS SELECT pname FROM Product WHERE price > 100 INSERT INTO Product VALUES(‘Gizmo’, NULL) INSERT INTO Product VALUES(‘Gizmo’, NULL) Purchase(customer, product, store) Product(pname, price) INSERT INTO Expensive-Product VALUES(‘Gizmo’) INSERT INTO Expensive-Product VALUES(‘Gizmo’)

Updatable View Dan Suciu , Winter CREATE VIEW AcmePurchase AS SELECT customer, product FROM Purchase WHERE store = ‘AcmeStore’ CREATE VIEW AcmePurchase AS SELECT customer, product FROM Purchase WHERE store = ‘AcmeStore’ INSERT INTO AcmePurchase VALUES(‘Joe’, ‘Gizmo’) INSERT INTO AcmePurchase VALUES(‘Joe’, ‘Gizmo’) Purchase(customer, product, store) Product(pname, price)

Updatable View Dan Suciu , Winter CREATE VIEW AcmePurchase AS SELECT customer, product FROM Purchase WHERE store = ‘AcmeStore’ CREATE VIEW AcmePurchase AS SELECT customer, product FROM Purchase WHERE store = ‘AcmeStore’ INSERT INTO AcmePurchase VALUES(‘Joe’, ‘Gizmo’) INSERT INTO AcmePurchase VALUES(‘Joe’, ‘Gizmo’) INSERT INTO Purchase VALUES(‘Joe’,’Gizmo’,NULL) INSERT INTO Purchase VALUES(‘Joe’,’Gizmo’,NULL) Note this Purchase(customer, product, store) Product(pname, price)

Nonupdatable Views Dan Suciu , Winter INSERT INTO CustomerPrice VALUES(‘Joe’, 200) ? ? ? ? ? Most views are non-updateable CREATE VIEW CustomerPrice AS SELECT x.customer, y.price FROM Purchase x, Product y WHERE x.product = y.pname CREATE VIEW CustomerPrice AS SELECT x.customer, y.price FROM Purchase x, Product y WHERE x.product = y.pname Purchase(customer, product, store) Product(pname, price)

Query Minimization under Bag Semantics Rule 1: If: x, y are tuple variables over the same table and: The condition x.key = y.key is in the WHERE clause Then combine x, y into a single variable query Dan Suciu , Winter

Query Minimization under Bag Semantics SELECT y.name, x.date FROM Order x, Product y, Order z WHERE x.pid = y.pid and y.price 150 SELECT y.name, x.date FROM Order x, Product y, Order z WHERE x.pid = y.pid and y.price 150 Order(cid, pid, weight, date) Product(pid, name, price) SELECT y.name, x.date FROM Order x, Product y WHERE x.pid = y.pid and y.price 150 SELECT y.name, x.date FROM Order x, Product y WHERE x.pid = y.pid and y.price 150

Query Minimization under Bag Semantics Rule 2: If x ranges over S, y ranges over T, and The condition x.fk = y.key is in the WHERE clause, and there is a not null constraint on x.fk y is not used anywhere else, and Then remove T (and y) from the query Dan Suciu , Winter

Query Minimization under Bag Semantics Order(cid, pid, weight, date) Product(pid, name, price) SELECT x.cid, x.date FROM Order x WHERE x.weight > 20 SELECT x.cid, x.date FROM Order x WHERE x.weight > 20 SELECT x.cid, x.date FROM Order x, Product y WHERE x.pid = y.pid and x.weight > 20 SELECT x.cid, x.date FROM Order x, Product y WHERE x.pid = y.pid and x.weight > 20 What constraints do we need to have for this optimization ?

Materialized Views The result of the view is materialized May speed up query answering significantly But the materialized view needs to be synchronized with the base data Dan Suciu , Winter

Applications of Materialized Views Indexes Denormalization Semantic caching Dan Suciu , Winter

Indexes 82 REALLY important to speed up query processing time. SELECT * FROM Person WHERE name = 'Smith' SELECT * FROM Person WHERE name = 'Smith' CREATE INDEX myindex05 ON Person(name) Person (name, age, city) May take too long to scan the entire Person table Now, when we rerun the query it will be much faster

B+ Tree Index Dan Suciu , Winter AdamBettyCharles….Smith…. We will discuss them in detail in a later lecture.

Creating Indexes 84 Indexes can be created on more than one attribute: SELECT * FROM Person WHERE age = 55 AND city = 'Seattle' Helps in: SELECT * FROM Person WHERE city = 'Seattle' But not in: CREATE INDEX doubleindex ON Person (age, city) Example: SELECT * FROM Person WHERE age = 55 and even in:

CREATE INDEX W ON Product(weight) CREATE INDEX P ON Product(price) CREATE INDEX W ON Product(weight) CREATE INDEX P ON Product(price) Indexes are Materialized Views SELECT weight, price FROM Product WHERE weight > 10 and price < 100 SELECT weight, price FROM Product WHERE weight > 10 and price < 100 Product(pid, name, weight, price, …) W(pid, weight) P(pid, price) SELECT x.weight, y.price FROM W x, P y WHERE x.weight > 10 and y.price < 100 and x.pid = y.pid SELECT x.weight, y.price FROM W x, P y WHERE x.weight > 10 and y.price < 100 and x.pid = y.pid

Denormalization Compute a view that is the join of several tables The view is now a relation that is not in normal form WHY ? Dan Suciu , Winter CREATE VIEW CustomerPrice AS SELECT * FROM Purchase x, Product y WHERE x.product = y.pname CREATE VIEW CustomerPrice AS SELECT * FROM Purchase x, Product y WHERE x.product = y.pname Purchase(customer, product, store) Product(pname, price)

Semantic Caching Queries Q1, Q2, … have been executed, and their results are stored in main memory Now we need to compute a new query Q Sometimes we can use the prior results in answering Q These queries can be seen as materialized views 87

Technical Challenges in Managing Views Synchronizing materialized views –A.k.a. incremental view maintenance, or incremental view update Answering queries using views Dan Suciu , Winter

Synchronizing Materialized Views Immediate synchronization = after each update Deferred synchronization –Lazy = at query time –Periodic –Forced = manual 89 Which one is best for: indexes, data warehouses, replication ?

Incremental View Update Dan Suciu , Winter CREATE VIEW FullOrder AS SELECT x.cid,x.pid,x.date,y.name,y.price FROM Order x, Product y WHERE x.pid = y.pid CREATE VIEW FullOrder AS SELECT x.cid,x.pid,x.date,y.name,y.price FROM Order x, Product y WHERE x.pid = y.pid UPDATE Product SET price = price / 2 WHERE pid = ‘12345’ UPDATE Product SET price = price / 2 WHERE pid = ‘12345’ Order(cid, pid, date) Product(pid, name, price) UPDATE FullOrder SET price = price / 2 WHERE pid = ‘12345’ UPDATE FullOrder SET price = price / 2 WHERE pid = ‘12345’ No need to recompute the entire view !

Incremental View Update 91 CREATE VIEW Categories AS SELECT DISTINCT category FROM Product CREATE VIEW Categories AS SELECT DISTINCT category FROM Product DELETE Product WHERE pid = ‘12345’ DELETE Product WHERE pid = ‘12345’ Product(pid, name, category, price) DELETE Categories WHERE category in (SELECT category FROM Product WHERE pid = ‘12345’) DELETE Categories WHERE category in (SELECT category FROM Product WHERE pid = ‘12345’) It doesn’t work ! Why ? How can we fix it ?

Incremental View Update 92 CREATE VIEW Categories AS SELECT category, count(*) as c FROM Product GROUP BY category CREATE VIEW Categories AS SELECT category, count(*) as c FROM Product GROUP BY category DELETE Product WHERE pid = ‘12345’ DELETE Product WHERE pid = ‘12345’ Product(pid, name, category, price) UPDATE Categories SET c = c-1 WHERE category in (SELECT category FROM Product WHERE pid = ‘12345’); DELETE Categories WHERE c = 0 UPDATE Categories SET c = c-1 WHERE category in (SELECT category FROM Product WHERE pid = ‘12345’); DELETE Categories WHERE c = 0

Answering Queries Using Views We have several materialized views: –V1, V2, …, Vn Given a query Q –Answer it by using views instead of base tables Variation: Query rewriting using views –Answer it by rewriting it to another query first Example: if the views are indexes, then we rewrite the query to use indexes Dan Suciu , Winter

Rewriting Queries Using Views 94 Purchase(buyer, seller, product, store) Person(pname, city) CREATE VIEW SeattleView AS SELECT y.buyer, y.seller, y.product, y.store FROM Person x, Purchase y WHERE x.city = ‘Seattle’ AND x.pname = y.buyer CREATE VIEW SeattleView AS SELECT y.buyer, y.seller, y.product, y.store FROM Person x, Purchase y WHERE x.city = ‘Seattle’ AND x.pname = y.buyer SELECT y.buyer, y.seller FROM Person x, Purchase y WHERE x.city = ‘Seattle’ AND x..pname = y.buyer AND y.product=‘gizmo’ SELECT y.buyer, y.seller FROM Person x, Purchase y WHERE x.city = ‘Seattle’ AND x..pname = y.buyer AND y.product=‘gizmo’ Goal: rewrite this query in terms of the view Have this materialized view:

Rewriting Queries Using Views Dan Suciu , Winter SELECT y.buyer, y.seller FROM Person x, Purchase y WHERE x.city = ‘Seattle’ AND x..pname = y.buyer AND y.product=‘gizmo’ SELECT y.buyer, y.seller FROM Person x, Purchase y WHERE x.city = ‘Seattle’ AND x..pname = y.buyer AND y.product=‘gizmo’ SELECT buyer, seller FROM SeattleView WHERE product= ‘gizmo’ SELECT buyer, seller FROM SeattleView WHERE product= ‘gizmo’

Rewriting is not always possible 96 CREATE VIEW DifferentView AS SELECT y.buyer, y.seller, y.product, y.store FROM Person x, Purchase y, Product z WHERE x.city = ‘Seattle’ AND x.pname = y.buyer AND y.product = z.name AND z.price < 100 CREATE VIEW DifferentView AS SELECT y.buyer, y.seller, y.product, y.store FROM Person x, Purchase y, Product z WHERE x.city = ‘Seattle’ AND x.pname = y.buyer AND y.product = z.name AND z.price < 100 SELECT y.buyer, y.seller FROM Person x, Purchase y WHERE x.city = ‘Seattle’ AND x..pname = y.buyer AND y.product=‘gizmo’ SELECT y.buyer, y.seller FROM Person x, Purchase y WHERE x.city = ‘Seattle’ AND x..pname = y.buyer AND y.product=‘gizmo’ SELECT buyer, seller FROM DifferentView WHERE product= ‘gizmo’ SELECT buyer, seller FROM DifferentView WHERE product= ‘gizmo’ “Maximally contained rewriting”