CS405G: Introduction to Database Systems
Announcement Today Review Friday Go over homework Course evaluation
Materials Review for final Book Slides ( should be all on the course website) Homework Quizzes Mid-Term
Jinze Liu @ University of Kentucky Database Design 11/16/2018 Jinze Liu @ University of Kentucky
E-R model E-R model Entities Attributes Relationships
Jinze Liu @ University of Kentucky 11/16/2018 Jinze Liu @ University of Kentucky
Database Design 11/16/2018 11/16/2018 7 7 7
From E-R Diagram to Relations Schemas Converting E-R diagram to relations Keys Super keys Candidate keys Primary keys Relational integrity constraints
Jinze Liu @ University of Kentucky Key Constraints Superkey: (Uniqueness constraints) A set of attributes where no two distinct tuples can have the same values Every relation has at least one superkey: The set of all attributes. Key: A minimal superkey Uniqueness constraint (superkey) Minimum Constraint No attribute can be removed and still satisfy the uniqueness constraints. 11/16/2018 Jinze Liu @ University of Kentucky 6
Relational Integrity Constraints Constraints are conditions that must hold on all valid relation instances. There are four main types of constraints: Domain constraints The value of a attribute must come from its domain Key constraints Entity integrity constraints Referential integrity constraints 11/16/2018 11/16/2018 10 10 10
Database Normalization Functional Dependency Functional Closure Keys Redefined Based on functional dependency DB Norm Form 1st, 2nd, 3rd, BCNF
Three Types of non-key DF X A X A Partial dependency key X A X A Transitive dependency I key X A X A Transitive dependency II 11/16/2018 Luke Huan Univ. of Kansas
Luke Huan Univ. of Kansas 3NF R is in Third Normal Form (3NF) if for every non-trivial FD X -> A (where A is single attribute), either X is a superkey of R, or A is a member of at least one key of R Intuitively, BCNF decomposition on X -> A would “break” the key containing A X A X A X A Partial dependency Transitive dependency I Transitive dependency II 2NF 3NF BCNF 2NF 3NF BCNF 11/16/2018 Luke Huan Univ. of Kansas
Luke Huan Univ. of Kansas Database Query 11/16/2018 Luke Huan Univ. of Kansas
Relational Algebra and SQL SQL query SFW Group by …, Having Subqueries Relationship between R.A. and SQL
Jinze Liu @ University of Kentucky Relational algebra A language for querying relational databases based on operators: RelOp RelOp Core set of operators: Selection, projection, cross product, union, difference, and renaming Additional, derived operators: Join, natural join, intersection, etc. Compose operators to make complex queries We are gonna cover this in one day! Possible because of the minimalist approach. 11/16/2018 Jinze Liu @ University of Kentucky
Summary of core operators Selection: Projection: Cross product: Union: Difference: Renaming: Does not really add “processing” power σp R πL R R X S R S R - S ρ S(A1, A2, …) R 11/16/2018 Jinze Liu @ University of Kentucky
Summary of derived operators R p S R S R S Join: Natural join: Intersection: 11/16/2018 Jinze Liu @ University of Kentucky
Classification of relational operators Selection: σp R Projection: πL R Cross product: R X S Join: R p S Natural join: R S Union: R U S Difference: R - S Intersection: R ∩ S Monotone Monotone w.r.t. R; non-monotone w.r.t S 11/16/2018 Jinze Liu @ University of Kentucky
Update Operations on Relations INSERT a tuple. DELETE a tuple. MODIFY a tuple. Constraints should not be violated in updates 11/16/2018 Jinze Liu @ University of Kentucky
Basic queries: SFW statement SELECT A1, A2, …, An FROM R1, R2, …, Rm WHERE condition; Also called an SPJ (select-project-join) query (almost) Equivalent to relational algebra query π A1, A2, …, An (σ condition (R1 X R2 X … X Rm)) 11/16/2018 Luke Huan Univ. of Kansas
Luke Huan Univ. of Kansas Semantics of SFW SELECT E1, E2, …, En FROM R1, R2, …, Rm WHERE condition; For each t1 in R1: For each t2 in R2: … … For each tm in Rm: If condition is true over t1, t2, …, tm: Compute and output E1, E2, …, En as a row t1, t2, …, tm are often called tuple variables Not 100% correct, we will see 11/16/2018 Luke Huan Univ. of Kansas
Operational semantics of GROUP BY SELECT … FROM … WHERE … GROUP BY …; Compute FROM Compute WHERE Compute GROUP BY: group rows according to the values of GROUP BY columns Compute SELECT for each group For aggregation functions with DISTINCT inputs, first eliminate duplicates within the group Number of groups = number of rows in the final output 11/16/2018 Jinze Liu @ University of Kentucky
Jinze Liu @ University of Kentucky Database Design 11/16/2018 Jinze Liu @ University of Kentucky
Jinze Liu @ University of Kentucky A DBMS Overview 11/16/2018 Jinze Liu @ University of Kentucky
physical data organization Storage hierarchy (Lexington vs. Pluto) ! count I/O’s Disk geometry: three components of access cost; random vs. sequential I/O Data layout Record layout (handling variable-length fields, NULL’s) Block layout (NSM, PAX) ! inter-/intra-record locality Access paths Primary versus secondary indexes Tree-based indexes: ISAM, B+-tree ! Again, reintroduce redundancy to improve performance ! Fundamental trade-off: query versus update cost