CS405G: Introduction to Database Systems Final Review
7/2/2015Jinze University of Kentucky2 Database Design
E-R model Entities Attributes Relationships
7/2/2015Jinze University of Kentucky4
7/2/20155 Database Design 57/2/2015
From E-R Diagram to Relations Relations Schemas Converting E-R diagram to relations Keys Super keys Candidate keys Primary keys Relational integrity constraints
Key Constraints Superkey: – (Uniqueness constraints) A set of attributes where no two distinct tuples can have the same values – Every relation has at least one superkey: The set of all attributes. Key: A minimal superkey – Uniqueness constraint (superkey) – Minimum Constraint No attribute can be removed and still satisfy the uniqueness constraints. 7/2/20157Jinze University of Kentucky
7/2/20158 Relational Integrity Constraints Constraints are conditions that must hold on all valid relation instances. There are four main types of constraints: 1.Domain constraints 1.The value of a attribute must come from its domain 2.Key constraints 3.Entity integrity constraints 4.Referential integrity constraints 87/2/2015
Database Normalization Functional Dependency Functional Closure Keys – Redefined – Based on functional dependency DB Norm Form – 1 st, 2 nd, 3 rd, BCNF
7/2/2015Luke Huan Univ. of Kansas10 Database Query
Relational Algebra and SQL Relational algebra SQL query SFW Group by …, Having Subqueries Relationship between R.A. and SQL
7/2/2015Jinze University of Kentucky12 Relational algebra Core set of operators: – Selection, projection, cross product, union, difference, and renaming Additional, derived operators: – Join, natural join, intersection, etc. Compose operators to make complex queries RelOp A language for querying relational databases based on operators:
7/2/2015Jinze University of Kentucky13 Summary of core operators Selection: Projection: Cross product: Union: Difference: Renaming: – Does not really add “processing” power σp Rσp R πL RπL R R X SR X S R SR S R - S ρ S(A 1, A 2, …) R
7/2/2015Jinze University of Kentucky14 Summary of derived operators Join: Natural join: Intersection: R p S R SR S R S
7/2/2015Jinze University of Kentucky15 Classification of relational operators Selection: σ p R Projection: π L R Cross product: R X S Join: R p S Natural join: R S Union: R U S Difference: R - S Intersection: R ∩ S Monotone Monotone w.r.t. R; non-monotone w.r.t S Monotone
7/2/2015Jinze University of Kentucky16 Update Operations on Relations Update operations – INSERT a tuple. – DELETE a tuple. – MODIFY a tuple. Constraints should not be violated in updates
7/2/2015Luke Huan Univ. of Kansas17 Basic queries: SFW statement SELECT A 1, A 2, …, A n FROM R 1, R 2, …, R m WHERE condition ; Also called an SPJ (select-project-join) query (almost) Equivalent to relational algebra query π A 1, A 2, …, A n (σ condition (R 1 X R 2 X … X R m ))
7/2/2015Luke Huan Univ. of Kansas18 Semantics of SFW SELECT E 1, E 2, …, E n FROM R 1, R 2, …, R m WHERE condition; For each t 1 in R 1 : For each t 2 in R 2 : … … For each t m in R m : If condition is true over t 1, t 2, …, t m : Compute and output E 1, E 2, …, E n as a row t 1, t 2, …, t m are often called tuple variables Not 100% correct, we will see
7/2/2015Jinze University of Kentucky19 Operational semantics of GROUP BY SELECT … FROM … WHERE … GROUP BY … ; Compute FROM Compute WHERE Compute GROUP BY : group rows according to the values of GROUP BY columns Compute SELECT for each group – For aggregation functions with DISTINCT inputs, first eliminate duplicates within the group Number of groups = number of rows in the final output
7/2/2015Jinze University of Kentucky20 Database Design
21 physical data organization Storage hierarchy (DC vs. Pluto) ! count I/O’s Disk geometry: three components of access cost; random vs. sequential I/O Data layout – Record layout (handling variable-length fields, NULL ’s) – Block layout (NSM, PAX) ! inter-/intra-record locality Access paths – Primary versus secondary indexes – Tree-based indexes: ISAM, B + -tree ! Again, reintroduce redundancy to improve performance ! Fundamental trade-off: query versus update cost
Performance Issues on Indexes Indexes – ISAM – B+ Tree Metrics – Storage – IO-costs Operations – Single value query & range query – Insertion and deletion 22
Query Processing Implementation Typical Query Processings – Selection – Join – Set operations. Typical approaches – Sequential scans in unsorted database – Sorted database – What are the tradeoffs. 23