Download presentation
Presentation is loading. Please wait.
Published byAlan Stephens Modified over 9 years ago
1
© D. Wong 2003 1 Normalization Purpose: process to eliminate redundancy in relations due to functional or multi-valued dependencies. Decompose relation schema into Normal forms: –Boyce-Codd Normal Form (BCNF) –Third Normal Form (3NF) –Fourth Normal Form (4NF) To obtain the new relations, project the schemas onto the original relation schema (e.g. Movie) To recover information (I.e. Movie) from the new relations: natural join the new relations.
2
© D. Wong 2003 2 BCNF Decomposition Example 3.24 pp 104 Relation: Movie(title, year, length, filmType, studioName, starName) Key: {title, year, starName} FD’s: title year length filmType studioName is a BCNF violation, so Movie not in BCNF Decomposition: Schema 1: {title, year, length, filmType, studioName} Schema 2: {title, year, starName} To obtain the new relations, project the schemas onto Movie To recover information (I.e. Movie) from the new relations: natural join the new relations. Does not lose information.
3
© D. Wong 2003 3 Functional Dependencies (FD) Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X Y means X functionally determines Y e.g. A 1 A 2 …A n B 1 B 2 …B m A 1 A 2 …A n BB…B is an assertion about R that two attributes or sets of attributes in R are dependent of one another. A 1 A 2 …A n B 1 B 2 …B m is an assertion about R that two attributes or sets of attributes in R are dependent of one another.
4
© D. Wong 2003 4 Mutivalued Dependencies (MVD) Given: relation schema R, and A 1 A 2 …A n and BB…B be subsets of attributes of R. Given: relation schema R, and A 1 A 2 …A n and B 1 B 2 …B m be subsets of attributes of R. MVD : A 1 A 2 …A n BB…B holds in R if : MVD : A 1 A 2 …A n B 1 B 2 …B m holds in R if : For each pair of tuples t and u of relation R that agree on all the A’s, we can find in R some tuple v that agrees: 1.With both t and u on the A’s, 2.With t on the B’s, and 3.With u on all attributes of R that are not among the A’s or B’s A 1 A 2 …A n BB…B is an assertion about R that two attributes or sets of attributes in R are independent of one another. A 1 A 2 …A n B 1 B 2 …B m is an assertion about R that two attributes or sets of attributes in R are independent of one another. Cause redundancy not related to FD’s in a BCNF schema. Most common source: putting 2 or more many-many relationships in a single relation.
5
© D. Wong 2003 5 MVD Rules Trivial dependencies rule If A 1 A 2 …A n BB…BA 1 A 2 …A n CC…C If A 1 A 2 …A n B 1 B 2 …B m holds for R, then A 1 A 2 …A n C 1 C 2 …C k holds where the C’s are the B’s + one or more of the A’s. The converse also hold. Transitive rule A 1 A 2 …A n BB…BB 1 B 2 …B m CC…C A 1 A 2 …A n CC…C If A 1 A 2 …A n B 1 B 2 …B m and B 1 B 2 …B m C 1 C 2 …C k then A 1 A 2 …A n C 1 C 2 …C k Splitting rule does not hold street city, but not name street E.g. name street city, but not name street So, always start with set of attributes on the R.S. because splitting rule does not hold.
6
© D. Wong 2003 6 More MVD Rules Every FD is an MVD Because If FD A 1 A 2 …A n BB…B, then swapping B’s between tuples that agree on A’s doesn’t create new tuples. Because If FD A 1 A 2 …A n B 1 B 2 …B m, then swapping B’s between tuples that agree on A’s doesn’t create new tuples. Complementation rule If X Y, then X Z, where Z is all attributes not in X or Y e.g. Star_Star_In {name, street, city, title, year} name street city name street city name title year name title year A’s B’s t u
7
© D. Wong 2003 7 Nontrivial MVD A 1 A 2 …A n BB…B A 1 A 2 …A n B 1 B 2 …B m for a relation R is nontrivial if: 1. BB…BA 1 A 2 …A n 1. B 1 B 2 …B m is not a subset of A 1 A 2 …A n 2. A 1 A 2 …A n BB…B 2. A 1 A 2 …A n B 1 B 2 …B m is not all attributes of R
8
© D. Wong 2003 8 Fourth Normal Form (4NF) Decompose relations that has MVD’s into 4NF to eliminate MVD’s. Definition: R is in 4NF if A 1 A 2 …A n BB…B A 1 A 2 …A n } is a superkey. R is in 4NF if A 1 A 2 …A n B 1 B 2 …B m is a nontrivial MVD, {A 1 A 2 …A n } is a superkey. every FD is an MVD, so 4NF is more stringent than BCNF Since every FD is an MVD, so 4NF is more stringent than BCNF Only nontrivial MVD’s has the potential to violate 4NF
9
© D. Wong 2003 9 4NF Decomposition Given: relation R, and nontrivial MVD X Y that violate 4NF 1. Decompose X Y into XY and X (R-Y) 2. Produce the relations by projecting R onto XY and X (R-Y) 3. Reconstruct R from the new relations using natural join e.g. Star_Star_In {name, street, city, title, year} and name street city Decompose Star_Star_In using name street city into {name, street, city} and {name, title, year} X Y R
10
© D. Wong 2003 10 Relationships among normal forms 4NF is the most stringent 4NF BCNF 3NF
11
© D. Wong 2003 11 Lossless-join decomposition Given: Relation R, decomposed into schemes R 1, R 2, … R k, and D is a set of dependencies. Definition: R 1, R 2, … R k is a lossless-join (w.r.t. D) if for every relation r for R satisfying D: r = R1 (r) R2 (r) Rk (r) r = R1 (r) R2 (r) … Rk (r) i.e. Every relation r for R is the natural join of its projections onto the R i ’s. The lossless-join property is necessary if the decomposed relation is to be recoverable from its decomposition. However, joins are expensive. So, don’t over decompose!
12
© D. Wong 2003 12 Structured Query Language (SQL) A DDL and DML for relational DBMSs History: ANSI SQL,, SQL-92 (SQL2), SQL-99 (SQL3) SQL-99 extends SQL2 with object-relational features and other new features Most DBMS vendors implements the core, and then add bells and whistles and variations Query capability is close to relational algebra, with lots of extensions. Case insensitive except characters inside quoted strings ' ' e.g. 'Smith' 'SMITH' ; as statement delimiter
13
© D. Wong 2003 13 Example database schema Movie(title, year, length, inColor, studioName, producerC#) StartIn(movieTitle, movieYear, starName) MovieStar(name, address, gender, birthdate) MovieExec(name, address, cert#, netWorth) Studio(name, address, presC#)
14
© D. Wong 2003 14 SQL Quries – basic form SELECT attribute/s FROM relations / views /subqury WHERE conditional expression;
15
© D. Wong 2003 15 SQL query examples 1. Example 1: SELECT * FROM Movie; -- * => all attributes of Movie 2. Example 2: SELECT * FROM Movie WHERE studioName = 'Disney' AND year = 1990; 3. Example 3: SELECT title, length FROM Movie WHERE studioName = 'Disney' AND year = 1990;
16
© D. Wong 2003 16 Duplicates SQL generally operates using bags instead of sets Exception: UNION, INTERSECT, EXCEPT operation To eliminate duplicates, add keyword DISTINCT to the SELECT clause e.g. SELECT DISTINCT starName FROM StarsIn; FROM StarsIn; Duplicate elimination is costly. Use judiciously.
17
© D. Wong 2003 17 SQL Correspondence to Relational Algebra SELECT L -- R.A. project FROM R-- R.A. operands WHERE C;-- R.A. select R.A. expression: L ( C (R)) R.A. expression: L ( C (R)) When reading and writing queries: 1. FROM -- what relations are involved 2. WHERE-- what's the tuples selection criteria 3. SELECT-- what columns to output
18
© D. Wong 2003 18 Union, Intersection, Difference of Queries UNION : R1 UNION R2 or (Q1) UNION (Q2) e.g. (SELECT title, year FROM Movie) UNION (SELECT movieTitle AS title, movieYear AS year FROM StarsIn); (SELECT movieTitle AS title, movieYear AS year FROM StarsIn); INTERSECT : R1 INTERSECT R2 or (Q1) INTERSECT (Q2) (Q1) INTERSECT (Q2) EXCEPT: R1 EXCEPT R2-- difference (Q1) EXCEPT (Q2) (Q1) EXCEPT (Q2)
19
© D. Wong 2003 19 Union, Intersection, Difference of Queries (continued) Q1 and Q2 are queries that produce relations R1 and R2, or results of Q1 and Q2 should have the same list of attributes and attribute types. Rename if necessary. Duplicates are eliminated automatically Add the keyword ALL after UNION, INTERSECT, or EXCEPT to prevent duplicates elimination
20
© D. Wong 2003 20 SQL and Relational Algebra The six independent operations are implemented by SQL SQL is relational complete
21
© D. Wong 2003 21 Some data values in SQL 1. Strings 2. Dates and Times 3. Null values 4. Truth value of Unknown
22
© D. Wong 2003 22 1. Strings Comparison operators (according to lexicographical order), = = LIKE -- pattern matching % -- matches any sequence of 0 or more characters _ -- matches any one character E.g.: title LIKE 'Star _ _ _ _' E.g.: title LIKE '%''s%' Can specify escape character E.g. title LIKE 'x%x%' ESCAPE 'x'
23
© D. Wong 2003 23 2. Dates and Times Date constant: DATE '2002-10-01' Time constant: TIME '15:00:02.5' Timestamp (combines dates and times): TIMESTAMP '2002-10-01 15:00:02.5‘ (beware of implementation differences!) Comparison operators apply
24
© D. Wong 2003 24 3. Null Values NULL to represent: 1.Value unknown 2.Value inapplicable 3.Value withheld Operations involving NULL 1.Arithmetic operation: result is NULL 2.Comparison: result is UNKNOWN NULL is not a constant, therefore NULL cannot be used explicitly as an operand. IS NULL and IS NOT NULL checks Read "Pitfalls Regarding Nulls" pp. 250
25
© D. Wong 2003 25 4. UNKNOWN Consider TRUE = 1, FALSE = 0, UNKNOWN = 0.5 1.AND of 2 truth-value = min. of the 2 values 2.OR of 2 truth-value = max. of the 2 values 3.Negation of v = 1-v Refer Fig. 6.2 pp. 250 for truth table for 3-valued logic
26
© D. Wong 2003 26 The Six Clauses in SQL Queries 1. SELECT-- required 2. FROM-- required 3. WHERE 4. GROUP BY 5. HAVING-- if used, must follows a group by clause 6. ORDER BY Subqueries may appear in the FROM clause and the WHERE clause Comments begins with ‘--’
27
© D. Wong 2003 27 Table level SQL (ref. 6.6, pp. 292) Create table – to define the schema of a base table (Ref. 6.6.1 for data types syntax) E.g. create table EMP ( empno int not null, empno int not null, lastName varchar(30) not null, lastName varchar(30) not null, firstName varchar(30) not null, firstName varchar(30) not null, num_of_children int, num_of_children int, constraint pk_EMP primary key (empno) constraint pk_EMP primary key (empno)); Drop table – to destroy a base table e.g. drop table EMP;
28
© D. Wong 2003 28 Tuple Modification Statements (ref. 6.5, pp. 286) Insert – to add a row Syntax: insert into R(A 1..A n ) values (v 1 …v n ) –E.g. insert into emp( empno, lastName, firstName, num_of_children) values (12345, ‘Doe’, ‘John’, 1) –Or insert into emp values (12345, ‘Doe’, ‘John’, 1) Delete – to remove a row Syntax: delete from R where Syntax: delete from R where –E.g. delete from emp where empno = 12345 Update – to modify the contents of a row Syntax: update R set A i = value where A j = targetValue –E.g. update emp set num_of_children = 2 where empno = 12345
29
© D. Wong 2003 29 Some JOINS in SQL. (ref. pp. 270) CROSS JOIN-- R.A. cartesian product e.g. Movie CROSS JOIN StarsIn; JOIN … ON-- R.A. theta-join e.g. Movie JOIN StarsIn ON title = movieTitle AND year = movieYear; [NATURAL] JOIN-- R.A. natural join e.g. MovieStar NATURAL JOIN MovieExec; or MovieStar JOIN MovieExec; MovieStar JOIN MovieExec; OUTERJOINS-- joins that include dangling tuples
30
© D. Wong 2003 30 OUTERJOINS An operator to augment the result of a join by the dangling tuples, padded with null values. Full outerjoin of R1 and R2 is a join that includes all rows from R1 and R2 matched or not. Unmatched rows are padded with NULLs. LEFT outerjoin of R1 and R2 is a join that includes all rows from R1, matched or not, plus the matching values from R2. Unmatched rows are padded with NULLs. RIGHT outerjoin of R1 and R2 is a join that includes all rows from R2, matched or not, plus the matching values from R1. Unmatched rows are padded with NULLs. The joining may be NATURAL or theta join
31
© D. Wong 2003 31 Outerjoins Syntax 1. R1 NATURAL {FULL | LEFT | RIGHT} OUTER JOIN R2; E.g. 1. MovieStar NATURAL FULL OUTER JOIN MovieExec; E.g. 2. MovieStar NATURAL LEFT OUTER JOIN MovieExec; E.g. 3. MovieStar NATURAL RIGHT OUTER JOIN MovieExec;
32
© D. Wong 2003 32 Outerjoins Syntax (continued) 1. R1 {FULL | LEFT | RIGHT} OUTER JOIN R2 ON conditional expression; E.g. 1. Movie FULL OUTER JOIN StarsIn ON title = movieTitle AND year = movieYear; E.g. 2. MovieStar LEFT OUTER JOIN StarsIn ON title = movieTitle AND year = movieYear; E.g. 3. MovieStar RIGHT OUTER JOIN StarsIn ON title = movieTitle AND year = movieYear;
33
© D. Wong 2003 33 Use result of joins as subqueries in queries E.g. SELECT title, year, length, inColor, studioName, producerC#, starName FROM Movie JOIN StarsIn ON title = movieTitle AND year = movieYear;
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.