Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

Similar presentations


Presentation on theme: "1 Lecture 10: Database Design XML Wednesday, October 20, 2004."— Presentation transcript:

1 1 Lecture 10: Database Design XML Wednesday, October 20, 2004

2 2 Outline Design of a Relational schema (3.6) XML

3 3 Normal Forms First Normal Form = all attributes are atomic Second Normal Form (2NF) = old and obsolete Third Normal Form (3NF) = this lecture Boyce Codd Normal Form (BCNF) = this lecture Others...

4 4 Boyce-Codd Normal Form A simple condition for removing anomalies from relations: In English (though a bit vague): Whenever a set of attributes of R is determining another attribute, should determine all the attributes of R. A relation R is in BCNF if: If A 1,..., A n  B is a non-trivial dependency in R, then {A 1,..., A n } is a key for R A relation R is in BCNF if: If A 1,..., A n  B is a non-trivial dependency in R, then {A 1,..., A n } is a key for R

5 5 BCNF Decomposition Algorithm A’s Others B’s R1R1 Is there a 2-attribute relation that is not in BCNF ? Repeat choose A 1, …, A m  B 1, …, B n that violates the BNCF condition split R into R 1 (A 1, …, A m, B 1, …, B n ) and R 2 (A 1, …, A m, [others]) continue with both R 1 and R 2 Until no more violations R2R2 In practice, we have a better algorithm (next):

6 6 BCNF Decomposition Algorithm BCNF_Decompose(R) find X s.t.: X ≠X + ≠ [all attributes] if (not found) then “R is in BCNF” else let Y = X + - X let Z = [all attributes] - X + decompose into R1(X  Y) and R2(X  Z) BCNF_Decompose(R1) BCNF_Decompose(R2)

7 7 Example BCNF Decomposition Person(name, SSN, age, hairColor, phoneNumber) SSN  name, age age  hairColor Iteration 1: Person SSN+ = SSN, name, age, hairColor Decompose into: P(SSN, name, age, hairColor) Phone(SSN, phoneNumber) Iteration 2: P age+ = age, hairColor Decompose: People(SSN, name, age) Hair(age, hairColor) Phone(SSN, phoneNumber) Iteration 1: Person SSN+ = SSN, name, age, hairColor Decompose into: P(SSN, name, age, hairColor) Phone(SSN, phoneNumber) Iteration 2: P age+ = age, hairColor Decompose: People(SSN, name, age) Hair(age, hairColor) Phone(SSN, phoneNumber) Find X s.t.: X ≠X + ≠ [all attributes] What is the key ?

8 8 Other Example R(A,B,C,D) A  B, B  C Iteration 1: X = A: A + = ABC –split R into R1(A,B,C) R2(A,D) Iteration 2: X = B: B + =BC –Split R into R3(B,C), R4(A,B), R2(A,D) What happens if at iteration 1 we pick X = AB ? What is the key ?

9 9 3NF: A Problem with BCNF Unit  Company Company, Product  Unit Unit + = Unit, Company We loose the FD: Company, Product  Unit !! UnitCompanyProduct UnitCompanyUnitProduct Unit  Company

10 10 So What’s the Problem? No problem so far. All local FD’s are satisfied. Let’s put all the data back into a single table again: UnitCompany Galaga99UW BingoUW UnitProduct Galaga99Databases BingoDatabases UnitCompanyProduct Galaga99UWDatabases BingoUWDatabases Unit  Company Company, Product  Unit Violates the FD:

11 11 The Problem We started with a table R and FD We decomposed R into BCNF tables R 1, R 2, … with their own FD 1, FD 2, … We can reconstruct R from R 1, R 2, … But we cannot reconstruct FD from FD 1, FD 2, …

12 12 Solution: 3rd Normal Form (3NF) A simple condition for removing anomalies from relations: A relation R is in 3rd normal form if : Whenever there is a nontrivial dependency A 1, A 2,..., A n  B for R, then {A 1, A 2,..., A n } a super-key for R, or B is part of a key. A relation R is in 3rd normal form if : Whenever there is a nontrivial dependency A 1, A 2,..., A n  B for R, then {A 1, A 2,..., A n } a super-key for R, or B is part of a key. Tradeoff: BCNF = no anomalies, but may lose some FDs 3NF = keeps all FDs, but may have some anomalies

13 13 3NF Decomposition Algorithm 3NF_Decompose(R) let K = [all attributes that are part of some key] find X s.t.: X + - X - K ≠  and X + ≠ [all attributes] if (not found) then “R is in 3NF” else let Y = X + - X - K let Z = [all attributes] - (X  Y) decompose into R1(X  Y) and R2(X  Z) 3NF_Decompose(R1) 3NF_Decompose(R2) 3NF_Decompose(R) let K = [all attributes that are part of some key] find X s.t.: X + - X - K ≠  and X + ≠ [all attributes] if (not found) then “R is in 3NF” else let Y = X + - X - K let Z = [all attributes] - (X  Y) decompose into R1(X  Y) and R2(X  Z) 3NF_Decompose(R1) 3NF_Decompose(R2)

14 14 Example of 3NF decomposition R(A,B,C,D,E): AB  C C  D D  B D  E AB  C C  D D  B D  E Keys: (need to compute X+, for several Xs) AB, AC, AD K = {A, B, C, D} Pick X = C C+ = BCDE C  BDE is a BCNF violation For 3NF: remove B, D (part of K): C  E is a 3NF violation Decompose: R1(C, E), R2(A,B,C,D) R1 is in 3NF R2 is in 3NF (because its keys: AB, AC, AD)

15 15 BCNF 3NF v.s. BCNF Decomposition ABCDEFGHK ABCDEEFGHK EFGGHK ABCCDE ABABABABABABAB AB 3NF

16 16 XML Outline XML (4.6, 4.7) –This lecture: syntax, semistructured data –Next lectures: DTDs, XPath, XQuery

17 17 Additional Readings on XML XQuery from the Experts, Katz, Ed. –The reference on Xquery http://www.w3.org/XML/1999/XML-in-10-points www.zvon.org/xxl/XMLTutorial/General/book_en.htmlwww.zvon.org/xxl/XMLTutorial/General/book_en.html http://db.bell-labs.com/galax/ Main source: www.w3.org (but hard to read)

18 18 XML eXtensible Markup Language XML 1.0 – a recommendation from W3C, 1998 Roots: SGML (a very nasty language). After the roots: a format for sharing data

19 19 XML Data Relational data does not have a syntax –I can’t “give” you my relational database –Need to import it from other other syntax, like CSV (comma- separated-values) XML = rich syntax for data –But XML is not relational: semistructured Usage: –Map any data to XML –Store it in files, exchange on the Web, etc. –Even query it directly, using XPath, XQuery

20 20 From HTML to XML HTML describes the presentation

21 21 HTML Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteoul, Buneman, Suciu Morgan Kaufmann, 1999 Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteoul, Buneman, Suciu Morgan Kaufmann, 1999

22 22 XML Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … XML describes the content

23 23 XML Terminology tags: book, title, author, … start tag:, end tag: elements: …, … elements are nested empty element: abbrv. an XML document: single root element well formed XML document: if it has matching tags

24 24 More XML: Attributes Foundations of Databases Abiteboul … 1995 Foundations of Databases Abiteboul … 1995 attributes are alternative ways to represent data

25 25 More XML: Oids and References Jane Mary John Jane Mary John oids and references in XML are just syntax

26 26 More XML: CDATA Section Syntax: Example: <>]]>

27 27 More XML: Entity References Syntax: &entityname; Example: this is less than < Some entities: << >> && &apos;‘ "“ &Unicode char

28 28 More XML: Processing Instructions Syntax: Example: What do they mean ? Alarm Clock 19.99

29 29 More XML: Comments Syntax Yes, they are part of the data model !!!

30 30 XML Namespaces http://www.w3.org/TR/REC-xml-names (1/99) name ::= [prefix:]localpart … 15 …. … 15 ….

31 31 … … XML Namespaces syntactic:, semantic: provide URL for schema Belong to this namespace

32 32 From Relational Data to XML Data John 3634 Sue 6343 Dick 6363 John 3634 Sue 6343 Dick 6363 row name phone “John”3634“Sue”“Dick”63436363 Persons XML: persons NamePhone John3634 Sue6343 Dick6363

33 33 XML Data XML is self-describing Schema elements become part of the data –Reational schema: persons(name,phone) –In XML,, are part of the data, and are repeated many times Consequence: XML is much more flexible XML = semistructured data

34 34 Semi-structured Data Explained Missing attributes: Could represent in a table with nulls John 1234 Joe John 1234 Joe  no phone ! namephone John1234 Joe-

35 35 Semi-structured Data Explained Repeated attributes Impossible in tables: Mary 2345 3456 Mary 2345 3456  two phones ! namephone Mary23453456 ???

36 36 Semistructured Data Explained Attributes with different types in different objects Nested collections (no 1NF) Heterogeneous collections: – contains both s and s John Smith 1234 John Smith 1234  structured name !


Download ppt "1 Lecture 10: Database Design XML Wednesday, October 20, 2004."

Similar presentations


Ads by Google