1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

Slides:



Advertisements
Similar presentations
XML, XML Schema, Xpath and XQuery Slides collated from various sources, many from Dan Suciu at Univ. of Washington.
Advertisements

Lecture 6: Design Constraints and Functional Dependencies January 21st, 2004.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 311 Database Systems I The Semistructured Data Model.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
CSE 636 Data Integration XML Semistructured Data Document Type Definitions.
1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.
Functional Dependencies Definition: If two tuples agree on the attributes A, A, … A 12n then they must also agree on the attributes B, B, … B 12m Formally:
1 Introduction to Database Systems CSE 444 Lectures 8 & 9 Database Design April 16 & 18, 2008.
Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000.
1 COS 425: Database and Information Management Systems XML and information exchange.
Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science San Jose State University.
1 Lecture 06 The Relational Data Model. 2 Outline Relational Data Model Functional Dependencies FDs in ER Logical Schema Design Reading Chapter 8.
1 Introduction to XML Yanlei Diao UMass Amherst April 19, 2007 Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau.
M.P. Johnson, DBMS, Stern/NYU, Spring C : Database Management Systems Lecture #7 Matthew P. Johnson Stern School of Business, NYU Spring,
XML and Databases 198:541. XML Motivation  Huge amounts of unstructured data on the web: HTML documents  No structure information  Only format instructions.
Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee.
End of SQL XML April 22 th, Null Values If x=Null then 4*(3-x)/7 is still NULL If x=Null then x=“Joe” is UNKNOWN Three boolean values: –FALSE =
XML, XML Schema, XPath and XQuery Query Languages CS561 Slides collated from several sources, including D. Suciu at Univ. of Washington.
M.P. Johnson, DBMS, Stern/NYU, Sp20041 C : Database Management Systems Lecture #6 Matthew P. Johnson Stern School of Business, NYU Spring, 2004.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
Managing XML and Semistructured Data Lecture 2: XML Prof. Dan Suciu Spring 2001.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
Relation Decomposition A, A, … A 12n Given a relation R with attributes Create two relations R1 and R2 with attributes B, B, … B 12m C, C, … C 12l Such.
Functional Dependencies and Relational Schema Design.
XML – a data sharing standard DSC340 Mike Pangburn.
XML: Extensible Markup Language FST-UMAC Gong Zhiguo.
XML by Dan Suciu 1 Introduction to Semistructured Data and XML Based on slides by Dan Suciu University of Washington.
XML and XPath. Web Services: XML+XPath2 EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML origins: structured.
Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives.
Lecture 09: Functional Dependencies. Outline Functional dependencies (3.4) Rules about FDs (3.5) Design of a Relational schema (3.6)
SCUJ. Holliday - coen 1784–1 Schedule Today: u Normal Forms. u Section 3.6. Next u Relational Algebra. Read chapter 5 to page 199 After that u SQL Queries.
Lecture 5: XML Tuesday, January 16, Outline XML, DTDs (Data on the Web, 3.1) Semistructured data in XML (3.2) Exporting Relational Data in XML (8.3.1)
1 Introduction to Semistructured Data and XML. 2 How the Web is Today  HTML documents often generated by applications consumed by humans only easy access:
More XML: semantics, DTDs, XPATH February 18, 2004.
Transactions, Relational Algebra, XML February 11 th, 2004.
Functional Dependencies. Outline Functional dependencies (3.4) Rules about FDs (3.5) Design of a Relational schema (3.6)
1 Lecture 7: Normal Forms, Relational Algebra Monday, 10/15/2001.
Tallahassee, Florida, 2015 COP4710 Database Systems Relational Design Fall 2015.
1 Lecture 10: Database Design Wednesday, January 26, 2005.
Functional Dependencies and Relational Schema Design.
Lecture 13: Relational Decomposition and Relational Algebra February 5 th, 2003.
1 Lecture 10: Database Design and Relational Algebra Monday, October 20, 2003.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Introduction to XML Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
1 Lecture 9: Database Design Wednesday, January 25, 2006.
Formal definition of a key A key is a set of attributes A 1,..., A n such that for any other attribute B: A 1,..., A n  B A minimal key is a set of attributes.
Lecture 11: Functional Dependencies
Lecture 14: Relational Algebra Projects XML?
Management of XML and Semistructured Data
Management of XML and Semistructured Data
Problems in Designing Schema
Lecture 11 XML Wednesday, Oct. 24, 2001.
Cse 344 May 16th – Normalization.
Functional Dependencies and Relational Schema Design
Introduction to Database Systems CSE 444 Lectures 8 & 9 Database Design October 12 & 15, 2007.
Lecture 12: XML, XPath, XQuery
Semi-Structured data (XML Data MODEL)
Lecture 8: Database Design
Lecture 07: E/R Diagrams and Functional Dependencies
Lecture 9: XML Monday, October 17, 2005.
CSE 544: Lecture 5 XML 4/15/2002.
Lecture 8: XML Data Wednesday, October
Introduction to Database Systems CSE 444 Lecture 10 XML
Semi-Structured data (XML)
Lecture 6: Functional Dependencies
Lecture 11: XML and Semistructured Data
Lecture 11: Functional Dependencies
Lecture 09: Functional Dependencies
Presentation transcript:

1 Lecture 10: Database Design XML Wednesday, October 20, 2004

2 Outline Design of a Relational schema (3.6) XML

3 Normal Forms First Normal Form = all attributes are atomic Second Normal Form (2NF) = old and obsolete Third Normal Form (3NF) = this lecture Boyce Codd Normal Form (BCNF) = this lecture Others...

4 Boyce-Codd Normal Form A simple condition for removing anomalies from relations: In English (though a bit vague): Whenever a set of attributes of R is determining another attribute, should determine all the attributes of R. A relation R is in BCNF if: If A 1,..., A n  B is a non-trivial dependency in R, then {A 1,..., A n } is a key for R A relation R is in BCNF if: If A 1,..., A n  B is a non-trivial dependency in R, then {A 1,..., A n } is a key for R

5 BCNF Decomposition Algorithm A’s Others B’s R1R1 Is there a 2-attribute relation that is not in BCNF ? Repeat choose A 1, …, A m  B 1, …, B n that violates the BNCF condition split R into R 1 (A 1, …, A m, B 1, …, B n ) and R 2 (A 1, …, A m, [others]) continue with both R 1 and R 2 Until no more violations R2R2 In practice, we have a better algorithm (next):

6 BCNF Decomposition Algorithm BCNF_Decompose(R) find X s.t.: X ≠X + ≠ [all attributes] if (not found) then “R is in BCNF” else let Y = X + - X let Z = [all attributes] - X + decompose into R1(X  Y) and R2(X  Z) BCNF_Decompose(R1) BCNF_Decompose(R2)

7 Example BCNF Decomposition Person(name, SSN, age, hairColor, phoneNumber) SSN  name, age age  hairColor Iteration 1: Person SSN+ = SSN, name, age, hairColor Decompose into: P(SSN, name, age, hairColor) Phone(SSN, phoneNumber) Iteration 2: P age+ = age, hairColor Decompose: People(SSN, name, age) Hair(age, hairColor) Phone(SSN, phoneNumber) Iteration 1: Person SSN+ = SSN, name, age, hairColor Decompose into: P(SSN, name, age, hairColor) Phone(SSN, phoneNumber) Iteration 2: P age+ = age, hairColor Decompose: People(SSN, name, age) Hair(age, hairColor) Phone(SSN, phoneNumber) Find X s.t.: X ≠X + ≠ [all attributes] What is the key ?

8 Other Example R(A,B,C,D) A  B, B  C Iteration 1: X = A: A + = ABC –split R into R1(A,B,C) R2(A,D) Iteration 2: X = B: B + =BC –Split R into R3(B,C), R4(A,B), R2(A,D) What happens if at iteration 1 we pick X = AB ? What is the key ?

9 3NF: A Problem with BCNF Unit  Company Company, Product  Unit Unit + = Unit, Company We loose the FD: Company, Product  Unit !! UnitCompanyProduct UnitCompanyUnitProduct Unit  Company

10 So What’s the Problem? No problem so far. All local FD’s are satisfied. Let’s put all the data back into a single table again: UnitCompany Galaga99UW BingoUW UnitProduct Galaga99Databases BingoDatabases UnitCompanyProduct Galaga99UWDatabases BingoUWDatabases Unit  Company Company, Product  Unit Violates the FD:

11 The Problem We started with a table R and FD We decomposed R into BCNF tables R 1, R 2, … with their own FD 1, FD 2, … We can reconstruct R from R 1, R 2, … But we cannot reconstruct FD from FD 1, FD 2, …

12 Solution: 3rd Normal Form (3NF) A simple condition for removing anomalies from relations: A relation R is in 3rd normal form if : Whenever there is a nontrivial dependency A 1, A 2,..., A n  B for R, then {A 1, A 2,..., A n } a super-key for R, or B is part of a key. A relation R is in 3rd normal form if : Whenever there is a nontrivial dependency A 1, A 2,..., A n  B for R, then {A 1, A 2,..., A n } a super-key for R, or B is part of a key. Tradeoff: BCNF = no anomalies, but may lose some FDs 3NF = keeps all FDs, but may have some anomalies

13 3NF Decomposition Algorithm 3NF_Decompose(R) let K = [all attributes that are part of some key] find X s.t.: X + - X - K ≠  and X + ≠ [all attributes] if (not found) then “R is in 3NF” else let Y = X + - X - K let Z = [all attributes] - (X  Y) decompose into R1(X  Y) and R2(X  Z) 3NF_Decompose(R1) 3NF_Decompose(R2) 3NF_Decompose(R) let K = [all attributes that are part of some key] find X s.t.: X + - X - K ≠  and X + ≠ [all attributes] if (not found) then “R is in 3NF” else let Y = X + - X - K let Z = [all attributes] - (X  Y) decompose into R1(X  Y) and R2(X  Z) 3NF_Decompose(R1) 3NF_Decompose(R2)

14 Example of 3NF decomposition R(A,B,C,D,E): AB  C C  D D  B D  E AB  C C  D D  B D  E Keys: (need to compute X+, for several Xs) AB, AC, AD K = {A, B, C, D} Pick X = C C+ = BCDE C  BDE is a BCNF violation For 3NF: remove B, D (part of K): C  E is a 3NF violation Decompose: R1(C, E), R2(A,B,C,D) R1 is in 3NF R2 is in 3NF (because its keys: AB, AC, AD)

15 BCNF 3NF v.s. BCNF Decomposition ABCDEFGHK ABCDEEFGHK EFGGHK ABCCDE ABABABABABABAB AB 3NF

16 XML Outline XML (4.6, 4.7) –This lecture: syntax, semistructured data –Next lectures: DTDs, XPath, XQuery

17 Additional Readings on XML XQuery from the Experts, Katz, Ed. –The reference on Xquery Main source: (but hard to read)

18 XML eXtensible Markup Language XML 1.0 – a recommendation from W3C, 1998 Roots: SGML (a very nasty language). After the roots: a format for sharing data

19 XML Data Relational data does not have a syntax –I can’t “give” you my relational database –Need to import it from other other syntax, like CSV (comma- separated-values) XML = rich syntax for data –But XML is not relational: semistructured Usage: –Map any data to XML –Store it in files, exchange on the Web, etc. –Even query it directly, using XPath, XQuery

20 From HTML to XML HTML describes the presentation

21 HTML Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteoul, Buneman, Suciu Morgan Kaufmann, 1999 Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteoul, Buneman, Suciu Morgan Kaufmann, 1999

22 XML Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … XML describes the content

23 XML Terminology tags: book, title, author, … start tag:, end tag: elements: …, … elements are nested empty element: abbrv. an XML document: single root element well formed XML document: if it has matching tags

24 More XML: Attributes Foundations of Databases Abiteboul … 1995 Foundations of Databases Abiteboul … 1995 attributes are alternative ways to represent data

25 More XML: Oids and References Jane Mary John Jane Mary John oids and references in XML are just syntax

26 More XML: CDATA Section Syntax: Example: <>]]>

27 More XML: Entity References Syntax: &entityname; Example: this is less than < Some entities: << >> && &apos;‘ "“ &Unicode char

28 More XML: Processing Instructions Syntax: Example: What do they mean ? Alarm Clock 19.99

29 More XML: Comments Syntax Yes, they are part of the data model !!!

30 XML Namespaces (1/99) name ::= [prefix:]localpart … 15 …. … 15 ….

31 … … XML Namespaces syntactic:, semantic: provide URL for schema Belong to this namespace

32 From Relational Data to XML Data John 3634 Sue 6343 Dick 6363 John 3634 Sue 6343 Dick 6363 row name phone “John”3634“Sue”“Dick” Persons XML: persons NamePhone John3634 Sue6343 Dick6363

33 XML Data XML is self-describing Schema elements become part of the data –Reational schema: persons(name,phone) –In XML,, are part of the data, and are repeated many times Consequence: XML is much more flexible XML = semistructured data

34 Semi-structured Data Explained Missing attributes: Could represent in a table with nulls John 1234 Joe John 1234 Joe  no phone ! namephone John1234 Joe-

35 Semi-structured Data Explained Repeated attributes Impossible in tables: Mary Mary  two phones ! namephone Mary ???

36 Semistructured Data Explained Attributes with different types in different objects Nested collections (no 1NF) Heterogeneous collections: – contains both s and s John Smith 1234 John Smith 1234  structured name !