CS 480: Database Systems Lecture 25 March 15, 2013.

CS 480: Database Systems Lecture 25 March 15, 2013

2 How good is BCNF? There are database schemas in BCNF that do not seem to be sufficiently normalized Consider a relation inst_info (ID, child_name, phone) where an instructor may have more than one phone and can have multiple children ID child_name phone 99999 David William Willian inst_info

3 How good is BCNF? There are no non-trivial functional dependencies and therefore the relation is in BCNF Insertion anomalies – i.e., if we add a phone to 99999, we need to add two tuples (99999, David, ) (99999, William, )

4 How good is BCNF? Therefore, it is better to decompose inst_info into:
ID child_name 99999 David Willian inst_child ID phone 99999 inst_phone

5 How good is BCNF? Therefore, it is better to decompose inst_info into:
ID child_name 99999 David Willian inst_child ID phone 99999 inst_phone This suggests the need for other higher normal forms over BCNF. Higher forms that not only depend on functional dependencies.

6 Multivalued Dependencies (MVDs)
Let R be a relation schema and let   R and   R. The multivalued dependency    holds on R if in any legal relation r(R), for all pairs for tuples t1 and t2 in r such that t1[] = t2 [], there exist tuples t3 and t4 in r such that: t1[] = t2 [] = t3 [] = t4 [] t3[] = t1 [] t3[R – ] = t2[R – ] t4 [] = t2[] t4[R – ] = t1[R – ] This is different than functional dependencies, those prohibited some tuples to exist. MVDs require that other tuples of a certain form have to exist in the database.

7 MVD Tabular representation of   

8 Example Let R be a relation schema with a set of attributes that are partitioned into 3 nonempty subsets. Y, Z, W We say that Y  Z (Y multidetermines Z ) if and only if for all possible relations r (R ) < y1, z1, w1 >  r and < y1, z2, w2 >  r then < y1, z1, w2 >  r and < y1, z2, w1 >  r Note that since the behavior of Z and W are identical it follows that Y  Z if Y  W

9 Example In our example: ID  child_name ID  phone_number
The above formal definition is supposed to formalize the notion that given a particular value of Y (ID) it has associated with it a set of values of Z (child_name) and a set of values of W (phone_number), and these two sets are in some sense independent of each other. Note: If Y  Z then Y  Z In contrast to functional dependencies, if    and   ,    will not necessarily hold. Proof of first remark? Flows from explanation from previous slide…

10 Use of MVDs We use multivalued dependencies in two ways:
1. To test relations to determine whether they are legal under a given set of functional and multivalued dependencies 2. To specify constraints on the set of legal relations. We shall thus concern ourselves only with relations that satisfy a given set of functional and multivalued dependencies. If a relation r fails to satisfy a given multivalued dependency, we can construct a relation r that does satisfy the multivalued dependency by adding tuples to r.

11 Theory of MVDs From the definition of multivalued dependency, we can derive the following rule: If   , then    That is, every functional dependency is also a multivalued dependency The closure D+ of D is the set of all functional and multivalued dependencies logically implied by D. We can compute D+ from D, using the formal definitions of functional dependencies and multivalued dependencies. We can manage with such reasoning for very simple multivalued dependencies, which seem to be most common in practice For complex dependencies, it is better to reason about sets of dependencies using a system of inference rules (see Appendix B, online). Armstrong axioms are used and some others as well…

12 Fourth Normal Form A relation schema R is in 4NF with respect to a set D of functional and multivalued dependencies if for all multivalued dependencies in D+ of the form   , where   R and   R, at least one of the following hold:    is trivial (i.e.,    or    = R)  is a superkey for schema R If a relation is in 4NF it is in BCNF Decomposition will be similar to the BCNF decomposition, i.e. decompose into    and R – ( – ). Can we know why if it is in 4NF it is in BCNF? Because for any FD, the property will hold for MVDs. But since any FD that holds, will be in D+ as an MVD then it will hold for both the MVD and the FD.

13 Further Normal Forms Join dependencies generalize multivalued dependencies lead to project-join normal form (PJNF) (also called fifth normal form) A class of even more general constraints, leads to a normal form called domain-key normal form. Problem with these generalized constraints: are hard to reason with, and no set of sound and complete set of inference rules exists. Hence rarely used


15 Introduction XML: Extensible Markup Language
Defined by the WWW Consortium (W3C) Derived from SGML (Standard Generalized Markup Language), but simpler to use than SGML Documents have tags giving extra information about sections of the document E.g. <title> XML </title> <slide> Introduction …</slide> Extensible, unlike HTML Users can add new tags, and separately specify how the tag should be handled for display Markup language defines what part is CONTENT what part is MARKUP, and what the markup means… XML is not functional like HTML, more of a data representation language…

16 XML Introduction The ability to specify new tags, and to create nested tag structures make XML a great way to exchange data, not just documents. Much of the use of XML has been in data exchange applications, not as a replacement for HTML Tags make data (relatively) self-documenting E.g <university> <department> <dept_name> Comp. Sci. </dept_name> <building> Taylor </building> <budget> </budget> </department> <course> <course_id> CS-101 </course_id> <title> Intro. to Computer Science </title> <dept_name> Comp. Sci </dept_name> <credits> 4 </credits> </course> </university> We don’t need to consult a schema to understand the meaning of the text. We can see the meaning from the structure…

17 XML: Motivation Data interchange is critical in today’s networked world Examples: Banking: funds transfer Order processing (especially inter-company orders) Scientific data Chemistry: ChemML, … Genetics: BSML (Bio-Sequence Markup Language), … Paper flow of information between organizations is being replaced by electronic flow of information Each application area has its own set of standards for representing information XML has become the basis for all new generation data interchange formats Paper flow had a procedure that included manual entry… and understanding the form… Now with XML, format is specified by structure. Helps in automation of data exchange in standard data format.

18 XML: Motivation Earlier generation formats were based on plain text with line headers indicating the meaning of fields Similar in concept to headers Does not allow for nested structures, no standard “type” language Tied too closely to low level document structure (lines, spaces, etc) Each XML based standard defines what are valid elements, using XML type specification languages to specify the syntax DTD (Document Type Definition) XML Schema Plus textual descriptions of the semantics XML allows new tags to be defined as required However, this may be constrained by DTDs A wide variety of tools is available for parsing, browsing and querying XML documents/data

19 Comparison with Relational Data
Inefficient: tags, which in effect represent schema information, are repeated Better than relational tuples as a data-exchange format Unlike relational tuples, XML data is self-documenting due to presence of tags Non-rigid format: tags can be added Allows nested structures Wide acceptance, not only in database systems, but also in browsers, tools, and applications

20 Structure of XML Data Tag: label for a section of data
Element: section of data beginning with <tagname> and ending with matching </tagname> Elements must be properly nested Proper nesting <course> … <title> …. </title> </course> Improper nesting <course> … <title> …. </course> </title> Formally: every start tag must have a unique matching end tag, that is in the context of the same parent element. Every document must have a single top-level element

21 Example of Nested Elements
<purchase_order> <identifier> P-101 </identifier> <purchaser> …. </purchaser> <itemlist> <item> <identifier> RS1 </identifier> <description> Atom powered rocket sled </description> <quantity> 2 </quantity> <price> </price> </item> <item> <identifier> SG2 </identifier> <description> Superb glue </description> <quantity> 1 </quantity> <unit-of-measure> liter </unit-of-measure> <price> </price> </item> </itemlist> </purchase_order>

22 Motivation for Nesting
Nesting of data is useful in data transfer Example: elements representing item nested within an itemlist element (no need to do joins to get all the items). Nesting is not supported, or discouraged, in relational databases With multiple orders, customer name and address are stored redundantly normalization replaces nested structures in each order by foreign key into table storing customer name and address information Nesting is supported in object-relational databases But nesting is appropriate when transferring data External application does not have direct access to data referenced by a foreign key

