DASWIS 2001 1 NF-SS: A Normal Form for Semistructured Schemata Xiaoying Wu, Tok Wang Ling, Sin Yeung Lee, Mong Li Lee National University of Singapore.

Slides:



Advertisements
Similar presentations
primary key constraint foreign key constraint
Advertisements

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
Designing Functional Dependencies For XML Mong Li LEE, Tok Wang LING, Wai Lup LOW EDBT 2002.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Semi-structured Data. Facts about the Web Growing fast Popular Semi-structured data –Data is presented for ‘human’-processing –Data is often ‘self-describing’
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Schema Refinement and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 6, 2004 Some slide content.
Chapter 5 Normalization Transparencies © Pearson Education Limited 1995, 2005.
Databases 6: Normalization
Powerpoint 2006 PRESENTATION The University of Auckland New Zealand Marsden Fund A PVS Approach to Verifying ORA-SS Data Models Scott Uk-Jin Lee 1, Gillian.
1 The ORA-SS Approach for Designing Semistructured Databases Xiaoying Wu, Tok Wang Ling, Mong Li Lee National University of Singapore Gillian Dobbie University.
Tok Wang Ling1 Mong Li Lee1 Gillian Dobbie2
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 10 Functional Dependencies and Normalization for Relational Databases.
Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.
The Relational Model. Review Why use a DBMS? OS provides RAM and disk.
A Z Approach in Validating ORA-SS Data Models Scott Uk-Jin Lee Jing Sun Gillian Dobbie Yuan Fang Li.
Querying Structured Text in an XML Database By Xuemei Luo.
Instructor: Churee Techawut Functional Dependencies and Normalization for Relational Databases Chapter 4 CS (204)321 Database System I.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second.
1 The Relational Model. 2 Why Study the Relational Model? v Most widely used model. – Vendors: IBM, Informix, Microsoft, Oracle, Sybase, etc. v “Legacy.
Computing & Information Sciences Kansas State University Tuesday, 27 Feb 2007CIS 560: Database System Concepts Lecture 18 of 42 Tuesday, 27 February 2007.
Logical Database Design (1 of 3) John Ortiz Lecture 6Logical Database Design (1)2 Introduction  The logical design is a process of refining DB schema.
1 Lecture 6: Schema refinement: Functional dependencies
Chapter 10 Normalization Pearson Education © 2009.
Functional Dependencies. FarkasCSCE 5202 Reading and Exercises Database Systems- The Complete Book: Chapter 3.1, 3.2, 3.3., 3.4 Following lecture slides.
1 Functional Dependencies and Normalization Chapter 15.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
1 Resolving Schematic Discrepancy in the Integration of Entity-Relationship Schemas Qi He Tok Wang Ling Dept. of Computer Science School of Computing National.
Chapter 5.1 and 5.2 Brian Cobarrubia Database Management Systems II January 31, 2008.
Chapter 7 Functional Dependencies Copyright © 2004 Pearson Education, Inc.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
Deriving Relation Keys from XML Keys by Qing Wang, Hongwei Wu, Jianchang Xiao, Aoying Zhou, Junmei Zhou Reviewed by Chris Ying Zhu, Cong Wang, Max Wang,
Towards the Preservation of Keys in XML Data Transformation for Integration Md. Sumon Shahriar and Jixue Liu Data and Web Engineering Lab Computer and.
CS542 1 Schema Refinement Chapter 19 (part 1) Functional Dependencies.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying and Ling Tok Wang.
Al-Imam University Girls Education Center Collage of Computer Science 1 st Semester, 1432/1433H Chapter 10_part 1 Functional Dependencies and Normalization.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
1 CS 430 Database Theory Winter 2005 Lecture 8: Functional Dependencies Second, Third, and Boyce-Codd Normal Forms.
NormalisationNormalisation Normalization is the technique of organizing data elements into records. Normalization is the technique of organizing data elements.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Normal Forms Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems June 18, 2016 Some slide content courtesy of Susan Davidson.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Chapter 14 Functional Dependencies and Normalization Informal Design Guidelines for Relational Databases –Semantics of the Relation Attributes –Redundant.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
Chapter 8 Relational Database Design Topic 1: Normalization Chuan Li 1 © Pearson Education Limited 1995, 2005.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
Modified Slides from Dr.Peter Buneman 1 XML Constraints Constraints are a fundamental part of the semantics of the data; XML may not come with a DTD/type.
Functional Dependency and Normalization
Database Management Systems (CS 564)
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
CS 480: Database Systems Lecture 22 March 6, 2013.
Database Management systems Subject Code: 10CS54 Prepared By:
Chapter 14 Normalization – Part I Pearson Education © 2009.
CS 405G: Introduction to Database Systems
XML Constraints Constraints are a fundamental part of the semantics of the data; XML may not come with a DTD/type – thus constraints are often the only.
Chapter 7a: Overview of Database Design -- Normalization
Presentation transcript:

DASWIS NF-SS: A Normal Form for Semistructured Schemata Xiaoying Wu, Tok Wang Ling, Sin Yeung Lee, Mong Li Lee National University of Singapore Gillian Dobbie University of Auckland, New Zealand

DASWIS Outline 1. Motivations 2. Semistructured schema and its data tree 3. Integrity constraints for semistructured data 4. NF-SS: Normal Form for Semistructured Schemata 5. Designing of semistructured schema into NF-SS 6. Discussions of the designing approach 7. Comparison with related proposal 8. Summary

DASWIS Motivation: Example 1 <!ELEMENT department (course+) <!ATTLIST department name ID #REQUIRED> <!ATTLIST course cid ID #REQUIRED title CDATA #implied> <!ATTLIST student sid ID #REQUIRED name CDATA #REQUIRED age CDATA #IMPLIED> course title student sid age name + department grade cid * ? name

DASWIS Motivation ( cont. ) 4 Redundancy: name and age of a student 4 Updating Anomaly: –Insertion –Rewriting –Deletion

DASWIS Motivation:Example 2 name CDATA #REQUIRED> <!ATTLIST subject cid ID #REQUIRED> <!ATTLIST day CDATA #REQUIRED hour CDATA #REQUIRED> 4 Path anomaly: –The schema doesn’t reflect the integrity constraints: tid,day,hour  cid,room#

DASWIS Semistructured Schema and Data tree A semistructured schema is defined to be D = (E, A, B, P, R, r) course title student sid age name + department grade cid * ? name E: Object type A: attribute s E is a finite set of object types in D. A is a finite set of attributes, disjoint from E. P is a function from E to object type definition with symbol in {*, +, ?,1} called multiplicity e.g: P (course) = student * r: root Object type R is a function from E to the power set of A e.g.: R(student) = {sid, name, age } multiplicity r  E and is called the object type of the root. e.g.: r = department B is a set of basic domain type like string, integer, Boolean etc.

DASWIS Semistructured Schema and Data tree (Cont.) cid: cs4221 title: database design sid: s01 “A” title: data Mining age: 21 name: Jack       course name: CS department course student sid: s02 name: Tom student grade cid: cs5220 sid: s01 age: 21 name: Jack student A data tree T with respect to a semistructured schema D = (E, A, B, P, R, r) is defined to be a tree T=(V, lab, obj, att, val, root), showing a database instance.

DASWIS course title student sid age name + department grade cid * ? name The path of a node n in semistructured schema D is denoted as path D (n). e.g.: Path D for student is /department / course / student The path of a node v in data tree T is denoted as Path T (v) e.g.: Path T for student “s02” is /department / course/ student The target set of node n in T, T[n], is {v: v  V, n  E  A Path T (v)= Path D (n)}. e.g.: the target set T[student] includes nodes of students with sid “s02” etc. 2. Semistructured Schema and Data tree (Cont.)

DASWIS Semistructured Schema and Data tree (Cont.) 4 Two nodes from two data tree w.r.t schema D satisfy value equality iff –they are attributes nodes with the same tag and the same value; –or they are object nodes having the same tag and their children are pairwise value equal cid: cs4221 title: database design sid: s01 “A” title: data Mining age: 21 name: Jack      course name: CS department course sid: s02 name: Tom student grade cid: cs5220 sid: s01 age: 21 name: Jack student  Two data trees T 1 and T 2 w.r.t schema D = (E, A, B, P, R, r), X  E  A. T 1 and T 2 agree on X, denoted as iff the following condition is hold:  t 1  T 1 [X],t 2  T 2 [X], such that (t 1 = v t 2 )

DASWIS Integrity Constraints for Semistructured Data 4 Extended Functional Dependency(EFD) Let D = (E, A, B, P, R, r) be a semistructured schema, let X  E  A and Y  E  A. Y is extended functionally dependent on X, is denoted as X  Y. Let S denotes a set of data trees that are images of D, S satisfies X  Y, iff for any data trees T 1, T 2 in S, if they agree on every component in X, then they will agree on Y.that is,  T 1, T 2  S((  x  X, T 1 = x T 2 ) such that T 1 = y T 2 ). 4 Inference rule for EFD E1:(reflexivity) If Y  X, then X  Y, for any X, Y  E  A E2:(augmentation) if X  Y then XZ  YZ, for any X, Y, Z  E  A E3:(transitivity) If X  Y, Y  Z then X  Z, for any X, Y, Z  E  A

DASWIS Integrity Constraints for Semistructured Data (Cont.) 4 Notation: 4 EFD X  Y is partial EFD: If there exists an X’  X such that X’  Y. Otherwise, is full EFD. e.g.: (1)  is partial EFD (2)  its full EFD 4 X  Y is said to be coherent iff /X/Y is a path in D; otherwise it is called an incoherent EFD. O 1 1 ], …, O i i ],…,O n-1 n-1 ]  O n n ]  is an incoherent EFD, since /teacher / time /subject is not a path in schema.

DASWIS Integrity Constraints for Semistructured Data (Cont.) 4 If there exists Z  E  A, such that X  Y and Y  Z and Y X, then Z is transitively extended functionally dependent on X via Z. e.g.: age is transitively dependent on course via student since (1)  (2)  and course title student sid age name + department grade cid * ? name

DASWIS Integrity Constraints for Semistructured Data (Cont.) 4Theorem Let D = (E, A, B, P, R, r) be a semistructured schema, X, Y, Z  E  A. If Z is transitively dependent on X via Y, then there exists a data tree of D where a rewriting anomaly occurs upon updating the values of Z.

DASWIS Integrity Constraints for Semistructured Data (Cont.) 4 Key Constraints : Based on EFD semantics 4 Notation: K o = O 1 1 ]/…/O i i ]/…/O n n for key of an object type O in semistructured schema D. /O 1 /…/O is a path in D If n equals one, then K o is called an absolute key. Otherwise it is called a relative key. Example K book = K book is an absolute key K chapter K chapter is a relative key K section = K section is a relative key

DASWIS Integrity Constraints for Semistructured Data (Cont.) Let D be a semistructured schema and O be its root object type. The set of basic dependencies of D, denoted as BD(D), is defined as follows: 4 Let X, Y be children of O, non-trivial extended functional dependencies of the form X  Y where X is a key of O or Y is part of a key of O, are in BD(D).  Let O 1 be a sub-object type of O and D 1 be a schema tree that is rooted at O 1 and add K O as attribute(s) of O 1, then BD(D 1 )  BD(D). 4 No other non-trivial dependencies that is not generated from above is in BD(D)

DASWIS NF-SS Let D be a semistructured schema and O be its root object type. D is in Normal Form for Semistructured Schemata (NF-SS), iff 1.O has at least one key. 2. For any non-trivial EFD of the form X  Y satisfied by O, where X and Y are attributes of O, then either X is a key or Y is part of the key of O 3.For any sub-object type O 1 of O (a) If adding K O to O 1 as its components with other remains, a schema tree rooted at O 1 will be in NF-SS. (b) K O  K O1 =  or K O  K O1, where K O and K O1 are O and O 1 ’ s key respectively. (c) O 1 is not transitively dependent on K O 4. Any non-trivial EFD in D can be derived from BD(D) by using the inference rules for EFDs.

DASWIS Designing Semistructured Schema into NF-SS 4 We adopt restructuring approach for the designing. 4 We propose four heuristic restructuring rules –Decomposition object types. –Creation new object types. –Regrouping components of an object type. 4 Objective –Remove transitive or partial EFD and incoherent EFD from the given dependency and key constraints.

DASWIS Designing Semistructured Schema into NF-SS (cont.) Rule 1. (Remove Transitive Dependency by Decomposition) Given an object type O in a semistructured schema D, if there is some non-prime component(s) Y of O that is transitively dependent on some key of O, i.e., K O  X, X  Y and X K O, and X  K O = . Then, restructuring the schema as follows. 1. Duplicate X to form a new node(s) Z. 2. Move Y and all the descendants of Y and their corresponding edges under Z. 3. Make X as foreign key of O, and add a reference edge from the original node X to Z.

DASWIS Designing Semistructured Schema into NF-SS (cont.) 4 Example 5.1: schema D satisfies the following EFDs  (2)  department    grade

DASWIS Designing Semistructured Schema into NF-SS (cont.) Rule 2. Remove Path Anomaly by Path Splitting Given a semistructured schema D. Suppose there exists an incoherent EFD: O 1 1 ],…,O n n ]  Y, Y is either an object type or an attribute, and there exists a path P that contains {O 1,…,O n,Y}. Path P can be split into two sub-paths P 1 and P 2,where P 1 only contains {O 1,…,O n } and Y, while P 2 contains {O 1,…,O n } and (P-Y).

DASWIS Designing Semistructured Schema into NF-SS (cont.) 4 Example 5.2:schema D satisfies following EFDs (1)  ClassRoom time  subject

DASWIS Designing Semistructured Schema into NF-SS (cont.) Rule 3. Removing Partial Dependency by Creating New Object type Given an object type O in a semistructured schema, let X be a set of prime attributes of O, and Y be the set of O ’ s attributes. Let O 1 be a sub-object type of O. If (K O -X)  O 1 and no proper superset of X satisfy this property, then restructure the schema as follows: 1. (K O  Y – X) becomes the only attribute(s) of O while O 1 remains to be its sub-object type. 2.Create a new object type O 2 that is a direct component of O. 3.Move rest of the components of O and all their descendants and corresponding edges under O 2.

DASWIS Designing Semistructured Schema into NF-SS (cont.) 4 Example 5.3: schema D shown in Figure (a). the following EFDs  D,  O 2,  O 1,  E } and the key of O is {A,B}.

DASWIS Designing Semistructured Schema into NF-SS (cont.) Rule 4. (Restructuring To Satisfy Condition 3(b) of NF-SS Definition) Given an object type O in a semistructured schema D, X be a set of O ’ s attributes and single-valued atomic sub-object types, O 1 be a complex sub-object type of O. O 1 has relative key K O1, but K O  K O1 and K O1 K O. Let Y be K O  K O1  X, and Y . D is restructured as follows: 1. O 1 remains to be a sub-object type of O. 2. Make Y as components of O. 3.Create a new object type O 2 to be a child of O and the rest components of O (excluding Y) become children of O 2.

DASWIS Designing Semistructured Schema into NF-SS (cont.)  Example 5.4: schema D in Figure (a) satisfies the EFD  O 1  O 2 and the key of O is {K, A, B}.

DASWIS Designing Semistructured Schema into NF-SS (cont.) Algorithm 1: Restructuring Algorithm Input: A set S that contains semistructured schemas, and a set of EFDs for S. Output: A set of semistructured schemas that in NF-SS. Begin 1. for each semistructured schema D in S do if D is not in NF-SS then repeat until no further change: (1) if there exists transitive EFD: K O  X, X  Y and X K O for an object type O in D, Case X  K O =  : apply Rule 1 to remove the transitive EFD. Case X  K O : apply Rule 3 to remove the transitive EFD. Case X  K O  : apply Rule 4 to remove the transitive EFD. (2) if there exists incoherent EFD then apply Rule 2 to remove it. 2. output S. End

DASWIS Discussion of Restructuring Approach for Designing 4 Is the restructuring rules complete? No. –covering is not guaranteed –dependency preservation is not guaranteed 4 Does it give unique solution? No. –depending on the order in which the dependencies are examined 4 Designing task can be made easier if more semantics available. –In [5], We have proposed another approach for designing semistructured databases using ORA-SS, a semantic rich model. 4 Nevertheless, it does give practical heuristics and provides insights into the normalization task for semistructured databases.

DASWIS Comparison with Related Proposal 4 The first attempt to define normal form for semistructured data ([ER’99] S.Y.Lee, M.L.Lee, T.W.Ling, and L.A.Kalinichenko.) [3] –Defines a schema called S3-Graph, which makes no distinction between element node and attribute node and no cardinality specification. –Proposes S3-NF, but missing key constraints, an essential part of database design. –The decomposition method may not be able to remove some other kinds of anomalies, like partial dependency and path anomaly that may exist in a schema. 4 The most recent proposal: XNF (XML Normal Form) ([ER 2001] D.W.Embley and W.Y.Mok. ) [2] –It mainly provides algorithms to translate a schema, represented in a conceptual model called CM hypergraphs, to a scheme-tree forest in XNF. –Like S3-Graph, scheme tree doesn't lend itself to XML definition. –XNF isn’t formulated with the concept of key. –The algorithms given suffers from efficiency. –A large set of results is expected.

DASWIS Summary 4 A normal for semistructured schemata –It is incorporated with integrity constraints. –It guarantees no redundancy and hence no undesirable updating anomalies for the conforming semistructured databases. –It gives more reasonable representations of real world semantics 4 Restructuring Approach for designing semistructured databases –a set of heuristic restructuring rules is proposed. –an algorithm for iteratively restructuring a schema into NF-SS is developed. –It provides insights into the normalization task for semistructured databases.

DASWIS References 1. J. Clark and S. DeRose. XML Path Language (XPath). W3C Working Darft, November D.W.Embley and W.Y.Mok. Developing XML Documents with Guaranteed “Good” Properties. Proceedings of the 20th International Conference on Conceptual Modeling (ER), S. Y. Lee, M. L. Lee, T. W. Ling and L. A.. Kalinichenko. Designing Good Semi-structured Databases. Proceedings of the 18th International Conference on Conceptual Modeling (ER), T. W. Ling and L. L. Yan. NF-NR: A Practical Normal Form for Nested Relations. Journal of Systems Integration. Vol4, 1994, pp Xiaoying Wu, Tok Wang Ling, Mong Li Lee, Gillian Dobbie. Designing Semistructured Databases Using the ORA-SS Model, accepted for publication in Proceedings of the 2nd International Conference on Web Information Systems Engineering (WISE), IEEE Computer Society, Kyoto, Japan, December 2001.

DASWIS Q&A