Data Modeling using XML Schemas Murali Mani Extreme 2002
What this talk is not about Not about X gave a two thumbs up for the Fugitive, The We talk about data modeling from database perspective.
What is database perspective? Our world consists of Entities Relationships binary - 1:1, 1:many, many:many n-ary recursive Attributes for entities Attributes for relationships
Outline of the talk How XML can contribute to the DB community. Introduction of the ER model How ER concepts are modeled using relational model Mapping ER concepts to XML model Constraint specification for XML – what are the options? Subtyping for XML processing – do we need it, and what are the options?
How XML can contribute to DB community Standard exchange format Superior data model? Recursive relationships Union types Person (name | (lastname, firstname), age, address) Frendlier representation of relationships?
person (person*) person (person?) PersonAgeFather X25Y Y55null
Data Modeling What is a data model? Structural specification Specification of constraints Operations to retrieve/update the data Stages in database design Conceptual model Logical Model Physical model Conceptual Model and Logical Model – absolutely NO (almost no) redundancy
Database Design and Redundancy ProfAge Muntz60 studentBSProf MMCSMuntz YCEEMuntz StudentBSProfAge MMCSMuntz60 YCEEMuntz60
Database design and redundancy PersonAddressCityStatezip XA1LAXCA90066 YA2LAXCA90066
Entity Relationship (ER model) Consider students and professors in a dept, with a relationship advisor StudentProfsince MMMuntz1998 YCMuntz2000
ER Model (contd…) N-ary relationship
Relational Model Every relation has a key Relationships are represented using foreign keys Foreign key from A to B represents A (_, 1) : B (_, _) relationship SupplierPartCitylastShipment
Relational Model (contd…) SupplierPartCitylastShipment PName Muntz StudentProfessorsince MMMuntz1998 YCMuntz2000
Relationships in XML model A (1, 1) : B (_, _) can be represented using parent-child relationships as B A* prof student*)
Entity Relationship (ER model) Consider students and professors in a dept, with a relationship advisor StudentProfsince MMMuntz1998 YCMuntz2000
Using ID/IDREF to represent relationships A (_, 1) : B (_, _) can be represented using ID/IDREF as Define an ID attribute for B Define an IDREF attribute for A referring B prof
Using ID/IDREFS to represent relationships – not Really… ID/IDREFS can represent any binary relationship – A (_, _) : B (_, _), but cannot represent attributes for relationship A B student professor
Using foreign keys to represent relationships student (SName, Professor, since) professor (PName)
Summary so far… XML schemas allow us to represent relationships in a friendlier way… All foreign key constraints can be represented using parent-child or ID/IDREF – we do not really need foreign keys IDREFS not recommended for representing relationships.
Constraint specification in XML – questions to be asked Node equality vs value equality (or) Can a path field produce an element? Can a path field produce a set of elements/values? – if so, what semantics? Should a path field exist? (or) Can a path field return empty? Should path expressions traverse only down the tree? Should our constraints be based on type selectors or should they be based on path expression selectors? If we use path expression or type selectors, do we need relative keys?
Node Equality Makes it easier, but… When are two elements equal – their serialized string values ignoring the order of attributes is the same. We have used order among child nodes in defining node equality…
Can a path field produce a set of values? professor (Pname, Age) 60 Muntz Chu If a type X has a key (X1, X2, …, Xn), then the set Y1 * Y2 * … * Yn should be unique
Should a path expression traverse only down the tree? Trade off is relative keys vs traversing up the tree.. For example, consider student, professor with a difference – a student can have multiple professors. Consider the same design Professor (PName, Student*) Student (Sname) Key for student can be specified as either (professor, Sname) (or) Key for student relative to professor is (Sname) But this is bad design anyways…
Three different constraint specifications UCM – WWW10 Type selectors, no relative keys, path expressions can produce set of values. Keys for XML – WWW10 Path selectors, relative keys specified through paths, path expressions cannot produce set of values. W3C XML Schema Path selectors, relative keys specified through types, path expressions cannot produce set of values.
Commonalities across the 3 specifications No concept of node equality Path expressions traverse only down the tree A path field should exist
Summary about Data Modeling Entity types map to element types. Some relationship types map to element types. Ability to define element types – RELAX NG provides the ability for us to define element types, In XML Schema, this is not so easy. Key constraints based on type selectors seem the right way to go.
XML Processing and Subtyping Subtyping is essential for static type checking function f1 : a{A} B*,C* { for $x in a//name return ; for $x in a//name return ; } function f2 : d{(B, B)*, (C, C)* | B, (B,B)*, C, (C, C)*} { … } Is this type-safe? Type-inferencing vs type-checking problem.
Two techniques for subtyping Implicit – tree/hedge language inclusion A type A is a subtype of type B iff L (A) is a sublanguage of L (B) – used in XDuce Explicit – user specifies type hierarchy As in XML Schema Explicit subtyping “implicitly” solves type- inferencing vs type checking problem. Implicit subtyping poses several interesting research problems.