M.P. Johnson, DBMS, Stern/NYU, Sp20041 C : Database Management Systems Lecture #4 Matthew P. Johnson Stern School of Business, NYU Spring, 2004
M.P. Johnson, DBMS, Stern/NYU, Sp Agenda Last time: finished E/R models per se Announcement: may have occasional pop quizzes at start of class On reading Counting toward participation/attendance grade This time: Intro to relational model Converting E/Rs to relations Functional dependencies Keys and superkeys in terms of FDs Finding keys for relations
M.P. Johnson, DBMS, Stern/NYU, Sp Review: E/R example Exercise Students enroll in courses and get grades Enrollments as a connecting entity set Represent students taking the course Grade of a student for a course Draw E/R diagram Indicate weak entity sets & their keys Is the grade part of the key for enrollments?
M.P. Johnson, DBMS, Stern/NYU, Sp Next topic: the Relational Data Model (3.1) Database Model (E/R, other) Relational Schema Physical storage Diagrams (E/R) Tables: column names: attributes rows: tuples Complex file organization and index structures.
M.P. Johnson, DBMS, Stern/NYU, Sp Relations as tables Name Price Category Manufacturer gizmo $19.99 gadgets GizmoWorks Power gizmo $29.99 gadgets GizmoWorks SingleTouch $ photography Canon MultiTouch $ household Hitachi tuples/rows/records/entities Attribute names Product table/relation
M.P. Johnson, DBMS, Stern/NYU, Sp Relational terminology Relation is composed of tuples Tuples composed of attribute values Attribute has atomic types Relation schema: relation name + attribute names + attribute types Relation instance: set of tuples order doesn’t matter Database schema: set of relation schemas Database instance: relation instance for every relation in the schema
M.P. Johnson, DBMS, Stern/NYU, Sp Relations as sets Remember: math relation is a subset of the cross-product of the attribute value sets R subset-of S x T Product subset-of Name x Price x Cat x Mft One member of Product: (gizmo, $19.99, gadgets, GizmoWorks) in Product DB Relation instance = math relation Q: If relations are sets, why call “instances”? A: R is a member of the powerset P(SxT) powerset = set of all subsets
M.P. Johnson, DBMS, Stern/NYU, Sp More on tuples Formally, can also be a mapping from attribute names to (correctly typed) values: name gizmo price $19.99 category gadgets manufacturer GizmoWorks NB: ordered tuple is equiv to mapping Sometimes we refer to a tuple by itself (note order of attributes) (gizmo, $19.99, gadgets, GizmoWorks) or Product(gizmo, $19.99, gadgets, GizmoWorks).
M.P. Johnson, DBMS, Stern/NYU, Sp Updates The database maintains a current database state Modifications of data: add a tuple delete a tuple update an attribute value in a tuple DB Relation instance = math relation Idea: we saw partic. Product DB instance add, delete rows different DB rel. instances technically, different math relations to DBMS, still the same relation Modifications to the data are frequent Updates to the schema are rare, painful (Why?)
M.P. Johnson, DBMS, Stern/NYU, Sp E/R models to relations (3.2) Recall justification: design is easier in E/R implementation is easier/faster in R Parallel to program compilation: design is easier in C/Java/whatever implemen. is easier/faster in machine/byte code Strategy 1. apply semi-mechanical conversion rules 2. improve by combining some relations 3. improve by normalization involves finding functional dependencies
M.P. Johnson, DBMS, Stern/NYU, Sp E/R conversion rules Relationship relation attributes: keys of entity-sets/roles key: depends on multiplicity Entity set … relation attributes: attributes of entity set key: key of ES NB: mapping of types is not one-one We’ll see: mapping one tokens is not one-one Special treatment: Weak entity sets Isa relations & subclasses
M.P. Johnson, DBMS, Stern/NYU, Sp Entity Sets Entity set Students ssn name address Students John Howard Name South Carolina Park Avenue AddressSSN Rel: Students
M.P. Johnson, DBMS, Stern/NYU, Sp Entity Sets Course CourseID CourseName
M.P. Johnson, DBMS, Stern/NYU, Sp Binary many-to-many relationships Key: keys of both entities Why we learned to recognize keys C C C CourseIDssn Relation: Enrolls Enrolls S_addr S_Name Students Course Course-Name CourseID ssn
M.P. Johnson, DBMS, Stern/NYU, Sp Many-to-one relationships Key: keys of many entitiy MoviesStudiosowns 2003SyliaM Mr. Ripley.M101 YearTitleMovieID Movies OrlandoDisneyS73 NYCMiramaxS35 AddressNameStudioID Studios S35 S73 StudioID CN22222 CN11111 CopyrightNo M202 M101 MovieID Owns CopyrightNo MovieID Title Year StudioID Name Address
M.P. Johnson, DBMS, Stern/NYU, Sp Improving on many-one Note rules applied: Movies Rel.: all atts from Movies ES Studios Rel: all atts from Studios ES Owns Rel: att key atts from Movies & Studios ESs But: Owns:Movies Studios is many-one for each row in Movies, there’s a(/no) row in Owns just add the Owns data to Movies
M.P. Johnson, DBMS, Stern/NYU, Sp Many-to-one: a better design Q: What if a movie’s Owns row were missing? 2003SyliaM Mr. Ripley.M101 YearTitleMovieID Movies S35 S73 StudioID CN22222 CN11111 CopyrightNo M202 M101 MovieID Owns CN22222 CN11111 CopyrightNo S35 S73 StudioID Year SyliaM202 Talent Mr. Ripley M101 TitleMovieID Movies’
M.P. Johnson, DBMS, Stern/NYU, Sp Many-to-many relationships again Won’t work for many-many relationships acts MovieIDTitleYear M101Mr. Ripley1999 M202Sylia2003 M303P.D. Love2002 StarIDNameAddress T400Gwyneth P.Bev.Hills T401P.S. HoffmanHollywood T402Jude LawPalm Springs MovieIDStarID M101T400 M202T400 M101T401 M101T402 M303T401 Movies Stars Acts Movies Stars
M.P. Johnson, DBMS, Stern/NYU, Sp Many-to-many relationships again MovieIDTitleYearStarID M101Talented Mr. Ripley1999T400 M101Talented Mr. Ripley1999T401 M101Talented Mr. Ripley1999T402 M202Sylia2003T400 M303Punch Drunk Love2003T401 And here’s why:
M.P. Johnson, DBMS, Stern/NYU, Sp Multiway relationships & roles Different roles treated as different entity sets Key: keys of the many entities StudentsCourses TAs tutorsgraders enrolls TA_SSN Name SSNCourseID Name
M.P. Johnson, DBMS, Stern/NYU, Sp Multiway relationships & roles Enrolls(S_SSN, Course_ID, Tutor_SSN, Grader_SSN) SSNName George Dick TA_SSNName Wesley Howard John StudentsTAs CourseIDName C Databases C Software Courses S_SSNCourseIDTutor_SSNGrader_SSN C C
M.P. Johnson, DBMS, Stern/NYU, Sp Converting weak ESs – differences Atts of Crew Rel are: attributes of Crew key attributes of supporting ESs CrewUnit-ofStudio StudioName Crew_ID address C2Miramax C1Disney C1Miramax Crew_IDStudioName Crew Supporting relships may be omitted (why?)
M.P. Johnson, DBMS, Stern/NYU, Sp Weak entity sets - relationships CrewStudio StudioName Crew_ID address Insurance IName Address th Av.NYBlueCross th Av.NYAetna AddressIName Insurance Subscribes Unit-of
M.P. Johnson, DBMS, Stern/NYU, Sp Weak entity sets - relationships Non-support relationships for weak ESs are converted keys include entire weak ES key C21 C22 C21 Crew_ID Aetna BlueCross Aetna Insurer Universal Disney Universal StudioName Subscribes
M.P. Johnson, DBMS, Stern/NYU, Sp Conversion example Video store rental example, plus some atts Q: Conversion to relations? Rental VideoStore Customer Movie date year MName address Cname MID
M.P. Johnson, DBMS, Stern/NYU, Sp Conversion example, continued Resulting binary-relationship version Q: Conversion to relations? Rental Customer Store Movie StoreOf MovieOf BuyerOf date year MName address Cname MID
M.P. Johnson, DBMS, Stern/NYU, Sp Converting inheritance hierarchies (3.3) No best way Several non-ideal methods: E/R-style: each ES relation OO-style: each possible “object” relation nulls-style: each rooted hierarchy relation non-applicable fields filled in with nulls Pros & cons for each method, exist situations favoring it
M.P. Johnson, DBMS, Stern/NYU, Sp Converting inheritance hierarchies Movies Cartoons Murder- Mysteries isa Voices Weapon stars length titleyear Lion King Component
M.P. Johnson, DBMS, Stern/NYU, Sp Inheritance: E/R-style conversion Each ES relation Root entity set: Movies(title, year, length) Lion King Year length Roger Rabbit Scream Star Wars Title Knife1990R. Rabbit 1988 Year Knife murderWeapon Scream Title Subclass: MurderMysteries(title, year, murderWeapon) Subclass: Cartoons(title, year) 1993Lion King 1990 Year Roger Rabbit Title
M.P. Johnson, DBMS, Stern/NYU, Sp Subclasses: object-oriented approach Every possible “subtree” (what’s this?): 1. Movies 2. Movies + Cartoons 3. Movies + Murder-Mysteries 4. Movies + Cartoons + Murder-Mysteries TitleYearlength Star Wars TitleYearlengthMurder-Weapon Scream Knife TitleYearlength Lion King TitleYearlengthMurder-Weapon Roger Rabbit Knife
M.P. Johnson, DBMS, Stern/NYU, Sp Subclasses: nulls approach One relation for entire hierarchy Any non-applicable fields are NULL Q: How do we know if a movie is a MM? Q: How do we know if a movie is a cartoon? TitleYearlengthMurder-Weapon Star Wars NULL Lion King NULL Scream Knife Roger Rabbit Knife
M.P. Johnson, DBMS, Stern/NYU, Sp Subclasses methods: considerations 1. Query time ~ number of tables accessed in query nulls: best, since each entity single row multi-node questions: “Find 1999 films with length > 150 mins” E/R: just Movies, so fast OO: Movies AND cartoons, so slow single-node questions: “Find weapons in >150-min. cartoons” E/R: Movies, Cartoons AND MMs, so slow OO: just MoviesCMM, so fast 2. Number of relations per entity set nulls: just one, so very few E/R: one per ES, so medium number OO: exponentional in #ESs, so very many (how many?) 3. Number/size of rows per entity nulls: one “long” row per entity non-applicables become null OO: one (all-relevant) row per entity, so smallest E/R: several (tho all relevant) rows per entity Better of E/R and OO depends (on what?)