M.P. Johnson, DBMS, Stern/NYU, Sp20041 C : Database Management Systems Lecture #3 Matthew P. Johnson Stern School of Business, NYU Spring, 2004
M.P. Johnson, DBMS, Stern/NYU, Sp Agenda Last time: E/R models, some design issues This time: More design “carving at the joints” Redundancy Whether an element should be an attribute or entity set Replacing a relationships with entity sets Constraints Identifying & specifying key attributes to an entity set Recognizing other types of single-valued constraints Representing referential integrity constraints Identifying & representing general constraints Weak entity sets
M.P. Johnson, DBMS, Stern/NYU, Sp Review Multiplicity review: Square-of? (e.g., (3,9)) Cube-of? (e.g., (-3,-27)) Wife-of? Wife-of-in-Utah?
M.P. Johnson, DBMS, Stern/NYU, Sp Design Principles Faithfulness Avoiding redundancy Simplicity Choice of relationships Picking elements
M.P. Johnson, DBMS, Stern/NYU, Sp Avoiding redundancy Say everything exactly once Minimize database storage requirements More important: prevent possible update errors simplest but not only e.g.: modify data one place but not the other – more later Example: Spot the redundancy StudiosMovies Own StudioName Name Length Name Address Redundancy: Movies “knows” the studio two ways Phone
M.P. Johnson, DBMS, Stern/NYU, Sp Spot more redundancy Different redundancy: studio info listed for every movie! Movies StudioName Name Length SAddress SPhon e Name Length Studio SAddress SPhone Pulp Fiction… Miramax NYC 212-… Sylvia… Miramax NYC 212-… Jay & Sil. Bob … Miramax NYC 212-… …
M.P. Johnson, DBMS, Stern/NYU, Sp Don’t add relships that are implied StudentsCourses TAs Enrolls TA-of Assist Suppose each course again has <=1 TA Q: Is the following good design? A: If TAs other than the course’s TA can help students, then yes; if not, then no: we can connect Students and TAs by going through Courses; redundant!
M.P. Johnson, DBMS, Stern/NYU, Sp Correct E/R models may contain loops Person plays multiple roles: employee of company buyer of product price address namessn Person buys makes employs Company Product namecategory stockprice name
M.P. Johnson, DBMS, Stern/NYU, Sp More design Repeating TA names & IDs – redundant TA is not TAing any course now lose TA’s data! TA should get its own ES StudentsCourses Enrolls Q: What’s wrong with this design? A: TA-NameTA-ID TA- Course-ID CName
M.P. Johnson, DBMS, Stern/NYU, Sp Opposite problem: Entity or attribute? Some E/Rs improved by removing entities Can convert Entity E attributes of F 1. R:F E is many-one one-one counts because special case 2. Attributes for E are independent of each other knowing one att val doesn’t tell us another att val Then remove E add all attributes of E to F
M.P. Johnson, DBMS, Stern/NYU, Sp StudentsCourses Enrolls TA-Name Assists TA Entity attribute CName Room StudentsCourses Enrolls CName Room TA-Name Course-ID
M.P. Johnson, DBMS, Stern/NYU, Sp Convert TA entity again? No! Multiple TAs allowed Violates condition (1) Redundant course data StudentsCourses Enrolls Assists TA CName CIDRoom TA-Name DBMS Howard DBMS Wesley … CName Room Course-ID TA-Name
M.P. Johnson, DBMS, Stern/NYU, Sp Convert TA entity again? StudentsCourses Enrolls Assists TA CName Room Course-ID TA-IDTA-Favorite-Color No! TA has dependent fields Violates condition (2) How can it tell? Redundant TA data CName TA-Name TA-ID TA-Color DBMS Ralph 678 Green A.Soft. Ralph 678 Green … TA-Name
M.P. Johnson, DBMS, Stern/NYU, Sp Entity or attributes? Should student address be an entity or an attribute? If student may have multiple addresses, must be entity campus address, permanent address attributes cannot be set-valued If we need to examine structure of address, must be entity find all students from NYS but not NYC If attribute, then it’s probably a simple string no structure! NB: this choice is a microcosm of entire miniworld (much) power of a DB comes from the structure imposed on the data
M.P. Johnson, DBMS, Stern/NYU, Sp Larger example DB design Application: library database. Authors have written books about various subjects; different libraries in the system may carry these books. Entities (with attributes in parentheses): Authors (ssn, name, phone, birthdate) Books (ISDN, title) Subjects (sname) Libraries (lname) Relations [associating entities in square brackets]: Wrote-on [Authors, Subjects] Carry [Libraries, Subjects] On [Books, Subjects]
M.P. Johnson, DBMS, Stern/NYU, Sp E/R of DB design Name Author ssnphonebirthdate wrote-on Subject SName Title Carries Library LName On Book ISBN
M.P. Johnson, DBMS, Stern/NYU, Sp Poor initial design First design is a poor model of this system Problems: no direct relship associating authors and books no direct relship associating libraries and books Common queries complex and difficult What libraries carry books by a given author? What books has a given author written? Who is the author of a given book? Some not supported: How many copies does a lib. have of a given book? What edition of a book does the library have?
M.P. Johnson, DBMS, Stern/NYU, Sp Larger example DB design 2 Application: library database as before Entities (with attributes in parentheses): Authors (ssn, name, phone, birthdate) Books (ISDN, title) Subjects (sname) Libraries (lname) Relations [associating entities in square brackets] (attributes in parentheses): Wrote [Authors, Books] Carries [Libraries, Books] (quantity, edition) On [Books, Subjects]
M.P. Johnson, DBMS, Stern/NYU, Sp E/R of improved DB design Rule of thumb: often queried together make closely connected Name Author ssnphonebirthdate wrote Book ISBN Title Carries Library LName Edition Quantity On Subject SName
M.P. Johnson, DBMS, Stern/NYU, Sp Next topic: Constraints (2.3) Review: programmer-defined rules stating what should always be true about consistent databases Restrictions on data: Keys (e.g. SSN uniquely identifies people) Single value constraints (e.g. everyone has 1 father) Referential Integrity (e.g. person’s record refers to father father must exist) Domain constraints (e.g. gender in M/F, age in ) General constraints (e.g. no more than 10 customers per sales rep) Can’t infer constraints from data may hold “accidentally” they are a part of the schema
M.P. Johnson, DBMS, Stern/NYU, Sp E/R keys Uniquely identifies entity in ES Attribute or set of attributes Two entities cannot agree on all attributes These attributes determine all others Every ES has a key possibly including all attributes Primary key attributes underlined More than one possible key: Candidate keys, primary key ISA hierarchy: Root entity set has all key-attributes Practical tip: create intentional key attribute E.g. SSN, course-id, employee-id, etc. SSN likely shorter than (name,address) Prevents quasi-redundancy address namessn Person
M.P. Johnson, DBMS, Stern/NYU, Sp Single-valued constraints “at most one” value sharp arrows E.g. attributes: could be null or one Many-one relationships: the “one” part is single-valued. Key attributes are single-valued but can’t be null TACourse Assists
M.P. Johnson, DBMS, Stern/NYU, Sp Referential integrity “Exactly one value” Non-null attributes Relationships Non-null value refers to entity that exists Refer to entity with foreign key HTML analogy: no broken links Programming analogy: no dangling pointers Ways of handling deletion: Prevent deletion as long as referrer exist Enforce deletion of all referrers InstructorCourse Taught
M.P. Johnson, DBMS, Stern/NYU, Sp Referential integrity – ER e.g. Insertion – must refer only existing entity Suppose need to add course: “Intro to Screaming” instructor: Prof. Howard Q: Which order? Q: What if relship were exactly-exactly? i.e., referential integrity in both directions? A: Put both inserts in one xact – later StudentsCourses Enrolls Instructor Taught
M.P. Johnson, DBMS, Stern/NYU, Sp Other kinds of constraints Domain constraints E.g. date: must be after 1980 Enumerated type: grades A through F, no E No specific ER notation: mention with attribute or relationship General constraints: A class may have no more than 100 students; a student may not have more than 6 courses: StudentsCourses Enroll <=6<=100
M.P. Johnson, DBMS, Stern/NYU, Sp Next topic: Weak entity sets (2.4) Definition: Some or all key attributes belong to another ES Why: An entity set is part of a hierarchy (not ISA) Connecting entity sets The key consists of 0, 1 or more of its own attributes Key attributes of entity sets from supporting relationships
M.P. Johnson, DBMS, Stern/NYU, Sp Conditions of Supporting relationships Supporting relationship R:E F R is many-one (E-F) or one-one R is binary (Why?) Referential integrity from E to F i.e. a rounded arrow Attributes supplied to E are key attributes of F F itself may be weak Another entity set G, and so on recursively A1 A2 R E F
M.P. Johnson, DBMS, Stern/NYU, Sp If several supporting relationships from E to F Keys of several different entities from F appear as foreign key of E Other many-one relationships Not necessarily supporting Requirements for weak entity sets From By Purchases A1 A2 A3 People Stores At-store
M.P. Johnson, DBMS, Stern/NYU, Sp Weak entity sets Example: Hierarchy – species & genus Idea: species name unique per genus only Species name Belongs-to Genus name
M.P. Johnson, DBMS, Stern/NYU, Sp Video store connecting entity sets e.g. was a weak entity set Key: date, MID,SID, CID Weak entity sets MID SID CID Rental StoreOf MovieOf BuyerOf date Product Store Customer
M.P. Johnson, DBMS, Stern/NYU, Sp Conceptual Design Using the ER Model Subject/design choices: Should a concept be modeled as an entity or an attribute? Should a concept be modeled as an entity or a relationship? Identifying relationships: Binary or multiway? Constraints in the ER Model: Important in determining the best design. A lot of data semantics can (and should) be captured. Normalization improves further – later
M.P. Johnson, DBMS, Stern/NYU, Sp Agenda Intro to relational model Converting ER diagrams to relations Functional dependencies Keys and superkeys in terms of FDs Finding keys for relations Rules of FDs
M.P. Johnson, DBMS, Stern/NYU, Sp Next topic: the Relational Data Model (3.1) Database Model (E/R, other) Relational Schema Physical storage Diagrams (E/R) Tables: column names: attributes rows: tuples Complex file organization and index structures.
M.P. Johnson, DBMS, Stern/NYU, Sp Relations as tables Name Price Category Manufacturer gizmo $19.99 gadgets GizmoWorks Power gizmo $29.99 gadgets GizmoWorks SingleTouch $ photography Canon MultiTouch $ household Hitachi tuples/rows/records/entities Attribute names Product table/relation