10/2/2001SIMS 257 – Database Management Database Design: From Conceptual Design to Physical Implementation - Relational Model University of California, Berkeley School of Information Management and Systems SIMS 257 – Database Management
10/2/2001SIMS 257 – Database Management Review Database Design Process Normalization
10/2/2001SIMS 257 – Database Management Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model
10/2/2001SIMS 257 – Database Management Normalization Normalization theory is based on the observation that relations with certain properties are more effective in inserting, updating and deleting data than other sets of relations containing the same data Normalization is a multi-step process beginning with an “unnormalized” relation –Hospital example from Atre, S. Data Base: Structured Techniques for Design, Performance, and Management.
10/2/2001SIMS 257 – Database Management Normal Forms First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Boyce-Codd Normal Form (BCNF) Fourth Normal Form (4NF) Fifth Normal Form (5NF)
10/2/2001SIMS 257 – Database Management Normalization Boyce- Codd and Higher Functional dependencyof nonkey attributes on the primary key - Atomic values only Full Functional dependencyof nonkey attributes on the primary key No transitive dependency between nonkey attributes All determinants are candidate keys - Single multivalued dependency
10/2/2001SIMS 257 – Database Management Normalization Normalization is performed to reduce or eliminate Insertion, Deletion or Update anomalies. However, a completely normalized database may not be the most efficient or effective implementation. “Denormalization” is sometimes used to improve efficiency.
10/2/2001SIMS 257 – Database Management Denormalization Usually driven by the need to improve query speed Query speed is improved at the expense of more complex or problematic DML (Data manipulation language) for updates, deletions and insertions.
10/2/2001SIMS 257 – Database Management Downward Denormalization Customer ID Address Name Telephone Customer ID Address Name Telephone Order Order No Date Taken Date Dispatched Date Invoiced Cust ID Order Order No Date Taken Date Dispatched Date Invoiced Cust ID Cust Name Before:After:
10/2/2001SIMS 257 – Database Management Upward Denormalization Order Order No Date Taken Date Dispatched Date Invoiced Cust ID Cust Name Order Order No Date Taken Date Dispatched Date Invoiced Cust ID Cust Name Order Price Order Item Order No Item No Item Price Num Ordered Order Item Order No Item No Item Price Num Ordered
10/2/2001SIMS 257 – Database Management Today: New Design Today we will build the COOKIE database from needs (rough) through the conceptual model, logical model and finally physical implementation in Access.
10/2/2001SIMS 257 – Database Management ER Diagram Symbols Entity Attribute Primary key Relationship Ovals are used to indicate the attributes associated with an entity or relationship (That is, the pieces of information recorded in the database about the entity or relationship) An underlined name indicates that the attribute is a primary key (That is, it can uniquely identify the entity) Rectangles are used to indicate entities (That is, the representatives or records describing persons, things, or events in the database) Diamonds are used to indicate relationships between entities. (That is, some association between the data records of different entities)
10/2/2001SIMS 257 – Database Management Cookie Requirements Cookie is a bibliographic database that contains information about a hypothetical union catalog of several libraries. Need to record which books are held by which libraries Need to search on bibliographic information –Author, title, subject, call number for a given library, etc. Need to know who publishes the books for ordering, etc.
10/2/2001SIMS 257 – Database Management Cookie Database There are currently 6 main types of entities in the database –Authors (Authors) –Books (bibfile) –Local Call numbers (callfile) –Libraries (libfile) –Publishers (pubfile) –Subject headings (subfile) –Additional entities Links between subject and books (indxfile) Links between authors and books (AU_BIB)
10/2/2001SIMS 257 – Database Management AUTHORS author -- The author’s name (We do not distinguish between Personal and Corporate authors) Au_id – a unique id for the author
10/2/2001SIMS 257 – Database Management AUTHORS Authors Author AU ID
10/2/2001SIMS 257 – Database Management BIBFILE Books (BIBFILE) contains information about particular books. It includes one record for each book. The attributes are: –accno -- an “accession” or serial number –title -- The title of the book –loc -- Location of publication (where published) –date -- Date of publication –price -- Price of the book –pagination -- Number of pages –ill -- What type of illustrations (maps, etc) if any –height -- Height of the book in centimeters
10/2/2001SIMS 257 – Database Management Books/BIBFILE Books accno Title Loc Date Price Pagination Height Ill
10/2/2001SIMS 257 – Database Management CALLFILE CALLFILE contains call numbers and holdings information linking particular books with particular libraries. Its attributes are: –accno -- the book accession number –libid -- the id of the holding library –callno -- the call number of the book in the particular library –copies -- the number of copies held by the particular library
10/2/2001SIMS 257 – Database Management LocalInfo/CALLFILE CALLFILE Copies accno libid Callno
10/2/2001SIMS 257 – Database Management LIBFILE LIBFILE contain information about the libraries participating in this union catalog. Its attributes include: –libid -- Library id number –library -- Name of the library –laddress -- Street address for the library –lcity -- City name –lstate -- State code (postal abbreviation) –lzip -- zip code –lphone -- Phone number –mop - suncl -- Library opening and closing times for each day of the week.
10/2/2001SIMS 257 – Database Management Libraries/LIBFILE LIBFILE Libid SatCl SatOp FCl FOp ThCl ThOpWClWOpTuClTuOp Mcl MOp Suncl SunOp lphone lzip lstatelcity laddress Library
10/2/2001SIMS 257 – Database Management PUBFILE PUBFILE contain information about the publishers of books. Its attributes include –pubid -- The publisher’s id number –publisher -- Publisher name –paddress -- Publisher street address –pcity -- Publisher city –pstate -- Publisher state –pzip -- Publisher zip code –pphone -- Publisher phone number –ship -- standard shipping time in days
10/2/2001SIMS 257 – Database Management Publisher/PUBFILE PUBFILE pubid Ship Publisher pphone pzip pstate pcity paddress
10/2/2001SIMS 257 – Database Management SUBFILE SUBFILE contains each unique subject heading that can be assigned to books. Its attributes are –subcode -- Subject identification number –subject -- the subject heading/description
10/2/2001SIMS 257 – Database Management Subjects/SUBFILE SUBFILE Subject subid
10/2/2001SIMS 257 – Database Management INDXFILE INDXFILE provides a way to allow many- to-many mapping of subject headings to books. Its attributes consist entirely of links to other tables –subcode -- link to subject id –accno -- link to book accession number
10/2/2001SIMS 257 – Database Management Linking Subjects and Books INDXFILE accno subid
10/2/2001SIMS 257 – Database Management AU_BIB AU_BIB provides a way to allow many to many mapping between books and authors. It also consists only of links to other tables –AU_ID – link to the AUTHORS table –ACCNO – link to the BIBFILE table
10/2/2001SIMS 257 – Database Management Linking Authors and Books AU_BIB accno AU ID
10/2/2001SIMS 257 – Database Management Some examples of Cookie Searches Who wrote Microcosmographia Academica? How many pages long is Alfred Whitehead’s The Aims of Education and Other Essays? Which branches in Berkeley’s public library system are open on Sunday? What is the call number of Moffitt Library’s copy of Abraham Flexner’s book Universities: American, English, German? What books on the subject of higher education are among the holdings of Berkeley (both UC and City) libraries? Print a list of the Mechanics Library holdings, in descending order by height. What would it cost to replace every copy of each book that contains illustrations (including graphs, maps, portraits, etc.)? Which library closes earliest on Friday night?
10/2/2001SIMS 257 – Database Management Cookie ER Diagram AU_ID BIBFILE pubid LIBFILE INDXFILE accno SUBFILE libid CALLFILE pubid PUBFILE subcodeaccnosubcode libid accno AUTHORS AU_BIB accno AU ID Author Note: diagram contains only attributes used for linking
10/2/2001SIMS 257 – Database Management What Problems? What sorts of problems and missing features arise given the previous ER diagram?
10/2/2001SIMS 257 – Database Management Problems Identified Subtitles, parallel titles? Edition information Series information lending status material type designation Genre, class information Better codes (ISBN?) Missing information (ISBN) Authority control for authors Missing/incomplete data Data entry problems Ordering information Illustrations Subfield separation (such as last_name, first_name) Separate personal and corporate authors
10/2/2001SIMS 257 – Database Management Problems (Cont.) Location field inconsistent No notes field No language field Zipcode doesn’t support plus-4 No publisher shipping addresses No (indexable) keyword search capability No support for multivolume works No support for URLs –to online version –to libraries –to publishers
10/2/2001SIMS 257 – Database Management Original Cookie ER Diagram AU_ID BIBFILE pubid LIBFILE INDXFILE accno SUBFILE libid CALLFILE pubid PUBFILE subcodeaccnosubcode libid accno AUTHORS AU_BIB accno AU ID Author Note: diagram contains only attributes used for linking
10/2/2001SIMS 257 – Database Management Cookie2: Separate Name Authorities nameid BIBFILE pubid LIBFILE INDXFILE accno SUBFILE libid CALLFILE pubid PUBFILE subcodeaccnosubcode libid accno AUTHFILE AUTHBIB authtype accno nameid name
10/2/2001SIMS 257 – Database Management Cookie3: Keywords nameid BIBFILE pubid LIBFILE INDXFILE accno SUBFILE libid CALLFILE pubid PUBFILE subcodeaccnosubcode libid accno AUTHFILE AUTHBIB authtype accno nameid name KEYMAP TERMS accnotermid
10/2/2001SIMS 257 – Database Management Cookie 4: Series nameid BIBFILE pubid LIBFILE INDXFILE accno SUBFILE libid CALLFILE pubid PUBFILE subcodeaccnosubcode libid accno AUTHFILE AUTHBIB authtype accno nameid name KEYMAP TERMS accnotermid SERIES seriesid ser_title
10/2/2001SIMS 257 – Database Management Cookie 5: Circulation nameid BIBFILE pubid LIBFILE accno libid CALLFILE pubid PUBFILE libid accno INDXFILE SUBFILE subcodeaccno subcode AUTHFILE AUTHBIB authtype accno nameid name KEYMAP TERMS accnotermid SERIES seriesid ser_title CIRC circidcopynumpatronid PATRON circid
10/2/2001SIMS 257 – Database Management Mapping to Relations Take each entity –BIBFILE –LIBFILE –CALLFILE –SUBFILE –PUBFILE –INDXFILE And make it a table...
10/2/2001SIMS 257 – Database Management Implementing the Physical Database... For each of the entities, we will build a table… Start up access… Use “New” in Tables… Loading data Entering data Data entry forms