Presentation is loading. Please wait.

Presentation is loading. Please wait.

8/28/97Information Organization and Retrieval Controlled Vocabularies: Name Authority Control University of California, Berkeley School of Information.

Similar presentations


Presentation on theme: "8/28/97Information Organization and Retrieval Controlled Vocabularies: Name Authority Control University of California, Berkeley School of Information."— Presentation transcript:

1 8/28/97Information Organization and Retrieval Controlled Vocabularies: Name Authority Control University of California, Berkeley School of Information Management and Systems SIMS 202: Information Organization and Retrieval

2 8/28/97Information Organization and Retrieval Review Mapping to the relational model Database Design & Normalization ER Diagrams and Assignment

3 8/28/97Information Organization and Retrieval Normalization Boyce- Codd and Higher Functional dependencyof nonkey attributes on the primary key - Atomic values only Full Functional dependencyof nonkey attributes on the primary key No transitive dependency between nonkey attributes All determinants are candidate keys - Single multivalued dependency

4 8/28/97Information Organization and Retrieval Unnormalized Relations First step in normalization is to convert the data into a two-dimensional table In unnormalized relations data can repeat within a column

5 8/28/97Information Organization and Retrieval Unnormalized Relation

6 8/28/97Information Organization and Retrieval First Normal Form To move to First Normal Form a relation must contain only atomic values at each row and column. –No repeating groups –A column or set of columns is called a Candidate Key when its values can uniquely identify the row in the relation.

7 8/28/97Information Organization and Retrieval First Normal Form

8 8/28/97Information Organization and Retrieval Second Normal Form A relation is said to be in Second Normal Form when every nonkey attribute is fully functionally dependent on the primary key. –That is, every nonkey attribute needs the full primary key for unique identification

9 8/28/97Information Organization and Retrieval Second Normal Form

10 8/28/97Information Organization and Retrieval Second Normal Form

11 8/28/97Information Organization and Retrieval Third Normal Form A relation is said to be in Third Normal Form if there is no transitive functional dependency between nonkey attributes –When one nonkey attribute can be determined with one or more nonkey attributes there is said to be a transitive functional dependency. The side effect column in the Surgery table is determined by the drug administered –Side effect is transitively functionally dependent on drug so Surgery is not 3NF

12 8/28/97Information Organization and Retrieval Third Normal Form

13 8/28/97Information Organization and Retrieval Third Normal Form

14 8/28/97Information Organization and Retrieval Joins

15 8/28/97Information Organization and Retrieval More on Assignment and ER Just what is this Cookie database? What sort of ways might it be used? What are those ER symbols again?

16 8/28/97Information Organization and Retrieval Original Assignment Examine the Cookie database using Access and look at the ER Diagram for it posted on the assignments page. Consider the possibilities of Book publications –What are the problems with the database? –What new fields would you add to the database, and where? –Draw a new ER diagram showing your design.

17 8/28/97Information Organization and Retrieval Cookie ER diagram Has call BIBFILE pubid LIBFILE INDXFILE accno SUBFILE Has index libid CALLFILE Has copy publishes pubid PUBFILE Has subject subcodeaccnosubcode libidaccno Note: diagram contains only attributes used for linking

18 8/28/97Information Organization and Retrieval Cookie Database Cookie is a bibliographic database that contains information about a hypothetical union catalog of several libraries There are currently 5 main types of entities in the database (and one linking relation) –Books (bibfile) –Local Call numbers (callfile) –Libraries (libfile) –Publishers (pubfile) –Subject headings (subfile) –Links between subject and books (indxfile)

19 8/28/97Information Organization and Retrieval BIBFILE Books (BIBFILE) contains information about particular books. It includes one record for each book. The attributes are: –accno -- an “accession” or serial number –author -- The author’s name –title -- The title of the book –loc -- Location of publication (where published) –date -- Date of publication –price -- Price of the book –pagination -- Number of pages –ill -- What type of illustrations (maps, etc) if any –height -- Height of the book in centimeters

20 8/28/97Information Organization and Retrieval CALLFILE CALLFILE contains call numbers and holdings information linking particular books with particular libraries. Its attributes are: –accno -- the book accession number –libid -- the id of the holding library –callno -- the call number of the book in the particular library –copies -- the number of copies held by the particular library

21 8/28/97Information Organization and Retrieval LIBFILE LIBFILE contain information about the libraries participating in this union catalog. Its attributes include: –libid -- Library id number –library -- Name of the library –laddress -- Street address for the library –lcity -- City name –lstate -- State code (postal abbreviation) –lzip -- zip code –lphone -- Phone number –mop - suncl -- Library opening and closing times for each day of the week.

22 8/28/97Information Organization and Retrieval PUBFILE PUBFILE contain information about the publishers of books. Its attributes include –pubid -- The publisher’s id number –publisher -- Publisher name –paddress -- Publisher street address –pcity -- Publisher city –pstate -- Publisher state –pzip -- Publisher zip code –pphone -- Publisher phone number –ship -- standard shipping time in days

23 8/28/97Information Organization and Retrieval SUBFILE SUBFILE contains each unique subject heading that can be assigned to books. Its attributes are –subcode -- Subject identification number –subject -- the subject heading/description

24 8/28/97Information Organization and Retrieval INDXFILE INDXFILE provides a way to allow many- to-many mapping of subject headings to books. Its attributes consist entirely of links to other tables –subcode -- link to subject id –accno -- link to book accession number

25 8/28/97Information Organization and Retrieval Some examples of Cookie Searches Who wrote Microcosmographia Academica? How many pages long is Alfred Whitehead’s The Aims of Education and Other Essays? Which branches in Berkeley’s public library system are open on Sunday? What is the call number of Moffitt Library’s copy of Abraham Flexner’s book Universities: American, English, German? What books on the subject of higher education are among the holdings of Berkeley (both UC and City) libraries? Print a list of the Mechanics Library holdings, in descending order by height. What would it cost to replace every copy of each book that contains illustrations (including graphs, maps, portraits, etc.)? Which library closes earliest on Friday night?

26 8/28/97Information Organization and Retrieval ER Diagram Symbols Entity Attribute Primary key Relationship Ovals are used to indicate the attributes associated with an entity or relationship (That is, the pieces of information recorded in the database about the entity or relationship) An underlined name indicates that the attribute is a primary key (That is, it can uniquely identify the entity) Rectangles are used to indicate entities (That is, the representatives or records describing persons, things, or events in the database) Diamonds are used to indicate relationships between entities. (That is, some association between the data records of different entities)

27 8/28/97Information Organization and Retrieval Cookie ER diagram Has call BIBFILE pubid LIBFILE INDXFILE accno SUBFILE Has index libid CALLFILE Has copy publishes pubid PUBFILE Has subject subcodeaccnosubcode libidaccno Note: diagram contains only attributes used for linking

28 8/28/97Information Organization and Retrieval Assignment Goal The main intent is to have you start thinking about how databases are structured, and what types of information can or should be included when designing a database The main task is to look for MISSING elements in the current design, or badly designed elements given the particular data What attributes and/or new relations need to be added to the database?

29 8/28/97Information Organization and Retrieval And now for something completely different...

30 8/28/97Information Organization and Retrieval Today Controlled vocabularies Choice of names Form of names Name Authority files

31 8/28/97Information Organization and Retrieval Controlled Vocabularies Vocabulary control is the attempt to provide a standardized and consistent set of terms (such as subject headings, names, classifications, etc.) with the intent of aiding the searcher in finding information.

32 8/28/97Information Organization and Retrieval Controlled Vocabularies Names and name authorities (Today) Cognitive basis of categorization and subject classification (Thursday) Design of controlled vocabularies for subject access -- Thesaurus design (next week)

33 8/28/97Information Organization and Retrieval Names Cutter’s objectives of bibliographic description: –To enable a person to find a document of which the author is known. –To show what the library has by a given author. First serves access. Second serves collocation.

34 8/28/97Information Organization and Retrieval Problems with Names How many names should be associated with a document? Which of these should be the “main entry”? What form should each of the names take? What references should be made from other possible forms of names that haven’t been used?

35 8/28/97Information Organization and Retrieval The problem Proliferation of the forms of names –Different names for the same person –Different people with the same names Examples –from Books in Print (semi-controlled but not consistent) –ERIC author index (not controlled)

36 8/28/97Information Organization and Retrieval Rules for description AACR II and other sets of descriptive cataloging rules provide guidelines for: –Determining the number of name entries –Choosing a main entry –Deciding on the form of name to be used –Deciding when to make references

37 8/28/97Information Organization and Retrieval Authority control Authority control is concerned with creation and maintenance of a set of terms that have been chosen as the standard representatives (also know as established) based on some set of rules. If you have rules, why do you need to keep track of all of the headings?

38 8/28/97Information Organization and Retrieval Conditions of Authorship? Single person or single corporate entity Unknown or anonymous authors Shared responsibility Collections or editorially assembled works Works of mixed responsibility (e.g. translations) Related Works

39 8/28/97Information Organization and Retrieval Added Entries Personal names –Collaborators –Editors, compilers, writers –Translators (in some cases) –Illustrators (in some cases) –Other persons associated with the work (such as the honoree in a Festschrift). Corporate Names –Any prominently named corporate body that has involvement in the work beyond publication, distribution, etc.

40 8/28/97Information Organization and Retrieval Choice of Name AACR II says that the predominant form of the name used in a particular author’s writings should be chosen as the form of name. References should be made from the other forms of the name.

41 8/28/97Information Organization and Retrieval Form of the Name When names appear in multiple forms, one form needs to be chosen. Criteria for choice are –Fullness (e.g. Full names vs. initials only) –Language of the name. –Spelling (choose predominant form) Entry element: –John Smith or Smith, John? –Mao Zedong or Zedong, Mao? (Mao Tse Tung?)

42 8/28/97Information Organization and Retrieval Name Authority Files ID:NAFL8057230 ST:p EL:n STH:a MS:c UIP:a TD:19910821174242 KRC:a NMU:a CRC:c UPN:a SBU:a SBC:a DID:n DF:05-14-80 RFE:a CSC: SRU:b SRT:n SRN:n TSS: TGA:? ROM:? MOD: VST:d 08-21-91 Other Versions: earlier 040 DLC$cDLC$dDLC$dOCoLC 053 PR6005.R517 100 10 Creasey, John 400 10 Cooke, M. E. 400 10 Cooke, Margaret,$d1908-1973 400 10 Cooper, Henry St. John,$d1908-1973 400 00 Credo,$d1908-1973 400 10 Fecamps, Elise 400 10 Gill, Patrick,$d1908-1973 400 10 Hope, Brian,$d1908-1973 400 10 Hughes, Colin,$d1908-1973 400 10 Marsden, James 400 10 Matheson, Rodney 400 10 Ranger, Ken 400 20 St. John, Henry,$d1908-1973 400 10 Wilde, Jimmy 500 10 $wnnnc$aAshe, Gordon,$d1908-1973

43 8/28/97Information Organization and Retrieval Name Authority Files ID:NAFO9114111 ST:p EL:n STH:a MS:n UIP:a TD:19910817053048 KRC:a NMU:a CRC:c UPN:a SBU:a SBC:a DID:n DF:06-03-91 RFE:a CSC:c SRU:b SRT:n SRN:n TSS: TGA:? ROM:? MOD: VST:d 08-19-91 040 OCoLC$cOCoLC 100 10 Marric, J. J.,$d1908-1973 500 10 $wnnnc$aCreasey, John 663 Works by this author are entered under the name used in the item. For a listing of other names used by this author, search also under$bCrease y, John 670 OCLC 13441825: His Gideon's day, 1955$b(hdg.: Creasey, John; usage: J.J. Marric) 670 LC data base, 6/10/91$b(hdg.: Creasey, John; usage: J.J. Marric) 670 Pseuds. and nicknames dict., c1987$b(Creasey, John, 1908-1973; Britis h author; pseud.: Marric, J. J.)

44 8/28/97Information Organization and Retrieval Name authority files ID:NAFL8166762 ST:p EL:n STH:a MS:c UIP:a TD:19910604053124 KRC:a NMU:a CRC:c UPN:a SBU:a SBC:a DID:n DF:08-20-81 RFE:a CSC: SRU:b SRT:n SRN:n TSS: TGA:? ROM:? MOD: VST:d 06-06-91 Other Versions: earlier 040 DLC$cDLC$dDLC$dOCoLC 100 10 Butler, William Vivian,$d1927- 400 10 Butler, W. V.$q(William Vivian),$d1927- 400 10 Marric, J. J.,$d1927- 670 His The durable desperadoes, 1973. 670 His The young detective's handbook, c1981:$bt.p. (W.V. Butler) 670 His Gideon's way, 1986:$bCIP t.p. (William Vivian Butler writing as J.J. Marric)


Download ppt "8/28/97Information Organization and Retrieval Controlled Vocabularies: Name Authority Control University of California, Berkeley School of Information."

Similar presentations


Ads by Google