11/7/2000Information Organization and Retrieval Controlled Vocabularies: Name Authority Control University of California, Berkeley School of Information.

Slides:



Advertisements
Similar presentations
Subject Analysis: An Introduction Based on BASIC SUBJECT CATALOGING USING LCSH edited by Lori Robare.
Advertisements

Chapter 5: Introduction to Information Retrieval
Module 5a: Authority Control and Encoding Schemes IMT530: Organization of Information Resources Winter 2007 Michael Crandall.
SLIDE 1IS 257 – Fall 2007 Codes and Rules for Description: History 2 University of California, Berkeley School of Information IS 245: Organization.
Information Retrieval in Practice
SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of.
Bibliographic Records, Data Structures and Databases (Cont.)
Introduction to Library Research Gabriela Scherrer Reference Librarian for English Languages and Literatures, University Library of Bern.
Final Exam Review SIMS 202 Profs. Hearst & Larson UC Berkeley SIMS Fall 2000.
8/28/97Information Organization and Retrieval Metadata and Data Structures University of California, Berkeley School of Information Management and Systems.
Module 10b: Wrapup IMT530: Organization of Information Resources Winter, 2007 Michael Crandall.
Using Metadata in CONTENTdm Diana Brooking and Allen Maberry Metadata Implementation Group, Univ. of Washington Crossing Organizational Boundaries Oct.
Thesaurus Design and Development
SLIDE 1IS FALL 2004 Lecture 18: Metadata & Controlled Vocabulary Introduction Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday.
11/21/2000Information Organization and Retrieval Thesaurus Design and Development University of California, Berkeley School of Information Management and.
8/28/97Information Organization and Retrieval Controlled Subject Vocabularies and Thesauri University of California, Berkeley School of Information Management.
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
SLIDE 1IS 257 – Fall 2009 Controlled Vocabularies University of California, Berkeley School of Information IS 245: Organization of Information.
SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002
Psychology of Category Structure Facets vs. Hierarchies SIMS 202 Profs. Hearst & Larson UC Berkeley SIMS Fall 2000.
SLIDE 1IS 257 – Fall 2007 Subject Access to Collections: Introduction University of California, Berkeley School of Information IS 245: Organization.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
The Library Cataloging Tradition
SLIDE 1IS 245 – Spring 2009 Codes and Rules for Description: History University of California, Berkeley School of Information IS 245: Organization.
11/13/2001Information Organization and Retrieval Controlled Vocabularies: Name Authority Control University of California, Berkeley School of Information.
IMT530- Organization of Information Resources1 Feedback Like exercises –But want more instructions and feedback on them –Wondering about grading on these.
8/28/97Information Organization and Retrieval Controlled Vocabularies: Name Authority Control University of California, Berkeley School of Information.
11/20/2001Information Organization and Retrieval Final Review University of California, Berkeley School of Information Management and Systems SIMS 202:
SLIDE 1IS FALL 2003 Lecture 07: Controlled Vocabularies Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30.
SLIDE 1IS FALL 2002 Lecture 06: Controlled Vocabularies Introduction Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and.
SLIDE 1IS 257 – Fall 2007 Codes and Rules for Description: History University of California, Berkeley School of Information IS 245: Organization.
Overview of Search Engines
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
Improving Access to Audio- Visual Materials by Using Genre/Form Terms OLAC Conference 1-3 October 2004 Montreal, Quebec.
Internet Research Fourth Edition Unit C. Internet Research – Illustrated, Fourth Edition 2 Internet Research: Unit C Browsing Subject Guides.
Lecture Four: Steps 3 and 4 INST 250/4.  Does one look for facts, or opinions, or both when conducting a literature search?  What is the difference.
The Library Cataloging Tradition Marty Kurth CS 431 February 9, 2005 [slides stolen from Diane Hillmann]
SLIDE 1IS 257 – Fall 2007 Introduction to Description and AACR II University of California, Berkeley School of Information IS 245: Organization.
IL Step 2: Searching for Information Information Literacy 1.
10/21/98Organization of Information in Collections Subject Access to Collections: Introduction University of California, Berkeley School of Information.
Current Events and Issues Using Index Databases for Finding Answers.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Metadata and Documentation Iain Wallace Performing Arts Data Service.
INFO Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.
Concepts and phrases 2. checked out (on loan): ödünç verilmiş/kullanıcı üzerinde The circulation status of an item that has been charged to a borrower.
Indexes and Abstracts: Dissecting the Resource By M. Leedy.
Ray R. Larson : University of California, Berkeley Clustering and Classification Workshop 1998 Cheshire II and Automatic Categorization Ray R. Larson Associate.
1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Basic Encoded Archival Description METRO New York Library Council Workshop Presented by Lara Nicosia December 9, 2011 New York, NY.
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
IMT530- Organization of Information Resources1 Feedback Lectures –More practical examples –Like guest lecturers –Generally helpful in understanding concepts.
PubMed …featuring more than 20 million citations for biomedical literature from MEDLINE, life science journals, and online books.
LIS 204: Introduction to Library and Information Science Week Nine Kevin Rioux, PhD.
8/28/97Information Organization and Retrieval Introduction University of California, Berkeley School of Information Management and Systems SIMS 245: Organization.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
1 Shelflisting and Filing Rules and Subject Authority Control May 11, 2005.
Queensland University of Technology Faculty of Information Technology Michael Middleton 1 CRICOS No J Bibliographic description.
Attributes and Values Describing Entities. Metadata At the most basic level, metadata is just another term for description, or information about an entity.
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
Information organization Week 2 Lecture notes INF 380E: Perspectives on Information Spring 2015 Karen Wickett UT School of Information.
Information Retrieval in Practice
Slides Template for Module 3 Contextual details needed to make data meaningful to others CC BY-NC.
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Attributes and Values Describing Entities.
IL Step 2: Searching for Information
Introduction to Information Retrieval
The ultimate in data organization
Attributes and Values Describing Entities.
Presentation transcript:

11/7/2000Information Organization and Retrieval Controlled Vocabularies: Name Authority Control University of California, Berkeley School of Information Management and Systems SIMS 202: Information Organization and Retrieval

11/7/2000Information Organization and Retrieval Review Dublin Core Other Metadata Systems Cognitive basis of categorization and subject classification

11/7/2000Information Organization and Retrieval Dublin Core Elements Title Creator Subject Description Publisher Other Contributors Date Resource Type Format Resource Identifier Source Language Relation Coverage Rights Management

11/7/2000Information Organization and Retrieval Issues in Dublin Core Lack of guidance on what to put into each element How to structure or organize at the element level? How ensure consistency across descriptions for the same persons, places, things, etc.

11/7/2000Information Organization and Retrieval More Metadata Systems The following are a sample of metadata systems for a variety of special types of data/documents/objects.

11/7/2000Information Organization and Retrieval Type of Metadata systems and standards Naming and ID systems – URLs, ISBNs Bibliographic description – MARC, Dublin Core, TEI, etc. Music -- SMDL Images and objects – CIMI, VRA Core Categories Numeric Data – DDI, SDSM Geospatial Data – FGDC Collections – EAD

11/7/2000Information Organization and Retrieval Metadata Resources Check the Links section from the class home page Best site is the “Digital Library: Metadata Resources” page from IFLA at site is the “Digital Library: Metadata Resources” page from IFLA at

11/7/2000Information Organization and Retrieval Hierarchical vs. Faceted (Subject Heading vs. Descriptor) Category Systems

11/7/2000Information Organization and Retrieval Controlled Vocabulary (The following slides follow Bates 88) Start with the text of the document Attempt to “control” or regularize: –The concepts expressed within mutually exclusive exhaustive –The language used to express those concepts limit the normal linguistic variations regulate word order and structure of phrases reduce the number of synonyms or near-synonyms Also, provide cross-references between concepts and their expression.

11/7/2000Information Organization and Retrieval Classification Schemes Classify possible concepts. Goals: –Completely distinct conceptual categories (mutually exclusive) –Complete coverage of conceptual categories (exhaustive)

11/7/2000Information Organization and Retrieval Assigning Headings vs. Descriptors Subject headings –assign one (or a few) complex heading(s) to the document Descriptors –Mix and match How would we describe recipes using each technique?

11/7/2000Information Organization and Retrieval Subject Heading vs. Descriptor WILSONLINE –Athletes –Athletes--Heath&Hygiene –Athletes--Nutrition –Athletes--Physical Exams –… –Athletics –Athletics -- Administration –Athletics -- Equipment -- Catalogs –… –Sports -- Accidents and injuries –Sports -- Accidents and injuries -- prevention ERIC –Athletes –Athletic Coaches –Athletic Equipment –Athletic Fields –Athletics –… –Sports psychology –Sportsmanship

11/7/2000Information Organization and Retrieval Subject Headings vs. Descriptors Describe the contents of an entire document Designed to be looked up in an alphabetical index –Look up document under its heading Few (1-5) headings per document Describe one concept within a document Designed to be used in Boolean searching –Combine to describe the desired document Many (5-25) descriptors per document

11/7/2000Information Organization and Retrieval Hierarchical Classification –Each category is successively broken down into smaller and smaller subdivisions –No item occurs in more than one subdivision –Each level divided out by a “character of division”. Also known as a feature. Example: distinguish Literature based on: –Language –Genre –Time Period

11/7/2000Information Organization and Retrieval Hierarchical Classification Literature SpanishFrenchEnglish DramaPoetryProse 18th17th16th DramaPoetryProse 19th18th17th16th19th...

11/7/2000Information Organization and Retrieval Labeled Categories for Hierarchical Classification LITERATURE –100 English Literature 110 English Prose –English Prose 16th Century –English Prose 17th Century –English Prose 18th Century – English Poetry –121 English Poetry 16th Century –122 English Poetry 17th Century – English Drama –130 English Drama 16th Century –… –200 French Literature

11/7/2000Information Organization and Retrieval Faceted Classification Create a separate, free-standing list for each characteristic of division (feature). Combine features to create a classification.

11/7/2000Information Organization and Retrieval Faceted Classification along with Labeled Categories A Language –a English –b French –c Spanish B Genre –a Prose –b Poetry –c Drama C Period –a 16th Century –b 17th Century –c 18th Century –d 19th Century Aa English Literature AaBa English Prose AaBaCa English Prose 16th Century AbBbCd French Poetry 19th Century BbCd Drama 19th Century

11/7/2000Information Organization and Retrieval Important Question: How to use both types of classification structures? How to look through them? How to use them in search?

11/7/2000Information Organization and Retrieval Today More on Controlled vocabularies Choice of names Form of names Name Authority files Types of Controlled Vocabularies

11/7/2000Information Organization and Retrieval Controlled Vocabularies Vocabulary control is the attempt to provide a standardized and consistent set of terms (such as subject headings, names, classifications, etc.) with the intent of aiding the searcher in finding information.

11/7/2000Information Organization and Retrieval Controlled Vocabularies Names and name authorities & Other Types of Controlled Vocabulary (Today) Design of controlled vocabularies for subject access -- Thesaurus design (Thursday)

11/7/2000Information Organization and Retrieval Names Cutter’s objectives of bibliographic description: –To enable a person to find a document of which the author is known –To show what the library has by a given author First serves access Second serves collocation

11/7/2000Information Organization and Retrieval Problems with Names How many names should be associated with a document? Which of these should be the “main entry”? What form should each of the names take? What references should be made from other possible forms of names that haven’t been used?

11/7/2000Information Organization and Retrieval The problem Proliferation of the forms of names –Different names for the same person –Different people with the same names Examples –from Books in Print (semi-controlled but not consistent) –ERIC author index (not controlled)

11/7/2000Information Organization and Retrieval Rules for description AACR II and other sets of descriptive cataloging rules provide guidelines for: –Determining the number of name entries –Choosing a main entry –Deciding on the form of name to be used –Deciding when to make references

11/7/2000Information Organization and Retrieval Authority control Authority control is concerned with creation and maintenance of a set of terms that have been chosen as the standard representatives (also know as established) based on some set of rules. If you have rules, why do you need to keep track of all of the headings? Can’t you just infer the headings from the rules?

11/7/2000Information Organization and Retrieval Conditions of Authorship? Single person or single corporate entity Unknown or anonymous authors –Fictitiously ascribed works Shared responsibility Collections or editorially assembled works Works of mixed responsibility (e.g. translations) Related Works

11/7/2000Information Organization and Retrieval Added Entries Personal names –Collaborators –Editors, compilers, writers –Translators (in some cases) –Illustrators (in some cases) –Other persons associated with the work (such as the honoree in a Festschrift). Corporate Names –Any prominently named corporate body that has involvement in the work beyond publication, distribution, etc.

11/7/2000Information Organization and Retrieval Choice of Name AACR II says that the predominant form of the name used in a particular author’s writings should be chosen as the form of name. References should be made from the other forms of the name.

11/7/2000Information Organization and Retrieval Form of the Name When names appear in multiple forms, one form needs to be chosen. Criteria for choice are –Fullness (e.g. Full names vs. initials only) –Language of the name. –Spelling (choose predominant form) Entry element: –John Smith or Smith, John? –Mao Zedong or Zedong, Mao? (Mao Tse Tung?)

11/7/2000Information Organization and Retrieval Name Authority Files ID:NAFL ST:p EL:n STH:a MS:c UIP:a TD: KRC:a NMU:a CRC:c UPN:a SBU:a SBC:a DID:n DF: RFE:a CSC: SRU:b SRT:n SRN:n TSS: TGA:? ROM:? MOD: VST:d Other Versions: earlier 040 DLC$cDLC$dDLC$dOCoLC 053 PR6005.R Creasey, John Cooke, M. E Cooke, Margaret,$d Cooper, Henry St. John,$d Credo,$d Fecamps, Elise Gill, Patrick,$d Hope, Brian,$d Hughes, Colin,$d Marsden, James Matheson, Rodney Ranger, Ken St. John, Henry,$d Wilde, Jimmy $wnnnc$aAshe, Gordon,$d Different names for the same person

11/7/2000Information Organization and Retrieval Name Authority Files ID:NAFO ST:p EL:n STH:a MS:n UIP:a TD: KRC:a NMU:a CRC:c UPN:a SBU:a SBC:a DID:n DF: RFE:a CSC:c SRU:b SRT:n SRN:n TSS: TGA:? ROM:? MOD: VST:d OCoLC$cOCoLC Marric, J. J.,$d $wnnnc$aCreasey, John 663 Works by this author are entered under the name used in the item. For a listing of other names used by this author, search also under$bCrease y, John 670 OCLC : His Gideon's day, 1955$b(hdg.: Creasey, John; usage: J.J. Marric) 670 LC data base, 6/10/91$b(hdg.: Creasey, John; usage: J.J. Marric) 670 Pseuds. and nicknames dict., c1987$b(Creasey, John, ; Britis h author; pseud.: Marric, J. J.)

11/7/2000Information Organization and Retrieval Name authority files ID:NAFL ST:p EL:n STH:a MS:c UIP:a TD: KRC:a NMU:a CRC:c UPN:a SBU:a SBC:a DID:n DF: RFE:a CSC: SRU:b SRT:n SRN:n TSS: TGA:? ROM:? MOD: VST:d Other Versions: earlier 040 DLC$cDLC$dDLC$dOCoLC Butler, William Vivian,$d Butler, W. V.$q(William Vivian),$d Marric, J. J.,$d His The durable desperadoes, His The young detective's handbook, c1981:$bt.p. (W.V. Butler) 670 His Gideon's way, 1986:$bCIP t.p. (William Vivian Butler writing as J.J. Marric) Different people writing with the same name

11/7/2000Information Organization and Retrieval Other Types of Controlled Vocabularies Gazetteers (Geographic Names) Code lists (e.g. LC Language Codes) Subject Heading Lists Classification Schemes Thesaurii

11/7/2000Information Organization and Retrieval Structure of an IR System Search Line Interest profiles & Queries Documents & data Rules of the game = Rules for subject indexing + Thesaurus (which consists of Lead-In Vocabulary and Indexing Language Storage Line Potentially Relevant Documents Comparison/ Matching Store1: Profiles/ Search requests Store2: Document representations Indexing (Descriptive and Subject) Formulating query in terms of descriptors Storage of profiles Storage of Documents Information Storage and Retrieval System Adapted from Soergel, p. 19

11/7/2000Information Organization and Retrieval Uses of Controlled Vocabularies Library Subject Headings, Classification and Authority Files. Commercial Journal Indexing Services and databases Yahoo, and other Web classification schemes Online and Manual Systems within organizations –SunSolve –MacArthur

11/7/2000Information Organization and Retrieval Types of Indexing Languages Uncontrolled Keyword Indexing Indexing Languages –Controlled, but not structured Thesauri –Controlled and Structured Classification Systems –Controlled, Structured, and Coded Faceted Classification Systems

11/7/2000Information Organization and Retrieval Indexing Languages An index is a systematic guide designed to indicate topics or features of documents in order to facilitate retrieval of documents or parts of documents. An Indexing language is the set of terms used in an index to represent topics or features of documents, and the rules for combining or using those terms.

11/7/2000Information Organization and Retrieval Indexing Languages Library of Congress Subject Headings Yellow Pages Topics Wilson Indexes (“Reader’s Guide”)

11/7/2000Information Organization and Retrieval Thesauri A Thesaurus is a collection of selected vocabulary (preferred terms or descriptors) with links among Synonymous, Equivalent, Broader, Narrower and other Related Terms

11/7/2000Information Organization and Retrieval Thesauri (cont.) National and International Standards for Thesauri –ANSI/NISO z American National Standard Guidelines for the Construction, Format and Management of Monolingual Thesauri –ANSI/NISO Draft Standard Z x -- American National Standard Guidelines for Indexes in Information Retrieval –ISO Documentation -- Guidelines for the establishment and development of monolingual thesauri –ISO Documentation -- Guidelines for the establishment and development of multilingual thesauri

11/7/2000Information Organization and Retrieval Thesauri (cont.) Examples: –The ERIC Thesaurus of Descriptors –The Art and Architecture Thesaurus –The Medical Subject Headings (MESH) of the National Library of Medicine

11/7/2000Information Organization and Retrieval Classification Systems A classification system is an indexing language often based on a broad ordering of topical areas. Thesauri and classification systems both use this broad ordering and maintain a structure of broader, narrower, and related topics. Classification schemes commonly use a coded notation for representing a topic and it’s place in relation to other terms.

11/7/2000Information Organization and Retrieval Classification Systems (cont.) Examples: –The Library of Congress Classification System –The Dewey Decimal Classification System –The ACM Computing Reviews Categories –The American Mathematical Society Classification System

11/7/2000Information Organization and Retrieval Automatic Indexing and Classification Automatic indexing is typically the simple deriving of keywords from a document and providing access to all of those words. More complex Automatic Indexing Systems attempt to select controlled vocabulary terms based on terms in the document. Automatic classification attempts to automatically group similar documents using either: –A fully automatic clustering method. –An established classification scheme and set of documents already indexed by that scheme.

11/7/2000Information Organization and Retrieval Clustering Agglomerative methods: Polythetic, Exclusive or Overlapping, Unordered clusters are order-dependent. Doc 1. Select initial centers (I.e. seed the space) 2. Assign docs to highest matching centers and compute centroids 3. Reassign all documents to centroid(s) Rocchio’s method

11/7/2000Information Organization and Retrieval Automatic Class Assignment Doc Search Engine 1. Create pseudo-documents representing intellectually derived classes. 2. Search using document contents 3. Obtain ranked list 4. Assign document to N categories ranked over threshold. OR assign to top-ranked category Automatic Class Assignment: Polythetic, Exclusive or Overlapping, usually ordered clusters are order-independent, usually based on an intellectually derived scheme