2002.09.12 - SLIDE 1IS 202 - FALL 2002 Lecture 06: Controlled Vocabularies Introduction Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and.

Slides:



Advertisements
Similar presentations
Ali Alshowaish. dc.coverage element articulates limitations in the scope of the resource, typically along the following lines: geographical, temporal,
Advertisements

Metadata and Search at Boeing Julie Martin Library & Learning Center Services
Module 5a: Authority Control and Encoding Schemes IMT530: Organization of Information Resources Winter 2007 Michael Crandall.
Geographic Information Systems and Science SECOND EDITION Paul A. Longley, Michael F. Goodchild, David J. Maguire, David W. Rhind © 2005 John Wiley and.
SLIDE 1IS 257 – Fall 2007 Codes and Rules for Description: History 2 University of California, Berkeley School of Information IS 245: Organization.
SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of.
Final Exam Review SIMS 202 Profs. Hearst & Larson UC Berkeley SIMS Fall 2000.
8/28/97Information Organization and Retrieval Metadata and Data Structures University of California, Berkeley School of Information Management and Systems.
Module 10b: Wrapup IMT530: Organization of Information Resources Winter, 2007 Michael Crandall.
Using Metadata in CONTENTdm Diana Brooking and Allen Maberry Metadata Implementation Group, Univ. of Washington Crossing Organizational Boundaries Oct.
10/23/2001Information Organization and Retrieval Information Structures and Metadata University of California, Berkeley School of Information Management.
10/26/2000Information Organization and Retrieval Metadata and Description University of California, Berkeley School of Information Management and Systems.
Thesaurus Design and Development
SLIDE 1IS 202 – FALL 2003 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2003
SLIDE 1IS FALL 2004 Lecture 18: Metadata & Controlled Vocabulary Introduction Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday.
11/21/2000Information Organization and Retrieval Thesaurus Design and Development University of California, Berkeley School of Information Management and.
11/7/2000Information Organization and Retrieval Controlled Vocabularies: Name Authority Control University of California, Berkeley School of Information.
8/28/97Information Organization and Retrieval Controlled Subject Vocabularies and Thesauri University of California, Berkeley School of Information Management.
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
SLIDE 1IS 257 – Fall 2009 Controlled Vocabularies University of California, Berkeley School of Information IS 245: Organization of Information.
SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002
10/24/2000Information Organization and Retrieval Information Structures and Metadata University of California, Berkeley School of Information Management.
SLIDE 1IS 257 – Fall 2007 Subject Access to Collections: Introduction University of California, Berkeley School of Information IS 245: Organization.
The Library Cataloging Tradition
SLIDE 1IS 245 – Spring 2009 Codes and Rules for Description: History University of California, Berkeley School of Information IS 245: Organization.
11/13/2001Information Organization and Retrieval Controlled Vocabularies: Name Authority Control University of California, Berkeley School of Information.
IMT530- Organization of Information Resources1 Feedback Like exercises –But want more instructions and feedback on them –Wondering about grading on these.
A Registry for controlled vocabularies at the Library of Congress
8/28/97Information Organization and Retrieval Controlled Vocabularies: Name Authority Control University of California, Berkeley School of Information.
11/20/2001Information Organization and Retrieval Final Review University of California, Berkeley School of Information Management and Systems SIMS 202:
SLIDE 1IS FALL 2003 Lecture 07: Controlled Vocabularies Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30.
SLIDE 1IS 257 – Fall 2007 Codes and Rules for Description: History University of California, Berkeley School of Information IS 245: Organization.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
By Carrie Moran. To examine the Metadata Object Description Schema (MODS) metadata scheme to determine its utility based on structure, interoperability.
GFIS-Africa Editorial tutorial – prepared by Anne Handley February 2003 (modified by Eero Mikkola July 2004)Anne Handley Aims To teach the skills needed.
1 Open-source platform for accessible content management Museo & Web CMS.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
Introduction: Databases and Database Users
The Library Cataloging Tradition Marty Kurth CS 431 February 9, 2005 [slides stolen from Diane Hillmann]
Metadata: Essential Standards for Management of Digital Libraries ALI Digital Library Workshop Linda Cantara, Metadata Librarian Indiana University, Bloomington.
SLIDE 1IS 257 – Fall 2007 Introduction to Description and AACR II University of California, Berkeley School of Information IS 245: Organization.
1 CS 430: Information Discovery Lecture 7 Descriptive Metadata 3 Dublin Core Automatic Generation of Catalog Records.
10/21/98Organization of Information in Collections Subject Access to Collections: Introduction University of California, Berkeley School of Information.
Current Events and Issues Using Index Databases for Finding Answers.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
LIS654 lecture 5 DC metadata and omeka tables Thomas Krichel
Modularization and Interoperability: Dublin Core and the Warwick Framework Sandra D. Payette Digital Library Research Group Cornell University November.
Metadata and Documentation Iain Wallace Performing Arts Data Service.
APPLYING FRBR TO LIBRARY CATALOGUES A REVIEW OF EXISTING FRBRIZATION PROJECTS Martha M. Yee September 9, 2006 draft.
Indexes and Abstracts: Dissecting the Resource By M. Leedy.
1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
IMT530- Organization of Information Resources1 Feedback Lectures –More practical examples –Like guest lecturers –Generally helpful in understanding concepts.
LIS 204: Introduction to Library and Information Science Week Nine Kevin Rioux, PhD.
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
Subject Description LIS 571 The Organization and Control of Recorded Information.
8/28/97Information Organization and Retrieval Introduction University of California, Berkeley School of Information Management and Systems SIMS 245: Organization.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
1 Shelflisting and Filing Rules and Subject Authority Control May 11, 2005.
Queensland University of Technology Faculty of Information Technology Michael Middleton 1 CRICOS No J Bibliographic description.
Attributes and Values Describing Entities. Metadata At the most basic level, metadata is just another term for description, or information about an entity.
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
Information organization Week 2 Lecture notes INF 380E: Perspectives on Information Spring 2015 Karen Wickett UT School of Information.
Information organization Week 2 Lecture notes INF 380E: Perspectives on Information Spring 2015 Karen Wickett UT School of Information.
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Attributes and Values Describing Entities.
Some Options for Non-MARC Descriptive Metadata
The ultimate in data organization
Attributes and Values Describing Entities.
Presentation transcript:

SLIDE 1IS FALL 2002 Lecture 06: Controlled Vocabularies Introduction Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 am Fall 2002 SIMS 202: Information Organization and Retrieval Some slides in this lecture were developed by Prof. Marti Hearst

SLIDE 2IS FALL 2002 Lecture Contents Review –Dublin Core –Other Metadata Systems Controlled Vocabularies Name Authority Files –Choice of Names –Form of Names Other Types of Controlled Vocabularies Faceted vs. Hierarchic Organization of Vocabularies

SLIDE 3IS FALL 2002 Lecture Contents Review –Metadata Systems –Dublin Core Controlled Vocabularies Name Authority Files –Choice of Names –Form of Names Other Types of Controlled Vocabularies Faceted vs. Hierarchic Organization of Vocabularies

SLIDE 4IS FALL 2002 Metadata Systems and Standards Naming and ID systems – URLS, ISBNS Bibliographic description – MARC, Dublin Core, TEI, etc. Music – SMDL Images and objects – CIMI, VRA core categories Numeric data – DDI, SDSM Geospatial data – FGDC Collections – EAD

SLIDE 5IS FALL 2002 Dublin Core Simple metadata for describing internet resources For “Document-Like Objects” 15 Elements (in base DC)

SLIDE 6IS FALL 2002 Dublin Core Elements Title Creator Subject Description Publisher Other Contributors Date Resource Type Format Resource Identifier Source Language Relation Coverage Rights Management

SLIDE 7IS FALL 2002 Title Label: TITLE The name given to the resource by the CREATOR or PUBLISHER

SLIDE 8IS FALL 2002 Author or Creator Label: CREATOR The person(s) or organization(s) primarily responsible for the intellectual content of the resource. For example, authors in the case of written documents, artists, photographers, or illustrators in the case of visual resources.

SLIDE 9IS FALL 2002 Subject and Keywords Label: SUBJECT The topic of the resource, or keywords or phrases that describe the subject or content of the resource. The intent of the specification of this element is to promote the use of controlled vocabularies and keywords. This element might well include scheme-qualified classification data (for example, Library of Congress Classification Numbers or Dewey Decimal numbers) or scheme-qualified controlled vocabularies (such as Medical Subject Headings or Art and Architecture Thesaurus descriptors) as well.

SLIDE 10IS FALL 2002 Description Label: DESCRIPTION A textual description of the content of the resource, including abstracts in the case of document-like objects or content descriptions in the case of visual resources. Future metadata collections might well include computational content description (spectral analysis of a visual resource, for example) that may not be embeddable in current network systems. In such a case this field might contain a link to such a description rather than the description itself.

SLIDE 11IS FALL 2002 Publisher Label: PUBLISHER The entity responsible for making the resource available in its present form, such as a publisher, a university department, or a corporate entity. The intent of specifying this field is to identify the entity that provides access to the resource.

SLIDE 12IS FALL 2002 Other Contributors Label: CONTRIBUTORS Person(s) or organization(s) in addition to those specified in the CREATOR element who have made significant intellectual contributions to the resource but whose contribution is secondary to the individuals or entities specified in the CREATOR element (for example, editors, transcribers, illustrators, and convenors).

SLIDE 13IS FALL 2002 Date Label: DATE The date the resource was made available in its present form. The recommended best practice is an 8 digit number in the form YYYYMMDD as defined by ANSI X In this scheme, the date element for the day this is written would be , or December 3, Many other schema are possible, but if used, they should be identified in an unambiguous manner.

SLIDE 14IS FALL 2002 Resource Type Label: RESOURCE TYPE The category of the resource, such as home page, novel, poem, working paper, preprint, technical report, essay, dictionary. It is expected that RESOURCE TYPE will be chosen from an enumerated list of types. One preliminary set of such types can be found at the following URL (now out of date):

SLIDE 15IS FALL 2002 Format Label: FORMAT The data representation of the resource, such as text/html, ASCII, Postscript file, executable application, or JPEG image. The intent of specifying this element is to provide information necessary to allow people or machines to make decisions about the usability of the encoded data (what hardware and software might be required to display or execute it, for example). As with RESOURCE TYPE, FORMAT will be assigned from enumerated lists such as registered Internet Media Types (MIME types). In principal, formats can include physical media such as books, serials, or other non- electronic media.

SLIDE 16IS FALL 2002 Resource Identifier Label: IDENTIFIER String or number used to uniquely identify the resource. Examples for networked resources include URLs and URNs (when implemented). Other globally-unique identifiers,such as International Standard Book Numbers (ISBN) or other formal names would also be candidates for this element.

SLIDE 17IS FALL 2002 Source Label: SOURCE The work, either print or electronic, from which this resource is derived, if applicable. For example, an html encoding of a Shakespearean sonnet might identify the paper version of the sonnet from which the electronic version was transcribed.

SLIDE 18IS FALL 2002 Language Label: LANGUAGE Language(s) of the intellectual content of the resource. Where practical, the content of this field should coincide with the Z39.53 three character codes for written languages. See:

SLIDE 19IS FALL 2002 Relation Label: RELATION Relationship to other resources. The intent of specifying this element is to provide a means to express relationships among resources that have formal relationships to others, but exist as discrete resources themselves. For example, images in a document, chapters in a book, or items in a collection. A formal specification of RELATION is currently under development. Users and developers should understand that use of this element should be currently considered experimental.

SLIDE 20IS FALL 2002 Coverage Label: COVERAGE The spatial locations and temporal duration characteristic of the resource. Formal specification of COVERAGE is currently under development. Users and developers should understand that use of this element should be currently considered experimental.

SLIDE 21IS FALL 2002 Rights Management Label: RIGHTS The content of this element is intended to be a link (a URL or other suitable URI as appropriate) to a copyright notice, a rights-management statement, or perhaps a server that would provide such information in a dynamic way. The intent of specifying this field is to allow providers a means to associate terms and conditions or copyright statements with a resource or collection of resources. No assumptions should be made by users if such a field is empty or not present.

SLIDE 22IS FALL 2002 Issues in Dublin Core Lack of guidance on what to put into each element How to structure or organize at the element level? How to ensure consistency across descriptions for the same persons, places, things, etc.

SLIDE 23IS FALL 2002 Metadata Structures and languages for the description of information resources and their elements (components or features) “Metadata is information on the organization of the data, the various data domains, and the relationship between them” (Baeza-Yates p. 142)

SLIDE 24IS FALL 2002 Metadata Often two main types of metadata are distinguished: –Descriptive metadata Describes the information/data object and its properties May use a variety of descriptive formats and rules –Topical metadata Describes the topic or “aboutness” of an information/data object May include a variety of vocabularies for describing, subjects, topics, categories, etc.

SLIDE 25IS FALL 2002 Lecture Contents Review –Metadata Systems –Dublin Core Controlled Vocabularies Name Authority Files –Choice of Names –Form of Names Other Types of Controlled Vocabularies Faceted vs. Hierarchic Organization of Vocabularies

SLIDE 26IS FALL 2002 Controlled Vocabularies Vocabulary control is the attempt to provide a standardized and consistent set of terms (such as subject headings, names, classifications, etc.) with the intent of aiding the searcher in finding information That is, it is an attempt to provide a consistent set of descriptions for use in (or as) metadata

SLIDE 27IS FALL 2002 Controlled Vocabularies Names and name authorities Gazetteers (geographic names) Code lists (e.g., LC language codes) Subject heading lists Classification schemes Thesauri

SLIDE 28IS FALL 2002 Lecture Contents Review –Metadata Systems –Dublin Core Controlled Vocabularies Name Authority Files –Choice of Names –Form of Names Other Types of Controlled Vocabularies Faceted vs. Hierarchic Organization of Vocabularies

SLIDE 29IS FALL 2002 Names Remember Cutter’s objectives of bibliographic description? –To enable a person to find a document of which the author is known –To show what the library has by a given author First serves access Second serves collocation

SLIDE 30IS FALL 2002 Problems with Names How many names should be associated with a document? Which of these should be the “main entry?” What form should each of the names take? What references should be made from other possible forms of names that haven’t been used?

SLIDE 31IS FALL 2002 The Problem Proliferation of the forms of names –Different names for the same person –Different people with the same names Examples –from Books in Print (semi-controlled but not consistent) –ERIC author index (not controlled)

SLIDE 32IS FALL 2002 Goethe …etc…

SLIDE 33IS FALL 2002 John Muir

SLIDE 34IS FALL 2002 Pauline Cochrane nee Atherton

SLIDE 35IS FALL 2002 Pauline Cochrane nee Atherton

SLIDE 36IS FALL 2002 Rules for Description AACR II and other sets of descriptive cataloging rules provide guidelines for: –Determining the number of name entries –Choosing a main entry –Deciding on the form of name to be used –Deciding when to make references

SLIDE 37IS FALL 2002 Authority Control Authority control is concerned with creation and maintenance of a set of terms that have been chosen as the standard representatives (also know as established) based on some set of rules If you have rules, why do you need to keep track of all of the headings? Can’t you just infer the headings from the rules?

SLIDE 38IS FALL 2002 Conditions of Authorship? Single person or single corporate entity Unknown or anonymous authors –Fictitiously ascribed works Shared responsibility Collections or editorially assembled works Works of mixed responsibility (e.g., translations) Related works

SLIDE 39IS FALL 2002 Added Entries Personal names –Collaborators –Editors, compilers, writers –Translators (in some cases) –Illustrators (in some cases) –Other persons associated with the work (such as the honoree in a festschrift) Corporate names –Any prominently named corporate body that has involvement in the work beyond publication, distribution, etc.

SLIDE 40IS FALL 2002 Choice of Name AACR II says that the predominant form of the name used in a particular author’s writings should be chosen as the form of name References should be made from the other forms of the name

SLIDE 41IS FALL 2002 Form of the Name When names appear in multiple forms, one form needs to be chosen Criteria for choice are: –Fullness (e.g., full names vs. initials only) –Language of the name –Spelling (choose predominant form) Entry element: –John Smith or Smith, John? –Mao Zedong or Zedong, Mao? (Mao Tse Tung?)

SLIDE 42IS FALL 2002 Name Authority Files ID:NAFL ST:p EL:n STH:a MS:c UIP:a TD: KRC:a NMU:a CRC:c UPN:a SBU:a SBC:a DID:n DF: RFE:a CSC: SRU:b SRT:n SRN:n TSS: TGA:? ROM:? MOD: VST:d Other Versions: earlier 040 DLC$cDLC$dDLC$dOCoLC 053 PR6005.R Creasey, John Cooke, M. E Cooke, Margaret,$d Cooper, Henry St. John,$d Credo,$d Fecamps, Elise Gill, Patrick,$d Hope, Brian,$d Hughes, Colin,$d Marsden, James Matheson, Rodney Ranger, Ken St. John, Henry,$d Wilde, Jimmy $wnnnc$aAshe, Gordon,$d Different names for the same person

SLIDE 43IS FALL 2002 Name Authority Files ID:NAFO ST:p EL:n STH:a MS:n UIP:a TD: KRC:a NMU:a CRC:c UPN:a SBU:a SBC:a DID:n DF: RFE:a CSC:c SRU:b SRT:n SRN:n TSS: TGA:? ROM:? MOD: VST:d OCoLC$cOCoLC Marric, J. J.,$d $wnnnc$aCreasey, John 663 Works by this author are entered under the name used in the item. For a listing of other names used by this author, search also under$bCrease y, John 670 OCLC : His Gideon's day, 1955$b(hdg.: Creasey, John; usage: J.J. Marric) 670 LC data base, 6/10/91$b(hdg.: Creasey, John; usage: J.J. Marric) 670 Pseuds. and nicknames dict., c1987$b(Creasey, John, ; Britis h author; pseud.: Marric, J. J.)

SLIDE 44IS FALL 2002 Name Authority Files ID:NAFL ST:p EL:n STH:a MS:c UIP:a TD: KRC:a NMU:a CRC:c UPN:a SBU:a SBC:a DID:n DF: RFE:a CSC: SRU:b SRT:n SRN:n TSS: TGA:? ROM:? MOD: VST:d Other Versions: earlier 040 DLC$cDLC$dDLC$dOCoLC Butler, William Vivian,$d Butler, W. V.$q(William Vivian),$d Marric, J. J.,$d His The durable desperadoes, His The young detective's handbook, c1981:$bt.p. (W.V. Butler) 670 His Gideon's way, 1986:$bCIP t.p. (William Vivian Butler writing as J.J. Marric) Different people writing with the same name

SLIDE 45IS FALL 2002 The Haunting of Lauran Paine 1. Paine, Lauran. ALSO KNOWN AS: Carrel, Mark. Thompson, Russ. Andrews, A. A. Benton, Will. Bradford, Will. Bradley, Concho. Brennan, Will. Carter, Nevada. Allen, Clay. Almonte, Rosa. Armour, John. Cassady, Claude. Glendenning, Donn. Kelley, Ray. Kilgore, John. Martin, Tom. Slaughter, Jim. Standish, Buck. … Batchelor, Reg. Beck, Harry. Bedford, Kenneth. Bosworth, Frank. Bovee, Ruth. Cassidy, Claude. Custer, Clint. Dana, Amber. Dana, Richard. Davis, Audrey. Drexler, J. F. Duchesne, Antoinette. Fisher, Margot. Fleck, Betty. Frost, Joni. Gordon, Angela. Gorman, Beth. Hayden, Jay. Houston, Will. Howard, Troy. Ingersol, Jared. … Kelly, Ray. Ketchum, Jack. Liggett, Hunter. Lucas, J. K. Lyon, Buck. Morgan, Arlene. Morgan, Valerie. O'Connor, Clint. St. George, Arthur. Sharp, Helen. Thorn, Barbara. Archer, Dennis. Clark, Badger.

SLIDE 46IS FALL 2002 Some Interesting Ones…

SLIDE 47IS FALL 2002 Lecture Contents Review –Dublin Core –Other Metadata Systems Controlled Vocabularies Name Authority Files –Choice of Names –Form of Names Other Types of Controlled Vocabularies Faceted vs. Hierarchic Organization of Vocabularies

SLIDE 48IS FALL 2002 Structure of an IR System Search Line Interest profiles & Queries Documents & data Rules of the game = Rules for subject indexing + Thesaurus (which consists of Lead-In Vocabulary and Indexing Language Storage Line Potentially Relevant Documents Comparison/ Matching Store1: Profiles/ Search requests Store2: Document representations Indexing (Descriptive and Subject) Formulating query in terms of descriptors Storage of profiles Storage of Documents Information Storage and Retrieval System Adapted from Soergel, p. 19

SLIDE 49IS FALL 2002 Uses of Controlled Vocabularies Library subject headings, classification, and authority files Commercial journal indexing services and databases Yahoo, and other web classification schemes Online and manual systems within organizations –SunSolve –MacArthur

SLIDE 50IS FALL 2002 Types of Indexing Languages Uncontrolled keyword indexing Indexing languages –Controlled, but not structured Thesauri –Controlled and structured Classification systems –Controlled, structured, and coded Faceted thesauri and classification systems

SLIDE 51IS FALL 2002 Indexing Languages An index is a systematic guide designed to indicate topics or features of documents in order to facilitate retrieval of documents or parts of documents An Indexing language is the set of terms used in an index to represent topics or features of documents, and the rules for combining or using those terms

SLIDE 52IS FALL 2002 Indexing Languages Library of Congress Subject Headings Yellow pages topics Wilson indexes (“reader’s guide”)

SLIDE 53IS FALL 2002 Thesauri A thesaurus is a collection of selected vocabulary (preferred terms or descriptors) with links among –Synonymous –Equivalent –Broader –Narrower, and –Other related terms

SLIDE 54IS FALL 2002 Thesauri (Cont.) National and international standards for thesauri –ANSI/NISO z American National Standard Guidelines for the Construction, Format and Management of Monolingual Thesauri –ANSI/NISO Draft Standard Z x -- American National Standard Guidelines for Indexes in Information Retrieval –ISO Documentation -- Guidelines for the establishment and development of monolingual thesauri –ISO Documentation -- Guidelines for the establishment and development of multilingual thesauri

SLIDE 55IS FALL 2002 Thesauri (Cont.) Examples: –The ERIC Thesaurus of Descriptors –The Art and Architecture Thesaurus –The Medical Subject Headings (MESH) of the National Library of Medicine

SLIDE 56IS FALL 2002 Classification Systems A classification system is an indexing language often based on a broad ordering of topical areas Thesauri and classification systems both use this broad ordering and maintain a structure of broader, narrower, and related topics Classification schemes commonly use a coded notation for representing a topic and it’s place in relation to other terms

SLIDE 57IS FALL 2002 Classification Systems (Cont.) Examples: –The Library of Congress Classification System –The Dewey Decimal Classification System –The ACM Computing Reviews Categories –The American Mathematical Society Classification System

SLIDE 58IS FALL 2002 Using Controlled Vocabulary Start with the text of the document Attempt to “control” or regularize: –The concepts expressed within mutually exclusive exhaustive –The language used to express those concepts limit the normal linguistic variations regulate word order and structure of phrases reduce the number of synonyms or near-synonyms Also, provide cross-references between concepts and their expression Slide author: Marti Hearst (These slides follow Bates 88)

SLIDE 59IS FALL 2002 Classification Schemes Classify possible concepts. Goals: –Completely distinct conceptual categories (mutually exclusive) –Complete coverage of conceptual categories (exhaustive) Slide author: Marti Hearst

SLIDE 60IS FALL 2002 Assigning Headings vs. Descriptors Descriptors –Mix and match How would we describe recipes using each technique? Slide author: Marti Hearst Subject headings –Assign one (or a few) complex heading(s) to the document

SLIDE 61IS FALL 2002 Subject Heading vs. Descriptors Wilsonline –Athletes –Athletes -- Heath&hygiene –Athletes -- Nutrition –Athletes -- Physical Exams –… –Athletics –Athletics -- Administration –Athletics -- Equipment -- Catalogs –… –Sports -- Accidents and Injuries –Sports -- Accidents and Injuries -- Prevention ERIC –Athletes –Athletic Coaches –Athletic Equipment –Athletic Fields –Athletics –… –Sports Psychology –Sportsmanship Slide author: Marti Hearst

SLIDE 62IS FALL 2002 Subject Headings vs. Descriptors Describe the contents of an entire document Designed to be looked up in an alphabetical index –Look up document under its heading Few (1-5) headings per document Describe one concept within a document Designed to be used in Boolean searching –Combine to describe the desired document Many (5-25) descriptors per document Slide author: Marti Hearst

SLIDE 63IS FALL 2002 Lecture Contents Review –Dublin Core –Other Metadata Systems Controlled Vocabularies Name Authority Files –Choice of Names –Form of Names Other Types of Controlled Vocabularies Faceted vs. Hierarchic Organization of Vocabularies

SLIDE 64IS FALL 2002 Hierarchical Classification Each category is successively broken down into smaller and smaller subdivisions No item occurs in more than one subdivision Each level divided out by a “character of division” (also known as a feature) –Example: Distinguish “Literature” based on: –Language –Genre –Time Period Slide author: Marti Hearst

SLIDE 65IS FALL 2002 Hierarchical Classification Literature SpanishFrenchEnglish DramaPoetryProse 18th17th16th DramaPoetryProse 19th18th17th16th19th... Slide author: Marti Hearst

SLIDE 66IS FALL 2002 Labeled Categories for Hierarchical Classification LITERATURE –100 English Literature 110 English Prose –English Prose 16th Century –English Prose 17th Century –English Prose 18th Century – English Poetry –121 English Poetry 16th Century –122 English Poetry 17th Century – English Drama –130 English Drama 16th Century –… –200 French Literature Slide author: Marti Hearst

SLIDE 67IS FALL 2002 Faceted Classification Create a separate, free-standing list for each characteristic or division (feature) Combine features to create a classification Slide author: Marti Hearst

SLIDE 68IS FALL 2002 Faceted Classification Along With Labeled Categories A Language –a English –b French –c Spanish B Genre –a Prose –b Poetry –c Drama C Period –a 16th Century –b 17th Century –c 18th Century –d 19th Century Aa English Literature AaBa English Prose AaBaCa English Prose 16th Century AbBbCd French Poetry 19th Century BbCd Drama 19th Century Slide author: Marti Hearst

SLIDE 69IS FALL 2002 Important Questions How to use both types of classification structures? How to look through them? How to use them in search? Slide author: Marti Hearst

SLIDE 70IS FALL 2002 Next Time Multimedia Information Organization and Retrieval (MED) Readings for next time (in Protected) –“Indexing the Content of Multimedia Documents” (S. W. Smoliar, L. D. Wilcox) –“Computational Media Aesthetics: Finding Meaning Beautiful” (C. Dorai, S. Venkatesh) –“The Holy Grail of Content-Based Media Analysis” (S. Chang)

SLIDE 71IS FALL 2002 Homework (!) Do Readings Receive and integrate feedback on Assignment 2 to iterate your Photo Use Scenario (nothing to turn in on this yet) Assignment 3: Photo Metadata Design –Due by Thursday, September 19