8/28/97Information Organization and Retrieval Controlled Subject Vocabularies and Thesauri University of California, Berkeley School of Information Management.

Slides:



Advertisements
Similar presentations
BS 8723 advances to encompass interoperability Stella G Dextre Clarke Convenor, IDT/2/2 Working Group of BSI.
Advertisements

Database Searching: How to Find Journal Articles? START.
Subject Analysis: An Introduction Based on BASIC SUBJECT CATALOGING USING LCSH edited by Lori Robare.
Chapter 5: Introduction to Information Retrieval
Module 5a: Authority Control and Encoding Schemes IMT530: Organization of Information Resources Winter 2007 Michael Crandall.
Leveraging Your Taxonomy to Increase User Productivity MAIQuery and TM Navtree.
SLIDE 1IS 257 – Fall 2007 Codes and Rules for Description: History 2 University of California, Berkeley School of Information IS 245: Organization.
Engineering Village ™ ® Basic Searching On Compendex ®
SLIDE 1IS 240 – Spring 2010 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of.
11/15/2001Information Organization and Retrieval Information Structures and Metadata University of California, Berkeley School of Information Management.
Final Exam Review SIMS 202 Profs. Hearst & Larson UC Berkeley SIMS Fall 2000.
8/28/97Information Organization and Retrieval Metadata and Data Structures University of California, Berkeley School of Information Management and Systems.
Thesaurus Design and Development
SLIDE 1IS 202 – FALL 2003 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2003
SLIDE 1IS FALL 2004 Lecture 18: Metadata & Controlled Vocabulary Introduction Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday.
11/9/2000Information Organization and Retrieval Information Structures and Metadata University of California, Berkeley School of Information Management.
11/21/2000Information Organization and Retrieval Thesaurus Design and Development University of California, Berkeley School of Information Management and.
11/7/2000Information Organization and Retrieval Controlled Vocabularies: Name Authority Control University of California, Berkeley School of Information.
SLIDE 1IS 257 – Fall 2009 Controlled Vocabularies University of California, Berkeley School of Information IS 245: Organization of Information.
SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002
SLIDE 1IS 257 – Fall 2007 Subject Access to Collections: Introduction University of California, Berkeley School of Information IS 245: Organization.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
The Library Cataloging Tradition
11/13/2001Information Organization and Retrieval Controlled Vocabularies: Name Authority Control University of California, Berkeley School of Information.
Jump to first page Information Management Process Information adapted from Prince William County Information Management Manual.
1 Languages for aboutness n Indexing languages: –Terminological tools Thesauri (CV – controlled vocabulary) Subject headings lists (CV) Authority files.
8/28/97Information Organization and Retrieval Controlled Vocabularies: Name Authority Control University of California, Berkeley School of Information.
11/20/2001Information Organization and Retrieval Final Review University of California, Berkeley School of Information Management and Systems SIMS 202:
SLIDE 1IS FALL 2003 Lecture 07: Controlled Vocabularies Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30.
SLIDE 1IS FALL 2002 Lecture 06: Controlled Vocabularies Introduction Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and.
Vocabulary & languages in searching
1 MeSH & Principles of Classification April 13, 2005.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
Indexing Knowledge Daniel Vasicek 2014 March 27 Introduction Basic topic is : All Human Knowledge Who Cares? Simple Examples.
1 Intra- and interdisciplinary cross- concordances for information retrieval Philipp Mayr GESIS – Leibniz Institute for the Social Sciences, Bonn, Germany.
Internet Research Fourth Edition Unit C. Internet Research – Illustrated, Fourth Edition 2 Internet Research: Unit C Browsing Subject Guides.
Lecture Four: Steps 3 and 4 INST 250/4.  Does one look for facts, or opinions, or both when conducting a literature search?  What is the difference.
Types of Periodicals in Literature Professional Scholarly Literary.
Basic Catalog Searching Rich Edwards Innovative Coordinator Washington State Library.
The Library Cataloging Tradition Marty Kurth CS 431 February 9, 2005 [slides stolen from Diane Hillmann]
Diane E. Beck, Pharm.D. Director of Educational & Faculty Development and Professor College of Pharmacy University of Florida Unit B Module 2.1 Finding.
10/21/98Organization of Information in Collections Subject Access to Collections: Introduction University of California, Berkeley School of Information.
Current Events and Issues Using Index Databases for Finding Answers.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
Librarians vs. Automation Carolyn Weber Lucio Campanelli Will Hohyon Ryu.
4 1 SEARCHING THE WEB Using Search Engines and Directories Effectively New Perspectives on THE INTERNET.
INFO Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.
Thesauri usage in information retrieval systems: example of LISTA and ERIC database thesaurus Kristina Feldvari Departmant of Information Sciences, Faculty.
Basics of Information Retrieval and Query Formulation Bekele Negeri Duresa Nuclear Information Specialist.
Information Retrieval
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
PubMed …featuring more than 20 million citations for biomedical literature from MEDLINE, life science journals, and online books.
LIS 204: Introduction to Library and Information Science Week Nine Kevin Rioux, PhD.
ORGANIZATION OF ELEMENTS OF INFORMATION The Thesaurus.
8/28/97Information Organization and Retrieval Introduction University of California, Berkeley School of Information Management and Systems SIMS 245: Organization.
Subject Access to Your Information Sandy Tucker Texas A&M University Libraries August 1, 2006 Second International Symposium on Transportation Technology.
Diane Vizine-Goetz Senior Research Scientist, OCLC Research Joan S. Mitchell Editor in Chief, DDC Michael Panzer Assistant Editor, DDC Publisher and Librarian.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
Charlyn P. Salcedo Instructor Types of Indexing Languages.
1 Shelflisting and Filing Rules and Subject Authority Control May 11, 2005.
Organization of Information LSIS Summer II (2005)
GUIDE. P UB M ED
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
Information Organization: Overview
CS 430: Information Discovery
Introduction to Information Retrieval
PubMed.
Information Organization: Overview
Information Retrieval and Web Design
Presentation transcript:

8/28/97Information Organization and Retrieval Controlled Subject Vocabularies and Thesauri University of California, Berkeley School of Information Management and Systems SIMS 202: Information Organization and Retrieval

8/28/97Information Organization and Retrieval Review Controlled vocabularies Choice of names Form of names Name Authority files

8/28/97Information Organization and Retrieval Controlled Vocabularies Vocabulary control is the attempt to provide a standardized and consistent set of terms (such as subject headings, names, classifications, etc.) with the intent of aiding the searcher in finding information.

8/28/97Information Organization and Retrieval Name Authority Files ID:NAFL ST:p EL:n STH:a MS:c UIP:a TD: KRC:a NMU:a CRC:c UPN:a SBU:a SBC:a DID:n DF: RFE:a CSC: SRU:b SRT:n SRN:n TSS: TGA:? ROM:? MOD: VST:d Other Versions: earlier 040 DLC$cDLC$dDLC$dOCoLC 053 PR6005.R Creasey, John Cooke, M. E Cooke, Margaret,$d Cooper, Henry St. John,$d Credo,$d Fecamps, Elise Gill, Patrick,$d Hope, Brian,$d Hughes, Colin,$d Marsden, James Matheson, Rodney Ranger, Ken St. John, Henry,$d Wilde, Jimmy $wnnnc$aAshe, Gordon,$d Different names for the same person

8/28/97Information Organization and Retrieval Name Authority Files ID:NAFO ST:p EL:n STH:a MS:n UIP:a TD: KRC:a NMU:a CRC:c UPN:a SBU:a SBC:a DID:n DF: RFE:a CSC:c SRU:b SRT:n SRN:n TSS: TGA:? ROM:? MOD: VST:d OCoLC$cOCoLC Marric, J. J.,$d $wnnnc$aCreasey, John 663 Works by this author are entered under the name used in the item. For a listing of other names used by this author, search also under$bCrease y, John 670 OCLC : His Gideon's day, 1955$b(hdg.: Creasey, John; usage: J.J. Marric) 670 LC data base, 6/10/91$b(hdg.: Creasey, John; usage: J.J. Marric) 670 Pseuds. and nicknames dict., c1987$b(Creasey, John, ; Britis h author; pseud.: Marric, J. J.)

8/28/97Information Organization and Retrieval Name authority files ID:NAFL ST:p EL:n STH:a MS:c UIP:a TD: KRC:a NMU:a CRC:c UPN:a SBU:a SBC:a DID:n DF: RFE:a CSC: SRU:b SRT:n SRN:n TSS: TGA:? ROM:? MOD: VST:d Other Versions: earlier 040 DLC$cDLC$dDLC$dOCoLC Butler, William Vivian,$d Butler, W. V.$q(William Vivian),$d Marric, J. J.,$d His The durable desperadoes, His The young detective's handbook, c1981:$bt.p. (W.V. Butler) 670 His Gideon's way, 1986:$bCIP t.p. (William Vivian Butler writing as J.J. Marric) Different people writing with the same name

8/28/97Information Organization and Retrieval Categorization Summary Processes of categorization underlie many of the issues having to do with information organization Categorization is messier than our computer systems would like Human categories have graded membership, consisting of family resemblances. Family resemblance is expressed in part by which subset of features are shared It is also determined by underlying understandings of the world that do not get represented in most systems

8/28/97Information Organization and Retrieval Today Origins and Uses of Controlled Vocabularies for Information Retrieval Types of Indexing Languages, Thesauri and Classification Systems Process of Design and Development of Thesauri

8/28/97Information Organization and Retrieval Origins Very early history of content representation –Sumerian tokens and “envelopes” –Alexandria - pinakes –Indices

8/28/97Information Organization and Retrieval Origins Biblical Indexes and Concordances Journal Indexes “Information Explosion” following WWII –Cranfield Studies of indexing languages and information retrieval –Development of bibliographic databases Index Medicus -- production and Medlars searching

8/28/97Information Organization and Retrieval Origins Communication theory revisited Problems with transmission of meaning Noise SourceDecodingEncodingDestination Message Channel StorageSource Decoding (Retrieval/Reading) Encoding (writing/indexing) Destination Message

8/28/97Information Organization and Retrieval What is a “Controlled Vocabulary” “The greatest problem of today is how to teach people to ignore the irrelevant, how to refuse to know things, before they are suffocated. For too many facts are as bad as none at all.” (W.H. Auden) Similarly, there are too many ways of expressing or explaining the topic of a document. Controlled vocabularies are sets of Rules for topic identification and indexing, and a THESAURUS, which consists of “lead-in vocabulary” and an limited and selective “Indexing Language” sometimes with special coding or structures.

8/28/97Information Organization and Retrieval Structure of an IR System Search Line Interest profiles & Queries Documents & data Rules of the game = Rules for subject indexing + Thesaurus (which consists of Lead-In Vocabulary and Indexing Language Storage Line Potentially Relevant Documents Comparison/ Matching Store1: Profiles/ Search requests Store2: Document representations Indexing (Descriptive and Subject) Formulating query in terms of descriptors Storage of profiles Storage of Documents Information Storage and Retrieval System Adapted from Soergel, p. 19

8/28/97Information Organization and Retrieval Uses of Controlled Vocabularies Library Subject Headings, Classification and Authority Files. Commercial Journal Indexing Services and databases Yahoo, and other Web classification schemes Online and Manual Systems within organizations –SunSolve –MacArthur

8/28/97Information Organization and Retrieval Types of Indexing Languages Uncontrolled Keyword Indexing Indexing Languages –Controlled, but not structured Thesauri –Controlled and Structured Classification Systems –Controlled, Structured, and Coded Faceted Classification Systems

8/28/97Information Organization and Retrieval Indexing Languages An index is a systematic guide designed to indicate topics or features of documents in order to facilitate retrieval of documents or parts of documents. An Indexing language is the set of terms used in an index to represent topics or features of documents, and the rules for combining or using those terms.

8/28/97Information Organization and Retrieval Indexing Languages Library of Congress Subject Headings Yellow Pages Topics Wilson Indexes (“Reader’s Guide”)

8/28/97Information Organization and Retrieval Thesauri A Thesaurus is a collection of selected vocabulary (preferred terms or descriptors) with links among Synonymous, Equivalent, Broader, Narrower and other Related Terms

8/28/97Information Organization and Retrieval Thesauri (cont.) National and International Standards for Thesauri –ANSI/NISO z American National Standard Guidelines for the Construction, Format and Management of Monolingual Thesauri –ANSI/NISO Draft Standard Z x -- American National Standard Guidelines for Indexes in Information Retrieval –ISO Documentation -- Guidelines for the establishment and development of monolingual thesauri –ISO Documentation -- Guidelines for the establishment and development of multilingual thesauri

8/28/97Information Organization and Retrieval Thesauri (cont.) Examples: –The ERIC Thesaurus of Descriptors –The Art and Architecture Thesaurus –The Medical Subject Headings (MESH) of the National Library of Medicine

8/28/97Information Organization and Retrieval Classification Systems A classification system is an indexing language often based on a broad ordering of topical areas. Thesauri and classification systems both use this broad ordering and maintain a structure of broader, narrower, and related topics. Classification schemes commonly use a coded notation for representing a topic and it’s place in relation to other terms.

8/28/97Information Organization and Retrieval Classification Systems (cont.) Examples: –The Library of Congress Classification System –The Dewey Decimal Classification System –The ACM Computing Reviews Categories –The American Mathematical Society Classification System

8/28/97Information Organization and Retrieval Automatic Indexing and Classification Automatic indexing is typically the simple deriving of keywords from a document and providing access to all of those words. More complex Automatic Indexing Systems attempt to select controlled vocabulary terms based on terms in the document. Automatic classification attempts to automatically group similar documents using either: –A fully automatic clustering method. –An established classification scheme and set of documents already indexed by that scheme.

8/28/97Information Organization and Retrieval Clustering Aglomerative methods Doc

8/28/97Information Organization and Retrieval Automatic Class Assignment Doc Search Engine 1. Search using document contents 2. Obtain ranked list 3. Assign document to N categories ranked over theshold.

8/28/97Information Organization and Retrieval Development of a Thesaurus Term Selection. Merging and Development of Concept Classes. Definition of Broad Subject Fields and Subfields. Development of Classificatory structure Review, Testing, Application, Revision.

8/28/97Information Organization and Retrieval 1. Preliminary Term Selection Select sources for the collection of terms. –Prearranged Sources –Open-ended Sources Assign codes to each source. Selection of terms –For part of pre- arranged and for all open-ended sources Enter terms into database with all information.

8/28/97Information Organization and Retrieval 2. Merging and Development of Concept Classes Sort Term DB into alphabetical order. First Round: Merge information for Identical terms -- possibly pulling info from additional sources. Second Round: Merge synonyms or terms in the same concept class.

8/28/97Information Organization and Retrieval 3. Definition of Broad Subject Fields and Subfields Define Broad Subject fields and sort terms into these broad fields Define subfields within each broad field and sort terms into these subfields. Work out the detailed structure –Select Preferred Terms –Merge information for terms in the same concept class Repeat these steps –for each subfield within a broad field –and for each broad field –Until all terms have been consolidated and preferred terms selected

8/28/97Information Organization and Retrieval 4. Development of Classificatory Structure Produce preliminary version of classified index and update the working database. Improve classificatory structure Reality check: produce and distribute a version of the classified index. Distribute to users/experts.

8/28/97Information Organization and Retrieval 5. Final Stages Review Testing Application Revision

8/28/97Information Organization and Retrieval Review Discuss classified index with users/experts. –Select descriptors and checklist descriptors. Assign Notational Symbols Produce Main Thesaurus & Indexes

8/28/97Information Organization and Retrieval Review (cont.) Check cross references and insert where needed Produce Test Version Test by Indexing Modify as needed Produce Production Version.

8/28/97Information Organization and Retrieval Testing a Thesaurus Assign descriptors to a sample set of NEW documents (use enough to get an idea of any gaps in the thesaurus. Test retrieval using sample questions and seeing how effectively the thesaurus maps to the appropriate descriptor

8/28/97Information Organization and Retrieval The Indexing Process Concept identification term selection (via thesaurus) term assignment

8/28/97Information Organization and Retrieval Application: The Indexing Process (Manual) Is Term suitable NO Select Alternative term to represent Concept Would Concept be better represented by one of these terms Is There Another Concept Consider Preferred Term Select Preferred Term Establish Term Denoting Concept Examine Document and Identify Significant Concepts Consider First Concept Preferred Term? Start NO YES Does Thesaurus contain term for Concept Consider any associated terms in Thesaurus (NT,BT) Admit New Term Into Thesaurus Can Concept be expressed combining terms? Consider Each of These Terms Assign Terms to Document Prefer Alternative Term(s) End Adapted from ISO 5963, p.5

8/28/97Information Organization and Retrieval Thesaurus Revision and Updates There will always be new concepts, products, or expressions that need to be added to the thesaurus. –Set a regular schedule of reviews and revisions. –Collect complaints, problems, etc. and fold into revision of the thesaurus

8/28/97Information Organization and Retrieval References Soegel, D. Indexing Languages and Thesauri: Construction and Maintenance. Los Angeles : Melville Publishing Co., 1974 Foskett, A.C. The Subject Approach to Information. London: Clive Bingley, Standards: –ANSI/NISO z American National Standard Guidelines for the Construction, Format and Management of Monolingual Thesauri –ANSI/NISO Draft Standard Z x -- American National Standard Guidelines for Indexes in Information Retrieval –ISO Documentation -- Guidelines for the establishment and development of monolingual thesauri –ISO Documentation -- Guidelines for the establishment and development of multilingual thesauri