2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of.

Slides:



Advertisements
Similar presentations
Subject Analysis: An Introduction Based on BASIC SUBJECT CATALOGING USING LCSH edited by Lori Robare.
Advertisements

Not just numbers on shelves: using the DDC for information retrieval Gordon Dunsire Presented at the Symposium “Bridging the class(ification) divide: the.
Session 8 Technical Services Moving from conceptual description to implementation technology.
6. Applying metadata standards: Controlled vocabularies and quality issues Metadata Standards and Applications Workshop.
Leveraging Your Taxonomy to Increase User Productivity MAIQuery and TM Navtree.
SLIDE 1IS 257 – Fall 2007 Codes and Rules for Description: History 2 University of California, Berkeley School of Information IS 245: Organization.
Introduction to Library Research Gabriela Scherrer Reference Librarian for English Languages and Literatures, University Library of Bern.
Introduction to Library Research Gabriela Scherrer Reference Librarian for English Languages and Literatures, University Library of Bern.
Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of Information Resources Winter 2007 Michael Crandall.
Facetted Classification and Thesauri Introduction
Using Metadata in CONTENTdm Diana Brooking and Allen Maberry Metadata Implementation Group, Univ. of Washington Crossing Organizational Boundaries Oct.
Thesaurus Design and Development
SLIDE 1IS 202 – FALL 2003 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2003
SLIDE 1IS FALL 2004 Lecture 21: Facetted Classification Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30.
11/21/2000Information Organization and Retrieval Thesaurus Design and Development University of California, Berkeley School of Information Management and.
8/28/97Information Organization and Retrieval Controlled Subject Vocabularies and Thesauri University of California, Berkeley School of Information Management.
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
1 Languages for aboutness n Indexing languages: –Terminological tools Thesauri (CV – controlled vocabulary) Subject headings lists (CV) Authority files.
SLIDE 1IS 202 – FALL 2004 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2004
Methodology Conceptual Database Design
Vocabulary & languages in searching
Page 1 ISMT E-120 Introduction to Microsoft Access & Relational Databases The Influence of Software and Hardware Technologies on Business Productivity.
10 December, 2013 Katrin Heinze, Bundesbank CEN/WS XBRL CWA1: DPM Meta model CWA1Page 1.
1 MeSH & Principles of Classification April 13, 2005.
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
Introduction to Library Research Gabriela Scherrer Reference Librarian for English Languages and Literatures, University Library of Bern.
Controlled Vocabulary & Thesaurus Design Planning & Maintenance.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
Improving Access to Audio- Visual Materials by Using Genre/Form Terms OLAC Conference 1-3 October 2004 Montreal, Quebec.
1 Intra- and interdisciplinary cross- concordances for information retrieval Philipp Mayr GESIS – Leibniz Institute for the Social Sciences, Bonn, Germany.
WHO-CEHA Inter-Water Thesaurus and other WHO Sources for Health and Environment Terminology Mazen Malkawi Technical Information Officer WHO/EMRO/CEHA.
Lecture Four: Steps 3 and 4 INST 250/4.  Does one look for facts, or opinions, or both when conducting a literature search?  What is the difference.
CINAHL DATABASE FOR HINARI USERS: nursing and allied health information (Module 7.1)
The Library Cataloging Tradition Marty Kurth CS 431 February 9, 2005 [slides stolen from Diane Hillmann]
ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.
AAT Art & Architecture Thesaurus. Diffuse list of museum standards
Current Events and Issues Using Index Databases for Finding Answers.
The UNESCO Thesaurus Meeting for Managers of UNESCO Documentation Networks Meron Ewketu UNESCO Library June
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
MeSH The Medical Subject Headings from the National Library of Medicine.
ISO 25964: a standard in support of interoperability Stella G Dextre Clarke Project Leader, ISO NP
Lecture 6: Structural Modeling
INFO Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.
Indexes and Abstracts: Dissecting the Resource By M. Leedy.
Basics of Information Retrieval and Query Formulation Bekele Negeri Duresa Nuclear Information Specialist.
Systematic literature searching Information skills for PhD students: 2 Jane Falconer Improving health worldwidewww.lshtm.ac.uk.
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
June 2003INIS Training Seminar1 INIS Training Seminar 2-6 June 2003 Subject Analysis Thesaurus and Indexing Alexander Nevyjel Subject Control Unit INIS.
Controlled Vocabulary & Thesaurus Design Associative Relationships & Thesauri.
LIS 204: Introduction to Library and Information Science Week Nine Kevin Rioux, PhD.
ORGANIZATION OF ELEMENTS OF INFORMATION The Thesaurus.
8/28/97Information Organization and Retrieval Introduction University of California, Berkeley School of Information Management and Systems SIMS 245: Organization.
Subject Access to Your Information Sandy Tucker Texas A&M University Libraries August 1, 2006 Second International Symposium on Transportation Technology.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
Charlyn P. Salcedo Instructor Types of Indexing Languages.
MEDLINE®/PubMed® PubMed for Trainers, Fall 2015 U.S. National Library of Medicine (NLM) and NLM Training Center An introduction.
1 How do we describe something? n What something is about? –What the content of an object is “about”? n Different methods (Wilson, 1968) –counting terms.
GUIDE. P UB M ED
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
1 SUBJECT ACCESS INF 389F: Organization of Records Information Professor Fran Miksa October 29, 2003.
Information organization Week 2 Lecture notes INF 380E: Perspectives on Information Spring 2015 Karen Wickett UT School of Information.
Subject Analysis: An Introduction
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Subject Access: Indexing and Abstracting
MeSH & Principles of Classification
Taxonomies, Lexicons and Organizing Knowledge
PubMed.
THESAURUS CONSTRUCTION: GROUND WATER
Presentation transcript:

SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of Information In Collections

SLIDE 2IS 257 – Fall 2007 Lecture Overview Review –Facetted Classification Traditional vs. Facetted Classification Designing Facetted Classifications Today –Thesaurus design –Steps in Thesaurus development –Indexing

SLIDE 3IS 257 – Fall 2007 Hierarchical Classification Literature SpanishFrenchEnglish DramaPoetryProse 18th17th16th DramaPoetryProse 19th18th17th16th19th... Slide author: Marti Hearst

SLIDE 4IS 257 – Fall 2007 Labeled Categories for Hierarchical Classification LITERATURE –100 English Literature 110 English Prose –English Prose 16th Century –English Prose 17th Century –English Prose 18th Century – English Poetry –121 English Poetry 16th Century –122 English Poetry 17th Century – English Drama –130 English Drama 16th Century –… –200 French Literature Slide author: Marti Hearst

SLIDE 5IS 257 – Fall 2007 Facetted Categories Mutually exclusive –Non-overlapping, distinct categories Relational –Relations between facets, subfacets, and foci (elements) are not restricted to hierarchical generalization-specialization relations Composable –Combined using grammars of order and relation to form compound descriptions

SLIDE 6IS 257 – Fall 2007 Facetted Classification Along With Labeled Categories A Language –a English –b French –c Spanish B Genre –a Prose –b Poetry –c Drama C Period –a 16th Century –b 17th Century –c 18th Century –d 19th Century Aa English Literature AaBa English Prose AaBaCa English Prose 16th Century AbBbCd French Poetry 19th Century BbCd Drama 19th Century Slide author: Marti Hearst

SLIDE 7IS 257 – Fall 2007 Ranganathan PMEST Facets –P(ersonality) WHO: The most important types or names of things for the particular discipline –M(atter) WHAT: Constituent materials –E(nergy) HOW: Action or activity terms –S(pace) WHERE: Where things occur –T(ime) WHEN: When things occur

SLIDE 8IS 257 – Fall 2007 “Classical” CRG/BC2 Facet Analysis Entity Kind Part Property Material Process Operation Patient Product By-Product Agent Space Time

SLIDE 9IS 257 – Fall 2007 “Classical” Facet Analysis What is being done? –Entity –Kind –Product –By-Product What are its parts? –Part What are its properties? –Property –Material How is this achieved? –Process By what means? –Operation By whom? –Agent –Patient Where? –Space When? –Time

SLIDE 10IS 257 – Fall 2007 “Classical” Facet Analysis Nouns –Entity –Kind –Part –Patient –Product –By-Product –Agent Adjectives –Property –Material Intransitive Verb –Process Transitive Verb –Operation Adverb –Space –Time

SLIDE 11IS 257 – Fall 2007 Semantic and Syntactic Relationships Semantic relationships –Is-A (thing/kind, genus/species) Mammals –Primates »Humans –Has-Parts Human –Head »Eyes Syntactic relationships –Compounds Wheat + harvesting = “wheat harvesting” Object + operation = operation on object

SLIDE 12IS 257 – Fall 2007 Facetted Classification Clearly distinguishes between semantic relationships and syntactic relationships –Semantic relationships Within a facet Containment relations –Syntactic relationships Across facets Combinatoric relations Have a “syntax” for syntactic combination of semantic terms

SLIDE 13IS 257 – Fall 2007 Power of Facet Combinations The syntactic relations of facetted classifications enable a small controlled vocabulary to produce –Many, many structured descriptions –Complex, but formally structured descriptions using nested compound descriptions –Descriptions for things we do not have words for

SLIDE 14IS 257 – Fall 2007 Today More on thesaurus standards and examples

SLIDE 15IS 257 – Fall 2007 Types of Indexing Languages Uncontrolled keyword indexing Indexing languages –Controlled, but not structured Thesauri –Controlled and structured Classification systems –Controlled, structured, and coded Facetted classification systems

SLIDE 16IS 257 – Fall 2007 Thesauri A Thesaurus is a collection of selected vocabulary (preferred terms or descriptors) with links among synonymous, equivalent, broader, narrower and other related terms

SLIDE 17IS 257 – Fall 2007 Thesaurus Standards National and International Standards for Thesauri –ANSI/NISO z — American National Standard Guidelines for the Construction, Format and Management of Monolingual Thesauri –ANSI/NISO Draft Standard Z x — American National Standard Guidelines for Indexes in Information Retrieval –ISO 2788 — Documentation — Guidelines for the establishment and development of monolingual thesauri –ISO 5964 — Documentation — Guidelines for the establishment and development of multilingual thesauri

SLIDE 18IS 257 – Fall 2007 Thesaurus Examples Examples –Non-Facetted The ERIC Thesaurus of Descriptors –Semi-Facetted The Medical Subject Headings (MESH) of the National Library of Medicine –Facetted The Art and Architecture Thesaurus

SLIDE 19IS 257 – Fall 2007 ERIC Thesaurus – Entry

SLIDE 20IS 257 – Fall 2007 ERIC Thesaurus – Alphabetic

SLIDE 21IS 257 – Fall 2007 ERIC Thesaurus – KWIC Index

SLIDE 22IS 257 – Fall 2007 ERIC Thesaurus – Hierarchies

SLIDE 23IS 257 – Fall 2007 ERIC Thesaurus – Groups

SLIDE 24IS 257 – Fall 2007 ERIC Thesaurus – Online

SLIDE 25IS 257 – Fall 2007 MESH – Entry

SLIDE 26IS 257 – Fall 2007 MESH – Alphabetic

SLIDE 27IS 257 – Fall 2007 MESH – Tree Structures

SLIDE 28IS 257 – Fall 2007 MESH – KWOC Index

SLIDE 29IS 257 – Fall 2007 MESH - Online

SLIDE 30IS 257 – Fall 2007 AAT – Facets

SLIDE 31IS 257 – Fall 2007 AAT – Hierarchies (print)

SLIDE 32IS 257 – Fall 2007 AAT – Hierarchies (online)

SLIDE 33IS 257 – Fall 2007 AAT – Entry (online)

SLIDE 34IS 257 – Fall 2007 Lecture Overview Thesaurus Design and Development –Controlled Vocabularies for topical description –Thesaurus Design –Steps In Thesaurus Development (intro)

SLIDE 35IS 257 – Fall 2007 Why Develop a Thesaurus? To provide a conceptual structure or “space” for a body of information –To make it possible to adequately describe the topical content of information resources at an appropriate level of generality or specificity –To provide enhanced search capabilities and to improve the effectiveness of searching (i.e., to retrieve most of the relevant material without too much irrelevant material)

SLIDE 36IS 257 – Fall 2007 Why Develop a Thesaurus? To provide vocabulary (or terminological) control –When there are several possible terms designating a single concept, the thesaurus should lead the indexer or searcher to the appropriate concept, regardless of the terms they start with

SLIDE 37IS 257 – Fall 2007 Preliminary Considerations What is used now? –Continue using an existing thesaurus? –Ad hoc modification of existing thesaurus? –Develop a new well-structured thesaurus? What is the scope and complexity of the subject field? What kind of retrieval objects or data will be dealt with? How exhaustive and specific is the desired description of objects?

SLIDE 38IS 257 – Fall 2007 Preliminary Considerations The scope and complexity of the field will provide some indication of the scope and complexity of the thesaurus –It is better to plan for a larger and more comprehensive system than a smaller system that rapidly will become inadequate as the database grows Development of a good thesaurus requires a major intellectual effort as well as clerical operations like data entry and production of sorted lists

SLIDE 39IS 257 – Fall 2007 Development of a Thesaurus Term Selection. Merging and Development of Concept Classes. Definition of Broad Subject Fields and Subfields. Development of Classificatory structure Review, Testing, Application, Revision.

SLIDE 40IS 257 – Fall Term Selection Select sources for the collection of terms. –Prearranged Sources –Open-ended Sources Assign codes to each source. Selection of terms –For part of pre- arranged and for all open-ended sources Enter terms into database with all information.

SLIDE 41IS 257 – Fall Kinds of Sources Prearranged Sources –Existing descriptor lists, classification schemes thesauri. This includes universal schemes like DDC or LCSH. –Nomenclatures of single disciplines –Treatises on the terminology of a field –Encyclopedias, lexica, dictionaries and glossaries. –Tables of contents of textbooks and handbooks –Indexes of journals or abstracting journals –Indexes of other publications in the field

SLIDE 42IS 257 – Fall Kinds of Sources Open-ended sources –Lists of search requests or interest profiles –Description of projects/activities to be served by the information retrieval system. –Discussion with specialists in the field –Sample of documents in the field Ask users why and how these documents relate to the field. Have documents indexed by experts in the field –Lists of titles of documents in the field –Abstracts and reviews of documents –Your own knowledge

SLIDE 43IS 257 – Fall 2007 Selection of sources Prearranged sources require less effort in gathering the material, and may already indicate some relationships between terms and concepts and relationships among terms. Open-ended sources can reflect current terminology and may provide more complete coverage. Choose a set of sources that are current, as complete as possible, and considered authoratative.

SLIDE 44IS 257 – Fall 2007 Selection of Sources Each selected source is assigned an ID for tracking its use in the development of the thesaurus. –Useful when making decisions about which terms to prefer –Useful for backtracking when questions arise (where did this come from?)

SLIDE 45IS 257 – Fall 2007 Selection of Terms Terms can be transferred directly from prearranged sources to the recording medium (cards or database) –Have to decide which terms and references to include, or to take the whole source

SLIDE 46IS 257 – Fall 2007 Selection of Terms In open-ended sources you read through the source and pick out terms (I.e. words and phrases) that might be useful in retrieval or as references to other terms. Alternatively, use keyword and phrase extraction software to create lists of terms and select from those. Transfer selected terms to the recording medium (cards or database).

SLIDE 47IS 257 – Fall Merging and Development of Concept Classes Sort Term DB into alphabetical order. First Round: Merge information for Identical terms -- possibly pulling info from additional sources. Second Round: Merge synonyms or terms in the same concept class.

SLIDE 48IS 257 – Fall Definition of Broad Subject Fields and Subfields Define Broad Subject fields and sort terms into these broad fields Define subfields within each broad field and sort terms into these subfields. Work out the detailed structure –Select Preferred Terms –Merge information for terms in the same concept class Repeat these steps –for each subfield within a broad field –and for each broad field –Until all terms have been consolidated and preferred terms selected

SLIDE 49IS 257 – Fall Development of Classificatory Structure Produce preliminary version of classified index and update the working database. Improve classificatory structure Reality check: produce and distribute a version of the classified index. Distribute to users/experts.

SLIDE 50IS 257 – Fall Final Stages Review Testing Application Revision

SLIDE 51IS 257 – Fall 2007 Review Discuss classified index with users/experts. –Select descriptors and checklist descriptors. Assign Notational Symbols Produce Main Thesaurus & Indexes

SLIDE 52IS 257 – Fall 2007 Review (cont.) Check cross references and insert where needed Produce Test Version Test by Indexing Modify as needed Produce Production Version.

SLIDE 53IS 257 – Fall 2007 Testing a Thesaurus Assign descriptors to a sample set of NEW documents (use enough to get an idea of any gaps in the thesaurus. Test retrieval using sample questions and seeing how effectively the thesaurus maps to the appropriate descriptor

SLIDE 54IS 257 – Fall 2007 Flow of Work in Thesaurus Construction Select Sources Assign codes Select Terms Record Selected Terms Sort Terms Merge identical Terms Define Broad Subject Fields Merge Terms in Same Concept class Sort Terms into Broad Subject Fields Define Subfields within one Subject Field Work out detailed structure of the Subject Field Select Preferred Terms All Subfields of Broad Subject finished? All Broad Subjects finished? Improve Class Structure Yes No Print Classified Index and review Discuss with Experts and Users Select descriptors and checklist items Produce Full Thesaurus and Check references Assign Notation Review and Test Many Modifications? Based on Soergel, pp Yes No Revise as needed

SLIDE 55IS 257 – Fall 2007 The Indexing Process Concept identification term selection (via thesaurus) term assignment

SLIDE 56IS 257 – Fall 2007 Application: The Indexing Process (Manual) Adapted from ISO 5963, p.5 Is Term suitable NO Select Alternative term to represent Concept Would Concept be better represented by one of these terms Is There Another Concept Consider Preferred Term Select Preferred Term Establish Term Denoting Concept Examine Document and Identify Significant Concepts Consider First Concept Preferred Term? Start NO YES Does Thesaurus contain term for Concept Consider any associated terms in Thesaurus (NT,BT) Admit New Term Into Thesaurus Can Concept be expressed combining terms? Consider Each of These Terms Assign Terms to Document Prefer Alternative Term(s) End YES NO

SLIDE 57IS 257 – Fall 2007 Thesaurus Revision and Updates There will always be new concepts, products, or expressions that need to be added to the thesaurus. –Set a regular schedule of reviews and revisions. –Collect complaints, problems, etc. and fold into revision of the thesaurus

SLIDE 58IS 257 – Fall 2007 References Soegel, D. Indexing Languages and Thesauri: Construction and Maintenance. Los Angeles : Melville Publishing Co., 1974 Foskett, A.C. The Subject Approach to Information. London: Clive Bingley, Standards: –ANSI/NISO z American National Standard Guidelines for the Construction, Format and Management of Monolingual Thesauri –ANSI/NISO Draft Standard Z x -- American National Standard Guidelines for Indexes in Information Retrieval –ISO Documentation -- Guidelines for the establishment and development of monolingual thesauri –ISO Documentation -- Guidelines for the establishment and development of multilingual thesauri