Facetted Classification and Thesauri Introduction

Slides:



Advertisements
Similar presentations
Subject Analysis: An Introduction Based on BASIC SUBJECT CATALOGING USING LCSH edited by Lori Robare.
Advertisements

Sage Library Consortium Cataloging Subjects and Genres.
Session 8 Technical Services Moving from conceptual description to implementation technology.
Leveraging Your Taxonomy to Increase User Productivity MAIQuery and TM Navtree.
Module 8a: Faceted Classification
SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of.
Organising Information in your Website Steps and Schemes.
Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of Information Resources Winter 2007 Michael Crandall.
Module 10b: Wrapup IMT530: Organization of Information Resources Winter, 2007 Michael Crandall.
Using Metadata in CONTENTdm Diana Brooking and Allen Maberry Metadata Implementation Group, Univ. of Washington Crossing Organizational Boundaries Oct.
Thesaurus Design and Development
SLIDE 1IS 202 – FALL 2003 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2003
SLIDE 1IS FALL 2004 Lecture 21: Facetted Classification Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30.
11/21/2000Information Organization and Retrieval Thesaurus Design and Development University of California, Berkeley School of Information Management and.
SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002
Psychology of Category Structure Facets vs. Hierarchies SIMS 202 Profs. Hearst & Larson UC Berkeley SIMS Fall 2000.
SLIDE 1IS 257 – Fall 2007 Subject Access to Collections: Introduction University of California, Berkeley School of Information IS 245: Organization.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
The Library Cataloging Tradition
SLIDE 1IS 245 – Spring 2009 Codes and Rules for Description: History University of California, Berkeley School of Information IS 245: Organization.
11/13/2001Information Organization and Retrieval Controlled Vocabularies: Name Authority Control University of California, Berkeley School of Information.
A Registry for controlled vocabularies at the Library of Congress
1 Languages for aboutness n Indexing languages: –Terminological tools Thesauri (CV – controlled vocabulary) Subject headings lists (CV) Authority files.
SLIDE 1IS 202 – FALL 2004 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2004
SLIDE 1IS 257 – Fall 2007 Codes and Rules for Description: History University of California, Berkeley School of Information IS 245: Organization.
Medical Subject Headings (MeSH)
Why classification matters The foundations of bibliographic classification.
Some facets of knowledge management in mathematics Wolfram Sperber (Zentralblatt Math) Patrick Ion (Math Reviews) Facets of Knowledge Organization A tribute.
1 MeSH & Principles of Classification April 13, 2005.
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
Improving Access to Audio- Visual Materials by Using Genre/Form Terms OLAC Conference 1-3 October 2004 Montreal, Quebec.
1 Intra- and interdisciplinary cross- concordances for information retrieval Philipp Mayr GESIS – Leibniz Institute for the Social Sciences, Bonn, Germany.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
Lecture Four: Steps 3 and 4 INST 250/4.  Does one look for facts, or opinions, or both when conducting a literature search?  What is the difference.
Nancy Lawler U.S. Department of Defense ISO/IEC Part 2: Classification Schemes Metadata Registries — Part 2: Classification Schemes The revision.
The Library Cataloging Tradition Marty Kurth CS 431 February 9, 2005 [slides stolen from Diane Hillmann]
ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.
AAT Art & Architecture Thesaurus. Diffuse list of museum standards
Controlled Vocabulary Working Group Virtual Water Cooler Session April 6-7, 2009 Moderator: John Porter rm.action?confKey=jhp7e.
10/21/98Organization of Information in Collections Subject Access to Collections: Introduction University of California, Berkeley School of Information.
Definition of a taxonomy “System for naming and organizing things into groups that share similar characteristics” Taxonomy Architectures Applications.
The UNESCO Thesaurus Meeting for Managers of UNESCO Documentation Networks Meron Ewketu UNESCO Library June
Coastal Atlas Interoperability - Ontologies (Advanced topics that we did not get to in detail) Luis Bermudez Stephanie Watson Marine Metadata Interoperability.
Semantic Data & Ontologies CMPT 455/826 - Week 5, Day 2 Sept-Dec 2009 – w5d21.
MeSH The Medical Subject Headings from the National Library of Medicine.
ISO 25964: a standard in support of interoperability Stella G Dextre Clarke Project Leader, ISO NP
INFO Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.
Information Architecture & Design Week 5 Schedule -Planning IA Structures -Other Readings -Research Topic Presentations Nadalia your Presentations.
Topic Maps introduction Peter-Paul Kruijsen CTO, Morpheus software ISOC seminar, april 5 th 2005.
June 2003INIS Training Seminar1 INIS Training Seminar 2-6 June 2003 Subject Analysis Thesaurus and Indexing Alexander Nevyjel Subject Control Unit INIS.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Controlled Vocabulary & Thesaurus Design Associative Relationships & Thesauri.
PubMed …featuring more than 20 million citations for biomedical literature from MEDLINE, life science journals, and online books.
LIS 204: Introduction to Library and Information Science Week Nine Kevin Rioux, PhD.
ORGANIZATION OF ELEMENTS OF INFORMATION The Thesaurus.
8/28/97Information Organization and Retrieval Introduction University of California, Berkeley School of Information Management and Systems SIMS 245: Organization.
Subject Access to Your Information Sandy Tucker Texas A&M University Libraries August 1, 2006 Second International Symposium on Transportation Technology.
Charlyn P. Salcedo Instructor Types of Indexing Languages.
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
Part 3A-2: Document & Subject Analysis Documents Subjects Facets.
1 How do we describe something? n What something is about? –What the content of an object is “about”? n Different methods (Wilson, 1968) –counting terms.
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
Subject Analysis: An Introduction
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Information Organization
MeSH & Principles of Classification
Introduction to Semantic Metadata & Semantic Web
PubMed.
THESAURUS CONSTRUCTION: GROUND WATER
Presentation transcript:

Facetted Classification and Thesauri Introduction University of California, Berkeley School of Information IS 245: Organization of Information In Collections IS 257 – Fall 2009

Lecture Overview Facetted Classification Traditional vs. Facetted Classification Designing Facetted Classifications Thesaurus Design intro IS 257 – Fall 2009

Agenda Facetted Classification Traditional vs. Facetted Classification Designing Facetted Classifications Thesaurus Design IS 257 – Fall 2009

Controlled Vocabularies Vocabulary control is the attempt to provide a standardized and consistent set of terms (such as subject headings, names, classifications, etc.) with the intent of aiding the searcher in finding information That is, it is an attempt to provide a consistent set of descriptions for use in (or as) metadata IS 257 – Fall 2009

Hierarchical Classification Each category is successively broken down into smaller and smaller subdivisions No item occurs in more than one subdivision Each level divided out by a “character of division” (also known as a feature) Example: Distinguish “Literature” based on: Language Genre Time Period Slide author: Marti Hearst IS 257 – Fall 2009

Hierarchical Classification Literature Spanish French English Drama Poetry Prose 18th 17th 16th 19th ... Slide author: Marti Hearst IS 257 – Fall 2009

Labeled Categories for Hierarchical Classification LITERATURE 100 English Literature 110 English Prose English Prose 16th Century English Prose 17th Century English Prose 18th Century ... 111 English Poetry 121 English Poetry 16th Century 122 English Poetry 17th Century 112 English Drama 130 English Drama 16th Century … 200 French Literature Slide author: Marti Hearst IS 257 – Fall 2009

Facetted Categories Mutually exclusive Relational Composable Non-overlapping, distinct categories Relational Relations between facets, subfacets, and foci (elements) are not restricted to hierarchical generalization-specialization relations Composable Combined using grammars of order and relation to form compound descriptions IS 257 – Fall 2009

Facetted Classification Along With Labeled Categories A Language a English b French c Spanish B Genre a Prose b Poetry c Drama C Period a 16th Century b 17th Century c 18th Century d 19th Century Aa English Literature AaBa English Prose AaBaCa English Prose 16th Century AbBbCd French Poetry 19th Century BbCd Drama 19th Century Slide author: Marti Hearst IS 257 – Fall 2009

Ranganathan PMEST Facets P(ersonality) M(atter) E(nergy) S(pace) WHO: The most important types or names of things for the particular discipline M(atter) WHAT: Constituent materials E(nergy) HOW: Action or activity terms S(pace) WHERE: Where things occur T(ime) WHEN: When things occur IS 257 – Fall 2009

“Classical” CRG/BC2 Facet Analysis Entity Kind Part Property Material Process Operation Patient Product By-Product Agent Space Time IS 257 – Fall 2009

“Classical” Facet Analysis What is being done? Entity Kind Product By-Product What are its parts? Part What are its properties? Property Material How is this achieved? Process By what means? Operation By whom? Agent Patient Where? Space When? Time IS 257 – Fall 2009

“Classical” Facet Analysis Nouns Entity Kind Part Patient Product By-Product Agent Adjectives Property Material Intransitive Verb Process Transitive Verb Operation Adverb Space Time IS 257 – Fall 2009

Semantic and Syntactic Relationships Semantic relationships Is-A (thing/kind, genus/species) Mammals Primates Humans Has-Parts Human Head Eyes Syntactic relationships Compounds Wheat + harvesting = “wheat harvesting” Object + operation = operation on object IS 257 – Fall 2009

Facetted Classification Clearly distinguishes between semantic relationships and syntactic relationships Semantic relationships Within a facet Containment relations Syntactic relationships Across facets Combinatoric relations Have a “syntax” for syntactic combination of semantic terms IS 257 – Fall 2009

Power of Facet Combinations The syntactic relations of facetted classifications enable a small controlled vocabulary to produce Many, many structured descriptions Complex, but formally structured descriptions using nested compound descriptions Descriptions for things we do not have words for IS 257 – Fall 2009

Example: Objects Red Plastic Glass Blue Paper Straw IS 257 – Fall 2009

IS202 Project Team Facetted Classifications (2004) 007 Personality Straw Glass Operation Drinking Slurping Sipping Material Plastic Paper Color Blue Red ARTery Color Size Material Weight Shape Radius/Circumference Density Volume/Capacity Function/Use Hardness/Softness Yin/Yang IS 257 – Fall 2009

IS202 Project Team Facetted Classifications (2004) Culture Feed Color Red Blue Material Plastic Paper Use Drink from Drink with Dimensions Circumference Height Diameter Picture Portal Color Red Blue Material Paper Plastic Use Containment Transport Shape Torus Planar # Holes 1 IS 257 – Fall 2009

IS202 Project Team Facetted Classifications (2004) F.U.N. Shape Color Material Rigidity Function Container Conduit Locale Weight Size MNM Functionality What it does What you can do with it Physical Properties Color Shape Material IS 257 – Fall 2009

IS202 Project Team Facetted Classifications (2004) pillBox Function Container Conduit Form Shape Cylinder Composition Paper Plastic Color Blue Red Size Tall and skinny Short and fat Team iTour Color Red Blue State Solid Non-porous Flexible Material Plastic Paper Geometry Cylindrical Hollow Function Container Drinking Sucking Blowing IS 257 – Fall 2009

Two Yellow Plastic Straws Example: Objects Gray Metal Glass Two Yellow Plastic Straws IS 257 – Fall 2009

Example: Objects Function Form Function: Drinking Form Shape: Cylinder Material Color Number Function: Drinking Form Shape: Cylinder Material: Plastic Color: Red Number: 1 IS 257 – Fall 2009

Agenda Facetted Classification Traditional vs. Facetted Classification Designing Facetted Classifications Thesaurus Design IS 257 – Fall 2009

Facetted Classification Design Collect examples that need to be classified Identify candidates for facets and subfacets Test classification scheme on examples for facet orthogonality Order foci within facets Explicate grammar for ordering and combining facets and subfacets Test classification scheme on examples for combinatoric power Extend foci for comprehensiveness where applicable Create new facets and subfacets where needed Test classification scheme on new examples, especially boundary cases Iterate and refine throughout IS 257 – Fall 2009

Facet Guidelines Terms on the same level in the ontology should be of the same level and type Facets, subfacets, and foci should have a discernible order Use of capitalization and singular/plural forms should be uniform Sports Team Sports Baseball Football Basketball Solo Sports Marathon Running Sports Team Sports Baseball Football Basketball Solo Sports Marathon Running IS 257 – Fall 2009

Ordering Foci (“Array”) Simple to complex (Locomotions: walk, run, jump, skip, hurdle, cartwheel) Common/popular to uncommon/unpopular (Vegetarian Pizza Toppings: mushroom, onion, olive, artichoke, pineapple, pine nuts) Spatial, geographical, or geometric (Southwestern States: California, Nevada, Arizona, New Mexico ) Chronological, historical, or evolutionary (Dinosaur Eras: Triassic, Jurassic, Cretaceous) Canonical (pre-established order) (Playground Counting: Eenie, Meenie, Mynee, Mo) Alphabetical (Boy’s Names: Al, Bob, Chuck, David, Ed, Frank, George, Harry) Size (T-Shirts: Small, Medium, Large, XL, XXL) IS 257 – Fall 2009

Agenda Facetted Classification Traditional vs. Facetted Classification Designing Facetted Classifications Thesaurus Design (intro) IS 257 – Fall 2009

Types of Indexing Languages Uncontrolled keyword indexing Indexing languages Controlled, but not structured Thesauri Controlled and structured Classification systems Controlled, structured, and coded Facetted classification systems IS 257 – Fall 2009

Thesauri A Thesaurus is a collection of selected vocabulary (preferred terms or descriptors) with links among synonymous, equivalent, broader, narrower and other related terms IS 257 – Fall 2009

Thesaurus Standards National and International Standards for Thesauri ANSI/NISO z39.19-1994 — American National Standard Guidelines for the Construction, Format and Management of Monolingual Thesauri ANSI/NISO Draft Standard Z39.4-199x — American National Standard Guidelines for Indexes in Information Retrieval ISO 2788 — Documentation — Guidelines for the establishment and development of monolingual thesauri ISO 5964 — Documentation — Guidelines for the establishment and development of multilingual thesauri IS 257 – Fall 2009

Thesaurus Examples Examples Non-Facetted Semi-Facetted Facetted The ERIC Thesaurus of Descriptors Semi-Facetted The Medical Subject Headings (MESH) of the National Library of Medicine Facetted The Art and Architecture Thesaurus IS 257 – Fall 2009

ERIC Thesaurus – Entry IS 257 – Fall 2009

ERIC Thesaurus – Alphabetic IS 257 – Fall 2009

ERIC Thesaurus – KWIC Index IS 257 – Fall 2009

ERIC Thesaurus – Hierarchies IS 257 – Fall 2009

ERIC Thesaurus – Groups IS 257 – Fall 2009

ERIC Thesaurus – Online http://www.ericfacility.net/extra/pub/thessearch.cfm IS 257 – Fall 2009

MESH – Entry IS 257 – Fall 2009

MESH – Alphabetic IS 257 – Fall 2009

MESH – Tree Structures IS 257 – Fall 2009

MESH – KWOC Index IS 257 – Fall 2009

MESH - Online http://www.nlm.nih.gov/mesh/meshhome.html IS 257 – Fall 2009

AAT – Facets IS 257 – Fall 2009

AAT – Hierarchies (print) IS 257 – Fall 2009

AAT – Hierarchies (online) http://www.getty.edu/research/tools/vocabulary/aat/ IS 257 – Fall 2009

AAT – Entry (online) IS 257 – Fall 2009

Lecture Overview Thesaurus Design and Development Controlled Vocabularies for topical description Thesaurus Design Steps In Thesaurus Development (intro) IS 257 – Fall 2009

Why Develop a Thesaurus? To provide a conceptual structure or “space” for a body of information To make it possible to adequately describe the topical content of information resources at an appropriate level of generality or specificity To provide enhanced search capabilities and to improve the effectiveness of searching (i.e., to retrieve most of the relevant material without too much irrelevant material) IS 257 – Fall 2009

Why Develop a Thesaurus? To provide vocabulary (or terminological) control When there are several possible terms designating a single concept, the thesaurus should lead the indexer or searcher to the appropriate concept, regardless of the terms they start with IS 257 – Fall 2009

Preliminary Considerations What is used now? Continue using an existing thesaurus? Ad hoc modification of existing thesaurus? Develop a new well-structured thesaurus? What is the scope and complexity of the subject field? What kind of retrieval objects or data will be dealt with? How exhaustive and specific is the desired description of objects? IS 257 – Fall 2009

Preliminary Considerations The scope and complexity of the field will provide some indication of the scope and complexity of the thesaurus It is better to plan for a larger and more comprehensive system than a smaller system that rapidly will become inadequate as the database grows Development of a good thesaurus requires a major intellectual effort as well as clerical operations like data entry and production of sorted lists IS 257 – Fall 2009

Development of a Thesaurus Term selection Merging and development of concept classes Definition of broad subject fields and subfields Development of classificatory structure Review, testing, application, revision IS 257 – Fall 2009

Flow of Work in Thesaurus Construction Select Sources Assign codes Select Terms Record Selected Terms Sort Terms Merge identical Terms Define Broad Subject Fields Merge Terms in Same Concept class Sort Terms into Broad Subject Fields Define Subfields within one Subject Field Work out detailed structure of the Subject Field Select Preferred Terms All Subfields of Broad Subject finished? All Broad Subjects finished? Improve Class Structure Yes No Print Classified Index and review Discuss with Experts and Users Select descriptors and checklist items Produce Full Thesaurus and Check references Assign Notation Review and Test Many Modifications? Based on Soergel, pp 327-333 Revise as needed IS 257 – Fall 2009