Download presentation
1
Facetted Classification and Thesauri Introduction
University of California, Berkeley School of Information IS 245: Organization of Information In Collections IS 257 – Fall 2009
2
Lecture Overview Facetted Classification
Traditional vs. Facetted Classification Designing Facetted Classifications Thesaurus Design intro IS 257 – Fall 2009
3
Agenda Facetted Classification Traditional vs. Facetted Classification
Designing Facetted Classifications Thesaurus Design IS 257 – Fall 2009
4
Controlled Vocabularies
Vocabulary control is the attempt to provide a standardized and consistent set of terms (such as subject headings, names, classifications, etc.) with the intent of aiding the searcher in finding information That is, it is an attempt to provide a consistent set of descriptions for use in (or as) metadata IS 257 – Fall 2009
5
Hierarchical Classification
Each category is successively broken down into smaller and smaller subdivisions No item occurs in more than one subdivision Each level divided out by a “character of division” (also known as a feature) Example: Distinguish “Literature” based on: Language Genre Time Period Slide author: Marti Hearst IS 257 – Fall 2009
6
Hierarchical Classification
Literature Spanish French English Drama Poetry Prose 18th 17th 16th 19th ... Slide author: Marti Hearst IS 257 – Fall 2009
7
Labeled Categories for Hierarchical Classification
LITERATURE 100 English Literature 110 English Prose English Prose 16th Century English Prose 17th Century English Prose 18th Century ... 111 English Poetry 121 English Poetry 16th Century 122 English Poetry 17th Century 112 English Drama 130 English Drama 16th Century … 200 French Literature Slide author: Marti Hearst IS 257 – Fall 2009
8
Facetted Categories Mutually exclusive Relational Composable
Non-overlapping, distinct categories Relational Relations between facets, subfacets, and foci (elements) are not restricted to hierarchical generalization-specialization relations Composable Combined using grammars of order and relation to form compound descriptions IS 257 – Fall 2009
9
Facetted Classification Along With Labeled Categories
A Language a English b French c Spanish B Genre a Prose b Poetry c Drama C Period a 16th Century b 17th Century c 18th Century d 19th Century Aa English Literature AaBa English Prose AaBaCa English Prose 16th Century AbBbCd French Poetry 19th Century BbCd Drama 19th Century Slide author: Marti Hearst IS 257 – Fall 2009
10
Ranganathan PMEST Facets P(ersonality) M(atter) E(nergy) S(pace)
WHO: The most important types or names of things for the particular discipline M(atter) WHAT: Constituent materials E(nergy) HOW: Action or activity terms S(pace) WHERE: Where things occur T(ime) WHEN: When things occur IS 257 – Fall 2009
11
“Classical” CRG/BC2 Facet Analysis
Entity Kind Part Property Material Process Operation Patient Product By-Product Agent Space Time IS 257 – Fall 2009
12
“Classical” Facet Analysis
What is being done? Entity Kind Product By-Product What are its parts? Part What are its properties? Property Material How is this achieved? Process By what means? Operation By whom? Agent Patient Where? Space When? Time IS 257 – Fall 2009
13
“Classical” Facet Analysis
Nouns Entity Kind Part Patient Product By-Product Agent Adjectives Property Material Intransitive Verb Process Transitive Verb Operation Adverb Space Time IS 257 – Fall 2009
14
Semantic and Syntactic Relationships
Semantic relationships Is-A (thing/kind, genus/species) Mammals Primates Humans Has-Parts Human Head Eyes Syntactic relationships Compounds Wheat + harvesting = “wheat harvesting” Object + operation = operation on object IS 257 – Fall 2009
15
Facetted Classification
Clearly distinguishes between semantic relationships and syntactic relationships Semantic relationships Within a facet Containment relations Syntactic relationships Across facets Combinatoric relations Have a “syntax” for syntactic combination of semantic terms IS 257 – Fall 2009
16
Power of Facet Combinations
The syntactic relations of facetted classifications enable a small controlled vocabulary to produce Many, many structured descriptions Complex, but formally structured descriptions using nested compound descriptions Descriptions for things we do not have words for IS 257 – Fall 2009
17
Example: Objects Red Plastic Glass Blue Paper Straw IS 257 – Fall 2009
18
IS202 Project Team Facetted Classifications (2004)
007 Personality Straw Glass Operation Drinking Slurping Sipping Material Plastic Paper Color Blue Red ARTery Color Size Material Weight Shape Radius/Circumference Density Volume/Capacity Function/Use Hardness/Softness Yin/Yang IS 257 – Fall 2009
19
IS202 Project Team Facetted Classifications (2004)
Culture Feed Color Red Blue Material Plastic Paper Use Drink from Drink with Dimensions Circumference Height Diameter Picture Portal Color Red Blue Material Paper Plastic Use Containment Transport Shape Torus Planar # Holes 1 IS 257 – Fall 2009
20
IS202 Project Team Facetted Classifications (2004)
F.U.N. Shape Color Material Rigidity Function Container Conduit Locale Weight Size MNM Functionality What it does What you can do with it Physical Properties Color Shape Material IS 257 – Fall 2009
21
IS202 Project Team Facetted Classifications (2004)
pillBox Function Container Conduit Form Shape Cylinder Composition Paper Plastic Color Blue Red Size Tall and skinny Short and fat Team iTour Color Red Blue State Solid Non-porous Flexible Material Plastic Paper Geometry Cylindrical Hollow Function Container Drinking Sucking Blowing IS 257 – Fall 2009
22
Two Yellow Plastic Straws
Example: Objects Gray Metal Glass Two Yellow Plastic Straws IS 257 – Fall 2009
23
Example: Objects Function Form Function: Drinking Form Shape: Cylinder
Material Color Number Function: Drinking Form Shape: Cylinder Material: Plastic Color: Red Number: 1 IS 257 – Fall 2009
24
Agenda Facetted Classification Traditional vs. Facetted Classification
Designing Facetted Classifications Thesaurus Design IS 257 – Fall 2009
25
Facetted Classification Design
Collect examples that need to be classified Identify candidates for facets and subfacets Test classification scheme on examples for facet orthogonality Order foci within facets Explicate grammar for ordering and combining facets and subfacets Test classification scheme on examples for combinatoric power Extend foci for comprehensiveness where applicable Create new facets and subfacets where needed Test classification scheme on new examples, especially boundary cases Iterate and refine throughout IS 257 – Fall 2009
26
Facet Guidelines Terms on the same level in the ontology should be of the same level and type Facets, subfacets, and foci should have a discernible order Use of capitalization and singular/plural forms should be uniform Sports Team Sports Baseball Football Basketball Solo Sports Marathon Running Sports Team Sports Baseball Football Basketball Solo Sports Marathon Running IS 257 – Fall 2009
27
Ordering Foci (“Array”)
Simple to complex (Locomotions: walk, run, jump, skip, hurdle, cartwheel) Common/popular to uncommon/unpopular (Vegetarian Pizza Toppings: mushroom, onion, olive, artichoke, pineapple, pine nuts) Spatial, geographical, or geometric (Southwestern States: California, Nevada, Arizona, New Mexico ) Chronological, historical, or evolutionary (Dinosaur Eras: Triassic, Jurassic, Cretaceous) Canonical (pre-established order) (Playground Counting: Eenie, Meenie, Mynee, Mo) Alphabetical (Boy’s Names: Al, Bob, Chuck, David, Ed, Frank, George, Harry) Size (T-Shirts: Small, Medium, Large, XL, XXL) IS 257 – Fall 2009
28
Agenda Facetted Classification Traditional vs. Facetted Classification
Designing Facetted Classifications Thesaurus Design (intro) IS 257 – Fall 2009
29
Types of Indexing Languages
Uncontrolled keyword indexing Indexing languages Controlled, but not structured Thesauri Controlled and structured Classification systems Controlled, structured, and coded Facetted classification systems IS 257 – Fall 2009
30
Thesauri A Thesaurus is a collection of selected vocabulary (preferred terms or descriptors) with links among synonymous, equivalent, broader, narrower and other related terms IS 257 – Fall 2009
31
Thesaurus Standards National and International Standards for Thesauri
ANSI/NISO z — American National Standard Guidelines for the Construction, Format and Management of Monolingual Thesauri ANSI/NISO Draft Standard Z x — American National Standard Guidelines for Indexes in Information Retrieval ISO 2788 — Documentation — Guidelines for the establishment and development of monolingual thesauri ISO 5964 — Documentation — Guidelines for the establishment and development of multilingual thesauri IS 257 – Fall 2009
32
Thesaurus Examples Examples Non-Facetted Semi-Facetted Facetted
The ERIC Thesaurus of Descriptors Semi-Facetted The Medical Subject Headings (MESH) of the National Library of Medicine Facetted The Art and Architecture Thesaurus IS 257 – Fall 2009
33
ERIC Thesaurus – Entry IS 257 – Fall 2009
34
ERIC Thesaurus – Alphabetic
IS 257 – Fall 2009
35
ERIC Thesaurus – KWIC Index
IS 257 – Fall 2009
36
ERIC Thesaurus – Hierarchies
IS 257 – Fall 2009
37
ERIC Thesaurus – Groups
IS 257 – Fall 2009
38
ERIC Thesaurus – Online
IS 257 – Fall 2009
39
MESH – Entry IS 257 – Fall 2009
40
MESH – Alphabetic IS 257 – Fall 2009
41
MESH – Tree Structures IS 257 – Fall 2009
42
MESH – KWOC Index IS 257 – Fall 2009
43
MESH - Online http://www.nlm.nih.gov/mesh/meshhome.html
IS 257 – Fall 2009
44
AAT – Facets IS 257 – Fall 2009
45
AAT – Hierarchies (print)
IS 257 – Fall 2009
46
AAT – Hierarchies (online)
IS 257 – Fall 2009
47
AAT – Entry (online) IS 257 – Fall 2009
48
Lecture Overview Thesaurus Design and Development
Controlled Vocabularies for topical description Thesaurus Design Steps In Thesaurus Development (intro) IS 257 – Fall 2009
49
Why Develop a Thesaurus?
To provide a conceptual structure or “space” for a body of information To make it possible to adequately describe the topical content of information resources at an appropriate level of generality or specificity To provide enhanced search capabilities and to improve the effectiveness of searching (i.e., to retrieve most of the relevant material without too much irrelevant material) IS 257 – Fall 2009
50
Why Develop a Thesaurus?
To provide vocabulary (or terminological) control When there are several possible terms designating a single concept, the thesaurus should lead the indexer or searcher to the appropriate concept, regardless of the terms they start with IS 257 – Fall 2009
51
Preliminary Considerations
What is used now? Continue using an existing thesaurus? Ad hoc modification of existing thesaurus? Develop a new well-structured thesaurus? What is the scope and complexity of the subject field? What kind of retrieval objects or data will be dealt with? How exhaustive and specific is the desired description of objects? IS 257 – Fall 2009
52
Preliminary Considerations
The scope and complexity of the field will provide some indication of the scope and complexity of the thesaurus It is better to plan for a larger and more comprehensive system than a smaller system that rapidly will become inadequate as the database grows Development of a good thesaurus requires a major intellectual effort as well as clerical operations like data entry and production of sorted lists IS 257 – Fall 2009
53
Development of a Thesaurus
Term selection Merging and development of concept classes Definition of broad subject fields and subfields Development of classificatory structure Review, testing, application, revision IS 257 – Fall 2009
54
Flow of Work in Thesaurus Construction
Select Sources Assign codes Select Terms Record Selected Terms Sort Terms Merge identical Terms Define Broad Subject Fields Merge Terms in Same Concept class Sort Terms into Broad Subject Fields Define Subfields within one Subject Field Work out detailed structure of the Subject Field Select Preferred Terms All Subfields of Broad Subject finished? All Broad Subjects finished? Improve Class Structure Yes No Print Classified Index and review Discuss with Experts and Users Select descriptors and checklist items Produce Full Thesaurus and Check references Assign Notation Review and Test Many Modifications? Based on Soergel, pp Revise as needed IS 257 – Fall 2009
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.