Presentation is loading. Please wait.

Presentation is loading. Please wait.

Facetted Classification and Thesauri Introduction

Similar presentations


Presentation on theme: "Facetted Classification and Thesauri Introduction"— Presentation transcript:

1 Facetted Classification and Thesauri Introduction
University of California, Berkeley School of Information IS 245: Organization of Information In Collections IS 257 – Fall 2009

2 Lecture Overview Facetted Classification
Traditional vs. Facetted Classification Designing Facetted Classifications Thesaurus Design intro IS 257 – Fall 2009

3 Agenda Facetted Classification Traditional vs. Facetted Classification
Designing Facetted Classifications Thesaurus Design IS 257 – Fall 2009

4 Controlled Vocabularies
Vocabulary control is the attempt to provide a standardized and consistent set of terms (such as subject headings, names, classifications, etc.) with the intent of aiding the searcher in finding information That is, it is an attempt to provide a consistent set of descriptions for use in (or as) metadata IS 257 – Fall 2009

5 Hierarchical Classification
Each category is successively broken down into smaller and smaller subdivisions No item occurs in more than one subdivision Each level divided out by a “character of division” (also known as a feature) Example: Distinguish “Literature” based on: Language Genre Time Period Slide author: Marti Hearst IS 257 – Fall 2009

6 Hierarchical Classification
Literature Spanish French English Drama Poetry Prose 18th 17th 16th 19th ... Slide author: Marti Hearst IS 257 – Fall 2009

7 Labeled Categories for Hierarchical Classification
LITERATURE 100 English Literature 110 English Prose English Prose 16th Century English Prose 17th Century English Prose 18th Century ... 111 English Poetry 121 English Poetry 16th Century 122 English Poetry 17th Century 112 English Drama 130 English Drama 16th Century 200 French Literature Slide author: Marti Hearst IS 257 – Fall 2009

8 Facetted Categories Mutually exclusive Relational Composable
Non-overlapping, distinct categories Relational Relations between facets, subfacets, and foci (elements) are not restricted to hierarchical generalization-specialization relations Composable Combined using grammars of order and relation to form compound descriptions IS 257 – Fall 2009

9 Facetted Classification Along With Labeled Categories
A Language a English b French c Spanish B Genre a Prose b Poetry c Drama C Period a 16th Century b 17th Century c 18th Century d 19th Century Aa English Literature AaBa English Prose AaBaCa English Prose 16th Century AbBbCd French Poetry 19th Century BbCd Drama 19th Century Slide author: Marti Hearst IS 257 – Fall 2009

10 Ranganathan PMEST Facets P(ersonality) M(atter) E(nergy) S(pace)
WHO: The most important types or names of things for the particular discipline M(atter) WHAT: Constituent materials E(nergy) HOW: Action or activity terms S(pace) WHERE: Where things occur T(ime) WHEN: When things occur IS 257 – Fall 2009

11 “Classical” CRG/BC2 Facet Analysis
Entity Kind Part Property Material Process Operation Patient Product By-Product Agent Space Time IS 257 – Fall 2009

12 “Classical” Facet Analysis
What is being done? Entity Kind Product By-Product What are its parts? Part What are its properties? Property Material How is this achieved? Process By what means? Operation By whom? Agent Patient Where? Space When? Time IS 257 – Fall 2009

13 “Classical” Facet Analysis
Nouns Entity Kind Part Patient Product By-Product Agent Adjectives Property Material Intransitive Verb Process Transitive Verb Operation Adverb Space Time IS 257 – Fall 2009

14 Semantic and Syntactic Relationships
Semantic relationships Is-A (thing/kind, genus/species) Mammals Primates Humans Has-Parts Human Head Eyes Syntactic relationships Compounds Wheat + harvesting = “wheat harvesting” Object + operation = operation on object IS 257 – Fall 2009

15 Facetted Classification
Clearly distinguishes between semantic relationships and syntactic relationships Semantic relationships Within a facet Containment relations Syntactic relationships Across facets Combinatoric relations Have a “syntax” for syntactic combination of semantic terms IS 257 – Fall 2009

16 Power of Facet Combinations
The syntactic relations of facetted classifications enable a small controlled vocabulary to produce Many, many structured descriptions Complex, but formally structured descriptions using nested compound descriptions Descriptions for things we do not have words for IS 257 – Fall 2009

17 Example: Objects Red Plastic Glass Blue Paper Straw IS 257 – Fall 2009

18 IS202 Project Team Facetted Classifications (2004)
007 Personality Straw Glass Operation Drinking Slurping Sipping Material Plastic Paper Color Blue Red ARTery Color Size Material Weight Shape Radius/Circumference Density Volume/Capacity Function/Use Hardness/Softness Yin/Yang IS 257 – Fall 2009

19 IS202 Project Team Facetted Classifications (2004)
Culture Feed Color Red Blue Material Plastic Paper Use Drink from Drink with Dimensions Circumference Height Diameter Picture Portal Color Red Blue Material Paper Plastic Use Containment Transport Shape Torus Planar # Holes 1 IS 257 – Fall 2009

20 IS202 Project Team Facetted Classifications (2004)
F.U.N. Shape Color Material Rigidity Function Container Conduit Locale Weight Size MNM Functionality What it does What you can do with it Physical Properties Color Shape Material IS 257 – Fall 2009

21 IS202 Project Team Facetted Classifications (2004)
pillBox Function Container Conduit Form Shape Cylinder Composition Paper Plastic Color Blue Red Size Tall and skinny Short and fat Team iTour Color Red Blue State Solid Non-porous Flexible Material Plastic Paper Geometry Cylindrical Hollow Function Container Drinking Sucking Blowing IS 257 – Fall 2009

22 Two Yellow Plastic Straws
Example: Objects Gray Metal Glass Two Yellow Plastic Straws IS 257 – Fall 2009

23 Example: Objects Function Form Function: Drinking Form Shape: Cylinder
Material Color Number Function: Drinking Form Shape: Cylinder Material: Plastic Color: Red Number: 1 IS 257 – Fall 2009

24 Agenda Facetted Classification Traditional vs. Facetted Classification
Designing Facetted Classifications Thesaurus Design IS 257 – Fall 2009

25 Facetted Classification Design
Collect examples that need to be classified Identify candidates for facets and subfacets Test classification scheme on examples for facet orthogonality Order foci within facets Explicate grammar for ordering and combining facets and subfacets Test classification scheme on examples for combinatoric power Extend foci for comprehensiveness where applicable Create new facets and subfacets where needed Test classification scheme on new examples, especially boundary cases Iterate and refine throughout IS 257 – Fall 2009

26 Facet Guidelines Terms on the same level in the ontology should be of the same level and type Facets, subfacets, and foci should have a discernible order Use of capitalization and singular/plural forms should be uniform Sports Team Sports Baseball Football Basketball Solo Sports Marathon Running Sports Team Sports Baseball Football Basketball Solo Sports Marathon Running IS 257 – Fall 2009

27 Ordering Foci (“Array”)
Simple to complex (Locomotions: walk, run, jump, skip, hurdle, cartwheel) Common/popular to uncommon/unpopular (Vegetarian Pizza Toppings: mushroom, onion, olive, artichoke, pineapple, pine nuts) Spatial, geographical, or geometric (Southwestern States: California, Nevada, Arizona, New Mexico ) Chronological, historical, or evolutionary (Dinosaur Eras: Triassic, Jurassic, Cretaceous) Canonical (pre-established order) (Playground Counting: Eenie, Meenie, Mynee, Mo) Alphabetical (Boy’s Names: Al, Bob, Chuck, David, Ed, Frank, George, Harry) Size (T-Shirts: Small, Medium, Large, XL, XXL) IS 257 – Fall 2009

28 Agenda Facetted Classification Traditional vs. Facetted Classification
Designing Facetted Classifications Thesaurus Design (intro) IS 257 – Fall 2009

29 Types of Indexing Languages
Uncontrolled keyword indexing Indexing languages Controlled, but not structured Thesauri Controlled and structured Classification systems Controlled, structured, and coded Facetted classification systems IS 257 – Fall 2009

30 Thesauri A Thesaurus is a collection of selected vocabulary (preferred terms or descriptors) with links among synonymous, equivalent, broader, narrower and other related terms IS 257 – Fall 2009

31 Thesaurus Standards National and International Standards for Thesauri
ANSI/NISO z — American National Standard Guidelines for the Construction, Format and Management of Monolingual Thesauri ANSI/NISO Draft Standard Z x — American National Standard Guidelines for Indexes in Information Retrieval ISO 2788 — Documentation — Guidelines for the establishment and development of monolingual thesauri ISO 5964 — Documentation — Guidelines for the establishment and development of multilingual thesauri IS 257 – Fall 2009

32 Thesaurus Examples Examples Non-Facetted Semi-Facetted Facetted
The ERIC Thesaurus of Descriptors Semi-Facetted The Medical Subject Headings (MESH) of the National Library of Medicine Facetted The Art and Architecture Thesaurus IS 257 – Fall 2009

33 ERIC Thesaurus – Entry IS 257 – Fall 2009

34 ERIC Thesaurus – Alphabetic
IS 257 – Fall 2009

35 ERIC Thesaurus – KWIC Index
IS 257 – Fall 2009

36 ERIC Thesaurus – Hierarchies
IS 257 – Fall 2009

37 ERIC Thesaurus – Groups
IS 257 – Fall 2009

38 ERIC Thesaurus – Online
IS 257 – Fall 2009

39 MESH – Entry IS 257 – Fall 2009

40 MESH – Alphabetic IS 257 – Fall 2009

41 MESH – Tree Structures IS 257 – Fall 2009

42 MESH – KWOC Index IS 257 – Fall 2009

43 MESH - Online http://www.nlm.nih.gov/mesh/meshhome.html
IS 257 – Fall 2009

44 AAT – Facets IS 257 – Fall 2009

45 AAT – Hierarchies (print)
IS 257 – Fall 2009

46 AAT – Hierarchies (online)
IS 257 – Fall 2009

47 AAT – Entry (online) IS 257 – Fall 2009

48 Lecture Overview Thesaurus Design and Development
Controlled Vocabularies for topical description Thesaurus Design Steps In Thesaurus Development (intro) IS 257 – Fall 2009

49 Why Develop a Thesaurus?
To provide a conceptual structure or “space” for a body of information To make it possible to adequately describe the topical content of information resources at an appropriate level of generality or specificity To provide enhanced search capabilities and to improve the effectiveness of searching (i.e., to retrieve most of the relevant material without too much irrelevant material) IS 257 – Fall 2009

50 Why Develop a Thesaurus?
To provide vocabulary (or terminological) control When there are several possible terms designating a single concept, the thesaurus should lead the indexer or searcher to the appropriate concept, regardless of the terms they start with IS 257 – Fall 2009

51 Preliminary Considerations
What is used now? Continue using an existing thesaurus? Ad hoc modification of existing thesaurus? Develop a new well-structured thesaurus? What is the scope and complexity of the subject field? What kind of retrieval objects or data will be dealt with? How exhaustive and specific is the desired description of objects? IS 257 – Fall 2009

52 Preliminary Considerations
The scope and complexity of the field will provide some indication of the scope and complexity of the thesaurus It is better to plan for a larger and more comprehensive system than a smaller system that rapidly will become inadequate as the database grows Development of a good thesaurus requires a major intellectual effort as well as clerical operations like data entry and production of sorted lists IS 257 – Fall 2009

53 Development of a Thesaurus
Term selection Merging and development of concept classes Definition of broad subject fields and subfields Development of classificatory structure Review, testing, application, revision IS 257 – Fall 2009

54 Flow of Work in Thesaurus Construction
Select Sources Assign codes Select Terms Record Selected Terms Sort Terms Merge identical Terms Define Broad Subject Fields Merge Terms in Same Concept class Sort Terms into Broad Subject Fields Define Subfields within one Subject Field Work out detailed structure of the Subject Field Select Preferred Terms All Subfields of Broad Subject finished? All Broad Subjects finished? Improve Class Structure Yes No Print Classified Index and review Discuss with Experts and Users Select descriptors and checklist items Produce Full Thesaurus and Check references Assign Notation Review and Test Many Modifications? Based on Soergel, pp Revise as needed IS 257 – Fall 2009


Download ppt "Facetted Classification and Thesauri Introduction"

Similar presentations


Ads by Google