Psychology of Category Structure Facets vs. Hierarchies SIMS 202 Profs. Hearst & Larson UC Berkeley SIMS Fall 2000
Last Time l Symbols and Language l Lexical Relations
Major Lexical Relations l WordNet classifies lexical relations –Synonymy, Polysemy, Metonymy, Hyponymy/Hyperonymy, Meronymy, Antonymy l These are important properties of words l Which of these apply to concepts?
Today l Psychology of Categorization l How to combine attributes to categorize information –Subject Headings vs. Descriptors –Hierarchies vs. Facets
Psychology of Categorization
Category Structure l Defining Category Membership –Necessary and Sufficient Conditions –Properties of Categorization »Characteristic Features »Centrality/Typicality »Basic Level Categories
Defining Category Membership l Necessary and Sufficient Conditions: –Every condition must be met. –No other conditions can be required. »Example: A prime number: l An integer divisible only by itself and 1. Source: Webster's Revised Unabridged Dictionary, © 1996, 1998 MICRA, Inc. »Example: mother l A woman who has given birth to a child.
Can category membership be defined? What are the necessary and sufficient conditions for something to be a game?
Definition of Game l Famous example by Wittgenstein –Classic categories assume clear boundaries defined by common properties (necessary and sufficient conditions) l Counterexample: “Game” –No common properties shared by all games »card games, ball games, Olympic games, children’s games »competition: ring-around-the-rosie »skill: dice games »luck: chess –No fixed boundary; can be extended to new games »video games l Alternative: Concepts related by Family Resemblances
Properties of Categorization l Family Resemblance –Members of a category may be related to one another without all members having any property in common. »Instead, they may share a large subset of traits. »Some attributes are more likely given that others have been seen. –Example: feathers, wings, twittering,... »Likely to be a bird, but not all features apply to “emu” »Unlikely to see an association with “barks”
Properties of Categorization l Centrality –Example: Prime Numbers »Definition: An integer divisible only by itself and 1 »Examples: 1, 2, 3, 5, 7, 11, 13, 17, … –A very clear-cut category. Or is it? »Can one number be “more prime” than another? –Centrality: some members of a category may be “better examples” than others. »Example: robins vs. chickens vs. emus
Properties of Categorization l Characteristic Features –Perceived degree of category membership has to do with which features define the category. –Members usually do not have ALL the necessary features, but have some subset. –Those members that have more of the central features are seen as more central members. –People have conceptions of typical members.
Testing for Centrality/Typicality l Ask a series of questions, compare how long it takes people to answer. –True or false: »An apple is a fruit. »A plum is a fruit. »A coconut is a fruit. »An olive is a fruit. »A tomato is a fruit. –Rosch and Mervis: »The more features a fruit shares with the other fruits, the more typical a member of the class it is.
Characteristic Features –Is a cat on a mat a cat? –Is a dead cat a cat? –Is a photo of a cat a cat? –Is a cat with three legs a cat? –Is a cat that barks a cat? –Is a cat with a dog’s brain a cat? –Is a cat with every cell replaced by a dog’s cells a cat?
Properties of Categorization l Basic-level Categories: –Categories are organized into a hierarchy from the most general to the most specific, but the level that is most cognitively basic is “in the middle” of the hierarchy l Basic-level Primacy: –Basic-level categories are functionally primary with respect to factors including ease of cognitive processing (learning, reasoning, recognition, etc).
Basic Level Categories l Brown 1958, 65, Berlin et al., 1972, 73 l Folk biology: –unique beginner: plant, animal –life form: tree, bush, flower –generic name: pine, oak, maple, elm –specific name: Ponderosa pine, white pine –varietal name: western Ponderosa pine l No overlap between levels l Level 3 is basic –corresponds to genus
Characteristics of Basic-level Categories Language –People name things more readily at basic level. –Name learned earliest in childhood. –Languages have simpler names at basic level. –Sounds like the “real name”. –Name used more frequently. »Strange to call a dime a coin, a metal object –Names used in neutral context. »There’s a dog on the porch. »There’s a terrier on the porch.
Characteristics of Basic-level Categories Concepts –Things perceived more holistically at the basic level (rather than by parts). –People interact with basic and more specific levels similarly. –Things are remembered more readily at basic level. –Folk biological categories correspond accurately to scientific biological categories only at the basic level.
Three Psychologically Primary Levels SUPERORDINATE animal furniture BASIC LEVEL dog chair SUBORDINATE terrier rocker l Children take longer to learn superordinate l Superordinate not associated with mental images or motor actions l How related to –Hyponymy –Hyperonymy
Categories vs. Words l Necessary and Sufficient conditions for Mother? »mother(A,B) -> female(A), gave-birth-to(A,B), same- species(A,B), …, l What about: »Birth mother vs. adoptive mother »Rearing role vs. biological role »Surrogate mother »Cloning l Need to distinguish between the word used and the underlying concept(s) it stands for.
Summary –Processes of categorization underlie many of the issues having to do with information organization –Categorization is messier than our computer systems would like –Human categories have graded membership, consisting of family resemblances. »Family resemblance is expressed in part by which subset of features are shared »It is also determined by underlying understandings of the world that do not get represented in most systems –Basic level categories, as well as subordinate and superordinate categories, seem to be cognitively real.
Hierarchical vs. Faceted (Subject Heading vs. Descriptor) Category Systems
Controlled Vocabulary (The following slides follow Bates 88) l Start with the text of the document l Attempt to “control” or regularize: –The concepts expressed within »mutually exclusive »exhaustive –The language used to express those concepts »limit the normal linguistic variations »regulate word order and structure of phrases »reduce the number of synonyms or near-synonyms l Also, provide cross-references between concepts and their expression.
Classification Schemes l Classify possible concepts. l Goals: –Completely distinct conceptual categories (mutually exclusive) –Complete coverage of conceptual categories (exhaustive)
Assigning Headings vs. Descriptors l Subject headings –assign one (or a few) complex heading(s) to the document l Descriptors –Mix and match How would we describe recipes using each technique?
Subject Heading vs. Descriptor WILSONLINE –Athletes –Athletes-- Heath&Hygiene –Athletes--Nutrition –Athletes--Physical Exams –… –Athletics –Athletics -- Administration –Athletics -- Equipment - - Catalogs –… –Sports -- Accidents and injuries –Sports -- Accidents and injuries -- prevention ERIC –Athletes –Athletic Coaches –Athletic Equipment –Athletic Fields –Athletics –… –Sports psychology –Sportsmanship
Subject Headings vs. Descriptors l Describe the contents of an entire document l Designed to be looked up in an alphabetical index –Look up document under its heading l Few (1-5) headings per document l Describe one concept within a document l Designed to be used in Boolean searching –Combine to describe the desired document l Many (5-25) descriptors per document
Assigning Headings vs. Descriptors How would we create a cookbook using each technique?
Hierarchical Classification –Each category is successively broken down into smaller and smaller subdivisions –No item occurs in more than one subdivision –Each level divided out by a “character of division”. Also known as a feature. »Example: distinguish Literature based on: l Language l Genre l Time Period
Hierarchical Classification Literature SpanishFrenchEnglish DramaPoetryProse 18th17th16th DramaPoetryProse 19th18th17th16th19th...
Labeled Categories for Hierarchical Classification l LITERATURE –100 English Literature »110 English Prose l English Prose 16th Century l English Prose 17th Century l English Prose 18th Century l... »111 English Poetry l 121 English Poetry 16th Century l 122 English Poetry 17th Century l... »112 English Drama l 130 English Drama 16th Century l … –200 French Literature
Faceted Classification l Create a separate, free-standing list for each characteristic of division (feature). l Combine features to create a classification.
Faceted Classification along with Labeled Categories l A Language –a English –b French –c Spanish l B Genre –a Prose –b Poetry –c Drama l C Period –a 16th Century –b 17th Century –c 18th Century –d 19th Century l Aa English Literature l AaBa English Prose l AaBaCa English Prose 16th Century l AbBbCd French Poetry 19th Century l BbCd Drama 19th Century
Important Question: How to use both types of classification structures? l How to look through them? l How to use them in search?