Download presentation
Presentation is loading. Please wait.
Published byLaura Harrison Modified over 9 years ago
1
Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 4 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City, May 21, 2006
2
Copyright © 2006 Access Innovations, Inc. 2 Evaluating terms Do terms represent all necessary concepts? –Gap analysis Do terms capture necessary details? –Level of granularity Are terms understood by users? –Domain expert vs. common user
3
Copyright © 2006 Access Innovations, Inc. 3 Talk about terms Term format Grammatical issues Singular and plural forms Spelling Abbreviations and acronyms Capitalization Other punctuation Consistency
4
Copyright © 2006 Access Innovations, Inc. 4 Term format KISS – Keep it short and simple –1-2-3 words Effect on search Factoring, Postcoordination (coming) Grammatical issues –Nouns and noun phrases –Verbish things –Adjectives –Adverbs –Initial articles
5
Copyright © 2006 Access Innovations, Inc. 5 Most terms are nouns Nouns or simple noun phrases (phrase = compound or bound term) –Adj + Noun – Art history (ANSI/NISO standard) Noun + Prep + Noun – History of art (ISO standard) –Exceptions – Burden of proof, Coats of arms, Prisoners of war, Birds of prey, etc.
6
Copyright © 2006 Access Innovations, Inc. 6 Other parts of speech Verbs –Gerund form: Fishing Adjectives –Not used in isolation –Very rare (lots in Art & Architecture Thesaurus) –OK when combined with another term – Dental bridges Adverbs –No, except as part of proper name – Very Large Array Articles –No, except as part of proper name – El Salvador, Le Mans
7
Copyright © 2006 Access Innovations, Inc. 7 Singular and plural forms Plural form for count nouns –“how many” clouds, animals, highways Singular form for mass nouns –“how much” security, oxygen, rain Exceptions –Body parts in medicine singular (heart, foot) –Unique entities singular (Brooklyn Bridge) –User warrant plural/singular (fishes) stocks? fishes? monies?
8
Copyright © 2006 Access Innovations, Inc. 8 Term spelling Preferred spelling depends on audience –Multinational company may need alternative spellings in same taxonomy Use most widely accepted spelling Use secondary spelling as NonPreferred Term (synonym) Exception: –Proper names – Labour Party
9
Copyright © 2006 Access Innovations, Inc. 9 Abbreviations and acronyms Use only when full form is rarely seen – SCUBA, LASER, DNA, LASIK Use full form if abbreviation is not widely used and understood –Automated teller machines – for ATM –Driving while intoxicated – for DWI Alternative becomes NonPreferred Term Use and acceptance always shifting Be consistent
10
Copyright © 2006 Access Innovations, Inc. 10 CapitalizationCapitalization Standards: use all lower case –Exceptions: Initialisms – DNA Proper names – Queen Mary Trade names – Thesaurus Master™ Taxonomic names – Homo sapiens Much variation in practice
11
Copyright © 2006 Access Innovations, Inc. 11 ParenthesesParentheses Use only for –Parenthetical qualifiers to disambiguate homographs Bridges (Dentistry), Bridges (Roadways), Bridges (Music) –Different meanings for singular / plural word forms Bridges [all the above] vs. Bridge (Card game) Wood (Material) vs. Woods (Forest) Damage (Injury) vs. Damages (Law) –Facet indicators – Paint (by finish) –Part of the term – benzo(a)pyrene –Trademark indicator (tm) becomes ™
12
Copyright © 2006 Access Innovations, Inc. 12 HyphensHyphens Generally avoid -- nonfiction Use only if –Omitting the hyphen would be ambiguous cocitation vs. co-occurrence –The hyphen is part of the term n-body problem p-benzoquinone CD-ROM
13
Copyright © 2006 Access Innovations, Inc. 13 Other punctuation bits Apostrophes –Keep for possessive case Diacritical marks –Keep if possible – Québec Other random marks –Keep if part of a proper name – A&W Root Beer Standard & Poors
14
Copyright © 2006 Access Innovations, Inc. 14 Compound terms (aka bound terms) and factored terms Term consisting of more than one word that represents a single concept Keep compound term or factor out (split)?
15
Copyright © 2006 Access Innovations, Inc. 15 Compound terms are precoordinated Elements are bound together to specify a concept at the indexing stage Can’t change the parts Water pollution Library science Television influence on preschoolers Chicken dinner with turnips and rutabagas- no substitutions of menu items!
16
Copyright © 2006 Access Innovations, Inc. 16 Factored terms can be Postcoordinated Elements can be strung together to specify a concept at the search stage Elements can be mixed and combined as needed –Few clothing pieces several outfits The sum of the elements reflects the concept (usually)
17
Copyright © 2006 Access Innovations, Inc. 17 To factor or not to factor Is each factor a single concept? Is each factor in your thesaurus? If YES, break term down to factors: California highway construction California + Highways + Construction If NO, or if factoring would be confusing, retain the compound term Children’s television Television + Children ?? Science library Library + Science ??
18
Copyright © 2006 Access Innovations, Inc. 18 Precoordination positives User expectations – Rapid transit –Occurs commonly in data –Splitting would be odd –Reflects a single concept for the audience Better accuracy – captures specific concepts precisely Fewer false drops Term information is retained (Related Terms, NonPreferred Terms, Scope Notes, …)
19
Copyright © 2006 Access Innovations, Inc. 19 Precoordination negatives Poorer total recall Term proliferation –Combinations and permutations increase thesaurus size Higher cost Limited flexibility in expressing new concepts
20
Copyright © 2006 Access Innovations, Inc. 20 Postcoordination pros and cons Higher recall Lower cost Greater flexibility – enables expression of new concepts through novel combinations x Lower accuracy, some false drops –Library scienceNOT = Library + Science –Art museums NOT = Art + Museums Postcoordination is implicit in most online searches (implied AND between search words)
21
Copyright © 2006 Access Innovations, Inc. 21 About “and” Avoid “and” in terms – not a single concept Instead of: Children and television Factor and postcoordinate USE Media influence + Television + Children “and” OK when both elements are members of a broader class Vessels Ships and boats Your need for granularity may dictate your choice
22
Copyright © 2006 Access Innovations, Inc. 22 So far you’ve got Hierarchy Complete term records –Broader and Narrower Terms Polyhierarchies when needed –Preferred/NonPreferred Terms (equivalence relationships) –Related Terms (associative relationships) –Scope Notes –Correct term format –Compound terms when needed
23
Copyright © 2006 Access Innovations, Inc. 23 NotationNotation Symbols (numbers, letters, hyphens, colons…) –1: Apples 1.1: Granny Smith 1.2: Winesap Another kind of ordering (non-alphabetic) –Chronological, positional, numeric sequence, or other logical sequence for user group –Same terms presented differently –Different user groups, different purposes Adjunct to verbal expression of term Secondary to verbal concept organization
24
Copyright © 2006 Access Innovations, Inc. 24 Automatic taxonomy construction Words and phrases from documents Based on frequency and co-occurrence of words No semantic analysis Produces list of possible terms Requires editorial analysis –hierarchical and conceptual organization –association of related concepts –identifying and deduplicating equivalent concepts
25
Copyright © 2006 Access Innovations, Inc. 25 Review, edit, test, edit, use, edit, and maintain, i.e. edit Review –Users –Expert reviewers Test –Index 500+ documents (more for variable writing style; fewer for strict style) –Monitor search log Edit and maintain –Add term –Change existing term –Change term status –Delete term –Add term relationship –Delete term relationship –Add/modify Scope Note –Change overall structure Consider machine automated / assisted indexing software
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.