Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City, May.

Slides:



Advertisements
Similar presentations
Taxonomy as Content Outline, Site Map and Search Aid SLA NWR Vancouver October 6, 2006 Marjorie M.K. Hlava President
Advertisements

INFO624 - Week 2 Models of Information Retrieval Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.
Database Systems: Design, Implementation, and Management Tenth Edition
Module 5a: Authority Control and Encoding Schemes IMT530: Organization of Information Resources Winter 2007 Michael Crandall.
Application of Subdivisions June 22, 2003 ALA Annual Conference, Toronto.
Copyright Irwin/McGraw-Hill Data Modeling Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley.
Leveraging Your Taxonomy to Increase User Productivity MAIQuery and TM Navtree.
Text Operations: Preprocessing. Introduction Document preprocessing –to improve the precision of documents retrieved –lexical analysis, stopwords elimination,
Engineering Village ™ ® Basic Searching On Compendex ®
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved. Business and Administrative Communication SIXTH EDITION.
Knowing Semantic memory.
WMES3103 : INFORMATION RETRIEVAL
Using Metadata in CONTENTdm Diana Brooking and Allen Maberry Metadata Implementation Group, Univ. of Washington Crossing Organizational Boundaries Oct.
Thesaurus Design and Development
1 CS 502: Computing Methods for Digital Libraries Lecture 12 Information Retrieval II.
1 Languages for aboutness n Indexing languages: –Terminological tools Thesauri (CV – controlled vocabulary) Subject headings lists (CV) Authority files.
Knowledge organisation and information architecture, Nils Pharo Knowledge organisation and the Web Nils Pharo, 6th November 2002.
Sunday May 4 – 5 PM Bradford, Hlava, McNaughton
Vocabulary & languages in searching
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Why classification matters The foundations of bibliographic classification.
Educator’s Guide Using Instructables With Your Students.
Taxonomies: Hidden but Critical Tools Marjorie M.K. Hlava President Access Innovations, Inc.
Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 3 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City,
Controlled Vocabulary & Thesaurus Design Planning & Maintenance.
Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 4 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City,
Indexing Knowledge Daniel Vasicek 2014 March 27 Introduction Basic topic is : All Human Knowledge Who Cares? Simple Examples.
Exploring a topic in depth... From Reading to Writing The drama Antigone was written and performed 2,500 years ago in a society that was very different.
LIS510 lecture 3 Thomas Krichel information storage & retrieval this area is now more know as information retrieval when I dealt with it I.
Lecture Four: Steps 3 and 4 INST 250/4.  Does one look for facts, or opinions, or both when conducting a literature search?  What is the difference.
Controlled Vocabulary & Thesaurus Design Term Selection/Format & Synonyms.
Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 5 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City,
ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.
Are LCSH still effective? Why not use keyword searching instead? Presented by Carol Bradsher October 29, 2004.
Steps to Writing A Research Paper In MLA Format. Writing a Research Paper The key to writing a good research paper or documented essay is to leave yourself.
IL Step 2: Searching for Information Information Literacy 1.
AAT Art & Architecture Thesaurus. Diffuse list of museum standards
Controlled Vocabulary Working Group Virtual Water Cooler Session April 6-7, 2009 Moderator: John Porter rm.action?confKey=jhp7e.
Medline on OvidSP. Medline Facts Extensive MeSH thesaurus structure with many synonyms used in mapping and multidatabase searching with Embase Thesaurus.
Confidential 111 Financial Industry Business Ontology (FIBO) [FIBO– Business Entities] Understanding the Business Conceptual Ontology For FIBO-Business.
Definition of a taxonomy “System for naming and organizing things into groups that share similar characteristics” Taxonomy Architectures Applications.
Current Events and Issues Using Index Databases for Finding Answers.
DATABASES Southern Region CEO Wednesday 13 th October 2010.
Interdisciplinary Writing Unit: Narrative Kim Stewart READ 7140.
The UNESCO Thesaurus Meeting for Managers of UNESCO Documentation Networks Meron Ewketu UNESCO Library June
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 2 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City,
Semantic web course – Computer Engineering Department – Sharif Univ. of Technology – Fall Knowledge Representation Semantic Web - Fall 2005 Computer.
Controlled Vocabulary & Thesaurus Design Planning & Maintenance.
Controlled Vocabulary & Thesaurus Design Term Selection/Format & Synonyms.
Information Architecture & Design Week 5 Schedule -Planning IA Structures -Other Readings -Research Topic Presentations Nadalia your Presentations.
WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.
Consultative process for finalizing the Guidance Document to facilitate the implementation of the clearing-house mechanism regional and national nodes.
Information Retrieval
June 2003INIS Training Seminar1 INIS Training Seminar 2-6 June 2003 Subject Analysis Thesaurus and Indexing Alexander Nevyjel Subject Control Unit INIS.
Copyright (c) 2014 Pearson Education, Inc. Introduction to DBMS.
Controlled Vocabulary & Thesaurus Design Associative Relationships & Thesauri.
ORGANIZATION OF ELEMENTS OF INFORMATION The Thesaurus.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
Charlyn P. Salcedo Instructor Types of Indexing Languages.
SEPTEMBER 2015 Databases. Database (review) A database is a collection of data arranged for ease and speed of search and retrieval (The American Heritage.
1 Management Information Systems M Agung Ali Fikri, SE. MM.
1 How do we describe something? n What something is about? –What the content of an object is “about”? n Different methods (Wilson, 1968) –counting terms.
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
© 2017 by McGraw-Hill Education. This proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Taxonomies, Lexicons and Organizing Knowledge
IL Step 2: Searching for Information
THESAURUS CONSTRUCTION: GROUND WATER
Presentation transcript:

Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City, May 21, 2006

Copyright © 2006 Access Innovations, Inc. 2 So what’s a taxonomy? Words – controlled vocabulary Used as labels for indexing – descriptive metadata Attached to documents, digital objects, or physical objects Organized to aid retrieval – hierarchical structure –Hierarchical presentation of a thesaurus

Copyright © 2006 Access Innovations, Inc. 3 Perspectives on taxonomies Taxonomist (aka Lexicographer, Thesaurus builder) Information architect Indexer Searcher Each has a different view and need for words in retrieving information. Each need relates to using a taxonomy for indexing.

Copyright © 2006 Access Innovations, Inc. 4 Taxonomies for information retrieval online Conceptual framework for web content – reflects organization of knowledge in a domain Foundation for information architecture Often 3 levels deep – depends on domain May be hidden or displayed

Copyright © 2006 Access Innovations, Inc. 5 Info retrieval starts with a knowledge organization system Uncontrolled list Name authority file Synonym set/ring Controlled vocabulary Taxonomy Thesaurus Ontology Semantic network Not complex Highly complex LOTS OF OVERLAP!

Copyright © 2006 Access Innovations, Inc. 6 Structure of controlled vocabularies List of words Synonyms Taxonomy Thesaurus Ambiguity control Ambiguity control Ambiguity cont’l Synonym control Synonym control Synonym cont’l Hierarchical rel’s Hierarchical rel’s Associative rel’s INCREASING COMPLEXITY

Copyright © 2006 Access Innovations, Inc. 7 Controlled vocabulary construction standards ANSI (American National Standards Institute) NISO (National Information Standards Organization) ISO (International Standards Organization) BS (British Standards Institute) Differences are minor and diminishing. ANSI/NISO Z revision approved.

Copyright © 2006 Access Innovations, Inc. 8 Taxonomy defined – ANSI/NISO Z * “A controlled vocabulary consisting of preferred terms all of which are connected in a hierarchy or polyhierarchy.” controlled Missing: equivalence, homographic, and associative relationships and notes – features of a THESAURUS. * hierarchy

Copyright © 2006 Access Innovations, Inc. 9 Taxonomy as an organization system Controlled vocabulary Hierarchical format –Parent-child relationships Specific items appear as final leaves on hierarchy branches Common on websites –Pick list –Browsable directory –Other variations

Copyright © 2006 Access Innovations, Inc. 10 Thesaurus as an organization system Controlled vocabulary Focus on conceptual classes, not specifics Hierarchy – implicit if not displayed –Parent-child relationships Various display formats may be available Network of relationships between terms helps user to find information –Cousins, friends, aliases Scope notes, term history More elaborate and informative Long established standards

Copyright © 2006 Access Innovations, Inc. 11 Thesaurus defined – ANSI/NISO Z , “A controlled vocabulary of terms in natural language that are designed for postcoordination...” “Terms are arranged…so that various relationships are displayed clearly…” “The controlled vocabulary is established by information specialists or lexicographers and is generally employed in indexing.”

Copyright © 2006 Access Innovations, Inc. 12 Thesaurus defined – ANSI/NISO Z “A controlled vocabulary arranged in a known order in which equivalence, homographic, hierarchical, and associative relationships among terms are clearly displayed and identified by standardized relationship indicators, which must be employed reciprocally. Its purposes are to promote consistency in the indexing of content objects, especially for postcoordinated information storage and retrieval systems, and to facilitate browsing and searching by linking entry terms with terms. Thesauri may also facilitate the retrieval of content objects in free text searching.”

Copyright © 2006 Access Innovations, Inc. 13 Standards and pragmatism Standards are your friends –Lead to richer, more informative product –Promote interoperability -- Allow you to adopt or adapt other controlled vocabularies –Promote predictability –Allow repurposing within your organization and by other organizations Follow standards for taxonomy building –Incorporate authority files / final nodes as needed Your taxonomy or thesaurus must meet your needs

Copyright © 2006 Access Innovations, Inc. 14 Your taxonomy / thesaurus end product Reflects –scope of your concern –degree of precision you need Facilitates –data storage and retrieval by vocabulary control –discovery of ideas Promotes learning –preferred terminology –relationships among concepts –organized guide to your field

Copyright © 2006 Access Innovations, Inc. 15 Talk about terms and taxonomies How to choose terms How to ensure term clarity, avoid ambiguity –Vocabulary control—why and how How to format terms Terms within a taxonomy—the big picture

Copyright © 2006 Access Innovations, Inc. 16 How do you choose terms? Importance in the subject area Use in the literature, by the organization or community Necessary degree of specificity or detail Relationship with other controlled vocabularies

Copyright © 2006 Access Innovations, Inc. 17 Vocabulary control – why? “The need for vocabulary control arises from two basic features of natural language, namely: two or more words or terms can be used to represent a single concept, and two or more words that have the same spelling can represent different concepts.” ANSI/NISO Z

Copyright © 2006 Access Innovations, Inc. 18 Vocabulary control through disambiguation Synonyms – de-duplicate meanings Multiple words for the same concept –President of the United States, POTUS –Biological technology, Biotech Homographs (polysemes) – eliminate ambiguity Same written word used for multiple meanings –Balloon—which kind?, Box—which kind? –Cells, Mercury, Records, Bridge/Bridges, Bush

Copyright © 2006 Access Innovations, Inc. 19 Vocabulary control – how? Organize terms to show which of two or more synonymous terms is preferred or authorized for use to distinguish between homographs to indicate hierarchical and associative relationships among terms

Copyright © 2006 Access Innovations, Inc. 20 Vocabulary control – in practice Use unambiguous terms, clear to the user group Distinguish between terms that appear similar Use Scope Notes when necessary Use terms as elements that can be coordinated in a flexible manner Create compound terms (noun+modifier) when necessary

Copyright © 2006 Access Innovations, Inc. 21 One term / one concept “Terms in a thesaurus should represent simple or unitary concepts…” (ISO standard) “Each descriptor included in a thesaurus should represent a single concept (or unit of thought). …frequently expressed by a single-word term but in many cases a multiword term is required.” (ANSI/NISO Z )

Copyright © 2006 Access Innovations, Inc. 22 A “term” synonym ring Term Node Subject heading Category Descriptor

Copyright © 2006 Access Innovations, Inc. 23 So what’s a concept? “A unit of thought, formed by mentally combining some or all of the characteristics of a concrete or abstract, real or imaginary object. Concepts exist in the mind as abstract entities independent of terms used to express them.” Three main categories –Abstract concepts –Concrete entities –Proper nouns

Copyright © 2006 Access Innovations, Inc. 24 Concrete entities as terms Things and their physical parts –primates head –buildings floors Materials –cement –wood –lead

Copyright © 2006 Access Innovations, Inc. 25 Abstract concepts as terms Actions and events –evolution, skating, management, ceremonies Abstract entitites –law, theory Properties of things, materials, and actions –strength, efficiency Disciplines and sciences –physics, meteorology, mathematics Units of measurement –pounds, kilograms, miles, meters, nanoseconds

Copyright © 2006 Access Innovations, Inc. 26 Proper nouns as terms Individual entities – “classes of one” – expressed as proper nouns –San Francisco, Lake Michigan Thesaurus standards prefer to exclude proper names, persons, and trade names. Extensive lists  authority files. Taxonomies include them as final nodes.

Copyright © 2006 Access Innovations, Inc. 27 Pop quiz – which qualify as terms? rooms living rooms living room furniture “single unit of thought” schools public schools public school curricula marketing and advertising societal issues information ethics, plagiarism, credibility information literacy, lifelong learning

Copyright © 2006 Access Innovations, Inc. 28 The term record Main Term (MT) Top Term (TT) Broader Terms (BT) Narrower Terms (NT) Related Terms (RT) –See also (SA) Scope Note (SN) History (H) NonPreferred Term (NP) –Used for (UF), See (S) see Lexicographer’s lexicon = subject term, heading, node, category, descriptor, class TAXONOMY THESAURUS

Copyright © 2006 Access Innovations, Inc. 29 Build a taxonomy – simple steps Get paper and pencil –Sharpen pencil Define subject field Collect terms Organize terms Fill in gaps Flesh out and interrelate terms You’re done!

Copyright © 2006 Access Innovations, Inc. 30 Define subject field Review representative collection of content Determine: –Core areas –Peripheral topics Psychology Education Sociology Law Scope can be modified later

Copyright © 2006 Access Innovations, Inc. 31 Before you go on: Build or buy? Survey existing thesaurus/taxonomy resources for your domain Test for –Scope –Depth Make-or-break terms –Cost Don’t reinvent the wheel!

Copyright © 2006 Access Innovations, Inc. 32 Collect terms Your documents and databases Departmental terminology Text books and their indexes (indices) Book tables of contents and indexes Journal quarterly indexes Encyclopediae Lexicons, glossaries on the topic Web resources Users and experts Search logs

Copyright © 2006 Access Innovations, Inc. 33 Gather terms from search logs Beyond the Spider: The Accidental Thesaurus (Richard Wiggins, Information Today, Oct 2002) Top ~100 search terms from search logs Match to web site with appropriate answer Basis for favorites or best bets, presented at the top of results list. (AKA behavior-based taxonomy) Not a thesaurus or taxonomy, but still a useful source of terms.

Copyright © 2006 Access Innovations, Inc. 34 Organize terms – roughly Sort terms into several major categories – logical groups of similar concepts as Top Terms –Identify core areas and peripheral topics –10 – 20 to start –Consider moving proper names to authority files Result: loose collection of terms under several main headings –Rough and tentative – see how it fits as you go –Initial gap analysis –Add / modify / delete as needed

Copyright © 2006 Access Innovations, Inc. 35 Labelling a concept – cognitive linguistics Most-used labels are middle in range from abstract to specific --- relates to search Linguistic universal – true across cultures Unique beginner Life form Generic Specific Varietal Insurance Health insurance Group health insurance Practical application?

Copyright © 2006 Access Innovations, Inc. 36 Craft the Top Terms Toughest job and most important step! Dictates further organization Determines how browsers/searchers perceive the taxonomy –Coverage –Formality Establish the concept first, tweak the wording later

Copyright © 2006 Access Innovations, Inc. 37 Usefulness of a term – the “duh” factor Some terms are so basic for a domain that they have little or no value –“Sports” in Sports Illustrated –“Technology” in Technology Review –“Golf” in Golf Magazine How useful will the term be for indexing? –Apply to everything in the domain? –Distinguish important concepts? –If term is needed, specify limited use conditions in Scope Note

Copyright © 2006 Access Innovations, Inc. 38 Hierarchy structures – variations on a theme Not pre-determined –Wines  type  variety  region  cost –Or Wines  cost  type…. Varies by user group and needs –May have multiple views of same content –Standard alpha view or customized notation Affects information architecture, i.e. how web site functions

Copyright © 2006 Access Innovations, Inc. 39 How do terms relate? Hierarchical relationships -- Parents and their children Equivalence relationships -- Aliases Associative relationships -- Cousins TAXONOMY THESAURUS

Copyright © 2006 Access Innovations, Inc. 40 Hierarchical relationships Broader Term represents the category Narrower Term represents the specific Three types: –Generic relationship (BTG/NTG) –Whole-part relationship (BTP/NTP) –Instance relationship (BTI/NTI) BTs/NTs have a reciprocal relationship

Copyright © 2006 Access Innovations, Inc. 41 Broader to Narrower Terms Gubernatorial elections Politics Elections Presidential elections Mayoral elections Generic Specific Varietal

Copyright © 2006 Access Innovations, Inc. 42 Hierarchy – Generic (genus-species) relationship Inheritance or inclusion – what’s true of the parent (BT) is true for all children (NTs) Applies to entities, actions, properties, agents – not just biological taxonomies ValueTeachersThinking Cultural value Adult educators Contemplation Economic value School teachers Divergent thinking Moral value Special ed teachers Lateral thinking Social value Student teachers Reasoning

Copyright © 2006 Access Innovations, Inc. 43 Generic relationship test – 1 Both terms in same fundamental category “All-and-some” test SOMEALL SOMENOT ALL Rodents Squirrels Pests Squirrels

Copyright © 2006 Access Innovations, Inc. 44 Generic relationship test – 2 Pests Squirrels Rodents ALL squirrels are rodents x NOT ALL squirrels are pests x NOT ALL pests are rodents

Copyright © 2006 Access Innovations, Inc. 45 Hierarchy – Whole-part relationship Also known as meronymy or partonomy Four types allowed in thesaurus standards –Body systems and organs Ear  Middle ear –Geographical locations Bernalillo County  Albuquerque –Fields of study Geology  Physical geology –Hierarchical organizational/corporate/social/political structures Diocese  Parish

Copyright © 2006 Access Innovations, Inc. 46 Hierarchy – Instance relationship General category (common noun) = BT Individual example (proper noun) = NT SeasNew York museums Baltic Sea Guggenheim Museum Caspian Sea Museum of Modern Art Mediterranean Sea Museum of Natural History Essentially identical to “final node” in taxonomies. Best practice: long list  move to authority file

Copyright © 2006 Access Innovations, Inc. 47 Polyhierarchical relationship Term can logically fit under more than one Broader Term – can have Multiple Broader Terms (MBT) New to ANSI/NISO standards SpoonsForks Sporks NursesHealth administrators Nurse administrators FinanceCareers Accounting

Copyright © 2006 Access Innovations, Inc. 48 Equivalence relationship Preferred Term –Thesaurus term and valid for indexing –Thesaurus notation: USE NonPreferred Term –Not valid for indexing –An alias or imposter –Entry point, directs user to Preferred Term –Thesaurus notation: UF or NPT SpidersPlant pathology UF Arachnids USE Phytopathology

Copyright © 2006 Access Innovations, Inc. 49 Equivalence – when to use Synonyms, slang, quasi-synonyms Scientific and trade names –IbubrofenUF Motrin™ Lexical variants –Fiber opticsUF Fibre optics –MouseUF Mice Upward posting of narrow concepts not specified in taxonomy or thesaurus –Social classUF Elite, Middle class, Working class Get equivalent terms from search logs, brainstorming…

Copyright © 2006 Access Innovations, Inc. 50 Associative relationship Related Terms (RTs) ~ cousins “…terms related conceptually but not hierarchically, and are not part of an equivalence set” (i.e. not synonyms) –Should siblings be Related Terms?? Both terms are valid thesaurus terms for indexing, and have reciprocal relationship Expands user’s awareness, reflects thesaurus coverage of unanticipated areas Standards describe specific types (see Lexicon)

Copyright © 2006 Access Innovations, Inc. 51 Sibling rivalry and facets Format and sense of sibling terms should be consistent If siblings don’t coexist well, separate them Subdivide large groups of terms into facets, mutually exclusive subcategories Growing demand with faceted navigation Facet examples –Properties, Materials, Agents, Actions, Influence –Objects, Styles and periods, Color, Shape (Art & Architecture Thesaurus)

Copyright © 2006 Access Innovations, Inc. 52 Faceted classification Pharmaceuticals –(by action) Anti-inflammatory agents… –(by chemical structure) Alkaloids… –(by indication) Pain… –(by use) Immunosuppression… Facet indicators (aka Node labels), not to be used for indexing

Copyright © 2006 Access Innovations, Inc. 53 Faceting challenge Paint –Oil paint –High-gloss paint –Interior paint –Matte paint –Latex paint –Semi-gloss paint –Exterior paint Propose facet indicators and subgroup these paint varieties into facets.

Copyright © 2006 Access Innovations, Inc. 54 Scope Notes (SN) Indicate meaning of the term in the context of this thesaurus, for this audience –Stress – Metal, Psychological, Physiological Indicate any restriction in meaning Indicate range of topics covered Provide direction for indexers; for terms often confused, may suggest an alternative term Use only as needed – not for every term Establish and stick with consistent format Be concise

Copyright © 2006 Access Innovations, Inc. 55 Evaluating terms Do terms represent all necessary concepts? –Gap analysis Do terms capture necessary details? –Level of granularity Are terms understood by users? –Domain expert vs. common user

Copyright © 2006 Access Innovations, Inc. 56 Talk about terms Term format Grammatical issues Singular and plural forms Spelling Abbreviations and acronyms Capitalization Other punctuation Consistency

Copyright © 2006 Access Innovations, Inc. 57 Term format KISS – Keep it short and simple –1-2-3 words Effect on search Factoring, Postcoordination (coming) Grammatical issues –Nouns and noun phrases –Verbish things –Adjectives –Adverbs –Initial articles

Copyright © 2006 Access Innovations, Inc. 58 Most terms are nouns Nouns or simple noun phrases (phrase = compound or bound term) –Adj + Noun – Art history (ANSI/NISO standard) Noun + Prep + Noun – History of art (ISO standard) –Exceptions – Burden of proof, Coats of arms, Prisoners of war, Birds of prey, etc.

Copyright © 2006 Access Innovations, Inc. 59 Other parts of speech Verbs –Gerund form: Fishing Adjectives –Not used in isolation –Very rare (lots in Art & Architecture Thesaurus) –OK when combined with another term – Dental bridges Adverbs –No, except as part of proper name – Very Large Array Articles –No, except as part of proper name – El Salvador, Le Mans

Copyright © 2006 Access Innovations, Inc. 60 Singular and plural forms Plural form for count nouns –“how many” clouds, animals, highways Singular form for mass nouns –“how much” security, oxygen, rain Exceptions –Body parts in medicine  singular (heart, foot) –Unique entities  singular (Brooklyn Bridge) –User warrant  plural/singular (fishes) stocks? fishes? monies?

Copyright © 2006 Access Innovations, Inc. 61 Term spelling Preferred spelling depends on audience –Multinational company may need alternative spellings in same taxonomy Use most widely accepted spelling Use secondary spelling as NonPreferred Term (synonym) Exception: –Proper names – Labour Party

Copyright © 2006 Access Innovations, Inc. 62 Abbreviations and acronyms Use only when full form is rarely seen – SCUBA, LASER, DNA, LASIK Use full form if abbreviation is not widely used and understood –Automated teller machines – for ATM –Driving while intoxicated – for DWI Alternative becomes NonPreferred Term Use and acceptance always shifting Be consistent

Copyright © 2006 Access Innovations, Inc. 63 CapitalizationCapitalization Standards: use all lower case –Exceptions: Initialisms – DNA Proper names – Queen Mary Trade names – Thesaurus Master™ Taxonomic names – Homo sapiens Much variation in practice

Copyright © 2006 Access Innovations, Inc. 64 ParenthesesParentheses Use only for –Parenthetical qualifiers to disambiguate homographs Bridges (Dentistry), Bridges (Roadways), Bridges (Music) –Different meanings for singular / plural word forms Bridges [all the above] vs. Bridge (Card game) Wood (Material) vs. Woods (Forest) Damage (Injury) vs. Damages (Law) –Facet indicators – Paint (by finish) –Part of the term – benzo(a)pyrene –Trademark indicator (tm) becomes ™

Copyright © 2006 Access Innovations, Inc. 65 HyphensHyphens Generally avoid -- nonfiction Use only if –Omitting the hyphen would be ambiguous cocitation vs. co-occurrence –The hyphen is part of the term n-body problem p-benzoquinone CD-ROM

Copyright © 2006 Access Innovations, Inc. 66 Other punctuation bits Apostrophes –Keep for possessive case Diacritical marks –Keep if possible – Québec Other random marks –Keep if part of a proper name – A&W Root Beer Standard & Poors

Copyright © 2006 Access Innovations, Inc. 67 Compound terms (aka bound terms) and factored terms Term consisting of more than one word that represents a single concept Keep compound term or factor out (split)?

Copyright © 2006 Access Innovations, Inc. 68 Compound terms are precoordinated Elements are bound together to specify a concept at the indexing stage Can’t change the parts Water pollution Library science Television influence on preschoolers Chicken dinner with turnips and rutabagas- no substitutions of menu items!

Copyright © 2006 Access Innovations, Inc. 69 Factored terms can be Postcoordinated Elements can be strung together to specify a concept at the search stage Elements can be mixed and combined as needed –Few clothing pieces  several outfits The sum of the elements reflects the concept (usually)

Copyright © 2006 Access Innovations, Inc. 70 To factor or not to factor Is each factor a single concept? Is each factor in your thesaurus? If YES, break term down to factors: California highway construction  California + Highways + Construction If NO, or if factoring would be confusing, retain the compound term Children’s television  Television + Children ?? Science library  Library + Science ??

Copyright © 2006 Access Innovations, Inc. 71 Precoordination positives User expectations – Rapid transit –Occurs commonly in data –Splitting would be odd –Reflects a single concept for the audience Better accuracy – captures specific concepts precisely Fewer false drops Term information is retained (Related Terms, NonPreferred Terms, Scope Notes, …)

Copyright © 2006 Access Innovations, Inc. 72 Precoordination negatives Poorer total recall Term proliferation –Combinations and permutations increase thesaurus size Higher cost Limited flexibility in expressing new concepts

Copyright © 2006 Access Innovations, Inc. 73 Postcoordination pros and cons Higher recall Lower cost Greater flexibility – enables expression of new concepts through novel combinations x Lower accuracy, some false drops –Library scienceNOT = Library + Science –Art museums NOT = Art + Museums Postcoordination is implicit in most online searches (implied AND between search words)

Copyright © 2006 Access Innovations, Inc. 74 About “and” Avoid “and” in terms – not a single concept Instead of: Children and television Factor and postcoordinate USE Media influence + Television + Children “and” OK when both elements are members of a broader class Vessels Ships and boats Your need for granularity may dictate your choice

Copyright © 2006 Access Innovations, Inc. 75 So far you’ve got Hierarchy Complete term records –Broader and Narrower Terms Polyhierarchies when needed –Preferred/NonPreferred Terms (equivalence relationships) –Related Terms (associative relationships) –Scope Notes –Correct term format –Compound terms when needed

Copyright © 2006 Access Innovations, Inc. 76 NotationNotation Symbols (numbers, letters, hyphens, colons…) –1: Apples 1.1: Granny Smith 1.2: Winesap Another kind of ordering (non-alphabetic) –Chronological, positional, numeric sequence, or other logical sequence for user group –Same terms presented differently –Different user groups, different purposes Adjunct to verbal expression of term Secondary to verbal concept organization

Copyright © 2006 Access Innovations, Inc. 77 Review, edit, test, edit, use, edit, and maintain, i.e. edit Review –Users –Expert reviewers Test –Index 500+ documents (more for variable writing style; fewer for strict style) –Monitor search log Edit and maintain –Add term –Change existing term –Change term status –Delete term –Add term relationship –Delete term relationship –Add/modify Scope Note –Change overall structure Consider machine automated / assisted indexing software

Copyright © 2006 Access Innovations, Inc. 78 Automatic taxonomy construction Words and phrases from documents Based on frequency and co-occurrence of words No semantic analysis Produces list of possible terms Requires editorial analysis –hierarchical and conceptual organization –association of related concepts –identifying and deduplicating equivalent concepts

Copyright © 2006 Access Innovations, Inc. 79 Show ‘em what you’ve got – displays for every user Thesaurus/taxonomy views and functions depend on audience and purpose –taxonomists –indexers –corporate workers –public searchers

Copyright © 2006 Access Innovations, Inc. 80 For the taxonomist Hierarchy view Alphabetic view Permuted (KWIC) view Single term record view Graphical view Notational view Deleted terms Candidate terms Retrieve term record Find term in hierarchy view Taxonomists NEED MOST and WANT even MORE!

Hierarchy Alphabetical Permuted (KWIC) Term record

Notation view

Copyright © 2006 Access Innovations, Inc. 83 For the indexer Search to retrieve term record Access to Scope Notes, Related Terms, NonPreferred Terms Hierarchy view for the big picture Automated proposal of indexing terms

Copyright © 2006 Access Innovations, Inc. 85 For the searcher Browsable directory (Yahoo.com, MediaSleuth.com) Faceted navigation (MOMA.org, LandsEnd.com) Alpha term list or terms grouped by letter Drop down list with selected terms Portal view – complete or partial taxonomy –Display terms may be identical to taxonomy terms –Display terms may be variants, mapped to taxonomy terms Taxonomy may not be accessible – requires random guessing

Display taxonomy categories Results from sample of 1,100 documents (not all categories are populated)

Copyright © 2006 Access Innovations, Inc. 87 Reveal Narrower Terms

Copyright © 2006 Access Innovations, Inc. 88 Select taxonomy category to display titles

Copyright © 2006 Access Innovations, Inc. 89 Access full bibliographic record

Copyright © 2006 Access Innovations, Inc. 90 Faceted navigation

Copyright © 2006 Access Innovations, Inc. 91 SLA website and thesaurus

Copyright © 2006 Access Innovations, Inc. 92 SLA search

Copyright © 2006 Access Innovations, Inc. 93 Search query: THESAURUS Precision search based on M.A.I. indexing: 3 hits Free text, no indexing  0 hits Concept indexing – effect on retrieval

Copyright © 2006 Access Innovations, Inc. 94

Copyright © 2006 Access Innovations, Inc. 95 Leverage taxonomy term information to aid search Search: kangaroo Broader Terms Narrower Terms Related Terms Use (synonyms)

Copyright © 2006 Access Innovations, Inc. 96 Indexing rule Term record

Copyright © 2006 Access Innovations, Inc. 97 What we’ve covered Taxonomy – from different perspectives Collecting and organizing concepts Term choice and vocabulary control Taxonomy structure Term relationships Term format Factored and compound terms Constructing a simple taxonomy Display variations for different users

Copyright © 2006 Access Innovations, Inc. 98 “The Computer and the Poet” “The biggest single need in computer technology is not for improved circuitry, or enlarged capacity, or prolonged memory, or miniaturized containers, but for better questions and better use of answers.” Norman Cousins, editorial in The Saturday Review, July 23, 1966 special issue on “The New Computer Age” Through taxonomies, effectively applied through indexing, we aim to efficiently connect the questions and the answers.

Copyright © 2006 Access Innovations, Inc. 99 Thanks for your attention! Alice Redmond-Neal Access Innovations, Inc. Data Harmony software Questions? Comments?