Download presentation
Presentation is loading. Please wait.
Published byArthur Harvey Modified over 8 years ago
1
INLS 520 – Fall 2007 Erik Mitchell INLS 520 Information Organization
2
INLS 520 – Fall 2007 Erik Mitchell Review Last week –Types of categorization & classification structures Classification –Definitions –Look at Library classification systems for Dewey & Library of Congress
3
INLS 520 – Fall 2007 Erik Mitchell Today Controlled vocabularies –Types –Basic concepts Related technologies –Metadata standards –Example Systems Knowledge organization systems –Term Lists, Thesauri, Taxonomies, Ontologies
4
INLS 520 – Fall 2007 Erik Mitchell Concepts & definitions Controlled Vocabularies –“organized lists of words and phrases, or notation systems, that are used to initially tag content, and then to find it through navigation or search.” (Warner via Leise, Fast) –“the primary purpose of vocabulary control is to achieve consistency in the description of content objects and to facilitate retrieval” (ANSI Z39.19) Knowledge organization systems –“tools that present the organized interpretation of knowledge structures” (Hjørland) –“classification schemes that organize materials at a general level…, subject headings that provide more detailed access, and authority files that control variant versions of key information” (Hodge)Hodge –“It depends on what the meaning of the words 'is' is.” (Clinton)
5
INLS 520 – Fall 2007 Erik Mitchell Uses of controlled vocabulary (1) Define scope, content, and context of information Navigation, breadcrumbs Map to user terminology Enhance browsing, searching Term consistency and relationships
6
INLS 520 – Fall 2007 Erik Mitchell Functions of a CV Removes ambiguity –Synonyms, Homonyms, polysemes,Homonymspolysemes Defines relationships –Equivalence, hierarchical, associative (BT, NT, RT, CR) reciprocity, Provides context –Category, scope, qualifiers, modifiers, scope notes
7
INLS 520 – Fall 2007 Erik Mitchell Types of Controlled Vocabularies Term Lists –Glossaries, Dictionaries, Gazetteers, Folksonomies Synonym rings –Z39.19 example –Oracle Text Taxonomies –Website navigation scheme Thesauri / Ontologies –Authority files, subject thesauri, topic maps
8
INLS 520 – Fall 2007 Erik Mitchell A conceptual map http://www.taxotips.com/
9
INLS 520 – Fall 2007 Erik Mitchell CV Concepts Content Analysis –Ambiguity –Synonymy –Exhaustivity –Specificity –Co-extensivity –Aboutness –Semantic structure –Warrant (User, Literary, Organization) Form Analysis –Linguistics –Grammar –Semiotics –Single / Multiple terms Indexing & Retrieval –Pre vs. Post Coordinate –Recall vs. Precision –Natural language processing (NLP)
10
INLS 520 – Fall 2007 Erik Mitchell Content Analysis (1) Ambiguity –Each term should relate to a single concpet Synonymy –Each concept should be identified by a single entry Specificity –Using the most specific words or phrase expressing the subject Exhaustivity –The extent to which the entire document is indexed (Summarization, depth) Co-extensivity –“Assign as many terms as needed to bring out the main theme, and according to guidelines sub-themes.” (p. 29, Lancaster) –“nothing more, nothing less” Semantic Structure –Terms can be related with equivalence, hierarchy, or associated relationships (Use, See, NT, BT, RT)
11
INLS 520 – Fall 2007 Erik Mitchell Content Analysis (2) Aboutness = Subject/topic? –Wilson (1968) Author intent, topicality, relationship to other resources, textual analysis –Farithorne (1969)Farithorne Intentional aboutness (author), extensional aboutness (document) –Maron (1977)Maron objective about (document), subjective about (user), and retrieval about (information retrieval) –Hjorland (2001)Hjorland “Closely related to theories of meaning, interpretation, and epistemology”
12
INLS 520 – Fall 2007 Erik Mitchell Content Analysis (3) Wilson’s criteria for evaluating aboutness (1968) –Identify author’s purpose (intent) –Weigh the predominant topics, elements (topical analysis) –Group/count a document’s use of concepts and references (bibliometrics) –Identify essential elements (text analysis)
13
INLS 520 – Fall 2007 Erik Mitchell Content Analysis (4) Literary Warrant –“The inclusion of a vocabulary term in a controlled vocabulary based on its appearance in one or more content items. For example, a medical text may use the term “oncology.” Based on literary warrant, that term would be included in the controlled vocabulary even though the general public uses the term “cancer.” (Glosso- Thesaurus)Glosso- Thesaurus User Warrant –“The inclusion of a vocabulary term in a controlled vocabulary based on use by users. Such terms can be identified through search log analysis or free listing.” (Glosso-Thesaurus)Glosso-Thesaurus Organizational Warrant –“Justification for the...selection of a preferred term due to the characteristics and context of the organization using the resource” (ANSI Z39.19)
14
INLS 520 – Fall 2007 Erik Mitchell Form Analysis –Linguistics Synatx/Form (grammar) Morphology (internal word structure) Semantics (meaning) Pragmatics, discourse analysis (word/phrase use) –Semiotics study of signs/symbolssymbols –Lexical structure Document layout, markup, tags (think DOM)
15
INLS 520 – Fall 2007 Erik Mitchell Indexing & Retrieval Pre/Post-Coordinate Organization prior to retrieval Organization at the point of retrieval Recall / Precision Recall: Number of retrieved relevant docs / total number of docs in collection Precision: number or retrieved relevant docs / all relevant docs in collection Natural language processing Uses semantics and syntax to automatically distill ‘aboutness’
16
INLS 520 – Fall 2007 Erik Mitchell Recall & Precision A collection of 100 documents Searches –“Vocabularies” Recall 100/100 = 1 Precision 100/100 = 1 –“Facet” Recall 20/100=.2 Precision 20/28 =.71 –“OWL” Recall 1/100 =.001 Precision 1/1 = 1 CV Entry# of docs Controlled Vocabularies 100 Faceted analysis20 Ontologies5 OWL1 RDF3 Recall = # of docs retrieved / total # of docs in collection Precision = # relevant of docs retrieved / total relevant # of docs in collection
17
INLS 520 – Fall 2007 Erik Mitchell Term List Examples Authority files – Maps to preferred terms –Library of CongressLibrary of Congress –Encoded Archival ContextEncoded Archival Context –Union List of Artist NamesUnion List of Artist Names Glossaries/Dictionaries –Words & definitions, sometimes topic focused –Glosso-ThesaurusGlosso-Thesaurus Folksonomies – –Contextualization, Trend discovery, Personal InformationContextualizationTrend discoveryPersonal Information Synonym rings – Used for back-end equivalence in searching –Princeton WordnetPrinceton Wordnet
18
INLS 520 – Fall 2007 Erik Mitchell Thesauri & taxonomy examples List of vocabularies –http://www.slais.ubc.ca/resources/indexing/ database1.htmhttp://www.slais.ubc.ca/resources/indexing/ database1.htm –Taxonomy warehouseTaxonomy warehouse Two Examples –Health & Ageing ThesaurusHealth & Ageing Thesaurus –Thesaurus of Geographic namesThesaurus of Geographic names
19
INLS 520 – Fall 2007 Erik Mitchell Interoperable system example NCBI Entrez –35 databases using interoperable controlled vocabulary systems to provide rich meta- searching Cross-database discovery – search for “heart attack”Cross-database discovery Cross database linking – search for aconitase, follow the “other links” tab.Cross database linking
20
INLS 520 – Fall 2007 Erik Mitchell Vocabulary and Classification systems - exercise Organization structures –Term Lists / Enumerative systems –Hierarchies –Tees –Paradigms –Facets / Associative relationships –Folksonomies Break into groups, discuss & list –Goal –Structure –Issues –Benefits Resources –Kwasnik, Boxes & arrows
21
INLS 520 – Fall 2007 Erik Mitchell Hierarchies Features –Inclusiveness –“Is-a” relationship –Inheritance –Transitivity –Systematic –Mutually exclusive –Neccesary and sufficient Issues Illusion of completeness Multiple perspectives Lack of comprehensive knoeldge IDfference in scale Lack of tranistivity Strict rules Benefits Comprehensive Economy of notation Inheritance Inference Real definitions Holistic perspective High level view
22
INLS 520 – Fall 2007 Erik Mitchell Trees Features –Hierarchy without inheritance –Varied relationships (beyond is-a) –Partitive relationships Issues Rigidity One-way perspective Selective perspective (single attribute) Benefits –Shows a primary relationship well –Indicates distance between objects –Shows relative frequency
23
INLS 520 – Fall 2007 Erik Mitchell Paradigms Features –Horizontal, multi- dimensional –Matrix allows assignment of attributes rather than placement in hierarchy Issues More extensive knowledge required Limited explanatory power Limited overview, navigational abilities Benefits –Naming allows abstraction –Definition/distinction allows assignment of attributes –Matrix allows comparison of attributes –Empty values tell us something
24
INLS 520 – Fall 2007 Erik Mitchell Faceted Classification Features –Multi-dimensional –Multi-relationship driven –Triples, object with attribute Issues Lack of obvious relationships Difficult to navigate, visualize Harder to establish facets Benefits Accommodates Partial Knowledge Flexible, Hospitable Expressive Bottom-up, not top-down Multi-theoretical Multi-perspective
25
INLS 520 – Fall 2007 Erik Mitchell Folksonomy Features –Single level description –Open vocabulary list –User supplied/harvested tags Issues Lack of controlled vocabulary Lack of relationship/hierarchy assignment Lack of definition of intent Benefits Flexible User-Centered Harvestable(?) – for what?
26
INLS 520 – Fall 2007 Erik Mitchell Relationships Equivalence ( Term Lists) –“use”, “see”, “isVersionOf”, “isFormatOf” Hierarchical (Thesauri, Taxonomies) –Generic – “is a” –Partitive – “is part of”, “has part”, “has conceptual part”, “member of” –Instance – Associative (Facets, Ontologies) –“isReferencedBy”, “isRequiredBy”, “hasDerivative”
27
INLS 520 – Fall 2007 Erik Mitchell Choosing a framework Use questions –Who is your user, what are their needs? –What systems are your users familiar with? –Will this system be internal/external? Content questions –How extensive, defined is the information? –Is your subject matter static or fluid? –What organizational framework best describes your content? System Questions –What access are you trying to provide? –What external pressures exist? –What external entities/theories will interact with this system?
28
INLS 520 – Fall 2007 Erik Mitchell Interoperability issues Similarity of subject matter in domains Multiple CV accepted in a domain Specificity/granularity of content indexing Use of synonyms, warrant Intended use, purpose of system
29
INLS 520 – Fall 2007 Erik Mitchell Creating a CV (1) Design methods –Re-use existing, start with content & desired use ideas –Committee / community approach Top-down –Concept driven Bottom-up –Document driven –Empirical approach Deductive approach –Select terms, create relationships, perform term control Inductive approach –Establish CV at outset, build hierarchies on as needed basis
30
INLS 520 – Fall 2007 Erik Mitchell Creating a CV (2) Top-Down –Identify audience –Identify all topics, concepts, uses, and context of the domain –Sort topics identified into an appropriate organization scheme (enumerative, hierarchical, faceted) –Solidify structure and clean up gaps & redundancies –Assign documents to categories, test retrieval Bottom-up –Identify audience –Survey documents for topics/concepts. –Build system on the fly – let content drive structure and limits of system –Identify gap & redundancies in system –Test retrieval
31
INLS 520 – Fall 2007 Erik Mitchell Creating a CV (3) Think about scope, use, content, maintenance Gather Terms –Based on existing systems, content –Based on user needs/expectations –Investigate issues of specificity, exhaustivity, granularity Build hierarchies, relationships –Broader/narrower terms, Related terms, Use/Use for, see/see also Establish Rules Implement Evaluate Maintain http://www.boxesandarrows.com/view/creating_a_controlled_vocabulary
32
INLS 520 – Fall 2007 Erik Mitchell Evaluating a CV Goals Determine if the CV solves retrieval needs of user/system Determine if CV matches user’s content model/term expectations Methods Expert evaluation of CV User based card sorting compared to actual CV Identification of non-included documents Analysis of use of system - HCI
33
INLS 520 – Fall 2007 Erik Mitchell CV Maintenance Primary responsibility –Editor, board, committee New terms –Is it really new or a different view –What is the proper form & placement Modified terms –Include a change log –Use a “USE” reference to point to new term Deleted terms –Unused / Overused terms –May want to keep for historical retrieval purposed Modification history –Use modification notes, date/time stamps
34
INLS 520 – Fall 2007 Erik Mitchell Class exercise Protégé overview –Orientation –Object types (Classes, Slots, Instances) –Relationships (hierarchies, associative) Replication of the Glosso-Thesaurus –Visit the Boxes & Arrows Glosso ThesaurusVisit the Boxes & Arrows Glosso Thesaurus –Look at the data there and come up with a structure in Protégé that allows replication of the thesaurus –Some issues to consider are: Do you want terms to be classes or instances? What is the easiest way to show the relationships (broader term, narrower term, etc)? Do you need to allow multiple relationships for a given type (BT, RT, etc)? If you have multiple classes, at what level should you create the slots?
35
INLS 520 – Fall 2007 Erik Mitchell Next Week More on Knowledge organization systems –Taxonomies, Ontologies –More work with Protégé
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.