INLS 520 – Fall 2007 Erik Mitchell INLS 520 Information Organization.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Ontology Assessment – Proposed Framework and Methodology.
Alexandria Digital Library Project Integration of Knowledge Organization Systems into Digital Library Architectures Linda Hill, Olha Buchel, Greg Janée.
Subject Analysis: An Introduction Based on BASIC SUBJECT CATALOGING USING LCSH edited by Lori Robare.
IPY and Semantics Siri Jodha S. Khalsa Paul Cooper Peter Pulsifer Paul Overduin Eugeny Vyazilov Heather lane.
Taxonomies of Knowledge: Building a Corporate Taxonomy Wendi Pohs, Iris Associates
Taxonomies, Lexicons and Organizing Knowledge Wendi Pohs, IBM Software Group.
6. Applying metadata standards: Controlled vocabularies and quality issues Metadata Standards and Applications Workshop.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
Leveraging Your Taxonomy to Increase User Productivity MAIQuery and TM Navtree.
Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of Information Resources Winter 2007 Michael Crandall.
Thesaurus Design and Development
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Module 9a: Classification Schemes
A Registry for controlled vocabularies at the Library of Congress
1 Languages for aboutness n Indexing languages: –Terminological tools Thesauri (CV – controlled vocabulary) Subject headings lists (CV) Authority files.
Conceptual modelling. Overview - what is the aim of the article? ”We build conceptual models in our heads to solve problems in our everyday life”… ”By.
Knowledge organisation and information architecture, Nils Pharo Knowledge organisation and the Web Nils Pharo, 6th November 2002.
Sunday May 4 – 5 PM Bradford, Hlava, McNaughton
IBE312: Ch15 Building an IA Team & Ch16 Tools & Software 2013.
INLS 520 Erik Mitchell INLS 520 Information Organization.
Information Organization
INLS 520 – Fall 2007 Erik Mitchell INLS 520 Information Organization.
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
Taxonomies: Hidden but Critical Tools Marjorie M.K. Hlava President Access Innovations, Inc.
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
Controlled Vocabulary & Thesaurus Design Planning & Maintenance.
Terminology services and the DDC: the High-Level Thesaurus and beyond Presented to the symposium Dewey goes Europe: on the use and development of the Dewey.
Ontology Development in the Sciences Some Fundamental Considerations Ontolytics LLC Topics:  Possible uses of ontologies  Ontologies vs. terminologies.
1 Catalog Displays, Retrieval, and FAST May 31, 2005.
Of 39 lecture 2: ontology - basics. of 39 ontology a branch of metaphysics relating to the nature and relations of being a particular theory about the.
D4: SKOS and HIVE—Enhancing the Creation, Design and Flow of Information Speakers: Hollie White Jane Greenberg Coordinator: Alan Keely.
ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
INLS 520 – Erik Mitchell INLS 520 Information Organization.
ISKO 2010 TERMINOLOGY AS ORGANIZED KNOWLEDGE Boyan Alexiev Nancy Marksbury.
Definition of a taxonomy “System for naming and organizing things into groups that share similar characteristics” Taxonomy Architectures Applications.
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
Semantic Data & Ontologies CMPT 455/826 - Week 5, Day 2 Sept-Dec 2009 – w5d21.
Evolving MARC 21 for the future Rebecca Guenther CCS Forum, ALA Annual July 10, 2009.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Comparability of language data and analysis Using an ontology for linguistics Scott Farrar, U.
Semantic web course – Computer Engineering Department – Sharif Univ. of Technology – Fall Knowledge Representation Semantic Web - Fall 2005 Computer.
Controlled Vocabulary & Thesaurus Design Planning & Maintenance.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Information Architecture & Design Week 5 Schedule -Planning IA Structures -Other Readings -Research Topic Presentations Nadalia your Presentations.
Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar
Strategies for subject navigation of linked Web sites using RDF topic maps Carol Jean Godby Devon Smith OCLC Online Computer Library Center Knowledge Technologies.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Jane Reid, AMSc IRIC, QMUL, 30/10/01 1 Information seeking Information-seeking models Search strategies Search tactics.
June 2003INIS Training Seminar1 INIS Training Seminar 2-6 June 2003 Subject Analysis Thesaurus and Indexing Alexander Nevyjel Subject Control Unit INIS.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Controlled Vocabulary & Thesaurus Design Associative Relationships & Thesauri.
MSG Reuse Catalog T.W. van den Berg 7 April 2010.
Enable Semantic Interoperability for Decision Support and Risk Management Presented by Dr. David Li Key Contributors: Dr. Ruixin Yang and Dr. John Qu.
Automatic vs manual indexing Focus on subject indexing Not a relevant question? –Wherever full text is available, automatic methods predominate Simple.
Charlyn P. Salcedo Instructor Types of Indexing Languages.
Expanding the Notion of Links DeRose, S.J. Expanding the Notion of Links. In Proceedings of Hypertext ‘89 (Nov. 5-8, Pittsburgh, PA). ACM, New York, 1989,
Design Evaluation Overview Introduction Model for Interface Design Evaluation Types of Evaluation –Conceptual Design –Usability –Learning Outcome.
1 How do we describe something? n What something is about? –What the content of an object is “about”? n Different methods (Wilson, 1968) –counting terms.
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
1 SUBJECT ACCESS INF 389F: Organization of Records Information Professor Fran Miksa October 29, 2003.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Information Organization
Information Organization
Taxonomies, Lexicons and Organizing Knowledge
Introduction to Semantic Metadata & Semantic Web
From a thesaurus standard to a general knowledge organization standard?! 04/12/2018.
PREMIS Tools and Services
Introduction to Information Retrieval
Presentation transcript:

INLS 520 – Fall 2007 Erik Mitchell INLS 520 Information Organization

INLS 520 – Fall 2007 Erik Mitchell Review Last week –Types of categorization & classification structures Classification –Definitions –Look at Library classification systems for Dewey & Library of Congress

INLS 520 – Fall 2007 Erik Mitchell Today Controlled vocabularies –Types –Basic concepts Related technologies –Metadata standards –Example Systems Knowledge organization systems –Term Lists, Thesauri, Taxonomies, Ontologies

INLS 520 – Fall 2007 Erik Mitchell Concepts & definitions Controlled Vocabularies –“organized lists of words and phrases, or notation systems, that are used to initially tag content, and then to find it through navigation or search.” (Warner via Leise, Fast) –“the primary purpose of vocabulary control is to achieve consistency in the description of content objects and to facilitate retrieval” (ANSI Z39.19) Knowledge organization systems –“tools that present the organized interpretation of knowledge structures” (Hjørland) –“classification schemes that organize materials at a general level…, subject headings that provide more detailed access, and authority files that control variant versions of key information” (Hodge)Hodge –“It depends on what the meaning of the words 'is' is.” (Clinton)

INLS 520 – Fall 2007 Erik Mitchell Uses of controlled vocabulary (1) Define scope, content, and context of information Navigation, breadcrumbs Map to user terminology Enhance browsing, searching Term consistency and relationships

INLS 520 – Fall 2007 Erik Mitchell Functions of a CV Removes ambiguity –Synonyms, Homonyms, polysemes,Homonymspolysemes Defines relationships –Equivalence, hierarchical, associative (BT, NT, RT, CR) reciprocity, Provides context –Category, scope, qualifiers, modifiers, scope notes

INLS 520 – Fall 2007 Erik Mitchell Types of Controlled Vocabularies Term Lists –Glossaries, Dictionaries, Gazetteers, Folksonomies Synonym rings –Z39.19 example –Oracle Text Taxonomies –Website navigation scheme Thesauri / Ontologies –Authority files, subject thesauri, topic maps

INLS 520 – Fall 2007 Erik Mitchell A conceptual map

INLS 520 – Fall 2007 Erik Mitchell CV Concepts Content Analysis –Ambiguity –Synonymy –Exhaustivity –Specificity –Co-extensivity –Aboutness –Semantic structure –Warrant (User, Literary, Organization) Form Analysis –Linguistics –Grammar –Semiotics –Single / Multiple terms Indexing & Retrieval –Pre vs. Post Coordinate –Recall vs. Precision –Natural language processing (NLP)

INLS 520 – Fall 2007 Erik Mitchell Content Analysis (1) Ambiguity –Each term should relate to a single concpet Synonymy –Each concept should be identified by a single entry Specificity –Using the most specific words or phrase expressing the subject Exhaustivity –The extent to which the entire document is indexed (Summarization, depth) Co-extensivity –“Assign as many terms as needed to bring out the main theme, and according to guidelines sub-themes.” (p. 29, Lancaster) –“nothing more, nothing less” Semantic Structure –Terms can be related with equivalence, hierarchy, or associated relationships (Use, See, NT, BT, RT)

INLS 520 – Fall 2007 Erik Mitchell Content Analysis (2) Aboutness = Subject/topic? –Wilson (1968) Author intent, topicality, relationship to other resources, textual analysis –Farithorne (1969)Farithorne Intentional aboutness (author), extensional aboutness (document) –Maron (1977)Maron objective about (document), subjective about (user), and retrieval about (information retrieval) –Hjorland (2001)Hjorland “Closely related to theories of meaning, interpretation, and epistemology”

INLS 520 – Fall 2007 Erik Mitchell Content Analysis (3) Wilson’s criteria for evaluating aboutness (1968) –Identify author’s purpose (intent) –Weigh the predominant topics, elements (topical analysis) –Group/count a document’s use of concepts and references (bibliometrics) –Identify essential elements (text analysis)

INLS 520 – Fall 2007 Erik Mitchell Content Analysis (4) Literary Warrant –“The inclusion of a vocabulary term in a controlled vocabulary based on its appearance in one or more content items. For example, a medical text may use the term “oncology.” Based on literary warrant, that term would be included in the controlled vocabulary even though the general public uses the term “cancer.” (Glosso- Thesaurus)Glosso- Thesaurus User Warrant –“The inclusion of a vocabulary term in a controlled vocabulary based on use by users. Such terms can be identified through search log analysis or free listing.” (Glosso-Thesaurus)Glosso-Thesaurus Organizational Warrant –“Justification for the...selection of a preferred term due to the characteristics and context of the organization using the resource” (ANSI Z39.19)

INLS 520 – Fall 2007 Erik Mitchell Form Analysis –Linguistics Synatx/Form (grammar) Morphology (internal word structure) Semantics (meaning) Pragmatics, discourse analysis (word/phrase use) –Semiotics study of signs/symbolssymbols –Lexical structure Document layout, markup, tags (think DOM)

INLS 520 – Fall 2007 Erik Mitchell Indexing & Retrieval Pre/Post-Coordinate Organization prior to retrieval Organization at the point of retrieval Recall / Precision Recall: Number of retrieved relevant docs / total number of docs in collection Precision: number or retrieved relevant docs / all relevant docs in collection Natural language processing Uses semantics and syntax to automatically distill ‘aboutness’

INLS 520 – Fall 2007 Erik Mitchell Recall & Precision A collection of 100 documents Searches –“Vocabularies” Recall 100/100 = 1 Precision 100/100 = 1 –“Facet” Recall 20/100=.2 Precision 20/28 =.71 –“OWL” Recall 1/100 =.001 Precision 1/1 = 1 CV Entry# of docs Controlled Vocabularies 100 Faceted analysis20 Ontologies5 OWL1 RDF3 Recall = # of docs retrieved / total # of docs in collection Precision = # relevant of docs retrieved / total relevant # of docs in collection

INLS 520 – Fall 2007 Erik Mitchell Term List Examples Authority files – Maps to preferred terms –Library of CongressLibrary of Congress –Encoded Archival ContextEncoded Archival Context –Union List of Artist NamesUnion List of Artist Names Glossaries/Dictionaries –Words & definitions, sometimes topic focused –Glosso-ThesaurusGlosso-Thesaurus Folksonomies – –Contextualization, Trend discovery, Personal InformationContextualizationTrend discoveryPersonal Information Synonym rings – Used for back-end equivalence in searching –Princeton WordnetPrinceton Wordnet

INLS 520 – Fall 2007 Erik Mitchell Thesauri & taxonomy examples List of vocabularies – database1.htmhttp:// database1.htm –Taxonomy warehouseTaxonomy warehouse Two Examples –Health & Ageing ThesaurusHealth & Ageing Thesaurus –Thesaurus of Geographic namesThesaurus of Geographic names

INLS 520 – Fall 2007 Erik Mitchell Interoperable system example NCBI Entrez –35 databases using interoperable controlled vocabulary systems to provide rich meta- searching Cross-database discovery – search for “heart attack”Cross-database discovery Cross database linking – search for aconitase, follow the “other links” tab.Cross database linking

INLS 520 – Fall 2007 Erik Mitchell Vocabulary and Classification systems - exercise Organization structures –Term Lists / Enumerative systems –Hierarchies –Tees –Paradigms –Facets / Associative relationships –Folksonomies Break into groups, discuss & list –Goal –Structure –Issues –Benefits Resources –Kwasnik, Boxes & arrows

INLS 520 – Fall 2007 Erik Mitchell Hierarchies Features –Inclusiveness –“Is-a” relationship –Inheritance –Transitivity –Systematic –Mutually exclusive –Neccesary and sufficient Issues Illusion of completeness Multiple perspectives Lack of comprehensive knoeldge IDfference in scale Lack of tranistivity Strict rules Benefits Comprehensive Economy of notation Inheritance Inference Real definitions Holistic perspective High level view

INLS 520 – Fall 2007 Erik Mitchell Trees Features –Hierarchy without inheritance –Varied relationships (beyond is-a) –Partitive relationships Issues Rigidity One-way perspective Selective perspective (single attribute) Benefits –Shows a primary relationship well –Indicates distance between objects –Shows relative frequency

INLS 520 – Fall 2007 Erik Mitchell Paradigms Features –Horizontal, multi- dimensional –Matrix allows assignment of attributes rather than placement in hierarchy Issues More extensive knowledge required Limited explanatory power Limited overview, navigational abilities Benefits –Naming allows abstraction –Definition/distinction allows assignment of attributes –Matrix allows comparison of attributes –Empty values tell us something

INLS 520 – Fall 2007 Erik Mitchell Faceted Classification Features –Multi-dimensional –Multi-relationship driven –Triples, object with attribute Issues Lack of obvious relationships Difficult to navigate, visualize Harder to establish facets Benefits Accommodates Partial Knowledge Flexible, Hospitable Expressive Bottom-up, not top-down Multi-theoretical Multi-perspective

INLS 520 – Fall 2007 Erik Mitchell Folksonomy Features –Single level description –Open vocabulary list –User supplied/harvested tags Issues Lack of controlled vocabulary Lack of relationship/hierarchy assignment Lack of definition of intent Benefits Flexible User-Centered Harvestable(?) – for what?

INLS 520 – Fall 2007 Erik Mitchell Relationships Equivalence ( Term Lists) –“use”, “see”, “isVersionOf”, “isFormatOf” Hierarchical (Thesauri, Taxonomies) –Generic – “is a” –Partitive – “is part of”, “has part”, “has conceptual part”, “member of” –Instance – Associative (Facets, Ontologies) –“isReferencedBy”, “isRequiredBy”, “hasDerivative”

INLS 520 – Fall 2007 Erik Mitchell Choosing a framework Use questions –Who is your user, what are their needs? –What systems are your users familiar with? –Will this system be internal/external? Content questions –How extensive, defined is the information? –Is your subject matter static or fluid? –What organizational framework best describes your content? System Questions –What access are you trying to provide? –What external pressures exist? –What external entities/theories will interact with this system?

INLS 520 – Fall 2007 Erik Mitchell Interoperability issues Similarity of subject matter in domains Multiple CV accepted in a domain Specificity/granularity of content indexing Use of synonyms, warrant Intended use, purpose of system

INLS 520 – Fall 2007 Erik Mitchell Creating a CV (1) Design methods –Re-use existing, start with content & desired use ideas –Committee / community approach Top-down –Concept driven Bottom-up –Document driven –Empirical approach Deductive approach –Select terms, create relationships, perform term control Inductive approach –Establish CV at outset, build hierarchies on as needed basis

INLS 520 – Fall 2007 Erik Mitchell Creating a CV (2) Top-Down –Identify audience –Identify all topics, concepts, uses, and context of the domain –Sort topics identified into an appropriate organization scheme (enumerative, hierarchical, faceted) –Solidify structure and clean up gaps & redundancies –Assign documents to categories, test retrieval Bottom-up –Identify audience –Survey documents for topics/concepts. –Build system on the fly – let content drive structure and limits of system –Identify gap & redundancies in system –Test retrieval

INLS 520 – Fall 2007 Erik Mitchell Creating a CV (3) Think about scope, use, content, maintenance Gather Terms –Based on existing systems, content –Based on user needs/expectations –Investigate issues of specificity, exhaustivity, granularity Build hierarchies, relationships –Broader/narrower terms, Related terms, Use/Use for, see/see also Establish Rules Implement Evaluate Maintain

INLS 520 – Fall 2007 Erik Mitchell Evaluating a CV Goals Determine if the CV solves retrieval needs of user/system Determine if CV matches user’s content model/term expectations Methods Expert evaluation of CV User based card sorting compared to actual CV Identification of non-included documents Analysis of use of system - HCI

INLS 520 – Fall 2007 Erik Mitchell CV Maintenance Primary responsibility –Editor, board, committee New terms –Is it really new or a different view –What is the proper form & placement Modified terms –Include a change log –Use a “USE” reference to point to new term Deleted terms –Unused / Overused terms –May want to keep for historical retrieval purposed Modification history –Use modification notes, date/time stamps

INLS 520 – Fall 2007 Erik Mitchell Class exercise Protégé overview –Orientation –Object types (Classes, Slots, Instances) –Relationships (hierarchies, associative) Replication of the Glosso-Thesaurus –Visit the Boxes & Arrows Glosso ThesaurusVisit the Boxes & Arrows Glosso Thesaurus –Look at the data there and come up with a structure in Protégé that allows replication of the thesaurus –Some issues to consider are: Do you want terms to be classes or instances? What is the easiest way to show the relationships (broader term, narrower term, etc)? Do you need to allow multiple relationships for a given type (BT, RT, etc)? If you have multiple classes, at what level should you create the slots?

INLS 520 – Fall 2007 Erik Mitchell Next Week More on Knowledge organization systems –Taxonomies, Ontologies –More work with Protégé