Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of Information Resources Winter 2007 Michael Crandall.

Slides:



Advertisements
Similar presentations
Toward an International Sharing and Use of Subject Authority Data
Advertisements

Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
Subject Analysis: An Introduction Based on BASIC SUBJECT CATALOGING USING LCSH edited by Lori Robare.
Dewey Decimal Classification (DDC)
Module 5a: Authority Control and Encoding Schemes IMT530: Organization of Information Resources Winter 2007 Michael Crandall.
Taxonomies, Lexicons and Organizing Knowledge Wendi Pohs, IBM Software Group.
Not just numbers on shelves: using the DDC for information retrieval Gordon Dunsire Presented at the Symposium “Bridging the class(ification) divide: the.
Advanced Searching Engineering Village.
Leveraging Your Taxonomy to Increase User Productivity MAIQuery and TM Navtree.
Text Operations: Preprocessing. Introduction Document preprocessing –to improve the precision of documents retrieved –lexical analysis, stopwords elimination,
Module 8a: Faceted Classification
WMES3103 : INFORMATION RETRIEVAL
Information Retrieval February 24, 2004
Module 10b: Wrapup IMT530: Organization of Information Resources Winter, 2007 Michael Crandall.
Module 10a: Display and Arrangement IMT530: Organization of Information Resources Winter, 2007 Michael Crandall.
Learn how to search for information the smart way Choose your own adventure!
Thesaurus Design and Development
Module 2a: Information Systems IMT530: Organization of Information Resources Winter, 2007 Michael Crandall.
Module 7b: Extracting/Controlling Terms and Semantic Relationships IMT530: Organization of Information Resources Winter 2007 Michael Crandall.
Using Digital Resources In or Out of a Library. Initial Search First decide what your topic is. Be sure that the topic is neither too broad, nor too narrow.
The Library Cataloging Tradition
Module 9a: Classification Schemes
“A successful man is usually a classifier and a chartmaker. This applies as much to modern business as to science or libraries… A large business or work.
IMT530- Organization of Information Resources1 Feedback Like exercises –But want more instructions and feedback on them –Wondering about grading on these.
1 Languages for aboutness n Indexing languages: –Terminological tools Thesauri (CV – controlled vocabulary) Subject headings lists (CV) Authority files.
Module 6b: Creating Controlled Vocabularies IMT530: Organization of Information Resources Winter 2007 Michael Crandall.
Module 2b: Modeling Information Objects and Relationships IMT530: Organization of Information Resources Winter, 2007 Michael Crandall.
What do you hate most about the web?
Vocabulary & languages in searching
ODINCINDIO Marine Information Management Training Course February 2006 Organizing the collection Murari P Tapaswi National Institute of Oceanography,
Controlled Vocabulary & Thesaurus Design Planning & Maintenance.
Terminology services and the DDC: the High-Level Thesaurus and beyond Presented to the symposium Dewey goes Europe: on the use and development of the Dewey.
LIS510 lecture 9 Thomas Krichel Organization of information Libraries organize information. Otherwise nothing that is an library could ever.
 Copyright 2006 Digital Enterprise Research Institute. All rights reserved. Collaborative Building of Controlled Vocabularies Crosswalks Mateusz.
Improving Access to Audio- Visual Materials by Using Genre/Form Terms OLAC Conference 1-3 October 2004 Montreal, Quebec.
Using Taxonomies Effectively in the Organization v. 2.0 KnowledgeNets 2001 Vivian Bliss Microsoft Knowledge Network Group
1 4. Content Organization In this chapter you will learn about: Organizational schemes: classification systems for organizing content into groups Organizational.
1 Catalog Displays, Retrieval, and FAST May 31, 2005.
COMP I Search Strategies Interview Techniques Frederic Murray Assistant Professor MLIS, University of British Columbia BA, Political Science, University.
The Library Cataloging Tradition Marty Kurth CS 431 February 9, 2005 [slides stolen from Diane Hillmann]
The Agricultural Ontology Service (AOS) A Tool for Facilitating Access to Knowledge AGRIS/CARIS and Documentation Group Library and Documentation Systems.
Using Taxonomies Effectively in the Organization KMWorld 2000 Mike Crandall Microsoft Information Services
Module 7a: Creating Controlled Vocabularies IMT530: Organization of Information Resources Winter 2008 Michael Crandall.
Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge.
Librarians vs. Automation Carolyn Weber Lucio Campanelli Will Hohyon Ryu.
Word of the Day: “Call Number” A combination of numbers and letters which is used to identify a particular book or item in a library's collection. Items.
Chapter 4: Content OrganizationCopyright © 2004 by Prentice Hall What do you hate most about the web? Number one answer: I can’t find what I’m looking.
WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.
Digital Libraries Lillian N. Cassel Spring A digital library An informal definition of a digital library is a managed collection of information,
Controlled Vocabulary & Thesaurus Design Course Introduction and Background.
IMT530- Organization of Information Resources1 Recap Descriptive metadata elements can be used for access or selection For access, it is important to have.
Module 10a: Display and Arrangement IMT530: Organization of Information Resources Winter, 2008 Michael Crandall.
Librarians vs. Automation Carolyn Weber Lucio Campanelli Will Hohyon Ryu.
Information Retrieval
IMT530- Organization of Information Resources1 Feedback Lectures –More practical examples –Like guest lecturers –Generally helpful in understanding concepts.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
LIS 204: Introduction to Library and Information Science Week Nine Kevin Rioux, PhD.
ORGANIZATION OF ELEMENTS OF INFORMATION The Thesaurus.
MSG Reuse Catalog T.W. van den Berg 7 April 2010.
Charlyn P. Salcedo Instructor Types of Indexing Languages.
FIND IT! USING LIBRARY CATALOGING CONCEPTS TO ORGANIZE AND MAKE RECORDS FINDABLE DIONNE L. MACK, INTERIM DIRECTOR OF QUALITY OF LIFE DEPARTMENTS.
Semantic Web Overview Diane Vizine-Goetz OCLC Research.
The Agricultural Ontology Server (AOS) A Tool for Facilitating Access to Knowledge AGRIS/CARIS and Documentation Group Food and Agriculture Organization.
Ontologies COMP6028 Semantic Web Technologies Dr Nicholas Gibbins
Part 3A-2: Document & Subject Analysis Documents Subjects Facets.
Module 1a: Course Overview and Logistics IMT530: Organization of Information Resources Winter 2007 Michael Crandall.
COMP6215 Semantic Web Technologies
Taxonomies, Lexicons and Organizing Knowledge
Introduction to Semantic Metadata & Semantic Web
Taxonomies and Classification for Organizing Content
Presentation transcript:

Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of Information Resources Winter 2007 Michael Crandall

IMT530- Organization of Information Resources2 Module 6a Outline Where we are Controlled vocabularies Types of controlled vocabularies Tagging Overview of building vocabularies

IMT530- Organization of Information Resources3 Recap We looked at the indexing process to see how controlled vocabularies can be used to enhance access to information –Different methods of indexing provide different results –Need to decide on your approach based on an analysis of your business objectives, the user needs, and the domain –A combination of automatic and human indexing is often the best solution

IMT530- Organization of Information Resources4 Overview of Subject Representation Subject analysis –a technique used to determine the “subject(s)” and disciplinary context exemplified by an object Subject indexing –a technique through which subject terms (words, taxonomic categories, or notation) are added to an object representation to describe the subject content of the object Controlled vocabularies –standards containing controlled subject terms (words, taxonomic categories, or notation) used in the indexing process

IMT530- Organization of Information Resources5 Controlled Vocabulary: Definition A controlled vocabulary is a list of terms (words or phrases) or codes (notation) used for indexing Almost always, controlled vocabularies show relationships among terms

IMT530- Organization of Information Resources6 Purpose of Controlled Vocabularies Specific Purposes –To provide access to content by subject, through providing hierarchical and associative relationships and synonym control for the terms used in the domain –Increase precision in retrieval and display by controlling homographs (words that are spelled the same but have different meanings) General Purposes –Assist users by conveying meaning, orientation, and structure in a subject area –Assist users by providing rich relationships among concepts and terms

IMT530- Organization of Information Resources7 Buckland Proposes five different vocabularies in any system: –Authors –Indexers –Syndetic structure –Searchers –Formulated queries Formal tradition vs. document tradition

IMT530- Organization of Information Resources8 Types of Controlled Vocabularies Subject Heading List Taxonomy Thesauri Classification Scheme More terminology on Leonard Will’s site – Zeng, M.L. (2005). Construction of controlled vocabularies: A primer.

IMT530- Organization of Information Resources9 Subject Heading Lists General list of terms (words and phrases), not limited by discipline or subject area Terms are called subject headings The distinction between thesauri & subject heading lists is largely historical (subject heading lists are older); there are very few subject heading lists because they are so expensive to maintain Terms are mainly subject attributes, but there are many exemplified attributes used in subdivisions Example: Library of Congress Subject Headings (LCSH), used in library catalogs –Sample terms: “France – Colonies – History – 18 th century”; “Time and space – Juvenile fiction”; “Frogs” (notice the use of subdivisions, marked here by dashes; thesauri seldom use subdivisions)

IMT530- Organization of Information Resources10 Taxonomies List of terms (words and phrases) that may be general or subject/discipline/domain specific Terms are called taxons or (simply) terms Terms represent subjects, disciplines/domains, and exemplified attributes Used in digital environment only Examples: Microsoft Corporation intranet taxonomies; Yahoo taxonomy used in the Yahoo directory –Sample terms from the Yahoo taxonomy (in Yahoo, you’ll find these at the top of the screen as you browse through the directory): “Education”; “Science > Agriculture > Research > Government Agencies”; “Health > Nursing”; “Health > Education”;

IMT530- Organization of Information Resources11 Thesauri Thesauri (pl.) / Thesaurus (s.) –List of terms (words and phrases) that are usually limited to a specific subject or disciplinary area –Terms listed in a thesaurus are often called descriptors –Thesauri were mostly defined and developed after the advent of the computer and were created for use in an computerized environment (or with computers in mind) –Terms are usually subject (about) attributes, but some thesauri also contain exemplified (example of) attributes- –Example: ERIC Thesaurus (education) Sample terms from the ERIC Thesaurus: “School community relationship”; “College entrance exams”; “Age grade placement”

IMT530- Organization of Information Resources12 “Classification” Schemes Chart of subject categories contextualized by a hierarchical structure Terms are lists of codes (notation) Terms are called classes and class numbers Classification schemes make use of disciplinary, subject, and (sometimes) exemplified attributes Used often to arrange physical documents; sometimes used in online environments

IMT530- Organization of Information Resources13 “Classification” Example Examples: Dewey Decimal Classification (DDC); Universal Decimal Classification (UDC); Colon Classification Sample entries (DDC): –510 (meaning: “Mathematics” (a discipline and a subject)); – (meaning: “Mathematics / Linear, multilinear, multidimensional algebras / Factor algebras”) – (meaning: “Social problems and services / Problems of and services to the poor / Financial assistance”)

IMT530- Organization of Information Resources14 Four Types of Classification Kwasnik describes four classification systems –Hierarchies –Trees –Paradigms –Facets Paradigms are useful primarily for analysis of subject gaps and relationships in a constrained space Trees are a poor form of hierarchy with limited relationships We’ll look at the other two in some detail over the next two weeks

IMT530- Organization of Information Resources15 Hierarchies Good for representation of knowledge in mature domains where the nature of the entities and relationships are well known You’ll see examples of these in the thesauri that we will look at in today’s exercise Require a model that describes what entities are included, with rules of association and distinction Tend to be monolithic and cumbersome for large domains

IMT530- Organization of Information Resources16 Facets Actually a different approach rather than a different structure –May use hierarchies or trees as part of the structure –Originated in the work of S.R. Ranganathan Proposed that any object could be viewed in five ways: personality, matter, energy, space and time (PMEST) –Being used more and more in modern information systems because of flexibility in meeting multiple needs

IMT530- Organization of Information Resources17 Collaborative Tagging Points out issues of “basic level” and “collective sensemaking” Tug of war between personal storage –Identifying qualities –Self reference –Task organizing and public nature of access –What or who it is about –What it is –Who owns it –Categories Stability emerges from imitation and shared experience

IMT530- Organization of Information Resources18 Trees vs. Tags Weinberger’s article postulates three types of vocabularies –Trees (hierarchies) –Facets –Tags Golder/Huberman and Weinberger both point out that each approach can be useful in particular situations –Choosing your approach is part of the process of subject and domain analysis

IMT530- Organization of Information Resources19 Steps in Constructing CVs Define your domain Gather concepts –From user interviews, search logs, content analysis, preexisting vocabularies Select your approach Extract terminology Control your terms Organize your terms Maintain, maintain, maintain

IMT530- Organization of Information Resources20 Questions? If not, take a break!!!

IMT530- Organization of Information Resources21 Exercise 6a Purpose is to explore some existing controlled vocabularies to investigate their differences and similarities, how useful they might be for subject access, and to become familiar with the structure of controlled vocabularies in general Spend the next 45 minutes on Exercise 6a Ask questions and talk!!! Be sure to hand in completed work at the end of class for credit!!!

IMT530- Organization of Information Resources22 Thursday We’ll start to look at ways to build controlled vocabularies and the rules associated with them Remember to read assignments BEFORE class