Controlling values The equivalence relationship. The vocabulary problem What is this?

Slides:



Advertisements
Similar presentations
CINAHL DATABASE FOR HINARI USERS: nursing and allied health information (Module 7.1)
Advertisements

Subject Analysis: An Introduction Based on BASIC SUBJECT CATALOGING USING LCSH edited by Lori Robare.
Advanced Information Systems Laboratory Department of Computer Science and Systems Engineering GI-DAYS MÜNSTER A software tool.
Module 5a: Authority Control and Encoding Schemes IMT530: Organization of Information Resources Winter 2007 Michael Crandall.
Other Nursing Databases – Part 2 MEDLINE, Dissertations & Theses, Cochrane and ERIC.
Taxonomies, Lexicons and Organizing Knowledge Wendi Pohs, IBM Software Group.
Advanced Searching Engineering Village.
Session 8 Technical Services Moving from conceptual description to implementation technology.
Leveraging Your Taxonomy to Increase User Productivity MAIQuery and TM Navtree.
Search Strategies Online Search Techniques. Universal Search Techniques Precision- getting results that are relevant, “on topic.” Recall- getting all.
Subject Access in the Digital Age Presented by Carol Bradsher.
Using Metadata in CONTENTdm Diana Brooking and Allen Maberry Metadata Implementation Group, Univ. of Washington Crossing Organizational Boundaries Oct.
IMT530- Organization of Information Resources1 Feedback Like exercises –But want more instructions and feedback on them –Wondering about grading on these.
1 Languages for aboutness n Indexing languages: –Terminological tools Thesauri (CV – controlled vocabulary) Subject headings lists (CV) Authority files.
Knowledge organisation and information architecture, Nils Pharo Knowledge organisation and the Web Nils Pharo, 6th November 2002.
Sunday May 4 – 5 PM Bradford, Hlava, McNaughton
Vocabulary & languages in searching
Taxonomies: Hidden but Critical Tools Marjorie M.K. Hlava President Access Innovations, Inc.
Indexing 1/2 BDK12-3 Information Retrieval William Hersh, MD Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University.
Controlled Vocabulary & Thesaurus Design Planning & Maintenance.
Search Engines and Information Retrieval Chapter 1.
Indexing Knowledge Daniel Vasicek 2014 March 27 Introduction Basic topic is : All Human Knowledge Who Cares? Simple Examples.
 Copyright 2006 Digital Enterprise Research Institute. All rights reserved. Collaborative Building of Controlled Vocabularies Crosswalks Mateusz.
Improving Access to Audio- Visual Materials by Using Genre/Form Terms OLAC Conference 1-3 October 2004 Montreal, Quebec.
MPC Library Research Colloquium Part One Oct 2 nd, 2012.
1 4. Content Organization In this chapter you will learn about: Organizational schemes: classification systems for organizing content into groups Organizational.
CINAHL DATABASE FOR HINARI USERS: nursing and allied health information (Module 7.1)
Controlled Vocabulary & Thesaurus Design Term Selection/Format & Synonyms.
Basic Catalog Searching Rich Edwards Innovative Coordinator Washington State Library.
D4: SKOS and HIVE—Enhancing the Creation, Design and Flow of Information Speakers: Hollie White Jane Greenberg Coordinator: Alan Keely.
ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.
Are LCSH still effective? Why not use keyword searching instead? Presented by Carol Bradsher October 29, 2004.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
IL Step 3: Using Bibliographic Databases Information Literacy 1.
Definition of a taxonomy “System for naming and organizing things into groups that share similar characteristics” Taxonomy Architectures Applications.
Current Events and Issues Using Index Databases for Finding Answers.
Librarians vs. Automation Carolyn Weber Lucio Campanelli Will Hohyon Ryu.
The UNESCO Thesaurus Meeting for Managers of UNESCO Documentation Networks Meron Ewketu UNESCO Library June
Searching Voyager: #2: Finding a Book by Its Title Zale Library at Paul Quinn College David Hamrick, 2012 “Now, voyager, sail thou forth to seek and find…”
MeSH The Medical Subject Headings from the National Library of Medicine.
1 Controlled Vocabularies Paul Miller Interoperability Focus UKOLN U KOLN is funded by Resource: the Council.
Controlled Vocabulary & Thesaurus Design Term Selection/Format & Synonyms.
Thesauri usage in information retrieval systems: example of LISTA and ERIC database thesaurus Kristina Feldvari Departmant of Information Sciences, Faculty.
WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.
Controlling values The equivalence relationship. The vocabulary problem What is this?
Librarians vs. Automation Carolyn Weber Lucio Campanelli Will Hohyon Ryu.
Subject Headings for Reference Everything You Need to Know About Subject Headings in One Easy Lesson By Dr. Nancy J. Becker Presented by Dr. Kevin Rioux.
June 2003INIS Training Seminar1 INIS Training Seminar 2-6 June 2003 Subject Analysis Thesaurus and Indexing Alexander Nevyjel Subject Control Unit INIS.
IMT530- Organization of Information Resources1 Feedback Lectures –More practical examples –Like guest lecturers –Generally helpful in understanding concepts.
Controlled Vocabulary & Thesaurus Design Associative Relationships & Thesauri.
Controlled Vocabulary & Thesaurus Design Types of Controlled Vocabularies.
Subject Access to Your Information Sandy Tucker Texas A&M University Libraries August 1, 2006 Second International Symposium on Transportation Technology.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
Charlyn P. Salcedo Instructor Types of Indexing Languages.
Learning Objectives 1.Students will be able to identify and implement three different strategies for when they are getting too many sources in their search.
Ontologies COMP6028 Semantic Web Technologies Dr Nicholas Gibbins
Controlling values for information organization 384C – Organizing Information Spring 2016 Karen Wickett School of Information University of Texas at Austin.
1 How do we describe something? n What something is about? –What the content of an object is “about”? n Different methods (Wilson, 1968) –counting terms.
OVIDSP Searches Library Informatics 2011/2012 Edit Csajbók Semmelweis University Central Library.
REVIEW OF LITERATURE Dr Reneega Gangadhar MD Professor & Head of Pharmacology Govt. T.D Medical college Alappuzha.
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
COMP6215 Semantic Web Technologies
After this course you will be able to:
Introduction to Semantic Metadata & Semantic Web
Review Key Teaching Points
Introduction to Information Retrieval
PubMed.
Attributes and Values Describing Entities.
Indexing CHARLYN P. SALCEDO, RL.
THESAURUS CONSTRUCTION: GROUND WATER
Presentation transcript:

Controlling values The equivalence relationship

The vocabulary problem What is this?

Synonymy Restroom, bathroom, toilet, loo, facilities, WC, ladies’ room, mens’ room, little girls’ room, little boys’ room... Synonymy: Using different words to identify the same concept.

Another vocabulary problem What is mercury? What is bank? What is python? What is java?

Polysemy Polysemy: Using the same word (morphologically speaking) to identify different concepts. Java: Island in Indonesia, variety of coffee bean, generic term for coffee, object-oriented programming language.

Yet more vocabulary problems The White House has been lobbying Congress to support the proposed budget... Freedom of the press is an important value in the United States... I’m tired of taking the bus; I need some new wheels...

Metonymy and synecdoche Metonymy: Using a related concept to stand for another concept. Synecdoche: Using the word for part of something to stand for the entire thing.

Do people label consistently? No. Furnas and colleagues asked people (including subject experts) to label a variety of items (recipes, text editing operations, “common content objects”). Surprise, there was little agreement among the names submitted by participants. Conclusion: “The idea of an ‘obvious,’ ‘self-evident,’ or ‘natural’ term is a myth! Since even the best possible name is not very useful, it follows that there can exist no rules, guidelines or procedures for choosing a good name, in the sense of ‘accessible to the unfamiliar user.’”

What to do? Furnas and colleagues suggest that interface designers: Implement unlimited aliasing. Disambiguate terms that can be used in multiple senses by presenting possibilities to users and asking them to select the appropriate one.

Limitations of Furnas study Participants were asked to label objects, not how they would search for objects. The study assumes a search interface, not a browsing (or menu-driven) interface. In a search interface, users must recall or guess an object’s name. In a browsing interface, users merely need to recognize the appropriate term.

Vocabulary problems and information systems Designers of information organization systems have long grappled with the ambiguities of language. Synonymy, polysemy, and so on complicate the goal to collocate, or bring together, like items in an information system.

Vocabulary control In LIS, vocabulary control is similar to Furnas’s idea of aliasing: concepts are associated with their synonyms. One term is designated as preferred: this is the term used in a display. Other labels associated with the concept are used in searching. Example: Search Nordstrom.com for “frock” and get “dresses” instead.

Example of a controlled term Preferred term: bathroom Equivalent terms: restroom, loo, toilet, WC, ladies’ room, mens’ room, little girls’ room, little boys’ room, ladies room, ladys room, lady’s room, ladie’s room, ladys’ room...

Equivalence can be relative Similar concepts may be treated as equivalents; this is a design decision by the vocabulary creator. Example Vocabulary includes this preferred term: Beer These terms are designated as equivalents: ale, porter, stout, pilsner, bock, IPA, malt liquor, barley wine.

Disambiguation in vocabularies Polysemous terms are often identified by adding qualifying terms in parentheses. Mercury (chemical element) Mercury (god in Greek mythology) Search engines may use ask users to select the sense they want.

Digression into the library catalog Library catalogs have three traditional access points: author, title, and subject. In the old card catalog, these were the three ways that users could search. Each of these access points has associated vocabulary control.

Control of names In library cataloging, controlled vocabularies for authors, titles, and subjects are called authority files. Authority files both disambiguate names that identify multiple people or items and group variations for the same person or item (that is, they deal with polysemy and synonymy).

Authority file examples In the UT author authority file: headings for Patricia Williams: Names are disambiguated by using middle initials and dates of birth. Cross references are used for some authors. There may still be two headings for one person.

Fun digression: Pseudonyms in the catalog The current catalog maintains pseudonymous identities (in older catalogs, everything went under the author’s real name). For example, “Carolyn Keene,” the name used by multiple people as the author for the Nancy Drew novels, is maintained as an author entity in the authority file.

Thesauri Thesauri are a type of controlled vocabulary that include equivalence, hierarchical, and associative relationships. Thesauri can also be faceted (that is, represent multiple aspects of a concept...we will discuss facets in depth later). Thesauri are often developed to deal with subjects of documents, and we will talk a lot about this beginning in a few weeks.

Example thesaurus entry Dark chocolate BTChocolate RTSingle-origin chocolate UF Semisweet chocolate Baker’s chocolate Sweet chocolate SN Chocolate without milk solids and with less than 70 percent chocolate mass. BT: broader term, one level up in a hierarchy RT: related term, in another facet or hierarchical branch UF: Use for; synonyms, or non- preferred terms SN: Scope note; definitions or usage guidelines

Controlled vocabulary example: MeSH and PubMed The Medical Subject Headings (MeSH) index journal articles for the PubMed database. Keyword searches in PubMed are automatically expanded with MeSH. Searches can also be explicitly limited to MeSH terms, which can increase precision. The comparison to a system like Google Scholar is illuminating.

Summary Controlled vocabularies increase precision and recall in searching by identifying equivalent terms. Authority files are types of controlled vocabularies. Thesauri are subject-based controlled vocabularies that include hierarchical and associative relationships in addition to equivalence relationships. Thesauri can also be used as browsing interfaces.