Controlling values The equivalence relationship. The vocabulary problem What is this?

Slides:



Advertisements
Similar presentations
CINAHL DATABASE FOR HINARI USERS: nursing and allied health information (Module 7.1)
Advertisements

EndNote Web Reference Management Software (module 5.1)
THE STEPS OF SEARCH You have opened a new veterinary clinic in a small town, and want people in the vicinity to know about it. You need some new ideas.
Subject Analysis: An Introduction Based on BASIC SUBJECT CATALOGING USING LCSH edited by Lori Robare.
Advanced Information Systems Laboratory Department of Computer Science and Systems Engineering GI-DAYS MÜNSTER A software tool.
Module 5a: Authority Control and Encoding Schemes IMT530: Organization of Information Resources Winter 2007 Michael Crandall.
Controlling values The equivalence relationship. The vocabulary problem What is this?
PubMed: Outline Coverage MeSH, mapping and subheadings Simple search Limits Displaying and managing results MeSH database Single citation matcher.
Advanced Searching Engineering Village.
Session 8 Technical Services Moving from conceptual description to implementation technology.
Leveraging Your Taxonomy to Increase User Productivity MAIQuery and TM Navtree.
University of Adelaide Library Life Impact The University of Adelaide The well connected catalogue Patricia Scott, Denise Tobin and Helen Attar.
Search Engines and Information Retrieval
Search Strategies Online Search Techniques. Universal Search Techniques Precision- getting results that are relevant, “on topic.” Recall- getting all.
Subject Access in the Digital Age Presented by Carol Bradsher.
Using Metadata in CONTENTdm Diana Brooking and Allen Maberry Metadata Implementation Group, Univ. of Washington Crossing Organizational Boundaries Oct.
1 Languages for aboutness n Indexing languages: –Terminological tools Thesauri (CV – controlled vocabulary) Subject headings lists (CV) Authority files.
Information Retrieval
Knowledge organisation and information architecture, Nils Pharo Knowledge organisation and the Web Nils Pharo, 6th November 2002.
Sunday May 4 – 5 PM Bradford, Hlava, McNaughton
What do you hate most about the web?
In CUNY, there are many ways of searching using the on-line CUNYPLUS catalog. CONTINUE To Ways to Search CONTINUE TO BASIC Searching.
Taxonomies: Hidden but Critical Tools Marjorie M.K. Hlava President Access Innovations, Inc.
Indexing 1/2 BDK12-3 Information Retrieval William Hersh, MD Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University.
Search Engines and Information Retrieval Chapter 1.
Searching Databases. What is in the Library? The Online Library has thousands of journal articles and electronic books available for your use. Also available.
MPC Library Research Colloquium Part One Oct 2 nd, 2012.
1 4. Content Organization In this chapter you will learn about: Organizational schemes: classification systems for organizing content into groups Organizational.
1 Catalog Displays, Retrieval, and FAST May 31, 2005.
CINAHL DATABASE FOR HINARI USERS: nursing and allied health information (Module 7.1)
Controlled Vocabulary & Thesaurus Design Term Selection/Format & Synonyms.
D4: SKOS and HIVE—Enhancing the Creation, Design and Flow of Information Speakers: Hollie White Jane Greenberg Coordinator: Alan Keely.
IL Step 2: Searching for Information Information Literacy 1.
Click on the tab to find journals by Subjects. From the drop down menu, we will select Parasitology and Parasitic Diseases.
IL Step 3: Using Bibliographic Databases Information Literacy 1.
Definition of a taxonomy “System for naming and organizing things into groups that share similar characteristics” Taxonomy Architectures Applications.
The Future of Cataloging Codes and Systems: IME ICC, FRBR, and RDA by Dr. Barbara B. Tillett Chief, Cataloging Policy & Support Office Library of Congress.
The UNESCO Thesaurus Meeting for Managers of UNESCO Documentation Networks Meron Ewketu UNESCO Library June
INFO Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.
Controlled Vocabulary & Thesaurus Design Term Selection/Format & Synonyms.
RDA and Special Libraries Chris Todd, Janess Stewart & Jenny McDonald.
Thesauri usage in information retrieval systems: example of LISTA and ERIC database thesaurus Kristina Feldvari Departmant of Information Sciences, Faculty.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
June 2003INIS Training Seminar1 INIS Training Seminar 2-6 June 2003 Subject Analysis Thesaurus and Indexing Alexander Nevyjel Subject Control Unit INIS.
IMT530- Organization of Information Resources1 Feedback Lectures –More practical examples –Like guest lecturers –Generally helpful in understanding concepts.
Controlled Vocabulary & Thesaurus Design Associative Relationships & Thesauri.
To find journals by language of publication, click on the Languages bar in the horizontal frame. The Languages drop down menu appear and we will choose.
Sally McCallum Library of Congress
Subject Access to Your Information Sandy Tucker Texas A&M University Libraries August 1, 2006 Second International Symposium on Transportation Technology.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
Charlyn P. Salcedo Instructor Types of Indexing Languages.
MEDLINE®/PubMed® PubMed for Trainers, Fall 2015 U.S. National Library of Medicine (NLM) and NLM Training Center An introduction.
Controlling values for information organization 384C – Organizing Information Spring 2016 Karen Wickett School of Information University of Texas at Austin.
1 How do we describe something? n What something is about? –What the content of an object is “about”? n Different methods (Wilson, 1968) –counting terms.
OVIDSP Searches Library Informatics 2011/2012 Edit Csajbók Semmelweis University Central Library.
REVIEW OF LITERATURE Dr Reneega Gangadhar MD Professor & Head of Pharmacology Govt. T.D Medical college Alappuzha.
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
Subject Analysis: An Introduction
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Information Organization
LIS 100 IFEST What should you know?.
CINAHL DATABASE FOR HINARI USERS
Introduction to Semantic Metadata & Semantic Web
Name authority control in an evolving landscape
IL Step 3: Using Bibliographic Databases
IL Step 2: Searching for Information
Introduction to Information Retrieval
PubMed.
Attributes and Values Describing Entities.
THESAURUS CONSTRUCTION: GROUND WATER
Presentation transcript:

Controlling values The equivalence relationship

The vocabulary problem What is this?

Synonymy Restroom, bathroom, toilet, loo, facilities, WC, ladies’ room, mens’ room, little girls’ room, little boys’ room... Synonymy: Using different words to identify the same concept.

Another vocabulary problem What is mercury? What is bank? What is python? What is java?

Polysemy Polysemy: Using the same word (morphologically speaking) to identify different concepts. Java: Island in Indonesia, specific variety of coffee bean, generic term for coffee, object- oriented programming language.

Yet more vocabulary problems The White House has been lobbying Congress to support the proposed budget... Freedom of the press is an important value in the United States... I’m tired of taking the bus; I need some new wheels...

Metonymy and synecdoche Metonymy: Using a related concept to stand for another concept. Synecdoche: Using the word for part of something to stand for the entire thing.

Furnas, et al’s experiment Furnas, et al asked people (including subject experts) to label a variety of items (recipes, text editing operations, “common content objects”). Surprise, there was little agreement among the names submitted by participants. Conclusion: “The idea of an ‘obvious,’ ‘self-evident,’ or ‘natural’ term is a myth! Since even the best possible name is not very useful, it follows that there can exist no rules, guidelines or procedures for choosing a good name, in the sense of ‘accessible to the unfamiliar user.’

Furnas, et al’s recommendations Furnas, et al suggest that interface designers: Implement unlimited aliasing. Disambiguate terms that can be used in multiple senses by presenting possibilities to users and asking them to select the appropriate one.

Limitations of Furnas’s study Participants were asked to label objects, not how they would search for objects. The study assumes a search interface, not a browsing (or menu-driven) interface. In a search interface, users must recall or guess an object’s name. In a browsing interface, users merely need to recognize the appropriate term.

Vocabulary problems and information systems Designers of organizational systems have been grappling with the ambiguities of language for many years. Synonymy, polysemy, and so on complicate the goal to collocate, or bring together, like items in an information system (those by the same author, with the same title, or on the same subject).

Vocabulary control In LIS, vocabulary control is similar to Furnas’s idea of aliasing: multiple terms that might stand for the same concept are grouped together. One term is typically designated as preferred: this is the term used in a display (or, in a card catalog, the card with the preferred term would actually have the entry; the other terms would just be cross-references).

Example of a controlled term Preferred term: bathroom Equivalent terms: restroom, loo, toilet, WC, ladies’ room, mens’ room, little girls’ room, little boys’ room, ladies room, ladys room, lady’s room, ladie’s room, ladys’ room...

Digression into the library catalog Library catalogs have three traditional access points: author, title, and subject. In the old card catalog, these were the three ways that users could search. Each of these access points has associated vocabulary control.

Catalog entries Entry is an old term for a catalog record. For example, Herman Melville’s Moby Dick might have an entry in the card catalog under the subject Fiction—Whaling. The main entry designates the primary access point and, in the card catalog, the card with all the bibliographic information. (Other entries might have a cross-reference to the main entry only.) The entry for Moby Dick under Fiction—Whaling might say merely “See Melville, Herman. Moby Dick.”

Main entry confusion For many people, the designation of a primary access point or main entry is anachronistic in the world of online systems. We can search any attribute now: why select a “primary” one? Taylor notes three arguments for retaining the main entry: standardization of citation, subarrangement, and collocation of works.

Control of names Names, such as author or title names, are controlled via authority files. Authority files both disambiguate names that identify multiple people or items and group variations for the same person or item (that is, they deal with polysemy and synonymy).

Authority file examples In the UT author authority file: headings for Patricia Williams: Names are disambiguated by using middle initials and dates of birth. Cross references are used for some authors. There may still be two headings for one person!

Digression 2: Power catalog searching To increase the precision in library catalog searches, avoid keyword searching. Instead, search the appropriate authority file first, then search using the preferred heading. Magic! Searching the authority file typically necessitates proper query formation (e.g., last name, first name for author searches).

Digression 3: Pseudonyms in the catalog Pseudonymous identities are maintained in AACR2 (in older catalogs, everything went under the author’s real name). For example, “Carolyn Keene,” the name used by multiple people as the author for the Nancy Drew novels, is maintained as an author entity in the authority file.

Controlled subject vocabularies Subject vocabularies have varying amounts of structure (e.g., relationships between terms). Thesauri may include equivalence, hierarchical, and associative relationships. Thesauri can also be faceted (that is, represent multiple aspects of a subject...we will discuss facets in depth later in the course).

Example thesaurus entry Dark chocolate BTChocolate RTSingle-origin chocolate UF Semisweet chocolate Baker’s chocolate Sweet chocolate SN Chocolate without milk solids and with less than 70 percent chocolate mass. BT: broader term, one level up in a hierarchy RT: related term, in another facet or hierarchical branch UF: Use for; synonyms, or non- preferred terms SN: Scope note; definitions or usage guidelines

Equivalence in thesauri Similar concepts may be treated as equivalents as judged appropriate by the thesaurus designer. Examples: Beer UF ale, porter, stout, pilsner, bock, IPA... Cartography UF maps

Disambiguation in thesauri Polysemous terms are often identified by adding qualifying terms in parentheses. Mercury (element) Mercury (god in Greek mythology) Search engines may use ask users to select the sense they want.

Using controlled vocabularies: MeSH and PubMed The Medical Subject Headings (MeSH) index journal articles for the PubMed database. Keyword searches in PubMed are automatically expanded with MeSH. Searches can also be explicitly limited to MeSH terms, which can increase precision. The comparison to a system like Google Scholar is illuminating.

Standards for controlled vocabularies There are a number of standards for thesaurus construction: ISO, NISO, British. These can be quite detailed, but they provide mostly syntactic guidance: e.g., terms should take noun form.

Summary Controlled vocabularies increase precision and recall in searching by identifying equivalent terms. Authority files are types of controlled vocabularies that describe preferred forms of author names and names of works. Thesauri are subject-based controlled vocabularies that include hierarchical and associative relationships in addition to equivalence relationships. Thesauri can also be used as browsing interfaces.