1 CS 430: Information Discovery Lecture 21 Thesauruses and Gazetteers.

Slides:



Advertisements
Similar presentations
1 DATA STRUCTURES USED IN SPATIAL DATA MINING. 2 What is Spatial data ? broadly be defined as data which covers multidimensional points, lines, rectangles,
Advertisements

Alexandria Digital Library Project Integration of Knowledge Organization Systems into Digital Library Architectures Linda Hill, Olha Buchel, Greg Janée.
Using Reference Sources Fleet RISD. Why Use Reference Sources? Reference Sources provide an overview of a subject at the beginning of the research.
Subject Analysis: An Introduction Based on BASIC SUBJECT CATALOGING USING LCSH edited by Lori Robare.
Module 5a: Authority Control and Encoding Schemes IMT530: Organization of Information Resources Winter 2007 Michael Crandall.
Searching Pubmed Database استخدام قاعدة المعلومات Pubmed د. سيناء عبد المحسن العقيل قسم الصيدلة الإكلينيكية برنامج مهارات البحث العلمي.
Session 8 Technical Services Moving from conceptual description to implementation technology.
Lexicography ( Dictionary Skills) Lecture 2
1 Discussion Class 12 Medical Subject Headings (MeSH) and Unified Medical Language System (UML)
1 CS 430 / INFO 430 Information Retrieval Lecture 26 Classification 1.
1 CS 502: Computing Methods for Digital Libraries Lecture 20 Multimedia digital libraries.
Using Metadata in CONTENTdm Diana Brooking and Allen Maberry Metadata Implementation Group, Univ. of Washington Crossing Organizational Boundaries Oct.
Thesaurus Design and Development
1 CS 502: Computing Methods for Digital Libraries Lecture 12 Information Retrieval II.
The Subject Librarian's Role in Building Digital Collections: Where Information Management and Subject Expertise Meet Ruth Vondracek Oregon State University.
A Digital Geolibrary: Integrating Keywords and PlacenamesECDL A Digital GeoLibrary: Integrating Keywords And Place Names Mathew Weaver and Lois Delcambre.
1 CS 430: Information Discovery Lecture 1 Overview of Information Discovery.
1 Languages for aboutness n Indexing languages: –Terminological tools Thesauri (CV – controlled vocabulary) Subject headings lists (CV) Authority files.
1 CS 430 / INFO 430 Information Retrieval Lecture 22 Non-Textual Materials 1.
Medical Subject Headings (MeSH)
Computers in Libraries Conference Search Tools Using Controlled Vocabularies Jan Herd, FLICC, Libraries and Emerging Technologies Joyce Koenemann, National.
1 MeSH & Principles of Classification April 13, 2005.
DeCS/MeSH description, uses, services, updating Adalberto Tardelli BIREME/PAHO/WHO GHL Workshop March 27, 2007.
Terminology services and the DDC: the High-Level Thesaurus and beyond Presented to the symposium Dewey goes Europe: on the use and development of the Dewey.
Improving Access to Audio- Visual Materials by Using Genre/Form Terms OLAC Conference 1-3 October 2004 Montreal, Quebec.
Databases Indexes & Abstracts. Indexes & Abstracts = Serials When most librarians think about science and technology they think about serials and the:
1 CS 430: Information Discovery Lecture 16 Thesauruses and Gazetteers.
The Library Cataloging Tradition Marty Kurth CS 431 February 9, 2005 [slides stolen from Diane Hillmann]
AAT Art & Architecture Thesaurus. Diffuse list of museum standards
IL Step 3: Using Bibliographic Databases Information Literacy 1.
Current Events and Issues Using Index Databases for Finding Answers.
Metadata and Documentation Iain Wallace Performing Arts Data Service.
MeSH The Medical Subject Headings from the National Library of Medicine.
Information Retrieval Thesauruses and Cluster Analysis 1.
Alexandria Digital Library Project Introduction ---- Digital Gazetteers Integration into Distributed Library Services JCDL 2002 Workshop Sponsored by Networked.
Rupa Tiwari, CSci5980 Fall  Course Material Classification  GIS Encyclopedia Articles  Classification Diagram  Course – Encyclopedia Mapping.
Discovering Descriptive Knowledge Lecture 18. Descriptive Knowledge in Science In an earlier lecture, we introduced the representation and use of taxonomies.
1 CS/INFO 430 Information Retrieval Lecture 21 Metadata 3.
1 CS 430: Information Discovery Lecture 25 Cluster Analysis 2 Thesaurus Construction.
1 Spatial Data Models and Structure. 2 Part 1: Basic Geographic Concepts Real world -> Digital Environment –GIS data represent a simplified view of physical.
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
RESEARCH – DOING AND ANALYSING Gavin Coney Thomson Reuters May 2009.
1 Understanding Cataloging with DLESE Metadata Karon Kelly Katy Ginger Holly Devaul
1 CS/INFO 430 Information Retrieval Lecture 15 Metadata 2.
Functional Requirements for Bibliographic Records The Changing Face of Cataloging William E. Moen Texas Center for Digital Knowledge School of Library.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
INTRODUCTION TO GIS  Used to describe computer facilities which are used to handle data referenced to the spatial domain.  Has the ability to inter-
Subject Headings for Reference Everything You Need to Know About Subject Headings in One Easy Lesson By Dr. Nancy J. Becker Presented by Dr. Kevin Rioux.
Subject Headings Objective: Students will understand that both books and articles are assigned words to describe their contents. These terms are referred.
IMT530- Organization of Information Resources1 Feedback Lectures –More practical examples –Like guest lecturers –Generally helpful in understanding concepts.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Controlled Vocabulary & Thesaurus Design Associative Relationships & Thesauri.
PubMed …featuring more than 20 million citations for biomedical literature from MEDLINE, life science journals, and online books.
LIS 204: Introduction to Library and Information Science Week Nine Kevin Rioux, PhD.
Charlyn P. Salcedo Instructor Types of Indexing Languages.
1 CS 430: Information Discovery Lecture 21 Non-Textual Materials 1.
Using the thesaurus Audi_insperation; Flickr, Creative Commons Licence: / /
1 CS 430: Information Discovery Lecture 23 Non-Textual Materials.
1 How do we describe something? n What something is about? –What the content of an object is “about”? n Different methods (Wilson, 1968) –counting terms.
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
Discovery and Metadata March 9, 2004 John Weatherley
Subject Headings for Reference
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
INTRODUCTION TO GEOGRAPHICAL INFORMATION SYSTEM
CS 430: Information Discovery
Application of Dublin Core and XML/RDF standards in the KIKERES
IL Step 3: Using Bibliographic Databases
PubMed.
THESAURUS CONSTRUCTION: GROUND WATER
Presentation transcript:

1 CS 430: Information Discovery Lecture 21 Thesauruses and Gazetteers

2 Course Administration

3 Lexicon and Thesaurus Lexicon contains information about words, their morphological variants, and their grammatical usage. Thesaurus relates words by meaning: ship, vessel, sail; craft, navy, marine, fleet, flotilla book, writing, work, volume, tome, tract, codex search, discovery, detection, find, revelation (From Roget's Thesaurus, 1911)

4 Thesaurus in Information Retrieval Use of a thesaurus in indexing (precoordination) A. Manual Used to guide human indexer to assign standard terms and associations. computer-aided instruction see also education UF teaching machines BT educational computing TT computer applications RT education RT teaching From: INSPEC Thesaurus

5 Thesaurus in Information Retrieval Use of a thesaurus in indexing (precoordination) B. Automatic Divide terms into thesaurus classes. Replace similar terms by a thesaurus class. 408dislocation409blast-cooled junctionheat-flow minority-carrierheat-transfer n-p-n p-n-p410 anneal point-contactstrain recombine transition unijunction From: Salton and McGill

6 Desirable Properties for Information Retrieval Thesaurus is specific to a subject area. Contains only terms of interest for identification within that subject area. Ambiguous terms are coded only for the senses important for that field. Target is that each thesaurus class should include terms of moderate frequency. Ideally the classes should have similar frequency.

7 Art and Architecture Thesaurus Controlled vocabulary for describing and retrieving information: fine art, architecture, decorative art, and material culture. Almost 120,000 terms for objects, textual materials, images, architecture and culture from all periods and all cultures. Used by archives, museums, and libraries to describe items in their collections. Used to search for materials. Used by computer programs, for information retrieval, and natural language processing. A project of the J. Paul Getty Trust

8 Art and Architecture Thesaurus Provides the terminology for objects, and the vocabulary necessary to describe them, such as style, period, shape, color, construction, or use, and scholarly concepts, such as theories, or criticism. Concept: a cluster of terms, one of which is established as the preferred term, or descriptor. Categories: associated concepts, physical attributes, styles and periods, agents, activities, materials, and objects.

9 Art and Architecture Thesaurus: Sample Record Record ID: Descriptor: rhyta Note: Refers to vessels from Ancient Greece, eastern Europe, or the Middle East that typically have a closed form with two openings, one at the top for filling and one at the base so that liquid could stream out. They are often in the shape of a horn or an animal's head, and were typically used as a drinking cup or for pouring wine into another vessel. Hierarchy: Containers [TQ]

10 Art and Architecture Thesaurus: Sample Record (continued) Terms: rhyta rhyton (alternate, singular) protomai protome rhea rheon rheons Related concepts: stirrup cups sturzbechers drinking vessels ceremonial vessels

11 MeSH -- Medical Subject Headings Controlled vocabulary for indexing articles, for cataloging books and other holdings, and for searching MeSH-indexed databases, including MEDLINE. About 19,000 primary subject headings Thesaurus of 110,000 chemical terms. Total vocabulary over 300,000 terms. National Library of Medicine provides MeSH subject headings for each of the 400,000 articles that it indexes every year. "MeSH terminology provides a consistent way to retrieve information that may use different terminology for the same concepts."

12 MeSH -- Medical Subject Headings MeSH hierarchy: general terms, e.g., anatomy, organisms, diseases, biological sciences; anatomy is divided into sixteen topics, e.g., body regions and musculoskeletal system; body regions is divided into sections, e.g., abdomen, axilla, back etc.

13 Example of MeSH hierarchy Biological Sciences [G] Biological Sciences [G01] + Health Occupations [G02] + Environment and Public Health [G03] + Biological Phenomena, Cell Phenomena, and Immunity [G04] + Genetics [G05] + Biochemical Phenomena, Metabolism, and Nutrition [G06] + Physiological Processes [G07] + Reproductive and Urinary Physiology [G08] + Circulatory and Respiratory Physiology [G09] + Digestive, Oral, and Skin Physiology [G10] + Musculoskeletal, Neural, and Ocular Physiology [G11] + Chemical and Pharmacologic Phenomena [G12] +

14 Example of MeSH hierarchy (continued) Physiological Processes [G07] Adaptation, Physiological [G07.062] + Aging [G07.168] + Body Constitution [G07.265] + Body Temperature [G07.315] Body Temperature Regulation [G ] + Skin Temperature [G ] Chronobiology [G07.450] + Electrophysiology [G07.453] + Fluid Shifts [G07.503] Growth and Embryonic Development [G07.553] + Homeostasis [G07.621] + Tensile Strength [G07.900] Tropism [G07.950] +

15 Example of MeSH hierarchy (continued) MeSH HeadingBody Temperature Tree NumberE Tree NumberG Entry TermOrgan Temperature See AlsoFever See AlsoThermography See AlsoThermometers Allowable QualifiersDE GE IM PH RE Unique IDD001831

16 Observations about Manually Maintained Thesaurus Permit very rich structure of relationships Most effective when user of search system is skilled in the discipline and trained in the use of the thesaurus (e.g., medical librarian) Needs continually updating as a field develops new terminology Expensive to create and maintain

17 Gazetteers The Alexandria Digital Library (ADL): geolibrary at University of California at Santa Barbara where a primary attribute of objects is location on Earth (e.g., map, satellite photograph). Geographic footprint: latitude and longitude values that represent a point, a bounding box, a linear feature, or a complete polygonal boundary. Gazetteer: list of geographic names, with geographic locations and other descriptive information. Geographic name: proper name for a geographic place or feature (e.g., Santa Barbara County, Mount Washington, St. Francis Hospital, and Southern California)

18 Alexandria Thesaurus: Example canals A feature type category for places such as the Erie Canal. Used for: The category canals is used instead of any of the following. canal bends canalized streams ditch mouths ditches drainage canals drainage ditches... more... Broader Terms: Canals is a sub-type of hydrographic structures.

19 Alexandria Thesaurus: Example (continued) canals (continued) Related Terms: The following is a list of other categories related to canals (non- hierarchial relationships). channels locks transportation features tunnels Scope Note: Manmade waterway used by watercraft or for drainage, irrigation, mining, or water power. » Definition of canals.

20 Use of a Gazetteer Answers the "Where is" question; for example, "Where is Santa Barbara?" Translates between geographic names and locations. A user can find objects by matching the footprint of a geographic name to the footprints of the collection objects. Locates particular types of geographic features in a designated area. For example, a user can draw a box around an area on a map and find the schools, hospitals, lakes, or volcanoes in the area.

21 Alexandria Gazetteer: Example from a search on "Tulsa" Feature nameStateCountyTypeLatitudeLongitude Tulsa OK Tulsapop pl360914N W Tulsa CountryOKOsagelocale360958N W Club Tulsa CountyOKTulsacivil360600N W Tulsa HelicoptersOKTulsaairport360500N W Incorporated Heliport

22 Challenges for the Alexandria Gazetteer Content standard: A standard conceptual schema for gazetteer information. Feature types: A type scheme to categorize individual features, is rich in term variants and extensible. Temporal aspects: Geographic names and attributes change through time. "Fuzzy" footprints: Extent of a geographic feature is often approximate or ill-defined (e.g., Southern California).

23 Challenges for the Alexandria Gazetteer (continued) Quality aspects: (a) Indicate the accuracy of latitude and longitude data. (b) Ensure that the reported coordinates agree with the other elements of the description. Spatial extents: (a) Points do not represent the extent of the geographic locations and are therefore only minimally useful. (b) Bounding boxes, often include too much territory (e.g., the bounding box for California also includes Nevada).

24 Examples of Gazetteers Alexandria Digital Library Linda L. Hill, James Frew, and Qi Zheng, Geographic Names: The Implementation of a Gazetteer in a Georeferenced Digital Library. D-Lib Magazine, 5: 1, January Getty Thesaurus of Geographic Names