1 CS 430: Information Discovery Lecture 16 Thesauruses and Gazetteers.

Slides:



Advertisements
Similar presentations
Alexandria Digital Library Project Integration of Knowledge Organization Systems into Digital Library Architectures Linda Hill, Olha Buchel, Greg Janée.
Advertisements

Using Reference Sources Fleet RISD. Why Use Reference Sources? Reference Sources provide an overview of a subject at the beginning of the research.
Subject Analysis: An Introduction Based on BASIC SUBJECT CATALOGING USING LCSH edited by Lori Robare.
Module 5a: Authority Control and Encoding Schemes IMT530: Organization of Information Resources Winter 2007 Michael Crandall.
Searching Pubmed Database استخدام قاعدة المعلومات Pubmed د. سيناء عبد المحسن العقيل قسم الصيدلة الإكلينيكية برنامج مهارات البحث العلمي.
Session 8 Technical Services Moving from conceptual description to implementation technology.
BASICS OF SURVEYING Ivy Tech Community College. Surveying Definition DEFINITION The art and science of making such measurements as are necessary to determine.
1 Discussion Class 12 Medical Subject Headings (MeSH) and Unified Medical Language System (UML)
1 CS 430 / INFO 430 Information Retrieval Lecture 26 Classification 1.
1 CS 502: Computing Methods for Digital Libraries Lecture 20 Multimedia digital libraries.
Using Metadata in CONTENTdm Diana Brooking and Allen Maberry Metadata Implementation Group, Univ. of Washington Crossing Organizational Boundaries Oct.
Thesaurus Design and Development
1 CS 502: Computing Methods for Digital Libraries Lecture 12 Information Retrieval II.
The Subject Librarian's Role in Building Digital Collections: Where Information Management and Subject Expertise Meet Ruth Vondracek Oregon State University.
A Digital Geolibrary: Integrating Keywords and PlacenamesECDL A Digital GeoLibrary: Integrating Keywords And Place Names Mathew Weaver and Lois Delcambre.
1 CS 430: Information Discovery Lecture 1 Overview of Information Discovery.
1 CS 430 / INFO 430 Information Retrieval Lecture 22 Non-Textual Materials 1.
USING STUDENT OUTCOMES WHEN INTEGRATING INFORMATION LITERACY SKILLS INTO COURSES Information Literacy Department Asa H. Gordon Library Savannah State University.
EuroVoc, Eurlex, EU Bookshop Danica Maleková, Publications Office STS Bratislava, 22 October 2010.
Medical Subject Headings (MeSH)
Computers in Libraries Conference Search Tools Using Controlled Vocabularies Jan Herd, FLICC, Libraries and Emerging Technologies Joyce Koenemann, National.
1 MeSH & Principles of Classification April 13, 2005.
1 CS 430: Information Discovery Lecture 21 Thesauruses and Gazetteers.
Terminology services and the DDC: the High-Level Thesaurus and beyond Presented to the symposium Dewey goes Europe: on the use and development of the Dewey.
Improving Access to Audio- Visual Materials by Using Genre/Form Terms OLAC Conference 1-3 October 2004 Montreal, Quebec.
1 CS 430: Information Discovery Lecture 14 Automatic Extraction of Metadata.
Databases Indexes & Abstracts. Indexes & Abstracts = Serials When most librarians think about science and technology they think about serials and the:
The Library Cataloging Tradition Marty Kurth CS 431 February 9, 2005 [slides stolen from Diane Hillmann]
AAT Art & Architecture Thesaurus. Diffuse list of museum standards
Current Events and Issues Using Index Databases for Finding Answers.
The UNESCO Thesaurus Meeting for Managers of UNESCO Documentation Networks Meron Ewketu UNESCO Library June
MeSH The Medical Subject Headings from the National Library of Medicine.
Information Retrieval Thesauruses and Cluster Analysis 1.
Rupa Tiwari, CSci5980 Fall  Course Material Classification  GIS Encyclopedia Articles  Classification Diagram  Course – Encyclopedia Mapping.
Discovering Descriptive Knowledge Lecture 18. Descriptive Knowledge in Science In an earlier lecture, we introduced the representation and use of taxonomies.
1 CS/INFO 430 Information Retrieval Lecture 21 Metadata 3.
1 CS 430: Information Discovery Lecture 25 Cluster Analysis 2 Thesaurus Construction.
RDA DAY 1 – part 2 web version 1. 2 When you catalog a “book” in hand: You are working with a FRBR Group 1 Item The bibliographic record you create will.
1 CS 430: Information Discovery Lecture 23 Cluster Analysis 2 Thesaurus Construction.
RESEARCH – DOING AND ANALYSING Gavin Coney Thomson Reuters May 2009.
1 Understanding Cataloging with DLESE Metadata Karon Kelly Katy Ginger Holly Devaul
1 CS/INFO 430 Information Retrieval Lecture 15 Metadata 2.
Functional Requirements for Bibliographic Records The Changing Face of Cataloging William E. Moen Texas Center for Digital Knowledge School of Library.
INTRODUCTION TO GIS  Used to describe computer facilities which are used to handle data referenced to the spatial domain.  Has the ability to inter-
Subject Headings for Reference Everything You Need to Know About Subject Headings in One Easy Lesson By Dr. Nancy J. Becker Presented by Dr. Kevin Rioux.
Subject Headings Objective: Students will understand that both books and articles are assigned words to describe their contents. These terms are referred.
IMT530- Organization of Information Resources1 Feedback Lectures –More practical examples –Like guest lecturers –Generally helpful in understanding concepts.
October 2001MyDatabase1 MyDatabase A framework for creating desktop media collections Caroline Beebe North Carolina State University Indiana University.
Controlled Vocabulary & Thesaurus Design Associative Relationships & Thesauri.
PubMed …featuring more than 20 million citations for biomedical literature from MEDLINE, life science journals, and online books.
LIS 204: Introduction to Library and Information Science Week Nine Kevin Rioux, PhD.
1 CS 430: Information Discovery Lecture 1 Overview of Information Discovery.
1 CS 430: Information Discovery Lecture 8 Collection-Level Metadata Vector Methods.
Charlyn P. Salcedo Instructor Types of Indexing Languages.
1 CS 430: Information Discovery Lecture 21 Non-Textual Materials 1.
1 CS 430: Information Discovery Lecture 23 Non-Textual Materials.
GUIDE. P UB M ED
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
Information organization Week 2 Lecture notes INF 380E: Perspectives on Information Spring 2015 Karen Wickett UT School of Information.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
INTRODUCTION TO GEOGRAPHICAL INFORMATION SYSTEM
CS 430: Information Discovery
Taxonomies, Lexicons and Organizing Knowledge
Application of Dublin Core and XML/RDF standards in the KIKERES
Introduction to Semantic Metadata & Semantic Web
PubMed.
Research4Life Programmes: Similarities and Differences! (Part A)
Research4Life Programmes: Similarities and Differences!
THESAURUS CONSTRUCTION: GROUND WATER
Presentation transcript:

1 CS 430: Information Discovery Lecture 16 Thesauruses and Gazetteers

2 Shared Work!!! Some programs for Assignment 2 had sections of identical code! This is not acceptable. 1. If you incorporate code from other sources, it must be acknowledged. 2. If you work with a colleague: (a) You must write your own assignment. (b) You should acknowledge the joint preparation. IF YOU HAVE NOT FOLLOWED THESE PRINCIPLES, CONTACT ME DIRECTLY.

3 Course Administration Midterm examination Wednesday, October31. 7:30 to 9:00 Room: To be announced. Three questions based on readings and lectures Open book Sample examination See the Notices page on the course web site for last year's midterm and a set of PowerPoint slides that discuss the solutions. (This examination had four questions.)

4 Examination Suggest that you bring: Text book Copies of lecture slides Discussion class readings The examination will be on only material covered in the lectures and in the discussion classes. The objective is to reward people who regularly attend class and prepare thoroughly for the discussion sections.

5 Course Administration Syllabus changes Because of a National Science Foundation meeting that was rescheduled after September 11: Lecture on December 4 is cancelled. Topics for other lectures have been reordered. There are no changes in the readings or assignments.

6 Lexicon and Thesaurus Lexicon contains information about words, their morphological variants, and their grammatical usage. Thesaurus relates words by meaning: ship, vessel, sail; craft, navy, marine, fleet, flotilla book, writing, work, volume, tome, tract, codex search, discovery, detection, find, revelation (From Roget's Thesaurus, 1911)

7 Thesaurus in Information Retrieval Use of a thesaurus in indexing (precoordination) A. Manual Used to guide human indexer to assign standard terms and associations. computer-aided instruction see also education UF teaching machines BT educational computing TT computer applications RT education RT teaching From: INSPEC Thesaurus

8 Thesaurus in Information Retrieval Use of a thesaurus in indexing (precoordination) B. Automatic Divide terms into thesaurus classes. Replace similar terms by a thesaurus class. 408dislocation409blast-cooled junctionheat-flow minority-carrierheat-transfer n-p-n p-n-p410 anneal point-contactstrain recombine transition unijunction From: Salton and McGill

9 Desirable Properties for Information Retrieval Thesaurus is specific to a subject area. Contains only terms of interest for identification within that subject area. Ambiguous terms are coded only for the senses important for that field. Target is that each thesaurus class should include terms of moderate frequency. Ideally the classes should have similar frequency.

10 Art and Architecture Thesaurus Controlled vocabulary for describing and retrieving information: fine art, architecture, decorative art, and material culture. Almost 120,000 terms for objects, textual materials, images, architecture and culture from all periods and all cultures. Used by archives, museums, and libraries to describe items in their collections. Used to search for materials. Used by computer programs, for information retrieval, and natural language processing. A project of the J. Paul Getty Trust

11 Art and Architecture Thesaurus Provides the terminology for objects, and the vocabulary necessary to describe them, such as style, period, shape, color, construction, or use, and scholarly concepts, such as theories, or criticism. Concept: a cluster of terms, one of which is established as the preferred term, or descriptor. Categories: associated concepts, physical attributes, styles and periods, agents, activities, materials, and objects.

12 Art and Architecture Thesaurus: Sample Record Record ID: Descriptor: rhyta Note: Refers to vessels from Ancient Greece, eastern Europe, or the Middle East that typically have a closed form with two openings, one at the top for filling and one at the base so that liquid could stream out. They are often in the shape of a horn or an animal's head, and were typically used as a drinking cup or for pouring wine into another vessel. Hierarchy: Containers [TQ]

13 Art and Architecture Thesaurus: Sample Record (continued) Terms: rhyta rhyton (alternate, singular) protomai protome rhea rheon rheons Related concepts: stirrup cups sturzbechers drinking vessels ceremonial vessels

14 MeSH -- Medical Subject Headings Controlled vocabulary for indexing articles, for cataloging books and other holdings, and for searching MeSH-indexed databases, including MEDLINE. About 19,000 primary subject headings Thesaurus of 110,000 chemical terms. Total vocabulary over 300,000 terms. National Library of Medicine provides MeSH subject headings for each of the 400,000 articles that it indexes every year. "MeSH terminology provides a consistent way to retrieve information that may use different terminology for the same concepts."

15 MeSH -- Medical Subject Headings MeSH hierarchy: general terms, e.g., anatomy, organisms, diseases, biological sciences; anatomy is divided into sixteen topics, e.g., body regions and musculoskeletal system; body regions is divided into sections, e.g., abdomen, axilla, back etc.

16 Example of MeSH hierarchy Biological Sciences [G] Biological Sciences [G01] + Health Occupations [G02] + Environment and Public Health [G03] + Biological Phenomena, Cell Phenomena, and Immunity [G04] + Genetics [G05] + Biochemical Phenomena, Metabolism, and Nutrition [G06] + Physiological Processes [G07] + Reproductive and Urinary Physiology [G08] + Circulatory and Respiratory Physiology [G09] + Digestive, Oral, and Skin Physiology [G10] + Musculoskeletal, Neural, and Ocular Physiology [G11] + Chemical and Pharmacologic Phenomena [G12] +

17 Example of MeSH hierarchy (continued) Physiological Processes [G07] Adaptation, Physiological [G07.062] + Aging [G07.168] + Body Constitution [G07.265] + Body Temperature [G07.315] Body Temperature Regulation [G ] + Skin Temperature [G ] Chronobiology [G07.450] + Electrophysiology [G07.453] + Fluid Shifts [G07.503] Growth and Embryonic Development [G07.553] + Homeostasis [G07.621] + Tensile Strength [G07.900] Tropism [G07.950] +

18 Example of MeSH hierarchy (continued) MeSH HeadingBody Temperature Tree NumberE Tree NumberG Entry TermOrgan Temperature See AlsoFever See AlsoThermography See AlsoThermometers Allowable QualifiersDE GE IM PH RE Unique IDD001831

19 Observations about Manually Maintained Thesaurus Permit very rich structure of relationships Most effective when user of search system is skilled in the discipline and trained in the use of the thesaurus (e.g., medical librarian) Needs continually updating as a field develops new terminology Expensive to create and maintain

20 Gazetteers The Alexandria Digital Library (ADL): geolibrary at University of California at Santa Barbara where a primary attribute of objects is location on Earth (e.g., map, satellite photograph). Geographic footprint: latitude and longitude values that represent a point, a bounding box, a linear feature, or a complete polygonal boundary. Gazetteer: list of geographic names, with geographic locations and other descriptive information. Geographic name: proper name for a geographic place or feature (e.g., Santa Barbara County, Mount Washington, St. Francis Hospital, and Southern California)

21 Alexandria Thesaurus: Example canals A feature type category for places such as the Erie Canal. Used for: The category canals is used instead of any of the following. canal bends canalized streams ditch mouths ditches drainage canals drainage ditches... more... Broader Terms: Canals is a sub-type of hydrographic structures.

22 Alexandria Thesaurus: Example (continued) canals (continued) Related Terms: The following is a list of other categories related to canals (non- hierarchial relationships). channels locks transportation features tunnels Scope Note: Manmade waterway used by watercraft or for drainage, irrigation, mining, or water power. » Definition of canals.

23 Use of a Gazetteer Answers the "Where is" question; for example, "Where is Santa Barbara?" Translates between geographic names and locations. A user can find objects by matching the footprint of a geographic name to the footprints of the collection objects. Locates particular types of geographic features in a designated area. For example, a user can draw a box around an area on a map and find the schools, hospitals, lakes, or volcanoes in the area.

24 Alexandria Gazetteer: Example from a search on "Tulsa" Feature nameStateCountyTypeLatitudeLongitude Tulsa OK Tulsapop pl360914N W Tulsa CountryOKOsagelocale360958N W Club Tulsa CountyOKTulsacivil360600N W Tulsa HelicoptersOKTulsaairport360500N W Incorporated Heliport

25 Challenges for the Alexandria Gazetteer Content standard: A standard conceptual schema for gazetteer information. Feature types: A type scheme to categorize individual features, is rich in term variants and extensible. Temporal aspects: Geographic names and attributes change through time. "Fuzzy" footprints: Extent of a geographic feature is often approximate or ill-defined (e.g., Southern California).

26 Challenges for the Alexandria Gazetteer (continued) Quality aspects: (a) Indicate the accuracy of latitude and longitude data. (b) Ensure that the reported coordinates agree with the other elements of the description. Spatial extents: (a) Points do not represent the extent of the geographic locations and are therefore only minimally useful. (b) Bounding boxes, often include too much territory (e.g., the bounding box for California also includes Nevada).

27 Examples of Gazetteers Alexandria Digital Library Linda L. Hill, James Frew, and Qi Zheng, Geographic Names: The Implementation of a Gazetteer in a Georeferenced Digital Library. D-Lib Magazine, 5: 1, January Getty Thesaurus of Geographic Names