Download presentation
Presentation is loading. Please wait.
Published byGavin McLaughlin Modified over 9 years ago
1
1 CS 430: Information Discovery Lecture 21 Thesauruses and Gazetteers
2
2 Course Administration
3
3 Lexicon and Thesaurus Lexicon contains information about words, their morphological variants, and their grammatical usage. Thesaurus relates words by meaning: ship, vessel, sail; craft, navy, marine, fleet, flotilla book, writing, work, volume, tome, tract, codex search, discovery, detection, find, revelation (From Roget's Thesaurus, 1911)
4
4 Thesaurus in Information Retrieval Use of a thesaurus in indexing (precoordination) A. Manual Used to guide human indexer to assign standard terms and associations. computer-aided instruction see also education UF teaching machines BT educational computing TT computer applications RT education RT teaching From: INSPEC Thesaurus
5
5 Thesaurus in Information Retrieval Use of a thesaurus in indexing (precoordination) B. Automatic Divide terms into thesaurus classes. Replace similar terms by a thesaurus class. 408dislocation409blast-cooled junctionheat-flow minority-carrierheat-transfer n-p-n p-n-p410 anneal point-contactstrain recombine transition unijunction From: Salton and McGill
6
6 Desirable Properties for Information Retrieval Thesaurus is specific to a subject area. Contains only terms of interest for identification within that subject area. Ambiguous terms are coded only for the senses important for that field. Target is that each thesaurus class should include terms of moderate frequency. Ideally the classes should have similar frequency.
7
7 Art and Architecture Thesaurus Controlled vocabulary for describing and retrieving information: fine art, architecture, decorative art, and material culture. Almost 120,000 terms for objects, textual materials, images, architecture and culture from all periods and all cultures. Used by archives, museums, and libraries to describe items in their collections. Used to search for materials. Used by computer programs, for information retrieval, and natural language processing. A project of the J. Paul Getty Trust
8
8 Art and Architecture Thesaurus Provides the terminology for objects, and the vocabulary necessary to describe them, such as style, period, shape, color, construction, or use, and scholarly concepts, such as theories, or criticism. Concept: a cluster of terms, one of which is established as the preferred term, or descriptor. Categories: associated concepts, physical attributes, styles and periods, agents, activities, materials, and objects.
9
9 Art and Architecture Thesaurus: Sample Record Record ID: 198841 Descriptor: rhyta Note: Refers to vessels from Ancient Greece, eastern Europe, or the Middle East that typically have a closed form with two openings, one at the top for filling and one at the base so that liquid could stream out. They are often in the shape of a horn or an animal's head, and were typically used as a drinking cup or for pouring wine into another vessel. Hierarchy: Containers [TQ].................................
10
10 Art and Architecture Thesaurus: Sample Record (continued) Terms: rhyta rhyton (alternate, singular) protomai protome rhea rheon rheons Related concepts: stirrup cups sturzbechers drinking vessels ceremonial vessels
11
11 MeSH -- Medical Subject Headings Controlled vocabulary for indexing articles, for cataloging books and other holdings, and for searching MeSH-indexed databases, including MEDLINE. About 19,000 primary subject headings Thesaurus of 110,000 chemical terms. Total vocabulary over 300,000 terms. National Library of Medicine provides MeSH subject headings for each of the 400,000 articles that it indexes every year. "MeSH terminology provides a consistent way to retrieve information that may use different terminology for the same concepts."
12
12 MeSH -- Medical Subject Headings MeSH hierarchy: general terms, e.g., anatomy, organisms, diseases, biological sciences; anatomy is divided into sixteen topics, e.g., body regions and musculoskeletal system; body regions is divided into sections, e.g., abdomen, axilla, back etc.
13
13 Example of MeSH hierarchy Biological Sciences [G] Biological Sciences [G01] + Health Occupations [G02] + Environment and Public Health [G03] + Biological Phenomena, Cell Phenomena, and Immunity [G04] + Genetics [G05] + Biochemical Phenomena, Metabolism, and Nutrition [G06] + Physiological Processes [G07] + Reproductive and Urinary Physiology [G08] + Circulatory and Respiratory Physiology [G09] + Digestive, Oral, and Skin Physiology [G10] + Musculoskeletal, Neural, and Ocular Physiology [G11] + Chemical and Pharmacologic Phenomena [G12] +
14
14 Example of MeSH hierarchy (continued) Physiological Processes [G07] Adaptation, Physiological [G07.062] + Aging [G07.168] + Body Constitution [G07.265] + Body Temperature [G07.315] Body Temperature Regulation [G07.315.232] + Skin Temperature [G07.315.753] Chronobiology [G07.450] + Electrophysiology [G07.453] + Fluid Shifts [G07.503] Growth and Embryonic Development [G07.553] + Homeostasis [G07.621] + Tensile Strength [G07.900] Tropism [G07.950] +
15
15 Example of MeSH hierarchy (continued) MeSH HeadingBody Temperature Tree NumberE01.370.600.120 Tree NumberG07.315 Entry TermOrgan Temperature See AlsoFever See AlsoThermography See AlsoThermometers Allowable QualifiersDE GE IM PH RE Unique IDD001831
16
16 Observations about Manually Maintained Thesaurus Permit very rich structure of relationships Most effective when user of search system is skilled in the discipline and trained in the use of the thesaurus (e.g., medical librarian) Needs continually updating as a field develops new terminology Expensive to create and maintain
17
17 Gazetteers The Alexandria Digital Library (ADL): geolibrary at University of California at Santa Barbara where a primary attribute of objects is location on Earth (e.g., map, satellite photograph). Geographic footprint: latitude and longitude values that represent a point, a bounding box, a linear feature, or a complete polygonal boundary. Gazetteer: list of geographic names, with geographic locations and other descriptive information. Geographic name: proper name for a geographic place or feature (e.g., Santa Barbara County, Mount Washington, St. Francis Hospital, and Southern California)
18
18 Alexandria Thesaurus: Example canals A feature type category for places such as the Erie Canal. Used for: The category canals is used instead of any of the following. canal bends canalized streams ditch mouths ditches drainage canals drainage ditches... more... Broader Terms: Canals is a sub-type of hydrographic structures.
19
19 Alexandria Thesaurus: Example (continued) canals (continued) Related Terms: The following is a list of other categories related to canals (non- hierarchial relationships). channels locks transportation features tunnels Scope Note: Manmade waterway used by watercraft or for drainage, irrigation, mining, or water power. » Definition of canals.
20
20 Use of a Gazetteer Answers the "Where is" question; for example, "Where is Santa Barbara?" Translates between geographic names and locations. A user can find objects by matching the footprint of a geographic name to the footprints of the collection objects. Locates particular types of geographic features in a designated area. For example, a user can draw a box around an area on a map and find the schools, hospitals, lakes, or volcanoes in the area.
21
21 Alexandria Gazetteer: Example from a search on "Tulsa" Feature nameStateCountyTypeLatitudeLongitude Tulsa OK Tulsapop pl360914N 0955933W Tulsa CountryOKOsagelocale360958N0960012W Club Tulsa CountyOKTulsacivil360600N0955400W Tulsa HelicoptersOKTulsaairport360500N0955205W Incorporated Heliport
22
22 Challenges for the Alexandria Gazetteer Content standard: A standard conceptual schema for gazetteer information. Feature types: A type scheme to categorize individual features, is rich in term variants and extensible. Temporal aspects: Geographic names and attributes change through time. "Fuzzy" footprints: Extent of a geographic feature is often approximate or ill-defined (e.g., Southern California).
23
23 Challenges for the Alexandria Gazetteer (continued) Quality aspects: (a) Indicate the accuracy of latitude and longitude data. (b) Ensure that the reported coordinates agree with the other elements of the description. Spatial extents: (a) Points do not represent the extent of the geographic locations and are therefore only minimally useful. (b) Bounding boxes, often include too much territory (e.g., the bounding box for California also includes Nevada).
24
24 Examples of Gazetteers Alexandria Digital Library Linda L. Hill, James Frew, and Qi Zheng, Geographic Names: The Implementation of a Gazetteer in a Georeferenced Digital Library. D-Lib Magazine, 5: 1, January 1999. http://www.dlib.org/dlib/january99/hill/01hill.html Getty Thesaurus of Geographic Names http://www.getty.edu/research/tools/vocabulary/tgn/
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.