Download presentation
Presentation is loading. Please wait.
1
Standards for Controlled Vocabularies
1. U.S. Standard (NISO Z39.19) 2. British Standard (BS 8723) 3. IFLA Guidelines Marcia Lei Zeng, Kent State University 7th NKOS Workshop, JCDL2005, Denver
2
I. U.S. Standard for Controlled Vocabularies – NISO Z39.19
NISO Z x Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies Some of the slides are based on Emily Fayen SLA presentation & Margie Hlava’s talk at 2005 DadaHarmony User Group meeting
3
A little bit history… ANSI/NISO Z39.19,Guidelines for the Construction, Format, and Management of Monolingual Thesauri – 1993 The most frequently requested NISO Standard In spite of its age the Standard is still relevant 1999: NISO Workshop on Electronic Thesauri 2002: NISO initiates revision of Z39.19
4
Scope Expand beyond thesaurus Make more user-friendly
Explain important concepts Explain principles of vocabulary control Include electronic information environment Include additional user search methods: Browse Navigate Keyword searching Expand beyond A & I services Include Web applications
5
The Team: Vivian Bliss – Microsoft Carol Brent – ProQuest
John Dickert – DTIC Lynn El-Hoshy – Library of Congress Marjorie Hlava – Access Innovations Stephen Hearn – ALA Sabine Kuhn – Chemical Abstracts Service Pat Kuhr – H.W. Wilson Company Diane McKerlie – DMA Consulting Peter Morville -- Semantic Studios Stuart Nelson – National Library of Medicine Allan Savage – National Library of Medicine Diane Vizine-Goetz – OCLC Marcia Lei Zeng – Special Libraries Association
6
Z39.19 Chapters Content 1 Introduction 2 Scope 3 Referenced Standards 4 Definitions, Abbreviations, and Acronyms 5 Controlled Vocabularies – Purpose, Concepts, Principles, and Structure 6 Term Choice, Scope, and Form 7 Compound Terms 8 Relationships 9 Displaying Controlled Vocabularies 10 Interoperability 11 Construction, Testing, Maintenance, and Management Systems
7
What’s new? Coverage Types of vocabularies Post-coordinated
documents Types of vocabularies Thesauri Post-coordinated Printed formats Monolingual vocabularies Coverage Content objects Types of vocabularies lists, synonym rings, taxonomy Pre-coordinated Web format Multilingual vocabularies (general) Interoperability Facet analysis A content object is any item that can be described for inclusion in an information retrieval system Documents Electronic documents and their metadata Maps Music Paintings Sculpture . . .
8
Principles of Controlled Vocabularies
There are four important principles of vocabulary control that guide their design and development. • eliminating ambiguity • controlling synonyms • establishing relationships among terms where appropriate • testing and validation of terms
9
Type of vocabulary control
10
Lists A list is a simple group of terms Example:
Alabama Alaska Arkansas California Colorado Frequently used in Web site pick lists and pull down menus
12
Source: The J. Paul Getty Museum's implementation of The Museum System
software by Gallery Systems
13
Synonym Rings A synonym ring is a list of synonyms or near synonyms that are used interchangeably for retrieval purposes
14
Synonym Rings -- Examples
Synonym rings are usually found as sets of lists that allow users to access all content containing any of the terms. e.g., cholesterol: Cholesterol Blood Cholesterol Serum Cholesterol Good Cholesterol Bad Cholesterol LDL . -- Frequently used in systems where the content is not indexed or the indexing vocabulary is not controlled
15
An example from International SEMATECH;
a search for Silicon would look like this: Your search was submitted as “SILICON” or “SI”
16
Synonym Rings are used--
Synonym rings are used to expand queries for content objects. If a user enters any one of these terms as a query to the system, all items are retrieved that contain any of the terms in the cluster. Synonym rings are often used in systems where the underlying content objects are left in their unstructured natural language format, the control is achieved through the interface by drawing together similar terms into these clusters. Synonym rings are used in conjunction with search engines and provide a minimal amount of control of the diversity of the language found in the texts of the underlying documents.
17
Taxonomies A taxonomy is a set of preferred terms, all connected by a hierarchy or polyhierarchy Example: Chemistry Organic chemistry Polymer chemistry Nylon Frequently used in web navigation systems
18
Thesauri A thesaurus is a controlled vocabulary with multiple types of relationships Example: Rice UF paddy BT Cereals BT Plant products NT Brown rice RT Rice straw
19
Thesauri (cont.) Relationship types:
Use/Used For – indicates preferred term Hierarchy – indicates broader and narrower terms Associative – almost unlimited types of relationships may be used It is the most complex format for controlled vocabularies and widely used.
20
Interoperability One of the most important issues from the 1999 workshop Question: How to compare indexes perform searches merge databases that have been developed using different controlled vocabularies?
21
Interoperability (CONT.)
Factors Affecting Interoperability Multilingual Controlled Vocabularies Searching Indexing Merging Databases Merging Controlled Vocabularies Achieving Interoperability Storage and Maintenance of Relationships among Terms in Multiple Controlled Vocabularies
22
Review and Comments Current voting status: http://www.niso.org
Ballot period: April 11, May 25, 2005 Current voting status: YES: 40 NO: 0 ABSTAIN: 4 (as of June 5, 2005)
23
II. The British Standard
BS 8723: Structured Vocabularies for Information Retrieval – Guide Slides based on the presentation by Stella G Dextre Clarke Alan Gilchrist Leonard Will In ISKO 2004, London
24
Existing thesaurus standards
ISO Guidelines for the establishment and development of monolingual thesauri = BS 5723:1987 ISO Guidelines for the establishment and development of multilingual thesauri = BS 6723:1985
25
What needs updating? Printed versus electronic application
Guidance on management software Interoperability: Mapping between thesauri and other types of vocabulary Formats/protocols for data exchange with downstream applications Applicability to end-user applications, not just those for information professionals
26
Outline of new standard
BS 8723: Structured vocabularies for information retrieval – Guide Part 1 - Definitions, symbols and abbreviations Part 2 – Thesauri Part 3 - Vocabularies other than thesauri; Part 4 - Interoperability between vocabularies Part 5 - Interoperation between vocabularies and other components of information storage and retrieval systems
27
Part 3 chapters Classification schemes Subject heading lists
Taxonomies Ontologies Semantic nets (?) Search thesauri
28
Issues for Part 3 How much guidance is needed on how to build other sorts of vocabulary? Should we describe the idiosyncrasies of existing schemes, even where we judge there is a ‘better’ way? To provide a basis for Part 4, Part 3 should pick out the characteristics of different vocabulary types that govern when and how you can map them. But some of the observable characteristics might not be what we’d recommend. What to do?
29
Part 4: Interoperability between vocabularies
Huge demand for accessing information that has been indexed with another language and/or vocabulary. The buzzword is ‘Mapping’. The Semantic Web is just one application. Part 4 to include multilingual thesauri as a special case of mapping between vocabularies. Part 4 applies to situations in which more than one language or vocabulary is in use, but access to all resources is needed through the one vocabulary chosen by the user.
30
Part 4: Interoperability between vocabularies (cont.)
BS 8723 part 4 has a wider scope than BS 6723, which was concerned only with multilingual thesauri. It covers all of the previous ground and extends the scope to: thesauri in different dialects of one language different thesauri in a single language situations where a thesaurus interoperates with one or more different types of structured vocabulary, such as classification schemes situations where not all the interoperating vocabularies have the same status and/or function. It has a wider scope than BS 6723, which was concerned only with multilingual thesauri, that is to say, thesauri presented in more than one natural language; and which required all the language versions to have equal status. BS 8723 covers all of the previous ground and extends the scope to: thesauri in different dialects of one language different thesauri in a single language situations where a thesaurus interoperates with one or more different types of structured vocabulary, such as classification schemes situations where not all the interoperating vocabularies have the same status and/or function.
31
Part 5: Interoperability with applications
Vocabularies must work with Search engines Content Management Systems Web publishing software, etc. Build on existing formats and protocols for data exchange e.g. Z39.50 and Zthes, XML schema? DTD? MARC? SKOS Core Schema? Topic Map? ADL gazetteer protocol? Anything else?
32
Review and Comments Request a copy for Part 1 and 2:
Parts 1 and 2 numbered 04/ DC and 04/ DC. The documents may be ordered from BSI Customer Services tel +44(0) or Comment period end in Dec.
33
III. IFLA Guidelines for Multilingual Thesauri
IFLA Classification and Indexing Section April 2005 released for comments
34
IFLA Classification and Indexing Section WG on Guidelines for Multilingual Thesauri
Chair: Gerhard J.A. Riesthuis (Netherlands) Members: Lois Mai Chan (USA), Patrice Landry (Switzerland), Pia Leth (Sweden), Ia McIlwaine (United Kingdom), Martin Kunz (Germany), Dorothy McGarry (USA), Max Naudi (France), Marcia Lei Zeng (USA)
35
Three approaches in the development of multilingual thesauri:
building a new thesaurus from the bottom up starting with one language and adding another language or languages starting with more than one language simultaneously combining existing thesauri merging two or more existing thesauri into one new (multilingual) information retrieval language to be used in indexing and retrieval linking existing thesauri and subject heading languages to each other; using the existing thesauri and/or subject heading languages both in indexing and retrieval translating a thesaurus into one or more other languages
36
Semantic problems Semantic problems pertain to equivalence relations between terms used as preferred and non-preferred terms in information retrieval languages. Equivalence relations exist not only within each separate language involved, but also between the languages (intra-language equivalence and inter-language equivalence). Intra-language homonymy and inter-language homonymy are also considered semantic questions. Additional problems pertaining to semantics involve the scope, form and choice of thesaurus terms.
37
Structural problems Structural problems involve hierarchical and associative relations between the terms. An important question in this respect is whether the structure should be the same or different for each language. In most if not all cases of linking, the structure will most probably not be the same in all the information retrieval languages involved. In the other approaches mentioned it is possible in principle to apply the same structure to all languages.
38
Contents covered by the guidelines
Building multilingual thesauri starting from scratch Structure Morphology and Semantics Starting from existing thesauri Merging Linking Glossary Appendix: An example of a non-symmetrical thesaurus
39
Examples are in multiple languages
Cranes is a homograph in English does not necessarily mean that equivalent terms in other languages are also homographs. The Dutch term kranen is a homograph too, but with the meanings cranes (lifting equipment) and taps.
40
World-Wide Review Invitation to: World-Wide Review of IFLA Guidelines for Multilingual Thesauri Comments due by July 31, 2005 URL: Contact me at:
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.