Download presentation
Presentation is loading. Please wait.
Published byEthan Willis Modified over 10 years ago
1
Stella G Dextre Clarke Project Leader, ISO NP 25964
ISO the new standard for thesauri and interoperability with other vocabularies Stella G Dextre Clarke Project Leader, ISO NP 25964
2
Overview What is ISO 25964? Outline of Part 1 Outline of Part 2
More detail on some of the issues dealt with in the standard Comment on the need for a standard
3
What is ISO 25964? ISO 25964: Thesauri and interoperability with other vocabularies Part 1: Thesauri for information retrieval Part 2: Interoperability with other vocabularies It updates ISO 2788 and ISO 5964, with some input from BS 8723 Information retrieval (indexing/searching) is the overall context Part 1 covers monolingual and multilingual thesauri (= ISO ISO 5964) Part 2 covers mapping between thesauri and other types of vocabulary
4
What distinguishes ISO 25964-1 from ISO 2788/5964?
Clearer differentiation between terms and concepts Clearer guidance on applying facet analysis to thesauri Some changes to the ‘rules’ for compound terms More guidance on managing thesaurus development and maintenance Requirements for software to manage thesauri Data model and XML schema for data exchange General overhaul in all areas, e.g. sweeping update of multilingual examples
6
Is there a need for ISO ? “The thesaurus is dead. Long live Google!” But look how many thesauri we see today – alive and growing “Nobody has time to do indexing nowadays” Did anyone ever follow ISO 2788 rigorously? Look at the lack of standardization in today’s thesauri. The ideal thesaurus responds to the special needs of its own users. Consider the demand for networked applications which draw upon multiple heterogeneous resources Consider the diversity and evolution of languages/terminology in today’s full text Don’t forget the challenge of searching for images without text Successful automated networking depends on standards, or at least predictability in the tools and resources ISO compliance should enhance predictability in search tools And ISO ?
7
Content of ISO 25964-2 “Interoperability with other vocabularies”
No normative statements about building vocabularies other than thesauri However, comparisons are made and key features described. Emphasis is on interoperability, especially mapping between different vocabularies Structural models for mapping Recommended mapping types How to handle pre-coordination Practical aspects of mapping
8
Which “other vocabularies”?
Classification schemes Business classification schemes for records management (aka file plans) Taxonomies Subject heading schemes Ontologies Terminologies/Term banks Name authority lists Synonym rings
9
Structural models for mapping across vocabularies
H E G P Q R S
10
The dangers of chain mapping
buses → coaches coaches → trainers trainers → training shoes job vacancies → jobs jobs → posts posts → post post → mail Any one of the mappings could be OK in one context, but not when chained. Most howlers can be avoided, but only if you check carefully timber → wood wood → woods woods → forests firewood → logs logs → records records → archives
11
The dangers of two-way mappings
Poultry Parrots Chickens Canaries Birds Ducks Budgies Geese Vocabulary 1 Vocabulary 2 Vocabulary 3
12
ISO 25964-2 mapping types Basic mapping types:
Equivalence Hierarchical Associative equivalence mappings can also be marked as “Exact” or “Inexact”
13
ISO 25964-2 mapping types with examples
Basic mapping types: Equivalence Laptop computers EQ Notebook computers Hierarchical Roads NM Streets; Streets BM Roads Associative e-Learning RM Distance education “Exact” or “Inexact” equivalence Aubergines =EQ Egg-plants Horticulture ~EQ Gardening
14
Subdivisions of ISO 25964-2 mapping types
Basic mapping types: Equivalence Simple Compound Intersecting compound equivalence Cumulative compound equivalence Hierarchical Broader Narrower Associative “Exact” or “Inexact” applies to simple but not compound equivalence
15
Equivalence subdivisions with examples
Simple Laptop computers EQ Notebook computers Compound Intersecting compound equivalence Women executives EQ Women + Executives Cumulative compound equivalence Inland waterways EQ rivers | canals
16
Intersecting versus cumulative equivalence
Women executives EQ Women + Executives Inland waterways EQ rivers | canals executives women women executives canals inland waterways rivers
17
Pre-coordination adds complexity
If only we could ignore classification schemes and subject heading schemes! For example: The UDC class :51 (mathematics curriculum in primary schools) The LCSH heading Automobiles--Air conditioning--Maintenance and repair--Periodicals
18
Example: “academic library labor unions in Germany” (- from Marcia Lei Zeng/FRSAD report)
DDC: " “ – labor unions in industries and occupations other than extractive, manufacturing, construction – academic libraries -0943 – Germany LCSH: "Library employees--Labor unions--Germany" "Universities and colleges--Employees--Labor unions--Germany" "Collective bargaining--Academic librarians--Germany" "Libraries and labor unions--Germany" UNESCO Thesaurus: “Trade unions” “Academic libraries” “Germany” ILO Thesaurus: “Trade union” “library” “educational institution” “Germany”
19
How to map to and from pre-coordinated classes and synthesized notations?
For vocabularies using post-coordination (esp thesauri) mappings between them look feasible Mapping from a pre-coordinated or synthesized class to a thesaurus looks feasible. Mapping to a pre-coordinated class looks more problematic! The same applies to mapping from a synthesized class in one scheme to a differently synthesized class in another scheme Comparing subject headings with classification schemes, pre-coordination works in slightly different ways. Can we find common solutions? In any case, should the aim to be to map between schemes, or between the indexes of collections indexed/catalogued with the schemes?
20
In the real world, mapping perfection is elusive…
Mapping projects are labour intensive, and often under-resourced Exact equivalence is all too rare Even when exact equivalence seems likely, it is often hard to be sure Some managers assume that mappings can be found by computers without human guidance Often the vocabularies to be mapped are poorly constructed Compound equivalence is needed commonly, but often unavailable Inclusion of pre-coordinate schemes makes it much harder Some systems allow only one mapping per concept While preparing mappings, you can’t make assumptions about capabilities of the search software
21
Is there a need for ISO ? Consider the demand for networked applications which draw upon multiple heterogeneous resources Finding equivalent concepts cannot rely on comparison of text words alone Bear in mind the challenges listed above Practical experience of mapping is not widespread ISO provides guidance on good practice, mostly on the intellectual processes but also on the potential for automation
22
Want a copy of ISO ? A draft is due to appear in early 2011, “ISO DIS ”, with the hope of attracting comments from potential users The official way to get it is through your national standards body (e.g. BSI, DIN) Distribution policies vary from one country to another; last time round we found a way to make the draft available online free of charge and free of passwords, on the BSI site. Send me an and I’ll alert you when the DIS is released.
23
Want to get involved? Contact your national standards body, specifically the committee corresponding to ISO TC 46/SC 9/WG8 17 countries already participate: Belgium, Bulgaria, Canada, China, Denmark, France, Germany, Finland, Korea, New Zealand, Russia, South Africa, Spain, Sweden, UK, Ukraine, USA While Part 1 of the standard will be published in 2011, Part 2 is still in draft. There is time for you to contribute ideas on interoperability!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.