Overview of technological solutions to terminology services Doug Tudhope Hypermedia Research Unit University of Glamorgan JISC Terminology Workshop, London, February 2004
Presentation Networked Knowledge Organisation Systems/Services Broad review technological approaches NKOS Lifecycle Introduce Workshop Demonstrations Critical Issues and possible gaps References
Taxonomy of Knowledge Organisation Systems Term Lists Authority Files, Glossaries, Gazetteers, Dictionaries Classification and Categorization Subject Headings Classification Schemes and Taxonomies eg DDC, scientific taxonomies Relationship Schemes Thesauri Semantic Networks (eg WordNet) (Ontologies) Hodg00,
KOS ctd. Thesauri 3 Standard Relationships between concepts (Aitc00) Equivalence, Hierarchical, Associative Inherent domain lexicon (lead-in vocabulary) Concept definitions and warrant (Scope Notes) Ontologies Higher level conceptualisation (McGu02, Noy) formal definition of relationships inference rules and definition of roles (sometimes) KOS an element of ontologies and schemas Jaco03, Ontologies and the Semantic Web,. ASIST Bulletin, April/May 2003, Special Issue on Semantic Web
Recent Sources NKOS: Networked Knowledge Organization Systems/Services NKOS JoDI Special Issue MultiTes Conference JCDL and ECDL Workshops SEMKOS IP Proposal Resources Semantic Web - RDF/XML, RDF Schema, Metalog, OWL W3C Semantic Web Activity OntoWeb SWAD-Europe Thesaurus index Semantic Grid - Semantic Web, Web service, eScience, GRID links W3C Web Services Activity Gardners Intro to Web Services
JISC Application Area Search/retrieval for educational purposes(?) students, teachers, researchers possibly Generalised search possibly integrated into applications triggered to take account of context (eg Brow02) link eScience applications? Current operational systems (eg RDN) lack terminology services some browsing categories but not integrated into search
Technologies Information Science Controlled Terminology Information Retrieval (probabilistic, full text) Intellectual/Automatic Indexing search/browse, user interfaces Facet Analysis Ontology Engineering (AI Knowledge Representation) formal (finer grained) representation, description logics automated reasoning, Semantic Web Distributed Systems Z39.50, Web Services, Semantic Grid Language Engineering Social Engineering
Enriching / Formalising KOS KOS Legacy - large (multilingual) vocabularies, indexed multimedia (and print) collections Product of peer review and follow standards However Not utilised to full potential in some applications Designed for human inspection, semantic structure not explicitly represented May be inconsistently evolved from various sources Opportunity to formalise / enrich Partly a matter of representation in RDF/XML but may be inconsistencies in logical structure --> deconstruction and ontological formalisation --> mutually exclusive concept structures
Facet Analysis (a link between technologies) Fundamental categories / foundational concepts eg CRG: Entity, Part, Property, Material, Process, Operation, Product, Agent, Space, Time,... Mapped to facets for particular KOS Basis of several scientific and industrial KOS Synthesis rules for principled combination of concepts rules for combining base concepts when indexing/querying Browsing and Searching applications
KOS integration into DL services from Hill02 Research Agenda KOS/DL Taxonomy of KOS - KOS types linked to DL service protocols Registries of KOS and KOS-level metadata to represent them XML/RDF KOS representations - customisable Core set of relationship types across all KOS General KOS service protocol from which protocols for specific types of KOS can be derived Robust linking model in which DL entities (collections, objects, and services) can refer to KOS entities (concepts, labels, and relationships) Visualization tools that fully use and display the rich semantics embedded in KOS => move towards a model of search service flow? - how semantic search services combine
Terminology Services from Koch04 Structured Overview - Activities to advance the powerful use of vocabularies Searching for concepts schemes in registries concepts/terms in taxonomy servers Search support for queries collection finding cross-searching, cross-browsing, mapping services KOS browsing and user interface/visualisation query expansion, disambiguation automatic indexing and classification extraction/mining of terms translation support using vocabularies
Workshop Demonstrations … in context of NKOS Information Lifecycle
NKOS Information Lifecycle KOS creation and maintenance Mapping, merging vocabularies Document creation and maintenance Indexing, classification, annotation intellectual, automatic Discovery of services and databases/collections Searching for concepts --> controlled terminology, auto-disambiguation Querying and result display Cross-searching, cross-browsing, mapping services KOS browsing and user interface/visualisation Query expansion Extraction/mining of terms Translation support using vocabularies Content integration and mediation
High Level Thesaurus (HILT) - Information Science KOS creation and maintenance Mapping, merging vocabularies Document creation and maintenance Indexing, classification, annotation intellectual, automatic Discovery of services and databases/collections Searching for concepts --> controlled terminology, auto-disambiguation Querying and result display Cross-searching, cross-browsing, mapping services KOS browsing and user interface/visualisation Query expansion Extraction/mining of terms Translation support using vocabularies Content integration and mediation Pilot Terminology Service HILT team, Wordmap s/w, OCLC discovery of collections cross-searching JISC collections mapping from Terminologies to DDC spine DDC, LCSH, UNESCO, MeSH, AAT
geoXwalk - Geographic Information Science KOS creation and maintenance Mapping, merging vocabularies Document creation and maintenance Indexing, classification, annotation intellectual, automatic Discovery of services and databases/collections Searching for concepts --> controlled terminology, auto-disambiguation Querying and result display Cross-searching, cross-browsing, mapping services KOS browsing and user interface/visualisation Query expansion Extraction/mining of terms Translation support using vocabularies Content integration and mediation Geo-spatial Gazetteer Service Edina, Data Archive, CIE feature (concept) searching geographic searching, spatial operators spatial result visualisation, flexible footprint geoparser - automated geographic indexing
Renardus - Information Science KOS creation and maintenance Mapping, merging vocabularies Document creation and maintenance Indexing, classification, annotation intellectual, automatic Discovery of services and databases/collections Searching for concepts --> controlled terminology, auto-disambiguation Querying and result display Cross-searching, cross-browsing, mapping services KOS browsing and user interface/visualisation Query expansion Extraction/mining of terms Translation support using vocabularies Content integration and mediation cross-browsing service NetLab, UKOLN, ILRT, SUB, … classification mapping via DDC cross-searching EU subject gateways (multilingual) user interface for browsing in large classifications
Learning and Teaching Portal & SSL - Information Science KOS creation and maintenance Mapping, merging vocabularies Document creation and maintenance Indexing, classification, annotation intellectual, automatic Discovery of services and databases/collections Searching for concepts --> controlled terminology, auto-disambiguation Querying and result display Cross-searching, cross-browsing, mapping services KOS browsing and user interface/visualisation Query expansion Extraction/mining of terms Translation support using vocabularies Content integration and mediation Systems Simulations Ltd, Index+ Learning and Teaching Support Network Web-based thesaurus service vocabulary management - Suggest a Term data entry browse and search
CIE Health Demonstrator - Information Science, facet analysis KOS creation and maintenance Mapping, merging vocabularies Document creation and maintenance Indexing, classification, annotation intellectual, automatic Discovery of services and databases/collections Searching for concepts --> controlled terminology, auto-disambiguation Querying and result display Cross-searching, cross-browsing, mapping services KOS browsing and user interface/visualisation Query expansion Extraction/mining of terms Translation support using vocabularies Content integration and mediation Adiuri Systems Ltd (from IDEA Project) Waypoint Health Info search demonstrator faceted, multi-concept query via browsing non-zero match, postings displayed faceted browsing user interface
COHSE Conceptual Open Hypermedia - Ontology, description logic, hypertext navigation KOS creation and maintenance Mapping, merging vocabularies Document creation and maintenance Indexing, classification, annotation intellectual, automatic Discovery of services and databases/collections Searching for concepts --> controlled terminology, auto-disambiguation Querying and result display Cross-searching, cross-browsing, mapping services KOS browsing and user interface/visualisation Query expansion Extraction/mining of terms Translation support using vocabularies Content integration and mediation Link Navigation Using Ontologies Manchester, Southampton University Open Hypermedia System (Soton DLS) open-source downloadable tools for Ontology and Annotation Services: eg OilEd lightweight ontology editor for DAML+OIL
OpenGALEN - Ontology, GRAIL logic, facet analysis KOS creation and maintenance Mapping, merging vocabularies Document creation and maintenance Indexing, classification, annotation intellectual, automatic Discovery of services and databases/collections Searching for concepts --> controlled terminology, auto-disambiguation Querying and result display Cross-searching, cross-browsing, mapping services KOS browsing and user interface/visualisation Query expansion Extraction/mining of terms Translation support using vocabularies Content integration and mediation Open GALEN Common Reference Model - Medical coding and classification systems Manchester University, faceted; compositional rather than traditional enumerative medical codes multilingual GALEN-in-use Project OpenKnoME, GALEN Case Env toolsets
Co-ODE: Collaborative Open Ontology Development Env Ontology management KOS creation and maintenance Mapping, merging vocabularies Document creation and maintenance Indexing, classification, annotation intellectual, automatic Discovery of services and databases/collections Searching for concepts --> controlled terminology, auto-disambiguation Querying and result display Cross-searching, cross-browsing, mapping services KOS browsing and user interface/visualisation Query expansion Extraction/mining of terms Translation support using vocabularies Content integration and mediation Manchester University new project develop Ontology management tools as plugins for Protégé (Stanford) building on earlier experience with OilEd concern with usability
FACET: faceted knowledge organisation for semantic retrieval - Information Science, facet analysis KOS creation and maintenance Mapping, merging vocabularies Document creation and maintenance Indexing, classification, annotation intellectual, automatic Discovery of services and databases/collections Searching for concepts --> controlled terminology, auto-disambiguation Querying and result display Cross-searching, cross-browsing, mapping services KOS browsing and user interface/visualisation Query expansion Extraction/mining of terms Translation support using vocabularies Content integration and mediation University of Glamorgan, Science Museum faceted, multi-concept bestmatch search semantic expansion as browsing service faceted thesaurus search interface standalone and Web demonstrators
E-Biosci : EC platform e-publishing and info integration in Life Sciences - Information Science KOS creation and maintenance Mapping, merging vocabularies Document creation and maintenance Indexing, classification, annotation intellectual, automatic Discovery of services and databases/collections Searching for concepts --> controlled terminology, auto-disambiguation Querying and result display Cross-searching, cross-browsing, mapping services KOS browsing and user interface/visualisation Query expansion Extraction/mining of terms Translation support using vocabularies Content integration and mediation European Molecular Biology Organisation Collexis B. V. technology: semantic matching conceptual fingerprints link genomic data + life sciences research lit multilingual integrated search: full text/data/researchers peer-reviewed, different publishing models
SKOS: Simple knowledge organisation for the semantic web Information Science, Ontology KOS creation and maintenance Mapping, merging vocabularies Document creation and maintenance Indexing, classification, annotation intellectual, automatic Discovery of services and databases/collections Searching for concepts --> controlled terminology, auto-disambiguation Querying and result display Cross-searching, cross-browsing, mapping services KOS browsing and user interface/visualisation Query expansion Extraction/mining of terms Translation support using vocabularies Content integration and mediation CCLRC, SWAD-EUROPE project Migrate existing KOS to SemWeb via common RDF schema for thesauri and for inter-thesaurus mapping (formal OWL spec planned) use cases for thesaurus services lightweight RDF service demonstrators using Jena RDF API toolkit
Some critical issues Standards User Interface Gaps?
Critical issues (1) Standards Ongoing initiatives to revise thesaurus standards ANSI/NISO Z39.19 BS 5723 and BS Dext03 BSI public draft soon, extended scope, interoperability Thesaurus Representations RDF - SWAD03 ; Topic Map - Ligh03 ; various XML Possibilities to extend current relationships by specialisation, enriching standards but maintaining compatibility KOS Service Protocols - Bind04 service oriented approach with composite service provision not based on atomic elements of data structures and relationships expansion service provision NKOS Registry - Vizi01; MEG Registry Project
Cost/benefit issues Thesaurus long-lived, pragmatic and useful tool cost-effective granularity of relationships for some search apps Domain lexicon (UF/ALTs, Scope Notes) Cost/benefit issues in KOS formalisation Application dependent level of precision in concept use Some apps very precise use of concepts (medical?) Other apps may vary in concept application (humanities?) Indexer - Searcher variation Results based on probable relevance judgements
Critical issues (2) User interface User interface critical given controlled terminology demands Offer different options Move beyond minimal assumptions of current web search engines on users, query structure, collections Link with service protocol issues kind of interfaces easily afforded Accessibility issues
Critical issues (3) Gaps? Language Engineering Related standards - Shre03 POS tagging tools large statistical corpora --> source of context data for disambiguation, annotation, proactive search JISC-specific corpora? Collect portal use data --> taxonomies, synonyms Time-varying synonyms - BBCi04BBCi04 Probabilistic IR term frequency information, automatic weighting
Social Engineering? What do users really want? Problems of introducing new technologies Sometimes a matter of both reflecting and shaping user needs Done implicitly by successful projects but also extant literature on sociology/philosophy of innovation Lessons from: Participatory Design, Rapid Application Development - Tudh00 evolving network: prototypes, user expectations, requirements and working practices Lead / Ambassador Users training, tailoring and advocacy / motivation.
Contact Information Doug Tudhope School of Computing University of Glamorgan Pontypridd CF37 1DL Wales, UK
References Aitchison J., Gilchrist A., Bawden D Thesaurus construction and use: a practical manual (4th edition). London: ASLIB. BBCi, A day in the life of BBCi search. Binding C., Tudhope D KOS at your Service: Programmatic Access to Knowledge Organisation Systems. JoDI 4(4), Brown P From information retrieval to hypertext linking. New Review of Hypermedia and Multimedia,8, Dextre Clarke S BS 8723 : a new British Standard for structured vocabularies. Hill et al Integration of Knowledge Organization Systems into Digital Library Architectures. ASIST SigCR - Hodge Gail, Systems of Knowledge Organization for Digital Libraries: Beyond Traditional Authority Files. CLIR Pub91. April Jacob Elin Ontologies and the Semantic Web. ASIST Bulletin, April/May 2003, Special Issue on Semantic Web. Koch T. Activities to advance the powerful use of vocabularies in the digital environment - Structured overview. Light R XML (and Topic Maps). McGuinness D Ontologies Come of Age. In: (Fensel et al eds.) Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential. MIT Press. MultiTes Conference on Thesauri and Taxonomies
References ctd. NKOS: Networked Knowledge Organization Systems/Services, NKOS Workshop ECDL. NKOS New Applications of Knowledge Organization Systems. NKOS Special Issue, JoDI. Noy N., McGuinness D. Ontology Development 101: A Guide to Creating Your First Ontology. Shreve G Terminology Standards. workshop%20Folder/Shreve.ppt Soergel D. The representation of Knowledge Organization Structure (KOS) data: a multiplicity of standards. SWAD-Europe Thesaurus Activity. Tudhope D, Beynon-Davies P, Mackay H Prototyping praxis: Constructing computer systems and building belief. Human Computer Interaction, 15(4), Vizine-Goetz D NKOS Registry - draft proposal for KOS-level metadata.