Follow the Fox to Renardus: an Academic Subject Gateway Service for Europe Cross-browsing and Cross-searching in a Distributed Network of Subject Gateways: Architecture, Data Model and Classification Dr. Heike Neuroth & Traugott Koch State Library of Lower Saxony and the University Library of Göttingen, Germany NetLab, Lund University Library Development Department, Sweden
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch Content n Renardus (aim, partners, etc.) n Subject Gateway (definition, elements) n Renardus Application Profile (working steps, metadata core set, data model, etc.) n Renardus Collection Level Description n Renardus Technical Approach n DDC Mapping for Cross-Browsing (methods, mapping relationships etc.) n Outlook
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch What is Renardus? n EU-funded project: n EC: 1,7 Mio EURO, including non costs: 2,3 Mio EURO n 1 January June 2002 n under the “Information Society Technologies” (IST ) 'Promoting a User-friendly Information Society‘, a major theme of the European Union's 5th Framework Programme n Partners drawn from 7 countries: n Project Management: National Library Den Haag (NL) n Denmark, Finland, Sweden, France, United Kingdom, Germany
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch Objectives n to provide access to distributed quality-controlled subject gateways (high quality metadata collections) across Europe via one single interface: n cross-search n cross-browse n and to develop, define: n metadata solutions n Renardus Application Profile, Renardus Namespaces, Renardus Collection Level Description n technical solutions n organizational/business models
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch Member Subject Gateways DAINet: German Agricultural Information System Document Server DEPOSIT: Deposit of German Online Dissertations DutchESS: Dutch Electronic Subject Service EELS: Engineering Electronic Library, Sweden FVL: The Finnish Virtual Library NOVAGate: Libraries of Nordic Agricultural & Veterinary Univ. SSG-FI: MathGuide, Geo-Guide, History Guide, Anglistik Guide RDN hubs: Resource Discovery Network (EEVL, SOSIG, OMNI,...) Danish Electronic Research Library (future partner) Les Signets: Collection of Internet Resources (future partner)
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch Subject Gateway ”Quality-controlled subject gateways are Internet-services which apply a rich set of quality measures to support systematic resource discovery. Considerable manual effort is used to secure a selection of resources which meet quality criteria and to display a rich description of these resources with standards-based metadata. Regular checking and updating ensure good collection management. A main goal is to provide a high quality of subject access through indexing resources using controlled vocabularies and by offering a deep classification structure for advanced searching and browsing.”
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch Subject Gateway cont. n Elements: n creation: manual/intellectual, experts etc. n selection and collection development: policy, selection criteria etc. n collection management: maintenance of collection etc. n resource description/metadata: rich set of metadata, formalized content description etc. n subject classification/subject access: controlled vocabularies etc. n standards: allow interoperability etc. n value-adding features: display, usage features etc.
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch Working Steps - General n selection of necessary/meaningful elements: n for a service like Renardus: „Meta-Subject Gateway“, European service (multilingual access, search, browse) n for search, filter, sort, and display options n for browse, subject access n selection of common metadata format (exchange format): n Dublin Core Metadata Element Set v1.1 n Dublin Core Qualifiers n others n home-grown
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch Working Steps - Analysis n first survey of partners‘ metadata format and detailed descripion of each subject gateway n GENERAL n name of SG, acronym, responsible organization, source of funding, time for record creation, general description etc. n COLLECTION/SELECTION n target user group, common primary language of target audience, collection scope, geographical and language coverage, selection criteria, granularity, resource types, resource formats etc. n CONTENT - METADATA n metadata scheme, metadata set, crosswalks, interoperability, cataloging rules, authority files etc.
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch Working Steps - Analysis cont. n CONTENT - OTHERS n metadata browsable, searchable n language(s) of descriptions, thesauri, interface, translation support etc., keywords, classification systems, etc. n INDEX TYPE/TECHNICAL NOTES n search engine, indexing system, structure of data storage etc. n INTELLECTUAL PROPERTY RIGHTS (IPR) n copyright, branding n VARIOUS n (quality) control, link checking, record checking/update etc. n backlinks of the gateway, statistical analysis of log files etc. n etc.
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch First Results definition of 8 metadata elements without detailed semantics, syntax based on Dublin Core: n DC.Title n DC.Creator n DC.Description n DC.Subject n DC.Identifier n DC.Language n DC.Type n Country
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch Renardus Data Model detailed investigations of each element about: n semantics and syntax of each element n qualifiers (refinements, encoding schemes) n cataloging rules (creator, description, keywords) n namespace n repeatability of each element n form of obligation (mandatory, strongly recommended, optional) n language qualifier (for title, description, subject) and: n administrative elements n future elements (rights, publisher), additional elements (format, etc.) n common browsing structure via classification system (home-grown, reuse of an existing system, which one)
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch Renardus Data Model cont.
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch Renardus Application Profile n Renardus Application Profile based on four namespaces, to be encoded in RDF/XML: Dublin Core Namespace: [DCMES version 1.1] Dublin Core Metadata Element Set, Version 1.1: Reference Description Dublin Core Qualifiers Namespace: [DCMES Qualifiers ( )] Dublin Core Qualifiers Renardus Namespace: [RMES version 0.1, ] Renardus Metadata Element Set Renardus Namespace Qualifiers: [RMES Qualifiers version 0.1, ] Renardus Metadata Element Set
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch Renardus AP cont. “content metadata” Title and Title.Alternative title: DCMES: mandatory, not repeatable, language tag title.alternative: DCMES Qualifiers: optional, repeatable, language tag Creator DCMES: strongly recommended, repeatable RMES Qualifiers (LastName, FirstName): strongly recommended, repeatable Description DCMES: mandatory in text version, repeatable, language tag
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch Renardus AP cont. Subject DCMES: mandatory, repeatable, language tag DCMES Qualifiers: strongly recommended, repeatable, language tag RMES Qualifiers (all other encoding schemes): mandatory, repeatable, language tag RMES Qualifiers (Ren-DDC): mandatory, repeatable Identifier DCMES Qualifiers: mandatory, repeatable (probably in the pilot system) RMES Qualifiers: “Operational System” mit Qualifiers “Archive”, “Mirror”... Language DCMES Qualifiers: strongly recommended, repeatable
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch Renardus AP cont. Type DCMES: strongly recommended, repeatable DCMES Qualifiers (DCT1): strongly recommended, repeatable DCMES Qualifiers (DCT2): “Operational System” Country RMES Qualifiers: strongly recommended, not repeatable “administrative metadata” Full Record URL RMES Qualifiers: strongly recommended, not repeatable SBIG ID RMES Qualifiers: mandatory, not repeatable
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch Renardus CLD Schema n Collection Level Description: simple description of collections, locations and related people or organizations n in Renardus: to provide information about participating Subject Gateways: n users chose Subject Gateways for thematic search (semi- automatic selection for subject) n well-structured background information (human and machine readable) n promotion n registry of Subject Gateways
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch Renardus CLD Format n Format: based on RSLP Collection Description (UKOLN): n Dublin Core metadata elements (e.g. DC.Title, DC.Description, DC.Subject) n RSLP metadata elements (cld.country) n Renardus specific metadata elements (e.g. rencld:acronym, rencld:subjectNotation, rencld:resourceLanguage etc.)
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch Renardus CLD Tool n WWW based form n RDF, RDF/XML, and text encoding n file is saved locally, each partner is able to update his description at every time n Renardus broker gathers all Subject Gateway descriptions
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch Renardus Technical Approach n PREPARATION n investigation: n of available standards and technologies n of functional and user requirements n of service provider requirements n formulation of use cases in UML n development of data model n data model n choosing architecture (decentralized vs. centralized) n architectural diagram n search/retrieval protocol n common profile (map data model to the protocol Z39.50) n Z39.50 profile, Bath compliant
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch Renardus Technical Approach cont. n IMPLEMENTATION n data normalization n encoding RDF/XML (RDF normalizing toolkit) n classification mapping (mapping tool adapted from CARMENx) n CLDs (CLD tool adapted from RSLP) n creation of participants Renardus servers (Z39.50, Z'mbol) n implementation of broker software and functionality n cross-searching (Zebril and modified EUROPAGATE simultaneous gateway) n cross-browsing (browsing tool, SQL) n user interface implementation (with use cases) n screen layout (Zebril and HTML, Javascript)
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch DDC Mapping for Cross-Browsing n why subject cross-browsing and classification? n why switching language? n browsing/mapping from DDC to the local systems/browsing structures n why DDC? n comparison to alternatives n research license, allowed changes n analysis of partners classification systems n types, adaptions, number of levels and classes, subject overlap
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch DDC Mapping for Cross-Browsing cont. n mapping approaches and issues n mapping methods n mapping between classes, not between individual resources n priorities: e.g. only well used classes are mapped n recommendations for local improvements n mapping relationships n fully equivalent, narrower and broader equivalent, major and minor overlap n reuse for retrieval result clustering
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch DDC Mapping for Cross-Browsing cont. n technical solution n sources: local classifications, CORC Web Dewey n mapping tool adapted from CARMENx (MySQL, PHP, Javascript) n syntax of the mapping information n creation of the browsing pages n usage of the DDC mapping in Renardus n „browse and jump“ n why not virtual browsing? n DDC classification search (in advanced search) n user interface solutions
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch MSC 2000 DDC DDC
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch DDC Mapping for Cross-Browsing cont. n future n recommendations for subject access efforts in gateways and brokers n multilingual access to the DDC top-levels n automatic mapping (and classification) as support n owners should take over for sustainable mapping n documentation n DDC mapping report (D7.4) n practical mapping guidelines (D7.4) n paper at IFLA Satellite Conf., August 2001
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch Outlook n June 2001: Public Deliverable WP 6, D6.5 n Renardus Application Profile n Renardus Namespaces n Renardus Collection Level Description n DDC Mapping n June 2001: Beta-Version of Renardus broker n first DDC mapping results n first evaluations of broker will start n November 2001 n Renardus Workshop for future participating Subject Gateways
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch URLs & References n Renardus n SUB Renardus - (also with D7.4) n News Digest SIGN-UP Form - n Evaluation of existing data models (D6.1) - n DCMI Dublin Core Metadata Initiative - n Dublin Core Metadata Element Set, Version 1.1: Reference Description - n Dublin Core Qualifiers - qualifiers/ n DCMI Agents Working Group - n DCMI Type Working Group -
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch URLs & References n RSLP Collection Description - n CLD Collection Level Description - n RSLP Collection Description Tool - n Subject Gateways (Traugott Koch): Online Information Review, Vol. 24, Number 1, 2000
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch Cross-Search n basic index: Title, Description, Subject n field search: n Title n Creator (in DC Simple and later on in RMES Qualifiers) n Description n DDC Captions (also cross-browsable!) n Subject (in future: several encoding schemes for keyword and classification systems of partners) n Type
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch Filter Options n Type n DCMI Type 1 (mapping of partners‘ document types to Dublin Core Type 1) n in future also meaningful: mapping to Sub Type List of DCMI? n Probably no Renardus specific type list n Language (of resources and languages of metadata = Language Tag) n Country
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch Sorting n Title (alphabetic sorting) n in future: Type, Language, Country? (central architecture) n Subject: Ren-DDC Classification n mapping relation (fully equivalent, narrower equivalent, broader equivalent, major overlap, minor overlap) n in discussion: n Subject - Keywords: sorting after subject indexing group: controlled vocabulary versus free keywords, but problematic!
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch
ELAG 2001, Prague 6-8 June 2001 Neuroth & Koch