Multilingual thesaurus Controlled vocabularies Taina Jääskeläinen CESSDA Expert Seminar 9-10 November 2009
Role of multilingual thesaurus ELSST Main aim is to function as a search tool Aids data retrieval in the CESSDA portal, also in free text search Overcomes language barriers Works both with DDI2 and DDI3 Use of thesaurus for study description is part of metadata standardisation Question in DDI3: thesaurus terms in which element, particularly at variable level?
Thesaurus: Annual maintenance cycle Months 1-3: Management Team reviews suggestions for candidate terms and changes, creating drafts. Month 4: Management Team drafts will be disseminated to CESSDA members for comments. Months 5-6: Management Team review comments and make final decisions, changes implemented in source thesaurus Months 7-9: CESSDA member organisations translate changes and new concepts. Months 10-11: Documentation and IT tasks, made centrally by UKDA. 30th November: Official release of new version.
Obligations of cessda-ERIC members regarding thesaurus For each language version, one member responsible Translation recommendation: Full Members within 9 months of entering full membership, Designate Members within 5 yrs max Organisations responsible ensure: –translation of the changes annually within Months 7-9 –also changes made during Madiera and CESSDA PPP –copyright is assigned to CESSDA –translation follows guidelines Representative to Management Team if requested If language spoken in more than one country, negotiation which organisation will be responsible.
Multilingual thesaurus: WP4 recommendation for solving the IPR issue within cessda-ERIC Cessda-ERIC will hold the copyright of all language versions Only cessda-ERIC issues licences Only one product is licenced, containing all language versions CESSDA members can freely use for indexing and as a search tool in their local systems Available for browsing for general public within the Data Portal Licence drafts for non-CESSDA users (for non-commercial and commercial separately) Licencees can use ELSST it but are not allowed to copy it to third parties. So far, uncleared IPR has preventing licencing the multilingual ELSST to libraries etc.
Progress towards clearing the IPR Problematic because initially so few members will be part of cessda-ERIC. Same strategy regarding IPR for the present CESSDA and cessda-ERIC? Otherwise licencing impossible. Each CESSDA member who has already translated the thesaurus to find out about the IP legislation in their country regarding translations If ELSST translated, and the translator(s), not the data archive, have the copyright – transfer of the copyright to the archive by separate agreement with translator(s). Information about the clearance sent to WP4 (=taina.jaaskelainen at uta.fi).
WP4 recommendations for cessda-ERIC regarding thesaurus Members use ELSST for indexing data published in the CESSDA portal (Part of standards development) Members change keywords in data descriptions according to each new thesaurus version ELSST translators attend a workshop before starting Members responsible for a language spoken in more than one country: collaboration and peer reviews of translations from the other country/ies Resource implications for management & maintenance Management Board has made no decisions yet regarding these recommendations.
Thesaurus software Ready to go live, has been tested over the autumn. CESSDA members and outside users can suggest candidate terms for the thesaurus via the thesaurus database. Software used for: –1) suggesting new terms to the thesaurus –2) accepting or rejecting candidate terms (Management Team) –3) maintaining the English source thesaurus (UKDA) –4) maintaining other language versions –5) maintaining local extensions
DDI Alliance Controlled Vocabularies Group Multinational working group Videoconference every 2 weeks DDI is an international standard Challenges of work: –lack of deep DDI3 knowledge, as not yet widely used –what the DDI3 element used for, documentation not always clear –what is covered by other elements –heterogeneity of DDI users –finding definitions for CV terms
Members of DDI CVG Atle Alvheim NSD Sanda Ionescu, ICPSR Taina Jääskeläinen FSD, chair Chryssa Kappi EKKE Fredy Kuhn FORS Meinhard Moschner GESIS New members are welcome! Contact taina.jaaskelainen at uta.fi
Why use CVs within CESSDA? Precision Consistency Temporal, spatial and topical comparability Semantic and technical interoperability Multilingual access and documentation Efficiency Harmonisation Authentication and authorisation procedures may also require CVs Experience of CV group: members often found their own archive-specific CVs lacking
Recommendations for cessda-ERIC Members translate DDI3 controlled vocabularies into their local language Translation within six months of the start of cessda-ERIC Members use controlled vocabularies for agreed DDI elements for data published in the portal; at least for all new data from the start of the ERIC. (This is in the future, since at the moment, there is no agreement yet on DDI3 elements to use). CVs are used as much as possible also for other DDI elements. To be discussed within each archive: whether to move towards DDI3 by starting to use the proposed CVs, once these have been reviewed (as most proposed CVs can be used with DDI2).
Review plan of draft CVs for CESSDA Systematic review – to see whether function function in practice for describing own data holdings. One person named responsable from each archive for the review, name sent to Taina, who will then send instructions Recording of what seems to be missing from CVs Provisional translation probably helps – to see if more definitions are needed etc. Deadline: 30 December 2009 Motivation: use of CVs may become obligatory within cessda-ERIC, and their use is recommended for standardisation and harmonisation. CVs will be part of the DDI3 standard.
Future plans for maintenance of CVs in DDI3 Most CVs have ”Other”, so recommendation to use ”OtherValue” element to specify ”Other”. Maybe CESSDA portal can capture what is in ”OtherValue” for DDI CVG to consider?? At present, many DDI3 elements for which CVs planned are xs:string type, TIC will review whether change their type to better allow CVs (to CodedType) Genericode for easy maintenance and translation DDI3 and qualitative data: CV group expects this work to produce more terms for the CVs, and maybe new CVs.
Many open issues for CESSDA/cessda-ERIC Metadata model/template to be planned –to what extent use DDI3? –will indexing of variables needed (for QDB) etc? –combining requirements from different sections Cessda-ERIC: Statutes say members are to publish all their data in the portal but what about cross-national survey series? To avoid doublettes?