CHARMCATS: Harmonisation demands for source metadata and output management CESSDA Expert Seminar: Towards the CESSDA- ERIC common Metadata Model and DDI3.

Slides:



Advertisements
Similar presentations
February 2007 Dissemination Policy for OECD Statistics: The role of the Statistical Data Warehouse.
Advertisements

DDI for the Uninitiated ACCOLEDS /DLI Training: December 2003 Ernie Boyko Statistics Canada Chuck Humphrey University of Alberta.
Recent and forthcoming resources for cross-national survey research: CESSDA PPP WP9 and ESDS International Jack Kneeshaw Economic and Social Data Service.
Where next…. Stakeholder workshop, 29 Jan To the end of the project.
Metadata Management at GESIS-ZA Reiner Mauer GESIS – Data Archive and Data Analysis CESSDA-Expert Seminar Odense, September 11th 2008.
Foundational Objects. Areas of coverage Technical objects Foundational objects Lessons learned from review of Use Case content Simple Study Simple Questionnaire.
An Archive’s Perspective to DDI3 Mari Kleemola Information Services Manager Finnish Social Science Data Archive CESSDA Expert Seminar Ljubljana,
New market instruments for RES-E to meet the 20/20/20 targets Sophie Dourlens-Quaranta, Technofi (Market4RES WP4 leader) Market4RES public kick-off Brussels,
Why, what were the idea ? 1.Create a data infrastructure, 2.Data + the knowledge products that are produced on the basis of data a) Efficiant access to.
IASSIST / IFOD: Mobile Data and the Life Cycle – Tampere, Finland May 26-29, 2009 Lifecycle & Comparative Studies Metadata Needs of the Future CESSDA RI.
CESSDA Question Databank Tender, results and future Maarten Hoogerwerf, CESSDA expert seminar 2009.
Quality Guidelines for statistical processes using administrative data European Conference on Quality in Official Statistics Q2014 Giovanna Brancato, Francesco.
Group & Resource Package - Potentials to re-use metadata with DDI 3 - Uwe Jensen, GESIS – cessda Expert Seminar Nov Ljubljana, Slovenia Group &
STARDAT DATA ARCHIVING SUITE European Survey Research Association (ESRA), July 18 – 22, 2011, Lausanne, Switzerland Monika Linne, Evelyn Brislinger, Wolfgang.
Multilingual thesaurus Controlled vocabularies Taina Jääskeläinen CESSDA Expert Seminar 9-10 November 2009.
Discove r Humanities and Social Science Electronic Thesaurus - HASSET Faceted search HASSET is the subject thesaurus that the UK Data Service uses to index.
California Digital Library Applications in the Real World: The Counting California Experience with the DDI Patricia Cruse Ilona Einowski Juri Stratford.
Meta Dater Metadata Management and Production System for surveys in Empirical Socio-economic Research A Project funded by EU under the 5 th Framework Programme.
Entering A New ERA : The European Research Area Ken Miller UK Data Archive University Of Essex June 11-15, 2002.
NESSTAR - the data archive perspective by Margaret Ward UK Data Archive.
Information Retrieval in Practice
Trials and Tribulations of creating DDI Codebooks at the University of Guelph A.Michelle Edwards and Carol Perry, Data Resource Centre, University of Guelph.
Codebook Centric to Life-Cycle Centric In the beginning….
Evelyn Brislinger, Wolfgang Zenk-Möltgen
Overview of Search Engines
1 LOMGen: A Learning Object Metadata Generator Applied to Computer Science Terminology A. Singh, H. Boley, V.C. Bhavsar National Research Council and University.
Research data workflow Practice in Slovenian Social Science Data Archives SERSCIDA WP4 – WORKSHOP Ljubljana September 2013.
WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat METIS Work Session6-8.
WP4 PROPOSALS Translation of key DDI elements of CESSDA catalogue records to English Obligations of cessda-ERIC members Obligations of cessda-ERIC members.
Case Studies: Statistics Canada (WP 11) Alice Born Statistics UNECE Workshop on Statistical Metadata.
Survey Data Management and Combined use of DDI and SDMX DDI and SDMX use case Labor Force Statistics.
Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.
REFERENCE METADATA FOR DATA TEMPLATE Ales Capek EUROSTAT.
Chuck Humphrey Data Library Co-ordinator University of Alberta May 16, Capitalising on Metadata Tool development plans IASSIST 2007.
CHRIS NELSON METADATA TECHNOLOGY WORK SESSION ON STATISTICAL METADATA GENEVA 6-8 MAY 2013 Designing a Metadata Repository Metadata Technology Ltd.
CountryData Technologies for Data Exchange SDMX Information Model: An Introduction.
SDMX Standards Relationships to ISO/IEC 11179/CMR Arofan Gregory Chris Nelson Joint UNECE/Eurostat/OECD workshop on statistical metadata (METIS): Geneva.
CASE STUDY: STATISTICS NORWAY (SSB) Jenny Linnerud and Anne Gro Hustoft Joint UNECE/Eurostat/OECD work session on statistical metadata (METIS) Luxembourg.
1 Technologies for (semi-) automatic metadata creation Diana Maynard.
Data documentation and metadata for data archiving and sharing Managing research data well workshop London, 30 June 2009 Manchester, 1 July 2009.
DDI-RDF Leveraging the DDI Model for the Linked Data Web.
1 PRODUCTION OF A MANUAL FOR STATISTICS ON ENERGY CONSUMPTION IN HOUSEHOLDS MESH PROJECT 3 rd Working Meeting Vienna, 4 rd October 2012 WP3: Draft Manual.
Documenting and disseminating census and survey data sets Ilpo Survo, United Nations ESCAP, Bangkok, for UNECE.
Co-funded by the European Community eContentplus programme The “Protected Areas” scenario of the HUMBOLDT project Roderic Molina GISIG NATURE-SDIplus Good.
Implementation Experiences METIS – April 2006 Russell Penlington & Lars Thygesen - OECD v 1.0.
MISSY - Metadata for Official Statistics - a new service for EU microdata - European Data Access Forum Luxembourg, March 2015 Jeanette Bohr GESIS – Leibniz.
Colectica: A Platform for DDI 3 based Metadata Management Design. Collect. Share.
DDI AND EXPERIENCES AT ICPSR Prepared for Expert Seminar Finnish Social Science Data Archive Tampere, Finland September 1-2, 2000.
Ontario Data Documentation, Extraction Service and Infrastructure.
Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October.
Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,
INSPIRE Training and Capacity Building Giorgio Saio (GISIG) Eionet NRC Environmental Information Systems (EIS) Meeting, Copenhagen (DK), November.
FORSbase SEEDS meeting May 5 th, 2015, Lausanne Bojana Tasic.
Information Retrieval in Practice
The Library of Online Harmonisations
Open Science=Open Methodology Oshrat Hochman & Christof Wolf
Usage scenarios, User Interface & tools
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Structural and reference metadata in the European Statistical System
The Generic Statistical Information Model (GSIM) and the Sistema Unitario dei Metadati (SUM): state of application of the standard Cecilia Casagrande –
Questasy: Documenting and Disseminating Longitudinal Data Online with DDI 3 Edwin de Vet 11/14/2018.
Using SDMX structures to facilitate data reporting
Water Information System for Europe (WISE) Concept and state-of-play
DDI for the Uninitiated
EDDI12 – Bergen, Norway Toni Sissala
ESS VIP ICT Project Task Force Meeting 5-6 March 2013.
The EES 2009 & PIREDEU In search of a more ‘permanent’ solution
Capitalising on Metadata
Joint UNECE/Eurostat/OECD
Introduction to reference metadata and quality reporting
Presentation transcript:

CHARMCATS: Harmonisation demands for source metadata and output management CESSDA Expert Seminar: Towards the CESSDA- ERIC common Metadata Model and DDI3 Ljubljana, Alex Agache

Question Database & Harmonisation Platform2 Cessda HARMonization of CATegories and Scales Markus Quandt (Team leader) Martin Friedrichs (R&D, programming) And CESSDA PPP - WP9 team (last slide) CHARM CATS  Current Status: Prototype/desktop  Future: Online workbench

Question Database & Harmonisation Platform3 Aims of this presentation 1.Functional requirements: CHARMCATS 2.Demands for source metadata (Portal & QDB) 3.Scenarios for feeding back enhanced metadata on comparability

Question Database & Harmonisation Platform4 1.Functional requirements: CHARMCATS Elements of the Metadata Model

Question Database & Harmonisation Platform5 Harmonisation: Basic Scenario A researcher wants to create comparative statistics for employment across European countries, year 2008 (Hypothetical) classification on employment (ex - post) Harmonisation: make data from different sources comparable

Question Database & Harmonisation Platform6 Targeted contributing users/research knowledge: Harmonisation How to proceed? Ouput = conversion syntax (e.g., SPSS, SAS) What coding decisions were made? Why these decisions were made? - Hydra  Experts in data issues  Experts in comparative measurement [+ Question(naire) development]  Experts in conceptual issues of measurements

Question Database & Harmonisation Platform7 Core of Functional Requirements Publishing (ex-post) harmonisation = metadata on 3 working Steps => Harmonisation Project

Question Database & Harmonisation Platform8 4 Unemployed 3 Self employed 2 Employed half time 1 Employed full time 7. Classification = Harmonized Variable 1. Concept = Define Employment 2. For what universes? A. Conceptual Step: What to measure/harmonize? B. Operational Step: How to measure/harmonize? 6. Reality = Country and Dataset specific Variables/Questions 4. Define an (universal) Typology of Employment C. Data Coding Step: How to find and recode data? 5. Ideally = Country specific Indicators/Questions- functionally equivalent 3. Dimensions ? Employment status Employment regulation Cross country/Time universal Core Elements of a Harmonisation Project

Question Database & Harmonisation Platform9 CHARMCATS: (3) Data-Coding Step

Question Database & Harmonisation Platform10 Summary: Metadata in CHARMCATS (1) Type of retrieved metadata:  HP Components: harmonized Classification, Scales, Indexes  Study components: Variables, Questions, Universes, etc.  Type of HP: depending on completness Functionality:  Ex-ante output & Ex-post harmonisation  Support creation of harmonisation routines  Support data users in undestanding datasets

Question Database & Harmonisation Platform11 Summary: Metadata in CHARMCATS (2) Standard Format:  Sources (expected): DDI2/3  CHARMCATS: DDI3 Location source:  CESSDA Portal  Question Database (QDB)  User‘s Studies in DDI2/3.xml

Question Database & Harmonisation Platform12 2. Demands for source metadata CESSDA Portal (Studies incl. Variables/Questions)  Studies: 3383  Variables: (incl. doublettes)  Variables with Question Text: ca. 85%  Variables with Labels or Frequencis: ca. 95%

Question Database & Harmonisation Platform13 Required Input Elements for CHARMCATS Question and variables connected to concepts Metadata from comparative studies by design Identification of variables and questions measured ex-ante as part of harmonized measurement instruments within a study Context information attached to variables Elements linked/tagged via Thesaurus (ELSST) Contextual databases (aggregate level) Bias: conceptual, methodological (and data) Validity of specific source variables/questions (e.g., psychometric inf.; cog. interviews) Required but not necessary

Question Database & Harmonisation Platform14 Summary: Required QDB Metadata Literal question text + answer categories English translation Multiple questions (Q. batteries) Position in + link to original Questionnaire Study context Nice to have:  Concept tagging Methodological Information ‚Proven Standard‘ Scales/Questions (e.g., Life satisfaction, post- materialism)

Question Database & Harmonisation Platform15 Vision: CHARMCATS_QDB Services Search/online access Questions used in both applications -> question (questionnaire) development - > ex-ante harmonisation CHARMCATS: users offer information on comparability of questions QDB: supports comparability analysis QDB: similarity matching-> commonality wheights

Question Database & Harmonisation Platform16 3.Scenarios for feeding back enhanced metadata on comparability - Starting points for discussion -

Question Database & Harmonisation Platform17  First phase: Charmcats will ‚read‘ metadata from CESSDA holdings but not write back  Subsequent stages: write to other serves or expose for searches through standardized interfaces material into the CESSDA infrastructure What material?

Question Database & Harmonisation Platform18 Core metadata on comparability Groups of harmonized variables Harmonized variables in form of partial datasets Coding routines Functional equivalent questions-variables (Universe/Concepts-Dimensions) International Standard Classifications and Scales Degrees of comparability (charmcats) + ? Commonality weights (QDB)

Question Database & Harmonisation Platform19 Thought experiments: 1.Additional Metadata created in charmcats on quality of harmonized measures Proposal for group discussions (Tuesday) 2. Interim solution for using DDI2/3 - via a linking Shell <- Charmcats/QDB  Use Case ISCED-97: Working Steps  Re-use/Impact of inf. on data coding (meas. error) in data analysis  Additional Metadata = prior inf. in Bayesian analysis  DB on quality of measurements

Question Database & Harmonisation Platform20 Additional Information Web: and of PPP Docs: oBourmpos, Michael; Linardis, Tolis (with Alexandru Agache, Martin Friedrichs, and Markus Quandt) (2009, September): D9.2 Functional and Technical Specifications of 3CDB. oHoogerwerf, M. (2009): Evaluation of the WP9 QDB Tender Report. oKrejci, Jindrich; Orten, Hilde and Quandt, Markus (2008): Strategy for collecting conversion keys for the infrastructure for data harmonisation, oQuandt, M., Agache, A., & Friedrichs, M. (2009, June). How to make the unpublishable public. The approach of the CESSDA survey data harmonisation platform. Paper presented at the NCESS 5th International Conference on e-Social Science, 24th – 26th June 2009, Cologne. Accessible at: Forthcoming: oFriedrichs, M., Quandt, M., Agache, A. The case of CHARMCATS: Use of DDI3 for publishing harmonisation routines. 1st Annual European DDI Users Group Meeting: DDI - The Basis of Managing the Data Life Cycle, 4th December 2009.

Question Database & Harmonisation Platform21 WP 9, CESSDA-PPP Nanna Floor Clausen (DDA) Maarten Hoogerwerf (DANS) Annick Kieffer (Réseau Quetelet ) Jindrich Krejci (SDA) Laurent Lesnard (CDSP) Tolis Linardis (EKKE) Hilde Orten (NSD)

Question Database & Harmonisation Platform22 Thought experiments: 1.Additional Metadata created in charmcats on quality of harmonized measures Proposal for group discussions (Tuesday) 2. Interim solution for using DDI2/3 - via a linking Shell <- Charmcats/QDB  Use Case ISCED-97: Working Steps  Re-use/Impact of inf. on data coding (meas. error) in data analysis  Additional Metadata = prior inf. in Bayesian analysis  DB on quality of measurements

Question Database & Harmonisation Platform23 Thought experiments: 1.Harmonisation Platoform – (Additional) Metadata - Quality of harmonized measures Group Discussion: Harmonisation/Comparable Data Use Case ISCED-97: Working Steps: a.Additional Metadata = measurement error  Re-useImpact of inf. on data coding (meas. error) in data analysis b. Additional Metadata = Priors in Bayesian analysis c. DB on quality of measurements

Question Database & Harmonisation Platform24 Example of harmonisation on education: ISCED-97 with ESS Round 3 data Scenario: Data ESS 03 (2006): 10 European country samples Same source variables: country specific education degrees Two variants of reclassification into ISCED-97: A.ESS team harmonized variable: EDULVL B.WP9.2 harmonized variable Other coding into ISCED of the same data (not considered here): Schneider, 2008

Question Database & Harmonisation Platform25 Classification on Education: ISCED Conceptual Step Concept of education: Broad definition Dimensions: Level of education, orientation of the educational program (general-vocational), position in the national degree structure Universe: Initially developed for OECD countries New variant of ISCED: 1997 Typology resulting in 7 classes of education: 0. Pre-primary education 1. Primary education 2. Lower secondary 3. Upper secondary 4. Intermediate level 5. Tertiary education 6. Advanced training Source: OECD (2004)

Question Database & Harmonisation Platform26 2. Operational Step Guidelines on measurement in survey research? (proposals year 2000<) Problems in coding: Codings for respondents with educational certificates received before 1997/data collection- little information on coding procedures The hydra not visible here: how does a specific educational certificates measure the multiple and interelated dimensions ISCED-97 Mapping (ISCED Manual):

Question Database & Harmonisation Platform27 3. Data Coding: Result of Mapping/Coding

Question Database & Harmonisation Platform28 Storing within a database the 2 harmonized variables Calculated „Agreement“ between two outputs of coding- same classification: 2 different harmonized variables Kappa= 0.67 Other measures for quality of coding/reliability? (e.g., ICCs) Ignore or consider when using one of the harmonized variables?  2 different target variables - same classification  Both target var use the same source var and same operational and conceptual

Question Database & Harmonisation Platform29 Next slides: Impact on data analysis – „Quality“ of coding (Reliability) 2 basic Status attainment models  Without measurement erorr  With measurement error Quality of coding: how to relate to Validity?

Question Database & Harmonisation Platform30 Harmonisation metadata: active reuse in data analysis ESS data 2006: Respondents aged Model Specification: ISCED Error = 0 /Erro=.33 (1- Reliability) Household Income ISCED R‘s ISCED father Age Gender Household Income ISCED R‘s ISCED father Age Gender Norway Germany without error:.354 (.03) with error:.581 (.04) with error:.44 (.04) without error:.33 (.03) without error: ( 29.08) with error: (39.42) without error: (28.387) with error: (41.93) SEM notation: Covariances and residuals not shown (Unstandardized estimates)

Question Database & Harmonisation Platform31 Example: Bayesian SEM Analysis with ISCED ESS data, 2006: Norway, Repondents aged: R‘s Education- > Income Mean = Posterior p =.50 MCMC samples = Bayesian approach Test of Hypothesis (probability of a hypothesis being true given the data) Use of priovous published/expert knowdledge in the field for specifying informative priors on specific parameters of a model Few but rising applications with cross-national data

Question Database & Harmonisation Platform32 DB Harmonisation Reliability: Aggregated across similar harmonizations/different country data sets New DB on Quality of compartive measurements DB: Quality of measurements Validity of harmonized/ latent variables DB Expert knowledge: Guidelines Comparability- measurement equivalence Analysis results Model Specification Priors on specific parameters

Question Database & Harmonisation Platform33 DB on Quality of measurements: User likeability Currently: - low incentives for researcher to publish new findings on validity of measurements in an ‚open access‘ database (before and after publications in journals) - mostly likelihood statistical methods employed Ioannidis (2005): Why most published research findings are false: „The hotter a scientific field (with more scientific teams involved), the less likely the research findings are to be true” - New regulations for registering data: (e.g, International Standard Randomised Controlled Trial Number in U.K: Future potential: - Self contributions from research groups (e.g., ESS, EVS, ISSP) - Meta-analysts (avoiding publication bias) - Contribution of bayesians

Question Database & Harmonisation Platform34 Questions? Thanks to: CESSDA, WP9 team ISCED-97 Coding: Annick Kiefer; Vanita Matta

Question Database & Harmonisation Platform35 Thought experiments: Proposal for group discussions (Tuesday) Interim solution for using DDI2/3 - via a linking Shell <- Charmcats/QDB

Question Database & Harmonisation Platform36 Thought experiment 2: DDI2/3 linking Shel Repository A Repository B Repository C CESSDA Holdings T-Shell DDI2 & DDI3 V 1 V 2 V 3 V 4 V 999 DDI2 only DDI3 CHARMCATS Application QDB Application Other App CESSDA Portal Use of V 1 for Harmonization purposes V1 V4V4 Registry Request for V 1 + V 4

Question Database & Harmonisation Platform37 Commonality weights (c.w.) Scenario: -Example C.W. = (weight for similarity or probability belonging to ad hoc comparability group x) -Search by different criteria (XXX) -Similarity matching algorithm provides c.w. (example: Lewensthein Algorithm) -Learning algorithm! -Bayesian prediction

Question Database & Harmonisation Platform38 Conclusions Contributors and Source data is requiered for intitial implementaiton -> QDB Any