GEODE - Glasgow DCC, Nov 2006 Data curation standards and the messy world of social science occupational information resources Paper presented to the 2nd.

Slides:



Advertisements
Similar presentations
Open repositories: value added services The Socionet example Sergey Parinov, CEMI RAS and euroCRIS.
Advertisements

New Services for Data Creators and Providers Louise Corti, Head ESDS Qualidata/ Outreach & Training Alasdair Crockett, ESDS Data Services Manager.
A Common Standard for Data and Metadata: The ESDS Qualidata XML Schema Libby Bishop ESDS Qualidata – UK Data Archive E-Research Workshop Melbourne 27 April.
Anne Etheridge Economic and Social Data Service IASSIST May 2006 METADATA MANAGEMENT THE FORGOTTEN WORLD OF THE BACK OFFICE.
GEODE - NeSC workshop, Oct 2006 GEODE: Grid Enabled Occupational Data Environment Paul Lambert and Larry Tan University of Stirling
For the e-Stat meeting of 27 Sept 2010 Paul Lambert / DAMES Node inputs.
For the e-Stat meeting of 6-7 April 2011 Paul Lambert / DAMES Node inputs 1)Updates on DAMES 2)Bringing DAMES inputs to e-Stat 3)Misc. feedback - Stat-JR.
28 March 2003e-MapScholar: content management system The e-MapScholar Content Management System (CMS) David Medyckyj-Scott Project Director.
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
C6 Databases.
© 2005 by Prentice Hall Appendix 2 Automated Tools for Systems Development Modern Systems Analysis and Design Fourth Edition Jeffrey A. Hoffer Joey F.
Meta Dater Metadata Management and Production System for surveys in Empirical Socio-economic Research A Project funded by EU under the 5 th Framework Programme.
GEODE Project introduction and summary, 12/12/05 GEODE: Grid Enabled Occupational Data Environment GEODE Project introduction and summary, 12/12/05 Motivation.
Arja Kuula: The DDI and Qualitative data IASSIST2001 Amsterdam, May 2001 Finnish Social Science Data Archive.
Data Grids: Globus vs SRB. Maturity SRB  Older code base  Widely accepted across multiple communities  Core components are tightly integrated Globus.
A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,
Overview of Search Engines
NCRM, Session 27, 1 July Handling data on occupations, educational qualifications, and ethnicity Paul Lambert & Vernon Gayle, Univ. Stirling Talk.
Data-PASS Shared Catalog Micah Altman & Jonathan Crabtree 1 Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate.
Chapter 5 Application Software.
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey.
John Gordon Building an ePortfolio for vocational learning in a Scottish Context.
World Bank, Africa Region, Africa Household Survey Databank - The World Bank - Africa.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
GEODE, March 2007 Handling Occupational Information and Introduction to GEODE GEODE – Grid Enabled Occupational.
Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,
ESRC - NCRM - Apr Concepts and Measures in occupation-based social classifications Presentation to: ‘Interpreting results from statistical modelling.
1 © Netskills Quality Internet Training, University of Newcastle Metadata Explained © Netskills, Quality Internet Training.
Introduction to Databases A line manager asks, “If data unorganized is like matter unorganized and God created the heavens and earth in six days, how come.
GEODE, 16 Jan 2007 Curating Occupational Information GEODE – Grid Enabled Occupational Data Environment Session.
GEODE, 16 Jan 2007 Handling Occupational Information and Introduction to GEODE GEODE – Grid Enabled Occupational.
GEODE - eSS Manchester, June 2006 Development of a Grid Enabled Occupational Data Environment GEODE – Paper presented.
Development of metadata in the National Statistical Institute of Spain Work Session on Statistical Metadata Genève, 6-8 May-2013 Ana Isabel Sánchez-Luengo.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
M.Lautenschlager (WDCC, Hamburg) / / 1 Semantic Data Management for Organising Terabyte Data Archives Michael Lautenschlager World Data Center.
GEODE / SSSN, 23 Jan 2008 Handling Occupational Information GEODE – Presentation to Scottish Social Survey Network,
Measuring the task frequencies of digit ISCO occupational units in 13 countries Brian Fabo Analyst, CELSI Data and Survey Manager, WageIndicator.
Copyright 2010, The World Bank Group. All Rights Reserved. ICT - a core management issue Part 1 Managing ICT resources Produced in Collaboration between.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
Some comments on using research data in the social sciences Paul Lambert, School of Applied Social Science, University of Stirling, 25 March 2013.
© Paradigm Publishing Inc. 5-1 Chapter 5 Application Software.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
Automated (meta)data collection – problems and solutions Grete Christina Lingjærde and Andora Sjøgren USIT, University of Oslo.
Use of Administrative Data Seminar on Developing a Programme on Integrated Statistics in support of the Implementation of the SNA for CARICOM countries.
1 The Importance of Specificity in Occupation-based Social Classifications Paper presented to the Cambridge Stratification Seminar, September 2006.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Conference on Data Quality for International Organisations, Newport, April Assessment of statistical data quality: The example of the Occupational.
A Practical Approach to Metadata Management Mark Jessop Prof. Jim Austin University of York.
GEODE - Durban ISA RC33, July 2006 Utilising a Grid Enabled Occupational Data Environment GEODE – Paper presented.
Enabling e-Research in Combustion Research Community T.V Pham 1, P.M. Dew 1, L.M.S. Lau 1 and M.J. Pilling 2 1 School of Computing 2 School of Chemistry.
SDMX IT Tools Introduction
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,
Software Reuse Course: # The Johns-Hopkins University Montgomery County Campus Fall 2000 Session 4 Lecture # 3 - September 28, 2004.
Oman College of Management and Technology Course – MM Topic 7 Production and Distribution of Multimedia Titles CS/MIS Department.
GEODE – Sharing Occupational Data Through The Grid Dr. Paul Lambert, Dr. Vernon Gayle, Prof. Ken Prandy, Prof. Richard Sinnott, Prof. Ken Turner, Koon.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Tools of data analysis Paul Lambert, University of Stirling Presentation to the Scottish Civil Society Data Partnership Project (S-CSDP), Webinar 2 on.
Occupational data Paul Lambert, University of Stirling Presentation to the Scottish Civil Society Data Partnership Project (S-CSDP), Webinar 3 on ‘Dealing.
Open Ag Data : Landscape Analysis ●Who is involved in collecting data on agricultural investments, and from whom? ●How is data publicly shared? Which.
E-Business Infrastructure PRESENTED BY IKA NOVITA DEWI, MCS.
Chapter 1 Computer Technology: Your Need to Know
An Overview of Data-PASS Shared Catalog
National e-Infrastructure Vision
Chapter 3 Database Management
Reportnet 3.0 Database Feasibility Study – Approach
European databases for research output
ESTP course on Statistical Metadata – Introductory course
Technical Coordination Group, Zagreb, Croatia, 26 January 2018
Presentation transcript:

GEODE - Glasgow DCC, Nov 2006 Data curation standards and the messy world of social science occupational information resources Paper presented to the 2nd International Digital Curation Conference, nd November 2006, Glasgow. Paul Lambert, Larry Tan, Ken Turner, & Vernon GayleUniversity of Stirling Richard SinnottUniversity of Glasgow Ken PrandyCardiff University

GEODE - Glasgow DCC, Nov 2006 GEODE – Grid Enabled Occupational Data Environment  Operate as a ‘portal’ User friendly access to occupational data High volume use  Support a community of occupational data providers Depository of occupational information resources Limited volume use  Experiment with / promote ‘e-Social Science’

GEODE - Glasgow DCC, Nov 2006 (Part 1) Occupational analyses in the social sciences (Quotes as reproduced in Coxon and Jones 1978; Crompton 1998) “A man’s work is as good a clue as any to the course of his life and to his social being and identity” (Hughes, 1958) “The backbone of the class structure, and indeed of the entire reward system of modern Western society, is the occupational order” (Parkin, 1972) “Nothing stamps a man as much as his occupation. Daily work determines the mode of life.. It constrains our ideas, feelings and tastes” (Goblot, 1961)

GEODE - Glasgow DCC, Nov 2006 Why is occupational research ‘messy’? Two stage process: 1. Collect & preserve ‘source occupational data’ 2. Summary / translation of source data  This model is a ‘scientific’ approach Published documentation (at both stages) Replicable Validation exercises  But social researchers have been not been good at using it… (Bechhofer 1969; Marsh 1986; Rose and Pevalin 2003)

GEODE - Glasgow DCC, Nov 2006 {Stage 1 - Collecting Occupational Data – Examples} Example 1: BHPS Occ descriptionEmployment statusSOC-2000EMPST Miner (coal)Employee81227 Police officer (Serg.)Supervisor33126 Electrical engineerEmployee21237 Retail dealer (cars)Self-employed w/e12342 Example 2: European Social Survey, parent’s data Occ descriptionSOC-2000EMPST Miner?8122?6/7 Police officer?3312?6/7 Engineer?? Self employed businessman???1/2

GEODE - Glasgow DCC, Nov 2006 {Stage 1 - Collecting occupational data – summary}  All methods lead eventually to coding to an occupational index scheme: –Occupational Unit Groups –Standardised Industrial Classifications –Standardised employment status classifications  Occupational index schemes are the point of departure for GEODE

GEODE - Glasgow DCC, Nov 2006 Stage 2: Summary / translation of source occ. data a) Published ‘occupational information resources’ used to link source data, via an index scheme, with substantively meaningful measures Social class schemes Stratification scales Gender segregation statistics Labour process statistics b) Coding by fiat –(Allocation by ‘expert’ social scientist) Lack of documentation / replicability / consistency Unscientific…

GEODE - Glasgow DCC, Nov 2006 What’s the problem? But… Low uptake of existing occupational information resources Strict security constraints on users’ micro-social survey data Problems in the formatting / distribution of occupational information resources (Part 2) External user (micro-social data) Occ information (index file) (aggregate) User’s output (micro-social data) idougsex.ougCS-MCS-FEGPidougCS I II VIIa

GEODE - Glasgow DCC, Nov 2006 Handling Occupational Information Messy because: –Large volume of occupational information resources –Limited coordination between resources –Inconsistencies in access and exploitation processes Occupational information resources are used to interpret occupational records

GEODE - Glasgow DCC, Nov 2006 Some illustrative occupational information resources Index units# distinct files (average size kb) Updates? CAMSIS, Local OUG*(e.s.) 200 (100)y CAMSIS value labels Local OUG50 (50)n ISEI tools, home.fsw.vu.nl/~ganzeboom Int. OUG20 (50)y E-Sec matrices Int. OUG*(e.s.) 20 (200)n Hakim gender seg codes (Hakim 1998) Local OUG2 (paper)n

GEODE - Glasgow DCC, Nov 2006 Occupational information resources  Large volumes of occupational information resources Coverage across countries and time periods Different research fields / topics Dynamic: updates to occupational information resources Internet based distributions lead to duplication and expansion, e.g. ISEI - ISCO translation files at: –PISA webpages (Ganzeboom) –IDEAS/Repec webpagees (Hendrickx) –CAMSIS occupational data webpage Some maths: 100+ alternative index schemes (OUGs; others) X 500+ alternative output measures (class schemes, etc)

GEODE - Glasgow DCC, Nov 2006 Occupational information resources Limited coordination Varying metadata practices Coordinated structure, e.g. ISEI at IDEAS/Repec [rare] Natural language, e.g. CAMSIS [common] No documentation Varying data file formats SPSS, Stata, Plain text One-way distribution Internet download; text publications Gaps between NSI’s and academic researchers NSI’s make regular changes to favoured resources

GEODE - Glasgow DCC, Nov 2006 Occupational information resources Limited coordination (ctd) Varying translation rules One file for all occupations (‘universal’) Multiple files for different contexts (‘specific’) Different occupational index requirements ISEICAMSISEGPWright {status scale}{stratification scale}{class scheme} Occ titleOcc title; e.s.; genderOcc title; e.s.Occ conditions

GEODE - Glasgow DCC, Nov 2006 Occupational information resources Inconsistencies in access / exploitation Occupational Unit Group schemes’ variants Decennial updates / International variations Localised adaptations [e.g. HESA] / Survey variations [e.g. GHS] Numeric or string format preservation Hierarchical organisations E.g. ISCO  123  12  = 0110  11  1  0 Focus for application of occupational data Individual level measures Household / career contexts

GEODE - Glasgow DCC, Nov 2006 Returning to the occupational research model Two stage process: 1. Collection & preservation of ‘source occupational data’ 2. Summary / translation of source data via occupational information resources  Critically, stage (2) places responsibility for reviewing occupational information resources with the social scientist  The volume of variants / inconsistencies isn’t huge, but is enough to impede easy application

GEODE - Glasgow DCC, Nov 2006 (Part 2) Curating Occupational Data GEODE – Grid Enabled Occupational Data Environment Core provision: support the management of and access to occupational information resources  ‘Occupational information depository’  Easy access to occupational data (portal for occupational data)

GEODE - Glasgow DCC, Nov 2006 Metadata - Occupational information depository How to facilitate searching, registering, accessing index service?  Establish a ‘GEODE-M’ meta- data subset (.xml) Founded on Michigan Data Documentation Initiative Semantic curation of occupational information Release date Country Time period Author Format Missing data Data extensions to differentiate index and output variable groups to reference variable defintions

GEODE - Glasgow DCC, Nov 2006 Benefits of DDI-XML curation XML suits: OGSA-DAI (data access & integration, Supports data indexing / preservation / management Supports secure data matching programme Could facilitate analytical queries ‘Gridsphere’ search programmes Data curation standards –DDI widely deployed in social science resources –XML accessibility / transferability –Repeatability of tags very helpful –E.g. data files; index measures; contexts; authors

GEODE - Glasgow DCC, Nov 2006 Implementing ‘GEODE-M’ metadata Critical entries: Context of data [country, time period] Index scheme : GEODE database of known index scheme Source uri for resource 2 stage curation process (…?) 1) Web-proforma for supply of occupational data Author; context, index units Gridsphere ‘portlet’ 2) Manual updating of xml resource by depositor / GEODE members Gridsphere ‘portlet’

GEODE - Glasgow DCC, Nov 2006 Example issues [Variant implementations indexed translation files] [cross-country resources] role=“formatting” [caters to multiple author roles] [caters to multiple files] ISEICAMSISEGPWright Occ titleOcc title; e.s.; genderOcc title; e.s.Occ conditions (from ISCO88SOC90; ukempst; gdrSOC90; ukempstSIC92; SUPVIS;.. : 10 [all]; allGB; GB; [all];

GEODE - Glasgow DCC, Nov 2006 Management of GEODE-M curation Metadata considerations ‘GEODE-M’ as {flexible} recommended components of DDI GEODE-M templates webpages at GEODE Other facilities? Data considerations: Stored at GEODE v’s Linkage to external data Proprietary software (plain text / SPSS / STATA) At present: Stage 1 – automated curation (allows external linkage, any file format) Stage 2 – extended manual curation (requires GEODE server copy of data, translation to plain text rectangular format Premised upon small commitment from depositors & GEODE

GEODE - Glasgow DCC, Nov 2006 GEODE – user uptake High potential demand Numerous queries on occupational data management Numerous researchers wishing to distribute occupational data Prototype GEODE services not yet user-friendly Carrots –High demands for easier access and review  Sticks –Poor standards of many previous research which neglects good review of occupational information  Hurdles –Change research cultures in social science disciplines(?)

GEODE - Glasgow DCC, Nov 2006 Conclusions Occupational data curation and the Grid Grid facilitates management / access via xml formats (OGSA-DAI) Current models require moderate specialist input (manual curation) Grid offers new level of service not previously available Dynamic coordinated file storage File matching [security] Occupational data as case study for focused DDI xml curation Complex but finite range of occupational information resources High user demand Uptake will require combination of motivation, and instigation

GEODE - Glasgow DCC, Nov 2006 App 1: e-Social Science ‘The Grid’ and ‘e-Science’: 1. Online Coordination of electronic resources and collaborations  (Distributed computing)  Large scale  Collaborative  Heterogeneous 2. Standard protocols / information management systems UK eSocial Science: 1) Investment in assessing / implementing technology 2) Computationally demanding data analysis 3) Qualitative and quantitative data collection technologies 4) **Data sharing, processing and access**

GEODE - Glasgow DCC, Nov 2006 App 2: GEODE architecture

GEODE - Glasgow DCC, Nov 2006 App3: {Collecting occupational data} a) Follow a recommended process:  ONS good practice Industry description / occupation description / size of organisation / employment status / supervisory status Occupation descriptions -> standardised numeric index Text coding tools, e.g.CASCOT - www2.warwick.ac.uk/fac/soc/ier/publications/software/cascot/ b) Do your own thing:  European Social Survey parental occupational questions  free text description of parental occupations

GEODE - Glasgow DCC, Nov 2006 App 4: Summary data: what is the best class scheme? a) Published ‘occupational information resources’ link source data, via index scheme, with substantively meaningful measures ‘Occupation-based social classifications’ –Social class schemes Registrar General’s Social Class Scheme ( ) [skill / prestige] National Statistics Socio-Economic Classifn. (2002-) [employment relations] Goldthorpe / CASMIN / EGP (Employment relations) Wright [ownership and authority] W.E.S. [female occupational groupings] –Stratification scales SIOPS [prestige] ISEI [socio-economic status – education and income average] CAMSIS [social interaction] {CAMSIS is the best…}