GEODE, 16 Jan 2007 Curating Occupational Information GEODE – www.geode.stir.ac.ukwww.geode.stir.ac.uk Grid Enabled Occupational Data Environment Session.

Slides:



Advertisements
Similar presentations
New Services for Data Creators and Providers Louise Corti, Head ESDS Qualidata/ Outreach & Training Alasdair Crockett, ESDS Data Services Manager.
Advertisements

A Common Standard for Data and Metadata: The ESDS Qualidata XML Schema Libby Bishop ESDS Qualidata – UK Data Archive E-Research Workshop Melbourne 27 April.
GEODE - NeSC workshop, Oct 2006 GEODE: Grid Enabled Occupational Data Environment Paul Lambert and Larry Tan University of Stirling
For the e-Stat meeting of 27 Sept 2010 Paul Lambert / DAMES Node inputs.
For the e-Stat meeting of 6-7 April 2011 Paul Lambert / DAMES Node inputs 1)Updates on DAMES 2)Bringing DAMES inputs to e-Stat 3)Misc. feedback - Stat-JR.
DAMES - Data Management through e-Social Science 1 DAMES: Data Management through e-Social Science NCeSS Research Node University of Stirling / University.
SDMX in the Vietnam Ministry of Planning and Investment - A Data Model to Manage Metadata and Data ETV2 Component 5 – Facilitating better decision-making.
GTS MetaData Generation data GTS data bases GTS Switch Volume C1 Central Support Office Information Classes white-list Metadata Synchronization.
Occupations in ESS R1-R51 Coding and Scaling Occupations in ESS R1-R5 Harry B.G. Ganzeboom Ingrid Workshop, UvA Amsterdam, February
Eurostat The ESS.VIP Validation and its implementation in waste statistics Q2014 – Session 13 4 June 2014 Hartmut Schrör, Eurostat.
Dissemination of U.S. Census Data and Results: The role of ICPSR First Conference of Al-Khawarezmi Committee on Statistics Doha, Qatar 6-8 December 2010.
© 2005 by Prentice Hall Appendix 2 Automated Tools for Systems Development Modern Systems Analysis and Design Fourth Edition Jeffrey A. Hoffer Joey F.
Meta Dater Metadata Management and Production System for surveys in Empirical Socio-economic Research A Project funded by EU under the 5 th Framework Programme.
GEODE Project introduction and summary, 12/12/05 GEODE: Grid Enabled Occupational Data Environment GEODE Project introduction and summary, 12/12/05 Motivation.
Arja Kuula: The DDI and Qualitative data IASSIST2001 Amsterdam, May 2001 Finnish Social Science Data Archive.
Search Engines and Information Retrieval
A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,
NCRM, Session 27, 1 July Handling data on occupations, educational qualifications, and ethnicity Paul Lambert & Vernon Gayle, Univ. Stirling Talk.
Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research.
TECHNIQUES FOR OPTIMIZING THE QUERY PERFORMANCE OF DISTRIBUTED XML DATABASE - NAHID NEGAR.
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey.
1 LOMGen: A Learning Object Metadata Generator Applied to Computer Science Terminology A. Singh, H. Boley, V.C. Bhavsar National Research Council and University.
LEARNING PROFILE Title of Degree Program PROGRAM CHARACTERISTICS (Description, Unique Experiences, Inputs, Outcomes) (EXAMPLES) Year Established. Accreditation.
GEODE, March 2007 Handling Occupational Information and Introduction to GEODE GEODE – Grid Enabled Occupational.
Highlights of Main Activities in China Hou Huiqun INIS LO for China Director of CINIE 1.
ESRC - NCRM - Apr Concepts and Measures in occupation-based social classifications Presentation to: ‘Interpreting results from statistical modelling.
Search Engines and Information Retrieval Chapter 1.
GEODE, 16 Jan 2007 Occupational Analysis – Issues and Examples Grid Enabled Occupational Data Environment GEODE Project workshop, 16 th January 2007 Vernon.
ADC Meeting ICEO Standards Working Group Steven F. Browdy, Co-Chair ADC Workshop Washington, D.C. September, 2007.
Assignment 2 1. Don’t forget the Flickr assignment #2 (due end of day today) 2. Don’t forget the Work Practice Diary (to be used for assignment 2) 3. Assignment.
February 1, 2011 Workshop: Persistent Identifiers for the Social Sciences 1 SOEP and DOI Requirements and Challenges Jan Goebel.
Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Interoperability: Where the irresistible force of flexibility meets the immovable.
GEODE, 16 Jan 2007 Handling Occupational Information and Introduction to GEODE GEODE – Grid Enabled Occupational.
ISCO-08 - Current Status and plans to support implementation David Hunter Department of Statistics International Labour Office United Nations Expert Group.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
Statistical Coherence: Census Hub Hypercubes and IPUMS Microdata UNECE Expert Group on Population and Housing Censuses Geneva, September 2014 Lara.
GEODE - eSS Manchester, June 2006 Development of a Grid Enabled Occupational Data Environment GEODE – Paper presented.
Metadata Normalisation in Europeana The Hague, 13 & 14 January 2009 Julie Verleyen Scientific Coordinator, Europeana Office EuropeanaLocal Knowledge Sharing.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
M.Lautenschlager (WDCC, Hamburg) / / 1 Semantic Data Management for Organising Terabyte Data Archives Michael Lautenschlager World Data Center.
GEODE / SSSN, 23 Jan 2008 Handling Occupational Information GEODE – Presentation to Scottish Social Survey Network,
A survey based analysis on training opportunities Dr. Jūratė Kuprienė Framing the digital curation curriculum International Conference Florence, Italy.
1 Occupational Stratification Measures in Harmonised European Surveys Talk prepared for ISA RC28 Spring Meeting, Neuchatel, 7-9 May 2004 Paul Lambert Ken.
IL Step 3: Using Bibliographic Databases Information Literacy 1.
Some comments on using research data in the social sciences Paul Lambert, School of Applied Social Science, University of Stirling, 25 March 2013.
GEODE - Glasgow DCC, Nov 2006 Data curation standards and the messy world of social science occupational information resources Paper presented to the 2nd.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
Automated (meta)data collection – problems and solutions Grete Christina Lingjærde and Andora Sjøgren USIT, University of Oslo.
1 The Importance of Specificity in Occupation-based Social Classifications Paper presented to the Cambridge Stratification Seminar, September 2006.
Conference on Data Quality for International Organisations, Newport, April Assessment of statistical data quality: The example of the Occupational.
1 Understanding Cataloging with DLESE Metadata Karon Kelly Katy Ginger Holly Devaul
GEODE - Durban ISA RC33, July 2006 Utilising a Grid Enabled Occupational Data Environment GEODE – Paper presented.
Developing and applying business process models in practice Statistics Norway Jenny Linnerud and Anne Gro Hustoft.
Preservation Program Digital Preservation Program Digital Preservation Services: Extending tools to meet campus needs Patricia Cruse, Director, Digital.
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,
Software Reuse Course: # The Johns-Hopkins University Montgomery County Campus Fall 2000 Session 4 Lecture # 3 - September 28, 2004.
GEODE – Sharing Occupational Data Through The Grid Dr. Paul Lambert, Dr. Vernon Gayle, Prof. Ken Prandy, Prof. Richard Sinnott, Prof. Ken Turner, Koon.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Samples of Anonymised Records from the U.K. Census 1991 and 2001 Integrating Census Microdata Workshop Barcelona th July 2005 Dr. Ed Fieldhouse Cathie.
Tools of data analysis Paul Lambert, University of Stirling Presentation to the Scottish Civil Society Data Partnership Project (S-CSDP), Webinar 2 on.
GEODE, March 2007 Occupational Analysis – the examples of: - the Youth Cohort Study of England & Wales - ‘By Slow Degrees’ - social mobility research Grid.
Occupational data Paul Lambert, University of Stirling Presentation to the Scottish Civil Society Data Partnership Project (S-CSDP), Webinar 3 on ‘Dealing.
Ingest – Acquisition and deposit Irena Vipavc Brvar ADP SEEDS Workshop I Belgrade, October.
Open Ag Data : Landscape Analysis ●Who is involved in collecting data on agricultural investments, and from whom? ●How is data publicly shared? Which.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
VIRTA Publication Information Service
An Overview of Data-PASS Shared Catalog
IL Step 3: Using Bibliographic Databases
Presentation transcript:

GEODE, 16 Jan 2007 Curating Occupational Information GEODE – Grid Enabled Occupational Data Environment Session 4 of GEODE Project workshop, 16 th January 2007 Paul Lambert, Larry Tan, Ken Turner, & Vernon GayleUniversity of Stirling Ken PrandyCardiff University Richard SinnottUniversity of Glasgow

GEODE, 16 Jan 2007 GEODE – Curating occupational information  Assigning structure to ‘messy’ occupational information resources Metadata on occupational information resources Collating and defining occupational standard classifications  Lambert, P.S., Tan, K.L.T., Turner, K.J., Gayle, V., Sinnott, R.O. and Prandy, K 'Data curation standards and the messy world of social science occupational information resources' Second International Digital Curation Conference. Glasgow,  Offering facilities for comparative occupational information  Lambert, P.S., Tan, K.L.T., Gayle, V., Prandy, K. and Turner, K.J forthcoming. 'The importance of specificity in occupation-based social classifications'. International Journal of Sociology and Social Policy.

GEODE, 16 Jan 2007 Why is data on occupations ‘messy’? Messiness at both stages of the process: 1. Collect & preserve ‘source occupational data’ 2. Summary / translation of source data  This model offers a ‘scientific’ approach Published documentation (at both stages) Replicable Validation exercises  But social researchers have been not been good at using it… (Bechhofer 1969; Marsh 1986; Rose and Pevalin 2003)

GEODE, 16 Jan 2007 {Stage 1 - Collecting Occupational Data and making a mess} Example 1: BHPS Occ descriptionEmployment statusSOC-2000EMPST Miner (coal)Employee81227 Police officer (Serg.)Supervisor33126 Electrical engineerEmployee21237 Retail dealer (cars)Self-employed w/e12342 Example 2: European Social Survey, parent’s data Occ descriptionSOC-2000EMPST Miner?8122?6/7 Police officer?3312?6/7 Engineer?? Self employed businessman???1/2

GEODE, 16 Jan 2007 {Stage 1 - Collecting occupational data – summary}  All methods lead eventually to coding to an occupational index scheme: –Occupational Unit Groups –Standardised Industrial Classifications –Standardised employment status classifications –Somewhat less standardised occupational schemes –Not really at all standardised occupational index schemes  Occupational index schemes are the point of departure for GEODE

GEODE, 16 Jan 2007 Stage 2 – using Occupational Information Messy because: –Large volume of occupational information resources –Limited coordination between resources –Inconsistencies in access and exploitation processes Occupational information resources are used to interpret occupational records

GEODE, 16 Jan 2007 Occupational information resources  Large volumes of occupational information resources Coverage across countries and time periods Different research fields / topics Dynamic: updates to occupational information resources Internet based distributions lead to duplication and expansion, e.g. ISEI - ISCO translation files at: –PISA webpages (Ganzeboom) –IDEAS/Repec webpagees (Hendrickx) –CAMSIS occupational data webpage Some maths: 100+ alternative index schemes (OUGs; others) X 500+ alternative output measures (class schemes, etc)

GEODE, 16 Jan 2007 Occupational information resources Limited coordination Varying metadata practices Coordinated structure, e.g. ISEI at IDEAS/Repec [rare] Natural language, e.g. CAMSIS [common] No documentation Varying data file formats SPSS, Stata, Plain text One-way distribution Internet download; text publications Gaps between NSI’s and academic researchers NSI’s make regular changes to favoured resources

GEODE, 16 Jan 2007 Occupational information resources Limited coordination (ctd) Varying translation rules One file for all occupations (‘universal’) Multiple files for different contexts (‘specific’) Different occupational index requirements ISEICAMSISEGPWright {status scale}{stratification scale}{class scheme} Occ titleOcc title; e.s.; genderOcc title; e.s.Occ conditions

GEODE, 16 Jan 2007 Occupational information resources Inconsistencies in access / exploitation Occupational Unit Group schemes’ variants Decennial updates / International variations Localised adaptations [e.g. HESA] / Survey variations [e.g. GHS] Numeric or string format preservation Hierarchical organisations E.g. ISCO  123  12  = 0110  11  1  0 Focus for application of occupational data Individual level measures Household / career contexts

GEODE, 16 Jan 2007 Returning to the occupational research model Two stage process: 1. Collection & preservation of ‘source occupational data’ 2. Summary / translation of source data via occupational information resources  Critically, stage (2) places responsibility for reviewing and treating occupational information resources with individual social scientists  GEODE – alternative facility for managing stage (2)

GEODE, 16 Jan 2007 Metadata - Occupational information information How to facilitate searching, registering, accessing OIRs?  Establish a ‘GEODE-M’ meta- data subset (.xml) Founded on Michigan Data Documentation Initiative Semantic curation of occupational information XML convenient engagement with OGSA-DAI, Gridsphere, JAVA Release date Country Time period Author Format Missing data Data extensions to differentiate index and output variable groups to reference variable defintions

GEODE, 16 Jan 2007 Example issues [Variant implementations indexed translation files] [cross-country resources] role=“formatting” [caters to multiple author roles] [caters to multiple files] ISEICAMSISEGPWright Occ titleOcc title; e.s.; genderOcc title; e.s.Occ conditions (from ISCO88SOC90; ukempst; gdrSOC90; ukempstSIC92; SUPVIS;.. : 10 [all]; allGB; GB; [all];

GEODE, 16 Jan

GEODE, 16 Jan 2007 Management of GEODE-M curation Metadata considerations ‘GEODE-M’ as {flexible} recommended components of DDI GEODE-M templates webpages at GEODE Other facilities? Data considerations: Stored at GEODE v’s Linkage to external data At present: Stage 1 – automated curation (allows external linkage, any file format) Stage 2 – extended manual curation (requires GEODE server copy of data, translation to plain text rectangular format Premised upon small commitment from depositors & GEODE

GEODE, 16 Jan 2007 Searching – uncurated resources

GEODE, 16 Jan 2007 Searching – curated resources

GEODE, 16 Jan 2007 Managing and modifying ‘uncurated’ resources

GEODE, 16 Jan 2007 Managing and modifying ‘G1’ resources

GEODE, 16 Jan 2007 Summary – assigning a structure to occupational information resources  Metadata xml format DDI standard 2-stage curation process

GEODE, 16 Jan ) Comparative occupational information GEODE Occupational Information Depository Collecting large volumes of OIRs from across countries, time periods Facilitation VO communication between occuaptional information resources  Opportunity for evaluations of comparative occupational research

GEODE, 16 Jan 2007 Universality and Specificity in social classifications “Occupations are ranked in the same order in most nations and over time...Hout referred to the pattern of invariance as the “Treiman constant”...the Treiman constant may be the only universal sociologists have discovered.” (Hout and DiPrete, 2006:2-3) “the idea of indexing a person’s origin and destination by occupation is weakened if the meaning of being, say, a manual worker is not the same at origin and destination. Historical comparisons become unreliable” (Payne, 1992: 220, cited in Bottero, 2005:65)

GEODE, 16 Jan 2007 Arguments for specificity Theoretical Theories of change (over time, countries, gender) Theories of the minutae of occuaptional differences Widening scope of social science research more countries, time periods More micro-data resources Empirical small increments to specific approaches broad equivalence across contexts

GEODE, 16 Jan 2007 Universality Comparative occupational research methods remain trenchantly universalist in principle: Forcing equivalent data collection / treatment across contexts ‘The categories are different and it’s not comparable’ Why? Substantial pragmatic hurdles to any other approach E.g. Cross National Equivalence File model –Model 1 (universal ISEI) CNEF data plus 1 file download; Approx 1.5k lines in Stata.. Approx 6 hours development –Model 2 (specific - CAMSIS) CNEF data, plus original BHPS, PSID and GSOEP, plus 6 further file downloads; Approx 3k lines in Stata.. Approx 40 hours development / estimation

GEODE, 16 Jan 2007 Universality v’s Specificity  Limits of universality… –Loss of the technological excuse…? –Sustainability of specific approaches –Need to engage with specific expectations –Contextuality of importance of specificity… GEODE contribution: Offers opportunity for specific approaches Potential generalisability for comparative research– education; geography

GEODE, 16 Jan 2007 Conclusions Occupational data curation and the Grid Grid facilitates management / access of occupational records via xml formats (OGSA-DAI) Current models require moderate specialist input (manual curation) Grid offers new level of service not previously available Dynamic coordinated file storage File matching [security] Comparative occupational analysis New opportunities in occupational comparisons