Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Loughborough, 21 January 2005 Louise Corti.

Slides:



Advertisements
Similar presentations
UK DATA ARCHIVE Louise Corti, ODAF April UK Data Archive an internationally-renowned centre of expertise in data acquisition, preservation, dissemination.
Advertisements

ASYCUDA Overview … a summary of the objectives of ASYCUDA implementation projects and features of the software for the Customs computer system.
ESDS Qualidata: Qualitative Data Preparation and Use John Southall ESDS 26 November 2003.
Advice on Consent and Confidentiality for Sharing Research Data Ethics and Consent issues: one-day workshop Belfast, 18 January 2005 John Southall.
Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Plymouth, 22 October 2004 Louise Corti.
The Economic and Social Data Service (ESDS) Kevin Schürer ESDS/UKDA ESDS Awareness Day 5 December 2003.
Quantitative Data Preparation Louise Corti ESDS/ UKDA Social Science Data Archives for Social Historians: creating, depositing and using qualitative data.
Depositing Data for Archiving Libby Bishop ESDS Qualidata, University of Essex Changing Families, Changing Food Meeting University of Sheffield 15 March.
ESDS Qualidata Processing Data John Southall ESDS Qualidata, UKDA IASSIST WORKSHOP 27 May 2003.
Qualitative Data:Preparation and Use John Southall Senior Qualitative Data & Support Services Officer Qualidata.
Using secondary qualitative data in interdisciplinary contexts Libby Bishop ESDS Qualidata, University of Essex Working Across Boundaries: 2 nd NCRM Summer.
Dealing with confidential research information anonymisation techniques and other measures to enable using and sharing research data Data Management and.
ESDS Qualidata and QUADS Coordination Louise Corti Online Resources Day 15 November 2005, London.
Accessing the NCDS and BCS70 via the Economic and Social Data Service Jack Kneeshaw NCDS/BCS70 workshop 27 October 2004 ESDS Longitudinal.
Dealing with confidential research information - Anonymisation techniques and access regulations to enable using and sharing research data Data Management.
ESDS Qualidata. Qualitative Data Collections Data from National Research Council (ESRC) individual research grant awards Data from ESRC Programme research.
New Services for Data Creators and Providers Louise Corti, Head ESDS Qualidata/ Outreach & Training Alasdair Crockett, ESDS Data Services Manager.
Quantitative Data Preparation Alasdair Crockett, Data Services Manager UK Data Archive.
ESDS Qualidata Libby Bishop, ESDS Qualidata Economic and Social Data Service UK Data Archive ESDS Awareness Day Friday 5 December 2003Royal Statistical.
Economic and Social Data Service June What is the ESDS? national service supporting the archiving, dissemination and use of social and economic.
Data copyright, rights management and the use of existing data resources Managing research data well workshop London, 30 June 2009 Manchester, 1 July 2009.
Nesstar, ESDS International and ESDS Qualidata online demonstrations ASLIB visit to the UK Data Archive Wednesday 24 November 2004 Louise Corti, Associate.
Secondary analysis of qualitative data: what is it and can it help your research? Libby Bishop ESDS Qualidata, University of Essex Department of Sociology.
The Economic and Social Data Service (ESDS) Karen Dennison UK Data Archive Improving access to government datasets 18 January 2007.
Accessing the MCS via the Economic and Social Data Service Jack Kneeshaw and Alasdair Crockett MCS workshop 20 November 2003 ESDS Longitudinal.
Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.
MANAGING YOUR DATA WELL …………………………………………
Qualitative Data Preparation and Use Jack Kneeshaw ESDS Psychology Department-U of Essex 4 December 2003.
DATA IN Qualitative Data Acquisitions Process Louise Corti ESDS Qualidata, UKDA IASSIST WORKSHOP 27 May 2003.
Is Mobility of Data a Special Problem for Qualitative Research? John Southall ESDS Qualidata A service provider of the UK Data Archive.
Depositing and Disseminating Digital Resources Alan Morrison Collections Manager AHDS Subject Centre for Literature, Linguistics and Languages.
Chapter 3 Preparing and Evaluating a Research Plan Gay and Airasian
Archived Qualitative Data: Accessing, Searching and Using Libby Bishop ESDS Qualidata Ph.D. Methods Mini-Course 30 January 2004.
DATA LIFECYCLE & DATA MANAGEMENT PLANNING ……………………………………………………………………………………………………………………………….…………………………….. ……………………………………………………………......…... RESEARCH DATA.
Data Protection Recruitment Process
Dealing with confidential research information and consent agreements in research Louise Corti Associate Director UK Data Archive University of Glamorgan.
Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September 2011 Overview of Archiving of Microdata Session 4 United Nations.
MANAGING YOUR RESEARCH DATA: PLANNING TO SHARE ……………………………………………………………………………………………………………………………….…………………………….. ……………………………………………………………......…... RESEARCH.
Research Week: Copyright, Commercialisation and IP Research Week: Copyright, Commercialisation and IP  opyright for postgraduate students and researchers.
DATA MANAGEMENT SUPPORT FOR RESEARCHERS …………………………………………
Guidelines for data preparation - ESRC Datasets Policy Louise Corti ESDS/UKDA Social Science Data Archives for Social Historians: creating, depositing.
The repositories Landscape: where are Repositories now and what’s around the corner? UKDA-store Louise Corti UKDA, University of Essex MIMAS OPEN FORUM.
ESRC Datasets Policy and Qualitative Data Preparation Gill Backhouse Senior Acquisitions and Liaison Officer Qualidata.
Data Protection Act AS Module Heathcote Ch. 12.
CRICOS No J a university for the world real R The OAK Law Project Queensland University of Technology CRICOS No J 1.
Investigating & analysing ICT issues and ethical dilemmas.
Data Protection Corporate training Data Protection Act 1998 Replaces DPA 1994 EC directive 94/46/EC The Information Commissioner The courts.
RESEARCH ETHICS AND DATA CONFIDENTALITY: ANONYMISATION AND ACCESS CONTROL ……………………………………………………………………………………………………………………………….…………………………….. ……………………………………………………………......…...
Why the Data Protection Act was brought in  The 1998 Data Protection Act was passed by Parliament to control the way information is handled and to give.
ESDS resources for managing and analysing data Beate Lichtwardt Economic and Social Data Service UK Data Archive Research Method Festival, Oxford 1 July.
ESDS - Support and resources Beate Lichtwardt, ESDS/UKDA British Library Conference Centre, London 9 March 2009.
Introduction ESDS Qualidata John Southall ESDS Creating and delivering re-usable qualitative data 24 June 2004.
Open Access to Data Confidentiality, Consent and Archive Access CESSDA, Athens October John Southall ESDS Qualidata.
LEGAL ASPECTS OF DIGITAL LIBRARIES By TALWANT SINGH ADDL DISTT. & SESSIONS JUDGE; DELHI.
BTEC ICT Legal Issues Data Protection Act (1998) Computer Misuse Act (1990) Freedom of Information Act (2000)
Depositing with the AHDS With particular reference to IPR.
Peter Granda Archival Assistant Director / Data Archives and Data Producers: A Cooperative Partnership.
Data for secondary analysis: the experience of the UK Data Archive Hilary Beedham UK Data Archive.
DOE Data Management Plan Requirements
HETUS Pilot Group 8 Privacy procedures and ethical issues Kimberly Fisher, Centre for Time Use Research – co-ordinator External consultant Kai Ludwigs.
Data protection—training materials [Name and details of speaker]
Karen Dennison Collections Development Manager
General Data Protection Regulation
Data Protection Legislation
Open Access to Data Confidentiality, Consent and Archive Access CESSDA, Athens October John Southall ESDS Qualidata.
Research Data Management
Copyright and Higher Degree Students
Copyright and Higher Degree Students
Presentation transcript:

Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Loughborough, 21 January 2005 Louise Corti

ESRC Datasets Policy – what is expected of award holders? to preserve and share data from ESRC funded research funding allowed to prepare data for archiving all award-holders must offer data for deposit to the ESDS within 3 months of the end of the award any potential problems should be notified to the ESDS at the earliest opportunity final payment will be withheld if dataset has not been deposited within 3 months of the end of the award, except where a waiver has been agreed in advance ESRC Datasets Policy

Depositing data data should be deposited to a standard that would enable them to be used by a third party, including the provision of adequate documentation good housekeeping = good research = long-term shareable data any potential problems in archiving the data should be discussed with ESDS acquisitions as soon as possible issues of consent and confidentiality allowing archiving should be included in the project management plan and addressed before data collection starts unless a waiver on deposition has been agreed with ESDS and the ESRC, researchers should not make commitments to informants which preclude archiving their data

Data creation and deposit: best practice early advice to data creators: –high quality data and documentation –consent and ethical issues are taken on board –longer-term rights management in place –IPR issues considered promoting standards in: –research design –transcription techniques –data and project management –documenting data collection and analysis

Characteristics of a ‘good’ archived research collection rich content accurate data, well organised and labelled files appropriate measurement of key concepts supporting documentation created –major stages of research recorded –research/measurement instruments documented data that can be stored in user-friendly ‘dissemination’ formats, but can also be archived in a future-proof ‘preservation’ format consent, confidentiality and copyright resolved

Intellectual content builds on previous research addresses new issues comparative potential topics not too specific or narrowly focused tried and trusted measures/scales used innovative approach to discipline and methodology

Extensive raw data types of research data assembled –survey data –in-depth interviews –focus groups –field notes/participant observation –case study notes images and audio-visual materials (supports textual transcripts) range of material – broad focus

Supporting documentation to produce catalogue record and user guide –funding application –questionnaire/Interview schedules –description of methodology (details of sample design, response rate, etc) –“codebook”(variable names, variable descriptions, code names and variable formatting information) –technical report describing the research project. –communication with informants on confidentiality –Coding schemes / themes –End of award report –software description/versions used –bibliographies, resulting publications –code used to create derived variables or check data (e.g. SPSS, STATA or SAS “command files”). anything that adds insight or aids understanding and secondary usage

Survey data

Survey data - variables

Labelling of survey data all variables should be named. Variable names should not exceed 8 characters where possible, as the most common format for disseminating data is SPSS all variables should be labelled. Labels should be brief (preferably < 80 characters), but precise and always make explicit the unit of measurement for continuous (interval) variables. Where possible, all variable labels should reference the question number (and if necessary questionnaire). For example, the variable q11bhexc might have the label “q11b: hours spent taking physical exercise in a typical week”. This gives the unit of measurement and a reference to the question number (q11b), so the user can quickly and easily cross-reference to it

Labelling of survey data II for categorical variables, all codes (values) should be given a brief label (preferably < 60 characters). For example, p1sex (gender of person 1) might have these value labels: 1 = male, 2 = female, -8 = don’t know, -9 = not answered where possible, all such labelling should be created and supplied to the UKDA as part of the data file itself. This is the expectation with data supplied in one of the three major statistical packages - SPSS, STATA or SAS.

Accuracy of data: validation checks computer-aided surveys (CAPI, CATI or CAWI) these are the most accurate way of gathering survey data, but the software (e.g. Blaise) and hardware (e.g. a laptop for every interviewer) may be beyond project resources computer-aided surveys allow one to build in as many logical checks - on question routing and responses - as is possible at the point of data creation non computer-aided surveys less control over initial responses, but checks can performed: –at the point of data entry/transcription if “data entry” software is used. However, there are few cheap data entry packages around –the only feasible option may be to enter data without checks directly into a spreadsheet style interface (e.g. Excel worksheet, SPSS data view), and perform validation checks afterwards - via command files in statistical packages or Visual Basic code in Excel or Access

An example of data seemingly untouched by the human eye : Originating error in text variables: OccupationDescription of Occupation ‘sole trader’‘purveyor of seafood’ Propagated error in derived numeric variables: respondent was coded under the standard occupational (SIC) code relating to food retailers: 52.2 Retail sale of food, beverages and tobacco in specialised stores

Online access to data NESSTAR: browse detailed information (metadata) about these data sources, including links to other sources do simple data analysis and visualisation on microdata bookmark analyses download the appropriate subset of data in one of a number of formats (e.g. SPSS, Excel) Data,must be ‘perfect’ - 100% labelled

Identifiers ‘ Direct' and 'indirect' identifiers may threaten confidentiality Direct identifiers may have been collected as part of the survey administration process and include names, addresses including postcode information, telephone number etc. Indirect identifiers are variables which include information that when linked with other publicly available sources, could result in a breach of confidentiality. This could include geographical information, workplace/organisation, education institution or occupation

Quantitative data Remove the identifier from the dataset Aggregate/reduce the precision of a variable –record the year of birth rather than the day, month and year; record postcode sectors (first 3 or 4 digits) rather than full postcode Bracket a coded (categorical) variable –aggregated SOC up to 'minor group' codes by removing the terminal digit Generalise the meaning of a nominal (string) variable Restrict the upper or lower ranges of a continuous variable

Derived and aggregated products Permission to share and IPR is main issue Range of potential parties with interest: –Owners, funders, data gatherers, employers other stakeholders, etc. All original source information must be recorded

Qualitative data

Transcribing Research integrated into the ongoing research – budget accordingly full transcriptions or summaries costs and benefits –self transcription –internal team transcription –external transcription full transcriptions –consistent layout –speaker tags –line breaks –header with identifier/other details –checked for errors

Labelling and listing qualitative data e.g set of in-depth interviews Data list: list of contents of research collection acts as a point of entry for secondary user qualitative data: excel template interviewee/case study characteristics

Online access to qualitative data new emphasis on providing direct access to collection content –supports more powerful resource discovery –greater scope for searching and browsing content of data (supplementary to higher level study-related metadata) –since users can search and explore content directly… can retrieve data immediately providing access to qualitative data via common interface (EDSD Qualidata Online) supporting tools for searching, retrieval, and analysis across different datasets Means that data must be accurate and standardised

Identifiers removed scheme devised – different for each dataset ideally should reflect any pseudonyms used in publications confidentiality respected anonymisation? problems of anonymisation –applied too weakly –applied to strongly –timing –potential for distortion –examples user undertakings appropriate and sympathetic

User Guides - qualitative studies to provide a deeper understanding of the study and research methods providing guidance on data resources and how to re-use them enhanced user guides: detailed notes on study methodology; ‘behind the scenes’ interviews with depositors; FAQs thematic pages – combining interviews exemplars and case studies of re-use, including full bibliographies user support and training activities to support secondary analysis of data + online resources

Back up and security Digital, paper and audio media are fragile. Digital media are even easier to change/copy/delete! a good backup procedure will protect against a range of mishaps such as: –accidental changes to data –accidental deletion of data –loss of data due to media or software faults –virus infections & hackers – catastrophic events (such as fire or flood) Back up frequently, retain off site copies Consider storage conditions, fireproofing etc.

ESDS in-house processing in-house data processing –‘cleaning up’ research data –Collating documentation received from depositor –repairing minor errors –meeting users’ expectations –cannot engage in major processing tasks unless destined for publishing into online systems

Ethics and legal Issues: Up front issues of consent and confidentiality allowing archiving should be included in the project management plan & addressed before data collection starts longer-term rights management in place and IPR issues considered unless a waiver on deposition has been agreed, researchers should not make commitments to informants which preclude archiving their data

Consent for archiving anonymity and privacy of research participants should be respected explicit ‘informed’ consent gained information for research participants should be clear and coherent and include: –purpose of research –what is involved in participation –benefits and risks –storage and access to data –usage of data (current and future uses) –withdrawal of consent at any time –Data Protection and Copyright Acts N.B. Additional measures are needed when participants are unable to consent through incapacity or age reflect needs and views of all works in practice

Legal issues in data preparation ‘Duty of confidentiality’ Law of Defamation Data Protection Act 1998 and EU Directive Copyright Act 1988 Freedom of Information

Duty of Confidentiality disclosure of information may constitute a breach of confidentiality and possibly a breach of contract not governed by an Act of Parliament not necessarily in writing can be a legal contractual exemptions are: –relevant police investigations or proceedings –disclosure by court order –‘public interest’ - defined by the courts –ethical obligations in cases of disclosure of child abuse

Law of Defamation a defamatory statement is one which may injure the reputation of another person, company or business

Data Protection Act 1998 eight principles: –fairly and lawfully processed –processed for limited purposes –adequate, relevant and not excessive –accurate –not kept longer than necessary –processed in accordance with the data subject's rights –secure –not transferred to countries without adequate protection allows for secondary use of data for research purposes under certain conditions

Options for preserving confidentiality anonymisation consent to archive at the time of field work researcher contacts informants retrospectively user undertakings in exceptional circumstances - permission to use or closure of material

Copyright Act 1988 developed for the broadcasting industry not research! protection of author’s rights multiple copyrights apply: –automatically assigned to the speaker –researcher holds the copyright in the sound recording of an interview  obtain written assignment of copyright from interviewee, or oral agreement (licence) to use –employer holds the copyright in research data  obtain copyright clearance from employer) copyright lasts for 70 years after the end of the year in which the author dies copying work is an infringement unless it is for the purposes of research, private study, criticism or review or reporting current events, and if the use can be regarded as being in the context of 'fair’ dealing seek legal advice on problem issues

Freedom of Information Freedom of Information Act 2000 A statutory right for individuals and organisations to request information held by public authorities. FOI specifically excludes environmental information which is covered by … Environmental Information Regulations 2004 Enables individuals and organisations to obtain environmental information held by public authorities

What is the legislation? Statutory rights of access to information Apply to public authorities – BBSRC, ESRC, NERC and the universities are public authorities Any one, anywhere can request copy of any information you hold – includes data sets Not all information has to be released Must respond to most requests in 20 days

Exemptions –information protected by law Don’t Panic - not all information has to be made available under FoI & EIRs FOI & EIRs provide a number of exemptions that can be applied to the release of information The presumption is that information will be made available unless for good reason (a public interest test). Exemptions protect scientific output, commercial business and personal information (through the Data Protection Act) Exemptions can be complex and difficult to apply. If in doubt, ask….

Conclusion: archivable research suitable for electronic dissemination suitable formats for re-use and long-term preservation in-house data processing –‘cleaning up’ research/ documenting –repairing minor errors –meeting user expectations –cannot engage in major processing tasks unless destined for publishing into online systems meeting users needs –building an expansive and varied data portfolio –creating online exploratory/data browsing systems good housekeeping = good research = good archives

Depositing data with ESDS provide details of all data collected, together with three samples of qualitative data, if applicable to do this, complete the Data Submission form on the ‘Deposit’ pages on ESDS web site dataset will then be formally reviewed for archiving by the UKDA Acquisitions Review Committee if accepted, complete the UKDA Deposit and Licence forms, and send the data, documentation and forms to the UK Data Archive within the required time-scales you will be notified when your data are being released via the UKDA online catalogue access to data will be granted to registered bona fide researchers only via Athens authentication

Creating or depositing data Tel: /872974