PART 2: DATA READINESS CASRAI CONFERENCE RECONNECT BIG DATA: THE ADVANCE OF DATA-DRIVEN DISCOVERY OCTOBER 16, 2013 JANE FRY Research Data Management: planning.

Slides:



Advertisements
Similar presentations
The Role of the IRB An Institutional Review Board (IRB) is a review committee established to help protect the rights and welfare of human research subjects.
Advertisements

How to write a study protocol Hanne-Merete Eriksen (based on Epiet 2004)
Advice on Consent and Confidentiality for Sharing Research Data Ethics and Consent issues: one-day workshop Belfast, 18 January 2005 John Southall.
Depositing Data for Archiving Libby Bishop ESDS Qualidata, University of Essex Changing Families, Changing Food Meeting University of Sheffield 15 March.
Dealing with confidential research information anonymisation techniques and other measures to enable using and sharing research data Data Management and.
Dealing with confidential research information - Anonymisation techniques and access regulations to enable using and sharing research data Data Management.
New Services for Data Creators and Providers Louise Corti, Head ESDS Qualidata/ Outreach & Training Alasdair Crockett, ESDS Data Services Manager.
Anonymisation techniques and other measures to enable using and sharing research data Managing and Sharing Research Data workshop London, 2 December 2009.
Useful tools for ESRC Research Centres
MANAGING YOUR DATA WELL …………………………………………
Review Questions Business 205
Developing a Records & Information Retention & Disposition Program:
Data Management: Documentation & Metadata Types of Documentation.
DATA LIFECYCLE & DATA MANAGEMENT PLANNING ……………………………………………………………………………………………………………………………….…………………………….. ……………………………………………………………......…... RESEARCH DATA.
Open Exeter Project Team
PhD-course Research Data Management (RDM) Expert Centre Research Data.
Mental Health Survey 2015: Webinar 14 th January 2015.
Ethics in Business Research
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
Undertaken by the ………………………………
Guidance on Preparing a Data Management Plan
Questionnaires and Interviews
Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September 2011 Overview of Archiving of Microdata Session 4 United Nations.
Open Data & overview of data management in H February 2015.
Workshop on Health Examination Surveys (HES) Legal and ethical issues Susanna Conti, M. Kanieff, G. Rago Istituto Superiore di Sanità (ISS) (National Public.
MANAGING YOUR RESEARCH DATA: PLANNING TO SHARE ……………………………………………………………………………………………………………………………….…………………………….. ……………………………………………………………......…... RESEARCH.
Research data workflow Practice in Slovenian Social Science Data Archives SERSCIDA WP4 – WORKSHOP Ljubljana September 2013.
Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle.
DATA MANAGEMENT SUPPORT FOR RESEARCHERS …………………………………………
Action Research March 12, 2012 Data Collection. Qualities of Data Collection  Generalizability – not necessary; goal is to improve school or classroom.
How to Organise your Files and Folders Gareth Cole. Data Curation Officer. 6 th October 2014.
SAVING AND STORING YOUR RESEARCH DATA : TIPS AND TOOLS Jane Fry Wendy Watkins Carleton University Library Data Centre and the Carleton Scholar Program.
Recordkeeping for Good Governance Toolkit Digital Recordkeeping Guidance Funafuti, Tuvalu – June 2013.
Guidelines for data preparation - ESRC Datasets Policy Louise Corti ESDS/UKDA Social Science Data Archives for Social Historians: creating, depositing.
World Data Center for Human Interactions in the Environment Needs Assessment for Managing and Preserving Geospatial Electronic Records: Preliminary Results.
UVa Library Research Data Services
Data documentation and metadata for data archiving and sharing Managing research data well workshop London, 30 June 2009 Manchester, 1 July 2009.
Meet and Confer Rule 26(f) of the Federal Rules of Civil Procedure states that “parties must confer as soon as practicable - and in any event at least.
HUMAN SUBJECTS PROTECTION PROGRAM Office Location: 1350 N. Vine Ave. (one block west of Cherry Ave. & three blocks north of Speedway) PO Box Phone:
RESEARCH ETHICS AND DATA CONFIDENTALITY: ANONYMISATION AND ACCESS CONTROL ……………………………………………………………………………………………………………………………….…………………………….. ……………………………………………………………......…...
Because good research needs good data Funded by: Digital Curation for Researchers, 28th February 2013 The Shifting Research Data Management Policy Landscape.
Introduction ESDS Qualidata John Southall ESDS Creating and delivering re-usable qualitative data 24 June 2004.
United Nations Economic Commission for Europe Statistical Division Mapping Data Production Processes to the GSBPM Steven Vale UNECE
Choosing Between Data Sharing Repositories for Engineering Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.
DMPTool and Data Management Basics Hannah Norton July 29, 2014 Image modified from :
Peter Granda Archival Assistant Director / Data Archives and Data Producers: A Cooperative Partnership.
Copyright 2010, The World Bank Group. All Rights Reserved. Testing and Documentation Part II.
Nov 26, Health-y sharing of human data. 2 Plan ahead.. It can be done in many cases, to great success and benefit!
11 Researcher practice in data management Margaret Henty.
PhD-course Research Data Management (RDM) Expert Centre Research Data.
HETUS Pilot Group 8 Privacy procedures and ethical issues Kimberly Fisher, Centre for Time Use Research – co-ordinator External consultant Kai Ludwigs.
Presented By Margaret Hellen Atiro Uganda Bureau of Statistics at the United Nations Regional Seminar on Census Data Archiving 20 – 23 Sep 2011, Addis.
Session 6: Data Flow, Data Management, and Data Quality.
Component D: Activity D.3: Surveys Department EU Twinning Project.
Research Data Management in the Humanities: an Introduction to the Basics Open Exeter Project Team.
GSH-course Research Data Management (RDM) Expert Centre Research Data.
Open Exeter Project Team
Research Data Management: an Introduction Jane Fry July 27, 2017
Karen Dennison Collections Development Manager
Dining with Diabetes IRB Training 2017.
Institutional role in supporting open access, open science, open data
Duck, Duck, Goose Keeping your IRB Ducks in a Row
General Finnish DMP Guidance
Research Data Management: Store and Analyze
Alignment Dr. Mary Clisbee
Data Management: Documentation & Metadata
Data Management Ethical considerations for educational research
Good Spirit School Division
Mapping Data Production Processes to the GSBPM
Ethics & Data Management
Presentation transcript:

PART 2: DATA READINESS CASRAI CONFERENCE RECONNECT BIG DATA: THE ADVANCE OF DATA-DRIVEN DISCOVERY OCTOBER 16, 2013 JANE FRY Research Data Management: planning and implementation

Agenda 2 Before data collection and processing  Planning and organizing Data collection and processing After data collection and processing  Metadata Your turn  No data expertise needed! Moore and Fry, CASRAI 2013 (October 16, 2013)

Before: why? 3 Why an RDMP?  Essential  For any type of data Why plan & organize?  Journal requirements  be proactive  Safety  protect your data  Efficiency  easier to write up analyses and reports  Quality  ensures high quality when guidelines laid out at beginning Make a checklist or a template Moore and Fry, CASRAI 2013 (October 16, 2013)

If no RDMP 4 Potential problems  Each type of data has its own ‘peculiarities’  will you remember them after 1, 2, 3, … years  What about other researchers  Loss of information  Inability to share  Inability to replicate  Not receive all monies from grant  Not as much analysis can be conducted  Cannot submit to journals Moore and Fry, CASRAI 2013 (October 16, 2013)

Before: plan and organize 5 What type of data How the data will be collected and processed Where and how will they be stored How will they be secured Where will the back-up be kept How will confidentiality be maintained What metadata to record Moore and Fry, CASRAI 2013 (October 16, 2013)

Before: type of data? 6 The type chosen will determine the format to be used for analysis  Quantitative  Microdata (.sav)  Aggregate data (.xls)  Qualitative (NVivo)  Geospatial (Vector and raster data)  Digital images (.jpeg)  Digital audio (.wav)  Digital video (.mp4)  Documentation, scripts (.doc) Moore and Fry, CASRAI 2013 (October 16, 2013)

Before: collection methods? 7 Depends on type of data  Questionnaires  Interviews  Focus groups  Observations  Transcripts  Newspaper articles  Journals  Diaries  … Moore and Fry, CASRAI 2013 (October 16, 2013)

Before: collection methods (cont’d) 8 Partially determined by type of data  Paper  Face-to-face  Web  Telephone  Snail mail   Audio  Video  … Moore and Fry, CASRAI 2013 (October 16, 2013)

Before: storage? 9 Where will it be stored  Your laptop, pc, Smartphone  Your researchers' laptop, pc, Smartphone  The shared drive in the office  A dropbox Controlled by what country What format will be used for storage  Proprietary? Preservation  How  Repository  Where  Your institution Moore and Fry, CASRAI 2013 (October 16, 2013)

Before: storage strategies 10 Two different locations Two copies (at least) Keep original data with no manipulations  2 copies What to keep  Everything! Use meaningful file names  Set out format to be used  Everyone has to use this format Moore and Fry, CASRAI 2013 (October 16, 2013)

Before: security issues? 11 How to secure data  Determine before hand To prevent unauthorized access  Intentional  Unintentional Remote access – yes or no  Off-site investigators  Off-site research team members Personal or sensitive data  Separate location from the main dataset  Limited, controlled access  Encrypted Moore and Fry, CASRAI 2013 (October 16, 2013)

Before: back-up? 12 Where will all information be backed-up  If at your institution  How often do they back-up  What are their policies for data retention How often will you back-up  When the project is over  After a year  Monthly  Weekly Moore and Fry, CASRAI 2013 (October 16, 2013)

Before: confidentiality? 13 What procedures will be taken to ensure confidentiality Data must be anonymised (unless permission has been granted)  Not possible to identify any individual  Aggregate certain variables  e.g., no low levels of geography  Hide outliers by recoding Record all decisions made  Why this decision made  How the variable has been recoded Moore and Fry, CASRAI 2013 (October 16, 2013)

Before: confidentiality (cont’d) 14 Disclosure processing  At what point in the data collection/processing  Remove direct identifiers  Names  Addresses  Telephone numbers  Remove indirect identifiers  Detailed geographic information  Exact occupations  Exact dates of events Birth Marriage Income Moore and Fry, CASRAI 2013 (October 16, 2013)

Before: confidentiality (cont’d) 15 Legal and ethical obligations to managing and sharing data  Ethics approval of your institution  National Data Policy (regarding sharing of data)  Canada (FIPPA)  UK (ESRC)  How will confidentiality be maintained  How to protect the privacy of the respondents  How will the confidential information be handled and managed  How to store respondents’ identification, if necessary  Disclosure  only if agreed to by respondent Moore and Fry, CASRAI 2013 (October 16, 2013)

Before: metadata? 16 Why keep metadata  Researchers re-use data  Secondary analysis  Comparative research  Teaching  Replicate a study  Requirement of our funders  Good research practice Start documenting at the very beginning of the project End goal  For this data to be replicated, if needed Moore and Fry, CASRAI 2013 (October 16, 2013)

Before: metadata (cont’d) 17 What to keep - everything!  Research design  Data collection  Data preparation  Questionnaires  Interviewer instructions  Meeting notes among researchers  Details of decisions made Why certain decisions were made e.g. if data collection not to be done on a certain date (Easter) Moore and Fry, CASRAI 2013 (October 16, 2013)

Before: metadata (cont’d) 18 Processes  What worked  What didn’t work  Changes made after pilots conducted  Why they were made  Was another pilot conducted after changes made Any and all changes that were made or not made Moore and Fry, CASRAI 2013 (October 16, 2013)

Before: metadata (cont’d) 19 Consent of participant (if needed) Disclosure processing Names of everyone involved in the project Source of all funding  Monetary  In kind Source of any data used that is not from this data collection  e.g., postal code conversion file Moore and Fry, CASRAI 2013 (October 16, 2013)

Before: a tip 20 If contracting out data processing  Specify deliverables  User Guide Date work performed Methodology of data cleaning, input, … Details of any new variables Reasons for making them Procedures, … Name and contact information Copy of questionnaire (if applicable)  Raw data Questionnaires, interviews, …  Example of incomplete deliverable Moore and Fry, CASRAI 2013 (October 16, 2013)

Data collection and processing 21 Some of the steps are  Transcribe  Code  Enter  Check  Validate  Clean  Anonymise Vary depending on the type of data collected One element in common with all types of data  Must record metadata Moore and Fry, CASRAI 2013 (October 16, 2013)

And next 22 All the decisions have been made Your checklist/template has been made The data have been collected and processed What now?  Complete metadata on  the data  the documentation Moore and Fry, CASRAI 2013 (October 16, 2013)

After: data 23 Metadata on data: must be well organized  How they were created  How they were digitized  How they were anonymised  Explanation of codes used  Explanation of classification scheme(s) used  e.g., occupation  Any and all changes that were made  Access conditions  e.g., member of your institution  Terms of use  e.g., academic or teaching purposes  e.g., non-profit Moore and Fry, CASRAI 2013 (October 16, 2013)

After: data (cont’d) 24 Data metadata  File names  Meaningful  Set up a system beforehand  Make sure everyone sticks to it  Versioning  Set up a system beforehand  What changes necessitate a new version number Version 1 to Version 2 e.g., one of the variables was coded incorrectly, therefore the dataset was replaced  What changes do not necessitate a new version number Version 1 to Version 1.1 e.g., Something small like a spelling mistake Moore and Fry, CASRAI 2013 (October 16, 2013)

After: data (cont’d) 25 Transcribing  guidelines set up beforehand  Transcribing conventions  Instructions  Guidelines Variables  Names  Labels  Comprehensible  Unique  Description  Value labels  Comprehensible  Complete  Associated question Moore and Fry, CASRAI 2013 (October 16, 2013)

After: data (cont’d) 26 Recoded variables  Why they were needed (e.g., geographic location)  Why they were done the way they were (e.g., age)  All of the above list under variables Derived variables  Derived from what  Be specific  Why was it done  All of the above list under variables Missing values  Codes used  Should be consistent  Reasons for missing values Weighting variable(s)  Description  Formula(s) Moore and Fry, CASRAI 2013 (October 16, 2013)

After: documentation 27 What to put in?  Information for a researcher looking at your dataset for the first time with no prior knowledge  As specific as possible  All associated documentation about the research Moore and Fry, CASRAI 2013 (October 16, 2013)

28

After: documentation (cont’d) 29 Study background  Purpose  Time frame  Geographic location  Creator, principal investigator(s), other investigator(s)  Funders  Sampling design  Description  Size  Any changes that were made Moore and Fry, CASRAI 2013 (October 16, 2013)

After: documentation (cont’d) 30 Study description  Describes all aspects of the data collection and processing  Data collection methodology  Data preparation procedure  Data validation protocols  Instruments used  Geographic coverage  Temporal coverage  Date of file creation  Description of codes and classifications used Moore and Fry, CASRAI 2013 (October 16, 2013)

After: documentation (cont’d) 31 Codebook or user guide  Original questionnaire/data collection instrument  All interviewer instructions  Any documentation describing variables  Original ones  Recoded  Derived  Weight  Include formulas used to construct variables Moore and Fry, CASRAI 2013 (October 16, 2013)

A tip: 32 Much of the information in the previous slides may seem like common sense  You will be tempted not to follow it  No time  No facilities to record it  Will do it later  Minor change, therefore not important enough to mark down  Of course, I will remember it!  What if?  You forget to mark it down  You forget to tell rest of research team If you follow a checklist, neither you nor your team will be caught short! Moore and Fry, CASRAI 2013 (October 16, 2013)

In sum 33 In this section you have learned  What to do before data collection  Plan and organize Data type, data collection and processing, storage, security, back-up, confidentiality, metadata  To make a checklist or template  About data collection and processing (in brief)  After data collection and processing  Metadata data, documentation Moore and Fry, CASRAI 2013 (October 16, 2013)

Research Data Management 34 Exercise #2: Data Readiness Is this data set ready for deposit? Why? Why not? Dataset Title: Attitudes of Pets towards their Owners (October 1998) Documentation available: The following text file: “This survey was conducted by the Pet Researchers of Canada and was analysed by the Acme Research Company. There is no documentation available for this survey. Use basic survey methodology if necessary. There are some interesting results in this survey.” Data available: A microdata file with some variable and value labels. Example 1: Name of variable: V35 Frequency: Yes = 35%, No = 47% Example 2: Name of Variable: Region of Country Frequency: 1 = 12%; 2 = 32%; 3 = 35%; 4 = 15%; 5 = 4%

Pat Moore Associate University Librarian: Research, Scholarship and Technology Carleton University x2745 Jane Fry Data Specialist Carleton University x1121 Contact Information 35 Moore and Fry, CASRAI 2013 (October 16, 2013)

References Corti, L “Managing qualitative data”. Datum Workshop, Newcastle, 26 May Retrieved 7 October 2013 from /corti_dataforlife_ pdf /corti_dataforlife_ pdf Fry, J. and Edwards, A.M. (2009). “ Protocols for accepting data.” Retrieved 7 October 2013 from UK Data Archive. “Create & manage data: Research Data lifecycle”. Retrieved 13 October 2013 from Stephenson, L. “Data management for advanced research”. Presentation given 28 March UCLA Social Science Data Archive, Unpublished. 36 Moore and Fry, CASRAI 2013 (October 16, 2013)