Presentation is loading. Please wait.

Presentation is loading. Please wait.

PART 2: DATA READINESS CASRAI CONFERENCE RECONNECT BIG DATA: THE ADVANCE OF DATA-DRIVEN DISCOVERY OCTOBER 16, 2013 JANE FRY Research Data Management: planning.

Similar presentations


Presentation on theme: "PART 2: DATA READINESS CASRAI CONFERENCE RECONNECT BIG DATA: THE ADVANCE OF DATA-DRIVEN DISCOVERY OCTOBER 16, 2013 JANE FRY Research Data Management: planning."— Presentation transcript:

1 PART 2: DATA READINESS CASRAI CONFERENCE RECONNECT BIG DATA: THE ADVANCE OF DATA-DRIVEN DISCOVERY OCTOBER 16, 2013 JANE FRY Research Data Management: planning and implementation

2 Agenda 2 Before data collection and processing  Planning and organizing Data collection and processing After data collection and processing  Metadata Your turn  No data expertise needed! Moore and Fry, CASRAI 2013 (October 16, 2013)

3 Before: why? 3 Why an RDMP?  Essential  For any type of data Why plan & organize?  Journal requirements  be proactive  Safety  protect your data  Efficiency  easier to write up analyses and reports  Quality  ensures high quality when guidelines laid out at beginning Make a checklist or a template Moore and Fry, CASRAI 2013 (October 16, 2013)

4 If no RDMP 4 Potential problems  Each type of data has its own ‘peculiarities’  will you remember them after 1, 2, 3, … years  What about other researchers  Loss of information  Inability to share  Inability to replicate  Not receive all monies from grant  Not as much analysis can be conducted  Cannot submit to journals Moore and Fry, CASRAI 2013 (October 16, 2013)

5 Before: plan and organize 5 What type of data How the data will be collected and processed Where and how will they be stored How will they be secured Where will the back-up be kept How will confidentiality be maintained What metadata to record Moore and Fry, CASRAI 2013 (October 16, 2013)

6 Before: type of data? 6 The type chosen will determine the format to be used for analysis  Quantitative  Microdata (.sav)  Aggregate data (.xls)  Qualitative (NVivo)  Geospatial (Vector and raster data)  Digital images (.jpeg)  Digital audio (.wav)  Digital video (.mp4)  Documentation, scripts (.doc) Moore and Fry, CASRAI 2013 (October 16, 2013)

7 Before: collection methods? 7 Depends on type of data  Questionnaires  Interviews  Focus groups  Observations  Transcripts  Newspaper articles  Journals  Diaries  … Moore and Fry, CASRAI 2013 (October 16, 2013)

8 Before: collection methods (cont’d) 8 Partially determined by type of data  Paper  Face-to-face  Web  Telephone  Snail mail  E-mail  Audio  Video  … Moore and Fry, CASRAI 2013 (October 16, 2013)

9 Before: storage? 9 Where will it be stored  Your laptop, pc, Smartphone  Your researchers' laptop, pc, Smartphone  The shared drive in the office  A dropbox Controlled by what country What format will be used for storage  Proprietary? Preservation  How  Repository  Where  Your institution Moore and Fry, CASRAI 2013 (October 16, 2013)

10 Before: storage strategies 10 Two different locations Two copies (at least) Keep original data with no manipulations  2 copies What to keep  Everything! Use meaningful file names  Set out format to be used  Everyone has to use this format Moore and Fry, CASRAI 2013 (October 16, 2013)

11 Before: security issues? 11 How to secure data  Determine before hand To prevent unauthorized access  Intentional  Unintentional Remote access – yes or no  Off-site investigators  Off-site research team members Personal or sensitive data  Separate location from the main dataset  Limited, controlled access  Encrypted Moore and Fry, CASRAI 2013 (October 16, 2013)

12 Before: back-up? 12 Where will all information be backed-up  If at your institution  How often do they back-up  What are their policies for data retention How often will you back-up  When the project is over  After a year  Monthly  Weekly Moore and Fry, CASRAI 2013 (October 16, 2013)

13 Before: confidentiality? 13 What procedures will be taken to ensure confidentiality Data must be anonymised (unless permission has been granted)  Not possible to identify any individual  Aggregate certain variables  e.g., no low levels of geography  Hide outliers by recoding Record all decisions made  Why this decision made  How the variable has been recoded Moore and Fry, CASRAI 2013 (October 16, 2013)

14 Before: confidentiality (cont’d) 14 Disclosure processing  At what point in the data collection/processing  Remove direct identifiers  Names  Addresses  Telephone numbers  Remove indirect identifiers  Detailed geographic information  Exact occupations  Exact dates of events Birth Marriage Income Moore and Fry, CASRAI 2013 (October 16, 2013)

15 Before: confidentiality (cont’d) 15 Legal and ethical obligations to managing and sharing data  Ethics approval of your institution  National Data Policy (regarding sharing of data)  Canada (FIPPA)  UK (ESRC)  How will confidentiality be maintained  How to protect the privacy of the respondents  How will the confidential information be handled and managed  How to store respondents’ identification, if necessary  Disclosure  only if agreed to by respondent Moore and Fry, CASRAI 2013 (October 16, 2013)

16 Before: metadata? 16 Why keep metadata  Researchers re-use data  Secondary analysis  Comparative research  Teaching  Replicate a study  Requirement of our funders  Good research practice Start documenting at the very beginning of the project End goal  For this data to be replicated, if needed Moore and Fry, CASRAI 2013 (October 16, 2013)

17 Before: metadata (cont’d) 17 What to keep - everything!  Research design  Data collection  Data preparation  Questionnaires  Interviewer instructions  Meeting notes among researchers  Details of decisions made Why certain decisions were made e.g. if data collection not to be done on a certain date (Easter) Moore and Fry, CASRAI 2013 (October 16, 2013)

18 Before: metadata (cont’d) 18 Processes  What worked  What didn’t work  Changes made after pilots conducted  Why they were made  Was another pilot conducted after changes made Any and all changes that were made or not made Moore and Fry, CASRAI 2013 (October 16, 2013)

19 Before: metadata (cont’d) 19 Consent of participant (if needed) Disclosure processing Names of everyone involved in the project Source of all funding  Monetary  In kind Source of any data used that is not from this data collection  e.g., postal code conversion file Moore and Fry, CASRAI 2013 (October 16, 2013)

20 Before: a tip 20 If contracting out data processing  Specify deliverables  User Guide Date work performed Methodology of data cleaning, input, … Details of any new variables Reasons for making them Procedures, … Name and contact information Copy of questionnaire (if applicable)  Raw data Questionnaires, interviews, …  Example of incomplete deliverable Moore and Fry, CASRAI 2013 (October 16, 2013)

21 Data collection and processing 21 Some of the steps are  Transcribe  Code  Enter  Check  Validate  Clean  Anonymise Vary depending on the type of data collected One element in common with all types of data  Must record metadata Moore and Fry, CASRAI 2013 (October 16, 2013)

22 And next 22 All the decisions have been made Your checklist/template has been made The data have been collected and processed What now?  Complete metadata on  the data  the documentation Moore and Fry, CASRAI 2013 (October 16, 2013)

23 After: data 23 Metadata on data: must be well organized  How they were created  How they were digitized  How they were anonymised  Explanation of codes used  Explanation of classification scheme(s) used  e.g., occupation  Any and all changes that were made  Access conditions  e.g., member of your institution  Terms of use  e.g., academic or teaching purposes  e.g., non-profit Moore and Fry, CASRAI 2013 (October 16, 2013)

24 After: data (cont’d) 24 Data metadata  File names  Meaningful  Set up a system beforehand  Make sure everyone sticks to it  Versioning  Set up a system beforehand  What changes necessitate a new version number Version 1 to Version 2 e.g., one of the variables was coded incorrectly, therefore the dataset was replaced  What changes do not necessitate a new version number Version 1 to Version 1.1 e.g., Something small like a spelling mistake Moore and Fry, CASRAI 2013 (October 16, 2013)

25 After: data (cont’d) 25 Transcribing  guidelines set up beforehand  Transcribing conventions  Instructions  Guidelines Variables  Names  Labels  Comprehensible  Unique  Description  Value labels  Comprehensible  Complete  Associated question Moore and Fry, CASRAI 2013 (October 16, 2013)

26 After: data (cont’d) 26 Recoded variables  Why they were needed (e.g., geographic location)  Why they were done the way they were (e.g., age)  All of the above list under variables Derived variables  Derived from what  Be specific  Why was it done  All of the above list under variables Missing values  Codes used  Should be consistent  Reasons for missing values Weighting variable(s)  Description  Formula(s) Moore and Fry, CASRAI 2013 (October 16, 2013)

27 After: documentation 27 What to put in?  Information for a researcher looking at your dataset for the first time with no prior knowledge  As specific as possible  All associated documentation about the research Moore and Fry, CASRAI 2013 (October 16, 2013)

28 28

29 After: documentation (cont’d) 29 Study background  Purpose  Time frame  Geographic location  Creator, principal investigator(s), other investigator(s)  Funders  Sampling design  Description  Size  Any changes that were made Moore and Fry, CASRAI 2013 (October 16, 2013)

30 After: documentation (cont’d) 30 Study description  Describes all aspects of the data collection and processing  Data collection methodology  Data preparation procedure  Data validation protocols  Instruments used  Geographic coverage  Temporal coverage  Date of file creation  Description of codes and classifications used Moore and Fry, CASRAI 2013 (October 16, 2013)

31 After: documentation (cont’d) 31 Codebook or user guide  Original questionnaire/data collection instrument  All interviewer instructions  Any documentation describing variables  Original ones  Recoded  Derived  Weight  Include formulas used to construct variables Moore and Fry, CASRAI 2013 (October 16, 2013)

32 A tip: 32 Much of the information in the previous slides may seem like common sense  You will be tempted not to follow it  No time  No facilities to record it  Will do it later  Minor change, therefore not important enough to mark down  Of course, I will remember it!  What if?  You forget to mark it down  You forget to tell rest of research team If you follow a checklist, neither you nor your team will be caught short! Moore and Fry, CASRAI 2013 (October 16, 2013)

33 In sum 33 In this section you have learned  What to do before data collection  Plan and organize Data type, data collection and processing, storage, security, back-up, confidentiality, metadata  To make a checklist or template  About data collection and processing (in brief)  After data collection and processing  Metadata data, documentation Moore and Fry, CASRAI 2013 (October 16, 2013)

34 Research Data Management 34 Exercise #2: Data Readiness Is this data set ready for deposit? Why? Why not? Dataset Title: Attitudes of Pets towards their Owners (October 1998) Documentation available: The following text file: “This survey was conducted by the Pet Researchers of Canada and was analysed by the Acme Research Company. There is no documentation available for this survey. Use basic survey methodology if necessary. There are some interesting results in this survey.” Data available: A microdata file with some variable and value labels. Example 1: Name of variable: V35 Frequency: Yes = 35%, No = 47% Example 2: Name of Variable: Region of Country Frequency: 1 = 12%; 2 = 32%; 3 = 35%; 4 = 15%; 5 = 4%

35 Pat Moore Associate University Librarian: Research, Scholarship and Technology Carleton University 613.520.2600 x2745 pat.moore@carleton.ca Jane Fry Data Specialist Carleton University 613.520.2600 x1121 jane.fry@carleton.ca Contact Information 35 Moore and Fry, CASRAI 2013 (October 16, 2013)

36 References Corti, L “Managing qualitative data”. Datum Workshop, Newcastle, 26 May 2011. Retrieved 7 October 2013 from http://www.library.carleton.ca/sites/default/files/find/data/surveys/pdf_files /corti_dataforlife_20110526.pdf http://www.library.carleton.ca/sites/default/files/find/data/surveys/pdf_files /corti_dataforlife_20110526.pdf Fry, J. and Edwards, A.M. (2009). “ Protocols for accepting data.” Retrieved 7 October 2013 from http://spotdocs.scholarsportal.info/display/odesi/protocols http://spotdocs.scholarsportal.info/display/odesi/protocols UK Data Archive. “Create & manage data: Research Data lifecycle”. Retrieved 13 October 2013 from http://data-archive.ac.uk/create-manage/life-cyclehttp://data-archive.ac.uk/create-manage/life-cycle Stephenson, L. “Data management for advanced research”. Presentation given 28 March 2008. UCLA Social Science Data Archive, Unpublished. 36 Moore and Fry, CASRAI 2013 (October 16, 2013)


Download ppt "PART 2: DATA READINESS CASRAI CONFERENCE RECONNECT BIG DATA: THE ADVANCE OF DATA-DRIVEN DISCOVERY OCTOBER 16, 2013 JANE FRY Research Data Management: planning."

Similar presentations


Ads by Google