Presentation is loading. Please wait.

Presentation is loading. Please wait.

Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle.

Similar presentations


Presentation on theme: "Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle."— Presentation transcript:

1 Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

2 We Are Here Today: Review & Processing

3 http://weknowmemes.com/2011/12/this-is-my-room-what-i-think-it-looks-like-what-my-mom-thinks-it-looks-like/

4 A well-prepared data collection “contains information intended to be complete and self-explanatory” for future users. Do no harm.

5 http://www.icpsr.umich.edu/files/ICPSR/access/dataprep.pdf

6 Review Documentation Data [Disclosure Review]

7 Is the data collection complete, accurate, and well-documented?

8 Documentation http://dx.doi.org/10.3886/ICPSR31521.v1

9 Essential Descriptive Elements Basic front matter Variable level details Methodology

10 Documentation: Front Matter Title Principal Investigator(s) http://dx.doi.org/10.3886/ICPSR31521.v1

11 Description Documentation: Front Matter Monitoring the Future: A Continuing Study of American Youth (12th-Grade Survey), 2009. Johnston, Lloyd D., Jerald G. Bachman, Patrick M. O'Malley, and John E. Schulenberg. Monitoring the Future: A Continuing Study of American Youth (12th-Grade Survey), 2009 [Computer file]. ICPSR28401-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2010-10-27. doi:10.3886/ICPSR28401.v1

12 Documentation: Variable-level Details National Longitudinal Study of Adolescent Health (Add Health), 1994-1995 (National Longitudinal Study of Adolescent Health (Add Health), Wave I School Administrator Codebook. http://www.cpc.unc.edu/projects/addhealth/codebooks/wave1/index.html http://www.cpc.unc.edu/projects/addhealth/codebooks/wave1/index.html

13 Variable Name Documentation: Variable-level Details

14 Variable Label Documentation: Variable-level Details

15 Variable Type Documentation: Variable-level Details

16 Question Text Documentation: Variable-level Details

17 Values Documentation: Variable-level Details

18 Value Labels Documentation: Variable-level Details

19 Missing Data Documentation: Variable-level Details

20 Summary Statistics Documentation: Variable-level Details

21 Constructed Variables Documentation: Variable-level Details

22 Skip Patterns Notes

23 Documentation: Variable-level Details (examples) American National Election Study, 2008-2009 Panel Study Frequency codebook, version 20090903. http://electionstudies.org/studypages/2008_2009panel/anes2008_2009panel_fcodebook.txt http://electionstudies.org/studypages/2008_2009panel/anes2008_2009panel_fcodebook.txt

24 Documentation: Variable-level Details (examples) Davis, James A., Tom W. Smith, and Peter V. Marsden. General Social Surveys, 1972-2008 [Cumulative File] [Computer file]. ICPSR25962-v2. Storrs, CT: Roper Center for Public Opinion Resarch, University of Connecticut/Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributors], 2010-02- 08. doi:10.3886/ICPSR25962

25 Documentation: Variable-level Details (examples) United States Department of Health and Human Services. Substance Abuse and Mental Health Services Administration. Office of Applied Studies. National Survey on Drug Use and Health, 2009 [Computer file]. ICPSR29621-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2010-11-16. doi:10.3886/ICPSR29621

26 Documentation: Variable-level Details (examples) United States Department of Justice. Office of Justice Programs. Bureau of Justice Statistics. Capital Punishment in the United States, 1973-2008 [Computer file]. ICPSR27982-v1. Ann Arbor, MI: Inter- university Consortium for Political and Social Research [distributor], 2010-09-07. doi:10.3886/ICPSR27982

27 Sample design: A description of how the cases that appear in the study were selected, including details about target populations, sampling frames, sample sizes, sampling errors, and sampling methods. Data collection procedures: The methods used to collect the data (e.g., telephone, mail, computer-assisted). Where applicable, this includes the exact instructions and protocols used by interviewers when they collected the data. Data processing: The activities and quality checks performed on the data collection to generate the final data products from the raw collected data. If files were merged, a full description of the process should be provided. Documentation: Methodology

28 Weighting: Where applicable, a description of the criteria for using weights in the analysis of a data collection, including how the weights were created, all weighting formulae or coefficients, a definition of their elements, and an indication of how the formulae are applied to the data. Confidentiality issues: Where applicable, a discussion of any confidentiality issues in the data, as well as the steps taken to mitigate disclosure risk. Documentation: Methodology

29 Other Documentation Questionnaire User Guide Handbook Manual Report Table User Agreement Errata

30 Useful Resources: Description ICPSR, “What is a codebook?” http://www.icpsr.umich.edu/icpsrweb/ICPSR/support/faqs/2006/01/what-is- codebook http://www.icpsr.umich.edu/icpsrweb/ICPSR/support/faqs/2006/01/what-is- codebook Institute for Health and Care Research Quality Handbook http://www.emgo.nl/kc/preparation/data%20collection/3%20Codebook.html http://www.emgo.nl/kc/preparation/data%20collection/3%20Codebook.html Princeton University Data and Statistical Services, “How to Use a Codebook” http://dss.princeton.edu/online_help/analysis/codebook.htm http://dss.princeton.edu/online_help/analysis/codebook.htm UCLA Social Science Data Archive, “Codebooks” http://dataarchives.ss.ucla.edu/tutor/tutcode.htm http://dataarchives.ss.ucla.edu/tutor/tutcode.htm

31 Data

32 Data Labels Does each variable have a variable name and label? Do all categorical variables have value labels? Are labels consistent?

33 Naming Conventions: Variables Variable Names: One-up numbers (V1, V2) Question numbers (Q1, Q2) Mnemonic names (age, race) Prefix, root, suffix systems (FAED, MOED)

34 Naming Conventions: Variables Variable Labels: Item/Question number Indicate variable content Indicate if variable constructed Q14: Assessment of R’s Health

35 Naming Conventions: Values Value Labels: Mutually exclusive, exhaustive, and defined Preserve original information Retain original coding scheme Respondent’s Employment Status Self-employed (1) Somewhere-else (2) No answer (9) Not applicable (BK)

36 Missing Data Are there missing data? Are missing data labeled? 77 = Inapplicable 88 = Don’t Know 99 = No Answer

37 Values Are the values reasonable (for example, date variables contain dates, gender variables don't have 10 categories, variables aren't all system missing)? Are there weight variables? If so, are they well documented?

38 Matching Data & Documentation Do the data match the documentation? Are values and/or labels listed in one but not in the other? Are all codes in the data valid (documented) according to the data collection instrument or PI's codebook? Are there duplicate records? Does the spelling look OK?

39 Processing History

40 Useful Resources: Data UK Data Archive, “Documenting Your Data/Data Level/Structured Tabular Data” http://www.data-archive.ac.uk/create-manage/document/data-level?index=1 ICPSR Guide to Social Science Data Preparation and Archiving: Phase 3: Data Collection and File Creation, “Documenting Your Data/Data Level/Structured Tabular Data” http://www.icpsr.umich.edu/icpsrweb/content/deposit/guide/chapter3quant.html

41 Activity Review the following data output and report any issues you find.

42 Examples of What to Look For: 42

43 43 Examples of What to Look For:

44 44 Examples of What to Look For:

45 45 Examples of What to Look For:

46 46 Examples of What to Look For:

47 47 Examples of What to Look For:

48 [Disclosure Review]

49 Discussion How much cleaning do you do to a data collection? When is it appropriate to change the ‘original order’ of a data collection? How many processing details do you include in the study documentation?

50 Example: Review @ICPSR

51 A well-prepared data collection “contains information intended to be complete and self-explanatory” for future users. Do no harm.

52 We Are Here Today: Review & Processing


Download ppt "Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle."

Similar presentations


Ads by Google