Download presentation
Presentation is loading. Please wait.
1
Data Management – Documentation
Claire Osgood November 2017
2
Children’s Environmental Health Initiative
Documentation
3
Children’s Environmental Health Initiative
Documentation – The Documentation Gap Standard, good quality, comprehensive, up-to-date, easily accessible. Little or non-existent, low quality, out of date Documentation is typically sorely underrepresented and underestimated when working on a project. However, it is greatly appreciated when one is on the receiving end. It is even more appreciated when a project spans years, involves rotating/changing staff, is audited, or is worked on intermittently. When planning your time, include time for documentation. THINK. Consider what you would want/need if you came back to the data after not working with it for a year, two years, three years. What would a person who has never worked with the data need? What would they want to know? This is particularly important for the UDP project since the expectation is that researchers who were not part of the process to curate the data will be using the end product. Obscurity is NOT job security, quite the opposite. Typical State Desired State
4
Children’s Environmental Health Initiative
Documentation – Types of Documentation Programs tabular processing GIS spatial processing Source Codebook / Variable list Processing Metadata
5
Children’s Environmental Health Initiative
Documentation – Programs – Program Header (SAS) **********************************************************************************; ** PROGRAM NAME: ** PROJECT: ** ** PURPOSE: ** ; ** DATA: ** RESTRICTIONS: ** OUTLINE OF PROGRAM/ANALYSIS: ** PROGRAMMER: DATE ORIGINALLY WRITTEN: ** REVISIONS (BY/DATE): ** INPUT FILES: ** FNAME FTYPE CREATED BY ** OUTPUT FILES: ** FNAME FTYPE NOTES Data extent For processing programs: Geographic coverage, years, etc. For analysis programs: Source of data, spatial resolution, years, and restrictions (for example: no missing covariates, age<45, etc.)
6
Children’s Environmental Health Initiative
Documentation – Programs – Program Header (R) # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # Script name: # Project name: # # Program description: # Data: # Restrictions: # Outline of program/analysis: # Author: Date: # Revisions (by/date): # Input files: Fname Ftype Created by # Output files: Fname Ftype Notes
7
Children’s Environmental Health Initiative
Documentation – Programs – Comments What to document: For complicated/long programs, include an outline. In the program header, or at top of program Show section comments within the program for each part of the outline. Any time data fixes/changes are made, cite the source on which you base the change. For example, fixing code based on a telephone conversation or an . Explain tricky code in plain English.
8
Children’s Environmental Health Initiative
Documentation – Programs – Comments ** ; ** OUTLINE OF PROGRAM/ANALYSIS: ** Read in raw data, and standardize variable names and formats. ** Create coded variables. ** Construct indicator variables: xx, yy, and zz. ** Add geocode information ** Save final file(s) ** (1) Read in raw data, and standardize variable names and formats. [Code for this section] ** (2) Create coded variables. ** A. Create Race Code from …. ** B. Create …. ** C. Create… etc. [Code for creating coded vars….] ** (3) Construct indicator variables: xx, yy, and zz. ** More comments about these…. [Code for constructing indicator variables….]
9
Children’s Environmental Health Initiative
Documentation – Processing, Codebook/Variable List Every published dataset should be accompanied by a… …Processing document, and… …a Codebook or Variable List Think about who the audience is. This document, along with the codebook, is what will be shared with analysts who want to use the published data. If it is too long and detailed, they will not read it. If it is too short and general, it is not useful. This document is also used for audit trail purposes, and for data management, which require detail. For complicated data processing, it might be better to have multiple documents – one being more general for analysts, and it can reference documents with more detail. Or, consider including appendices that contain more detail. Example_Processing_Document.docx Example_Codebook1.xlsx Example_Codebook2.xlsx
10
Children’s Environmental Health Initiative
Documentation – Metadata Every file published that was created through GIS processing must include Metadata. For UDP, use ISO format for the standard Metadata. Example Metadata
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.