Download presentation
Presentation is loading. Please wait.
Published byLee Bridges Modified over 8 years ago
1
DATA PREPARATION: PROCESSING & MANAGEMENT Lu Ann Aday, Ph.D. The University of Texas School of Public Health
2
CODING THE DATA Definition Translating the information that survey respondents provide into numerical or other symbols that can be processed by a computer
3
CODING THE DATA Types of Questions Closed-end questions Assign numbers to the response categories Open-ended & other (specify) questions Review a selected number of cases Develop codes for responses provided Test-code selected cases Check coder inter-rater reliability Revise codes if needed
4
CODING THE DATA Missing Data Develop uniform conventions for coding different types of missing data Respondent refused to answer: 7, 97, 997 Respondent did not know answer: 8, 98, 998 Question skipped (in error): 9, 99, 999 Question skipped (legitimate): blank
5
CODING THE DATA Coding Conventions Assign: an I.D. number for each case Use: numeric, not alphabetic codes, for response categories in general Develop: procedures for systematically verifying coding and data entry
6
CODING THE DATA Codebook For each question, document: variable name valid (allowable) range of values any specific coding instructions, e.g., whether to re- contact R if data are missing
7
ENTERING THE DATA Transcriptive Data Entry (quex database) Spreadsheets (e.g., EXCEL) Databases (e.g., ACCESS) Data entry software (e.g., SPSS) Source Data Entry (quex = database) Optical scanning of forms Computer-assisted data collection (CATI, CAPI, CASI)
8
CLEANING THE DATA Types: Range checking: verify that only valid values are used for responses within a question Contingency checking: verify that responses between questions that should be consistent are
9
CLEANING THE DATA Procedures: Develop decision rules for reconciling errors Enter revised codes in data file based on decision rules Document questions for which data were revised in the data file
10
IMPUTING MISSING DATA Deductive imputation Fill in information for Qs with missing data (e.g., gender) from other Qs (e.g., name) Cold-deck imputation Fill in group estimates, e.g., means for Qs with missing data Overall mean: study sample mean Class mean: subgroup mean
11
IMPUTING MISSING DATA Hot-deck imputation Fill in actual data from another related case on the data file for which information is available for Qs with missing data Statistical imputation Derive imputed value based on regression or statistically derived distance function for “nearest” matching case
12
IMPUTING MISSING DATA Multiple imputation Generates more than one acceptable value for the items that are missing, creates different complete data sets using the imputed values, and then combines the estimates resulting from the multiple iterations Attempts to reduce both bias and variance resulting from imputing only one value
13
ESTIMATING SELECTED DATA Estimation methods use data external to the survey, e.g., average charges for selected outpatient procedures from AHA, to construct analysis variables not directly available in the survey, e.g., total charges for outpatient services used
14
ANTICIPATING DATA ANALYSIS Generate descriptive frequencies Check for: Item non-response, i.e., missing values Decide: if imputation is needed Number of cases per response category Decide: whether categories may need to be collapsed for analysis Outliers Decide: whether to exclude outliers or assign an allowable “maximum” value
15
ANTICIPATING DATA ANALYSIS Analyze non-response bias Compare respondents with the original target population on characteristics for which corresponding data are available Assign non-response or post- stratification weights to adjust if needed (see Aday & Cornelius, 2006, Chapter 7)
16
ANTICIPATING DATA ANALYSIS Develop & evaluate summary scales Conduct reliability and validity testing of items to be included in summary scales (see Aday & Cornelius, 2006, Chapter 3) Decide whether to drop items from the final summary scale or not based on this reliability and validity testing
17
ANTICIPATING DATA ANALYSIS Transform data if needed Assess the normality (skewness and kurtosis) of the distribution of major study variables Transform the data, e.g., compute logarithmic or other arithmetic transformation of the variables, to make them fit a more “normal” distribution
18
ANTICIPATING DATA ANALYSIS Create dummy variables original variable: RACE: 1=White; 2=African-Amer; 3=Hisp dummy variables: RACE1: 1=White; 0=African-Amer or Hisp RACE2: 1=African-Amer; 0=White or Hisp RACE3: 1=Hispanic; 0=White or African-Amer referent group (omitted variable): RACE1: 1=White; 0=African-Amer or Hisp
19
AN APPLICATION EpiData Software You can install EpiData software and related notes and manuals to demonstrate survey data entry and documentation: http://www.epidata.dk/index.htm
20
SURVEY ERRORS: Preparing the Data for Analysis Systematic Errors: imputation/ estimation errors Variable Errors: data coding, editing, or data entry errors Solutions to errors Compare the estimates based on alternative imputation procedures. Develop and implement quality control monitoring systems. Compare the estimates based on imputed and nonimputed data. Develop a decision logic model for reducing potential inconsistencies in the coding of the data. Reenter the data to identify variable errors in data entry.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.