Presentation is loading. Please wait.

Presentation is loading. Please wait.

DATA PREPARATION: PROCESSING & MANAGEMENT Lu Ann Aday, Ph.D. The University of Texas School of Public Health.

Similar presentations


Presentation on theme: "DATA PREPARATION: PROCESSING & MANAGEMENT Lu Ann Aday, Ph.D. The University of Texas School of Public Health."— Presentation transcript:

1 DATA PREPARATION: PROCESSING & MANAGEMENT Lu Ann Aday, Ph.D. The University of Texas School of Public Health

2 CODING THE DATA  Definition Translating the information that survey respondents provide into numerical or other symbols that can be processed by a computer

3 CODING THE DATA  Types of Questions Closed-end questions Assign numbers to the response categories Open-ended & other (specify) questions Review a selected number of cases Develop codes for responses provided Test-code selected cases Check coder inter-rater reliability Revise codes if needed

4 CODING THE DATA  Missing Data Develop uniform conventions for coding different types of missing data Respondent refused to answer: 7, 97, 997 Respondent did not know answer: 8, 98, 998 Question skipped (in error): 9, 99, 999 Question skipped (legitimate): blank

5 CODING THE DATA  Coding Conventions Assign: an I.D. number for each case Use: numeric, not alphabetic codes, for response categories in general Develop: procedures for systematically verifying coding and data entry

6 CODING THE DATA  Codebook For each question, document: variable name valid (allowable) range of values any specific coding instructions, e.g., whether to re- contact R if data are missing

7 ENTERING THE DATA  Transcriptive Data Entry (quex  database) Spreadsheets (e.g., EXCEL) Databases (e.g., ACCESS) Data entry software (e.g., SPSS)  Source Data Entry (quex = database) Optical scanning of forms Computer-assisted data collection (CATI, CAPI, CASI)

8 CLEANING THE DATA  Types: Range checking: verify that only valid values are used for responses within a question Contingency checking: verify that responses between questions that should be consistent are

9 CLEANING THE DATA  Procedures: Develop decision rules for reconciling errors Enter revised codes in data file based on decision rules Document questions for which data were revised in the data file

10 IMPUTING MISSING DATA  Deductive imputation Fill in information for Qs with missing data (e.g., gender) from other Qs (e.g., name)  Cold-deck imputation Fill in group estimates, e.g., means for Qs with missing data Overall mean: study sample mean Class mean: subgroup mean

11 IMPUTING MISSING DATA  Hot-deck imputation Fill in actual data from another related case on the data file for which information is available for Qs with missing data  Statistical imputation Derive imputed value based on regression or statistically derived distance function for “nearest” matching case

12 IMPUTING MISSING DATA  Multiple imputation Generates more than one acceptable value for the items that are missing, creates different complete data sets using the imputed values, and then combines the estimates resulting from the multiple iterations Attempts to reduce both bias and variance resulting from imputing only one value

13 ESTIMATING SELECTED DATA  Estimation methods use data external to the survey, e.g., average charges for selected outpatient procedures from AHA, to construct analysis variables not directly available in the survey, e.g., total charges for outpatient services used

14 ANTICIPATING DATA ANALYSIS  Generate descriptive frequencies Check for: Item non-response, i.e., missing values Decide: if imputation is needed Number of cases per response category Decide: whether categories may need to be collapsed for analysis Outliers Decide: whether to exclude outliers or assign an allowable “maximum” value

15 ANTICIPATING DATA ANALYSIS  Analyze non-response bias Compare respondents with the original target population on characteristics for which corresponding data are available Assign non-response or post- stratification weights to adjust if needed (see Aday & Cornelius, 2006, Chapter 7)

16 ANTICIPATING DATA ANALYSIS  Develop & evaluate summary scales Conduct reliability and validity testing of items to be included in summary scales (see Aday & Cornelius, 2006, Chapter 3) Decide whether to drop items from the final summary scale or not based on this reliability and validity testing

17 ANTICIPATING DATA ANALYSIS  Transform data if needed Assess the normality (skewness and kurtosis) of the distribution of major study variables Transform the data, e.g., compute logarithmic or other arithmetic transformation of the variables, to make them fit a more “normal” distribution

18 ANTICIPATING DATA ANALYSIS  Create dummy variables original variable: RACE: 1=White; 2=African-Amer; 3=Hisp dummy variables: RACE1: 1=White; 0=African-Amer or Hisp RACE2: 1=African-Amer; 0=White or Hisp RACE3: 1=Hispanic; 0=White or African-Amer referent group (omitted variable): RACE1: 1=White; 0=African-Amer or Hisp

19 AN APPLICATION  EpiData Software You can install EpiData software and related notes and manuals to demonstrate survey data entry and documentation: http://www.epidata.dk/index.htm

20 SURVEY ERRORS: Preparing the Data for Analysis Systematic Errors: imputation/ estimation errors Variable Errors: data coding, editing, or data entry errors Solutions to errors Compare the estimates based on alternative imputation procedures. Develop and implement quality control monitoring systems. Compare the estimates based on imputed and nonimputed data. Develop a decision logic model for reducing potential inconsistencies in the coding of the data. Reenter the data to identify variable errors in data entry.


Download ppt "DATA PREPARATION: PROCESSING & MANAGEMENT Lu Ann Aday, Ph.D. The University of Texas School of Public Health."

Similar presentations


Ads by Google