Data Analysis: Preliminary Steps Chapter 19
Stages in the Research Process Formulate Problem Determine Research Design Determine Data Collection Method Design Data Collection Forms Design Sample and Collect Data Analyze and Interpret the Data Prepare the Research Report SLIDE 19-1
Editing Editing involves the inspection and, if necessary, correction of each questionnaire or observation form. FIELD EDIT CENTRAL-OFFICE EDIT SLIDE 19-2
Coding The process of transforming raw data into symbols (usually numbers) that can be utilized for analysis. SLIDE 19-3
Coding Closed-ended Items What is your overall opinion of TARGET department stores? unfavorable favorable (Typical coding: 1 2 3 4 5 6 7 ) SLIDE 19-4
Coding Closed-ended Items: Check All That Apply How did you learn about Brown Furniture Company? (check all that apply) newspaper advertising radio advertising billboard advertising recommended by others drove by store other: _______________ Typical coding: 6 different variables (1 if checked; 0 if not) SLIDE 19-5
Coding Open-ended Items Open-ended items seeking concrete, or factual, responses are relatively easy to code: numeric answers are typically recorded as given by the respondent, while other types of responses are given a specific code number. In what year were you born? (code year) How many times have you eaten at Streeter’s Grill in the last month? (code number) (3) Name the first 3 coffee shops located in Tampa that come to mind. (code as 3 separate variables; assign numbers to represent each coffee shop mentioned) SLIDE 19-6
Coding Open-ended Items Open-ended items seeking less structured responses are much more difficult to code. In your own words, give us two or three reasons why you prefer to leave the state after graduation. SLIDE 19-7
Process for Coding (Abstract) Open-ended Questions Develop initial response categories (before reading responses) Identify usable responses Review responses; add, delete, revise categories Sort responses into categories, using multiple coders; compare results Repeat #3 and #4 if one or more categories are too broad Assign code numbers for each category; use these codes to represent responses in the data file Assess interrater reliability (the degree of agreement between coders); low interrater reliability suggests that the categories are not well-defined, and #3-6 should be repeated SLIDE 19-8
Not at all willing Somewhat willing Very willing Developing a Codebook SPORTING GOODS SURVEY Please answer the following questions about buying sporting goods over the internet: During the past year, what percentage of the sporting goods you purchased were ordered through the internet? ________ percent 2. How willing are you to purchase merchandise offered through the Avery Sporting Goods web site? Not at all willing Somewhat willing Very willing 3. Please provide some reasons why someone might not want to purchase sporting goods over the internet: SLIDE 19-9
Avery Sporting Goods – CODEBOOK (partial) Developing a Codebook Avery Sporting Goods – CODEBOOK (partial) Col. Var. Name Description 1-3 ID questionnaire identification number 4-6 PERCENT % products purchased through internet (record response) WILLING willingness to purchase through web site 1=not at all willing 2=somewhat willing 3=very willing REASON1 first reason for not purchasing over internet 1=security issues (open ended) 2=no internet access 3=can’t examine goods 4=difficult to return 5=don’t want to wait 6=prior bad exper. w/internet 7=other REASON2 second reason SAME REASON3 third reason SAME SLIDE 19-10
Building the Data File: First Two Records in Avery Sporting Goods Data File 001010231 10011024344 0020001415 10000012121 SLIDE 19-11
How to identify blunders: Cleaning the Data A BLUNDER is an error that occurs during editing, coding, or data entry. Blunders are usually due to researcher carelessness in coding or data entry. How to identify blunders: Run frequency analysis on all variables Check a sample of questionnaires against the data file Double-entry of data (preferred) SLIDE 19-12
Handling Missing Data If a particular case has a significant amount of missing data (unanswered questions), it should probably be eliminated during the editing process. SLIDE 19-13
Strategies for Handling Missing Data Report “missing” as separate category Eliminate cases with missing data from all analyses Eliminate cases with missing data from analyses involving the variables on which data are missing, but keep cases for other analyses Substitute values for the missing data based on (a) responses to other variables given by the same individual, or (b) responses to the same variable given by other individuals Eliminate the variable (for that respondent) from a multi-item scale Recontact the respondent SLIDE 19-14