Presentation is loading. Please wait.

Presentation is loading. Please wait.

TIMOTHY SERVINSKY PROJECT MANAGER CENTER FOR SURVEY RESEARCH Data Preparation: An Introduction to Getting Data Ready for Analysis.

Similar presentations


Presentation on theme: "TIMOTHY SERVINSKY PROJECT MANAGER CENTER FOR SURVEY RESEARCH Data Preparation: An Introduction to Getting Data Ready for Analysis."— Presentation transcript:

1 TIMOTHY SERVINSKY PROJECT MANAGER CENTER FOR SURVEY RESEARCH Data Preparation: An Introduction to Getting Data Ready for Analysis

2 Data Preparation Overview Getting Started Organizing Data Cleaning Data  Data Entry Verification  Checking for Errors and Consistency  Formatting for Analysis: Data Transformations

3 Getting Started

4 Bridging the Gap Research Methodology Analysis

5 What Is Data Preparation? Creating and preparing a dataset to be analyzed Data can come from one or more data sources  Survey data (phone, web, mail)  Data tracked by your organization (internal reports)  Customer/client databases  Program data  Quality control data  Coded interviews

6 What Is Data Preparation? Before you start analysis, your data should be:  Organized  Consistently recorded  Error-free  Formatted for analysis

7 Allow for Ample Time Data Preparation can take up 50% or more of the time you dedicate to analysis Rushing/skipping data preparation Data errors  Low confidence  Starting analysis over

8 Document, Document, Document! Data collection and compilation process should be replicable  Where did you get the data?  How did you obtain it? Document problems  Collecting  Recording  Extracting

9 Document, Document, Document! Methodology report  Survey/Instrument/Experiment/Process Design  Sampling procedures (if applicable)  Response rates (if applicable)  Limitations

10 Organizing Data

11 Compile Your Data Start by figuring out all of the data components you will need for analysis.  What are the sources? Do you have access to all data that you need?  How do you get access to data that you need?  Track contact/retrieval information and date expectations  How much time do you need to build into your timeline?

12 Create Codes Code = A number or set of letters that stands for something else Question codes provide a way to reference a question Response codes provide a way to easily record results or answers

13 Create Codes

14

15 Create a Codebook

16 Enter Data Into a Database Two Methods:  Export from existing database  SurveyMonkey  Excel  Data entry  Paper survey data  Excel or SPSS  Create a codebook  Verification

17 Data Entry Enter information into your database one record at time Use your codebook to determine what you should enter into your database  Statistics program (i.e., SPSS)  Enter answer codes for data and define the codes in the program (i.e., Male = 1; Female = 2)  Excel or other spreadsheet program  Enter answer labels into spreadsheet (i.e., Male or Female)

18 Data Entry Considerations Create a unique ID for each record Illegible handwriting or unclear markings Missing data  How much can you tolerate?  Key questions?

19 Response Codes or Response Labels?

20 Data Cleaning

21 Data Entry Validation GIGO: Garbage In, Garbage Out Double user verification Double entry verification  Enter exact same data into two Excel tabs  Use a formula on a third tab to check the first two tabs  http://www.stattutorials.com/EXCEL/EXCEL_DOU BLE_DATA_ENTRY.html http://www.stattutorials.com/EXCEL/EXCEL_DOU BLE_DATA_ENTRY.html Data entry software / SPSS (using Compare Datasets)

22 Checking the Data: Ranges Are all answers within the accepted minimum and maximum values?  Age values of 150 or 16  Value of “7” on a 5-point scale

23 Checking the Data: Data Types Is data formatted in the correct way?  Age entered as thirty instead of 30  Date entered 29-10-2015 instead of 10-29-2015

24 Checking the Data: Data length Do all of your data entries have the correct number of digits or letters? Examples  Zip code with four digits instead of five or nine  Phone number with 11 digits instead of 10

25 Checking the Data: Fixing Errors Can the data be clarified with assistance?  Trace data back to point of origin  Review original data/database/instrument/source  If someone answered a survey, can you contact that person for clarification?

26 Checking the Data: Fixing Errors Is it reasonably correctable on your own?  If valid values are 1 – 5 and you have “11”, entering a “1” might be considered reasonable  If the valid values are between 0 and 100 and you have “232”, you can not make a reasonable determination between 23 and 32. Do not guess or choose a value at random; make it a missing value

27 Checking the Data: Missing Data Data can be missing for a variety of reasons:  Unanswered questions (forgot/declined to answer, illegible handwriting)  Data point was not applicable for a portion of records  Errors in recording data  Manually removed because you lacked confidence in the data

28 Checking the Data: Missing Data Create a unique code for missing values  Missing responses: -9998  Not applicable fields: -9999  Don’t know responses: 8888, 888, 88  Declined to answer responses: 9999, 999, 99

29 Checking the Data: Missing Data Statistical programs can remove your missing data from analysis

30 Statistical Analysis in Excel Can do statistical analysis in Excel Excel 2010 and 2013 (Windows):  File  Options  Add-Ins  New Data Analysis option appears under the Data tab

31 Statistical Analysis in Excel

32 Data Transformations in SPSS

33 Questions?

34 TIMOTHY SERVINSKY PROJECT MANAGER CENTER FOR SURVEY RESEARCH TJS32@PSU.EDU 717-948-4312 HTTPS://CSR.HBG.PSU.EDU Thank you!


Download ppt "TIMOTHY SERVINSKY PROJECT MANAGER CENTER FOR SURVEY RESEARCH Data Preparation: An Introduction to Getting Data Ready for Analysis."

Similar presentations


Ads by Google