Presentation is loading. Please wait.

Presentation is loading. Please wait.

Accuracy Chapter 5.1 Data Screening. Data Screening So, I’ve got all this data…what now? – Please note this is going to deviate from the book a bit and.

Similar presentations


Presentation on theme: "Accuracy Chapter 5.1 Data Screening. Data Screening So, I’ve got all this data…what now? – Please note this is going to deviate from the book a bit and."— Presentation transcript:

1 Accuracy Chapter 5.1 Data Screening

2 Data Screening So, I’ve got all this data…what now? – Please note this is going to deviate from the book a bit and is based on Tabachnick & Fidell’s data screening chapter Which is fantastic but terribly technical and can cure insomnia.

3 Why? Data screening – important to check for errors, outliers, and assumptions. What’s the most important? – Always check for errors, outliers, missing data. – For assumptions, it depends on the type of test because they have different assumptions.

4 Big Important Rule Hypothesis testing: – We set alpha (Type 1 Error) to <.05 – Therefore, we use p <.05 as a criterion for statistical significance

5 Big Important Rule Data screening: – In data screening, we want to use a stricter criterion. – We want things to be really screwy before we start to change them. – Therefore, we are going to use p <.001 to denote things are bad.

6 The List – In Order Accuracy Missing Data Outliers It Depends (we’ll come back to these): – Additivity – Normality – Linearity – Homogeneity – Homoscedasticity

7 The List – In Order Why this order? – Because if you fix something (accuracy) – Or replace missing data – Or take out outliers – ALL THE REST OF THE ANALYSES CHANGE.

8 Dataset Example This dataset was collected to assess the resiliency of teenagers after a natural disaster. – The dataset contains the RS14 scale that measures resiliency, as well as several demographic variables. Modeled after a real paper from the Schulenberg lab, not the real data.

9 Dataset Example Import the dataset to get started – call it master for master dataset.

10 Accuracy Check for typos and other issues with the dataset. – Typos are less common with software that collects data for you (Surveymonkey, Qualtrics). Good time to reverse code any items.

11 Accuracy Check for typos with the table function. – table(column name or dataset name) – This example: table(master$Sex) – Good for checking data that should be categorical.

12 Accuracy First saved as a new dataset: notypos = master Fix those problems: – You can use the factor function to drop that one bad data point, and give the labels at the same time. – Lines 16 and 19, checking the code in line 22, 23 to make sure they were fixed correctly.

13 Accuracy Check for typos with the summary function: – summary(dataset name) – This example: summary(notypos) – Useful for continuous variables with a specific range.

14 Accuracy Interpret the output: – Check for high and low values in minimum and maximum – Grade/Absences – high value of 35 – RS1, 3, 7, 13 have values out of range Note: the range of the values would be something you would know because you made the dataset.

15 Accuracy Fix those problems: – Find the original data and figure out what the point should be. – Or delete that data point. Do not delete the whole person, just the wrong data point.

16 Accuracy How to set a range of scores to a value: – Use the logical operators to find those values. – Then set those values to a specific number. – Remember matrix form [ ROW, COLUMN ]

17 Accuracy First, find the bad data: – I checked what was going on with grades and absences by running the table function. – It appears that they just have two typos. – Let’s get rid of those.

18 Accuracy Find those typos: – We look for typos BY ROW (i.e. you are scanning each participant to check for the bad score). – Therefore, the code to check for the typo goes in the ROW spot. – notypos[ notypos$Grade > 34, ]

19 Accuracy Now, we want to fix ONLY that bad score. – You will want to say fix only that column. – Column information goes in the COLUMN section. – notypos[ notypos$Grade > 34, “Grade” ] See how the two match?

20 Accuracy Now, let’s set it to something – usually this value will be a NA. – notypos[ notypos$Grade > 34, “Grade”] = NA Do the same thing for absences: – notypos[ notypos$Absences > 34, "Absences"] = NA

21 Accuracy Since the RS columns all have the same rules, we can fix those all at the same time. – Figure out what columns you want to replace using numbers. – notypos[, 6:19]

22 Accuracy Then, figure out the logical rules: – notypos[, 6:19] > 7 Now, put those two together – notypos[, 6:19][ notypos[, 6:19] > 7 ] Now, set them equal to something – notypos[, 6:19][ notypos[, 6:19] > 7 ] = NA Note: just do the last step, I did it one part at a time to show you what was happening.

23 Accuracy-ish Means and SDs are useful to think about. – You want to make sure it’s the data you expect. If a mean is 1.2 in a 1-7 scale, that’s a good sign everyone picked 1. – SDs indicate the spread of the data – very large spreads (lots of error) and very small spreads (no variance) can be bad for you. Remember that depends on the scale of the data.

24 Accuracy Let’s use apply! (note: not tapply). The apply function is very similar to tapply, but it calculates functions by row or column in a data frame. – Remember that tapply calculates ONLY on one column at a time with a grouping variable.

25 Accuracy apply(data set name, 1 OR 2, FUN) – Data set name is the data set you are interested in calculating on. – 1 is for rows, 2 is for columns – FUN = function, mean/sd/etc. apply(notypos, 2, mean) apply(notypos, 2, sd)

26 Accuracy Why is it sad? – First, we tried to take an average of factor variables. Which is no good. – Let’s drop the factor columns. The minus sign says everything BUT these columns. – notypos[, -c(1,3)]

27 Accuracy apply(notypos[, -c(1,3)], 2, mean) Second problem! All those NA values  Apply will allow us to use the other facets of each function. – apply(notypos[, -c(1,3)], 2, mean, na.rm = TRUE) – apply(notypos[, -c(1,3)], 2, sd, na.rm = TRUE)


Download ppt "Accuracy Chapter 5.1 Data Screening. Data Screening So, I’ve got all this data…what now? – Please note this is going to deviate from the book a bit and."

Similar presentations


Ads by Google