SFB stats workshop Bodo Winter
usefulness
learning outcomes
Two learning curves
Two learning curves
Two learning curves feel free to ask me on questions BOTH regarding R and regarding stats
Plan for today Introductory remarks Describing data Introduction to R Inferential stats (t-test, Chi-Square test) Wednesday: Linear models
Course website http://bodowinter.com/SFB/ Email?
Math-assisted thinking “Statistics, more than most other areas of mathematics, is just formalized common sense, quantified straight thinking.” Paulos (1992: 58) Paulos, J. A. (1992). Beyond numeracy: Ruminations of a numbers man. New York: Vintage Books.
Publish paper, data and scripts Preprocessing/ Data Preparation The Research Cycle Theory/Hypothesis Publish paper, data and scripts Data collection Write-up ALL OF THAT IS STATISTICS Preprocessing/ Data Preparation Statistical Analysis
Publish paper, data and scripts Preprocessing/ Data Preparation The Research Cycle Theory/Hypothesis Publish paper, data and scripts Data collection Write-up ALL OF THAT IS STATISTICS Preprocessing/ Data Preparation Statistical Analysis
Statistics = “getting meaning from data” Michael Starbird Descriptive Statistics Inferential Statistics COGNITIVE TOOL ... a way to assist thinking ... from this perspective follow certain things ... chiefly that if you do some type of stats that you don’t fully understand then you are essentially working against the purpose of doing statistics Michael Starbird
201 214 198 223 126 109 117 130
p < 0.001 M = 120.5 Hz male SD = 9.39 Hz M = 209 Hz female Descriptive Stats 109 117 126 130 198 201 214 223 M = 120.5 Hz male SD = 9.39 Hz M = 209 Hz female Inferential Stats SD = 11.63 Hz p < 0.001
Describing distributions
Describing distributions 1 throw
Describing distributions 2 throws
Describing distributions 3 throws
Describing distributions 4 throws
Describing distributions 5 throws
Describing distributions 6 throws
Describing distributions 7 throws
Describing distributions 8 throws
Describing distributions 30 throws frequency distribution, probability distribution
inspired by the Cartoon Guide to statistics
Uniform Distribution inspired by the Cartoon Guide to statistics
“Gaussian” Normal Distribution inspired by the Cartoon Guide to statistics
Ways continuous distributions differ Location Spread Shape Mean Standard deviation
Ways continuous distributions differ Location Spread Shape Mean Standard deviation
200 Response Times
200 Response Times Mean = 304.79
200 Response Times Mean = 404.79
200 Response Times Mean = 504.79
200 Response Times Mean = 604.79
200 Response Times Mean = 704.79
Voice pitch of 100 men Voice pitch of 100 women
The mean is a balance point The median is a half-way point
The mean is a balance point The median is a half-way point
The mean is a balance point The median is a half-way point
The mean is a balance point The median is a half-way point
Ways continuous distributions differ Location Spread Shape Mean Standard deviation
Ways continuous distributions differ Location Spread Shape Mean Standard deviation
Voice pitch of 100 men Voice pitch of 100 women Variance = 97.08 Variance = 386.22
How to calculate the variance Raw Data 6 3 2 5 4
How to calculate the variance Raw Data Mean of Data 6 4 3 2 5
How to calculate the variance Raw Data Mean of Data Differences 6 4 2 3 -1 -2 5
How to calculate the variance Raw Data Mean of Data Differences Squared Differences 6 4 2 3 -1 1 -2 5
How to calculate the variance Raw Data Mean of Data Differences Squared Differences 6 4 2 3 -1 1 -2 5 sum this and divide by N-1 to get variance
Formula for the variance taking the sum sum of squares squared differences from the mean dividing by total number of values minus one
Formula for the variance “sum of squares”
Voice pitch of 100 men Voice pitch of 100 women Variance = 97.08 Variance = 386.22
Voice pitch of 100 men Voice pitch of 100 women SD = 9.85 SD = 19.65
Variance Standard deviation
68% of the data lie within 1 standard deviation of the mean
Next time you read a paper... Between what values do you expect 68% of the data? What about 95% of the data?
The normal distribution family http://en.wikipedia.org/wiki/Normal_distribution
Ways continuous distributions differ Location Spread Shape Mean Standard deviation
Ways continuous distributions differ Location Spread Shape Mean Standard deviation
Normal Distribution inspired by the Cartoon Guide to statistics
A distribution with positive skew inspired by the Cartoon Guide to statistics
A distribution with negative skew inspired by the Cartoon Guide to statistics
reproducible, open research free, open-source, platform-independent ever-growing community STUDENTS!!! you would think SPSS is easier for students ... I beg to differ; I myself had no programming experience and found SPSS highly unintuitive plus you are increasing the threshold to actually use the software by having it restricted to university computers students are MUCH more likely to use something that they can run on their own computer and although they find R daunting at first, they quickly feel like doing something super fancy because what they do LOOKS and FEELS like programming (it isn’t) if you don’t believe me: I’ve taught R in content classes (a class on gesture) and even to literature students and it worked fully-fledged programming language
R packages on CRAN
base package versus tidyverse Hadley Wickham base package versus tidyverse
Approaching R: Having the right attitude “I have been writing R code for years, and every day I still write code that doesn’t work!” Wickham & Grolemund (2017: 7) Wickham, H. & Grolemund, G (2017). R for Data Science. Sebastopol, CA: O’Reilly.
STUDENTS. you would think SPSS is easier for students STUDENTS!!! you would think SPSS is easier for students ... I beg to differ; I myself had no programming experience and found SPSS highly unintuitive plus you are increasing the threshold to actually use the software by having it restricted to university computers students are MUCH more likely to use something that they can run on their own computer and although they find R daunting at first, they quickly feel like doing something super fancy because what they do LOOKS and FEELS like programming (it isn’t) if you don’t believe me: I’ve taught R in content classes (a class on gesture) and even to literature students and it worked