DCAL Stats Workshop Bodo Winter
Outline Two learning curves Friday Jan 19 Saturday Jan 20
Outline Two learning curves Friday Jan 19 Saturday Jan 20
Outline Two learning curves Friday Jan 19 Saturday Jan 20
What is statistics? “Math-assisted thinking” “Statistics, more than most other areas of mathematics, is just formalized common sense, quantified straight thinking.” Paulos (1992: 58) Paulos, J. A. (1992). Beyond numeracy: Ruminations of a numbers man. New York: Vintage Books.
Statistics is part of the entire research cycle Theory/Hypothesis Publish paper, data and scripts Data collection Write-up ALL OF THAT IS STATISTICS Preprocessing/ Data Preparation Statistical Analysis
Statistics is part of the entire research cycle Theory/Hypothesis Publish paper, data and scripts “confirmatory statistics” Data collection Write-up ALL OF THAT IS STATISTICS Preprocessing/ Data Preparation Statistical Analysis
Statistics is part of the entire research cycle “confirmatory statistics” = hypothesis-testing “exploratory statistics” ALL OF THAT IS STATISTICS = hypothesis- generating
“Getting meaning from data” Descriptive Statistics Michael Starbird Inferential Statistics
“Getting meaning from data” Word Emotional Valence minty +1.52 juicy +1.56 smelly -1.87 sweet +2.12 putrid -1.78 delicious +1.82 stinky -1.49 rancid -2.11 Descriptive Statistics Michael Starbird Inferential Statistics Winter (2016), Language, Cognition and Neuroscience
“Getting meaning from data” Word Emotional Valence minty +1.52 juicy +1.56 smelly -1.87 sweet +2.12 putrid -1.78 delicious +1.82 stinky -1.49 rancid -2.11 Winter (2016), Language, Cognition and Neuroscience
“Getting meaning from data” Word Emotional Valence sweet +2.12 delicious +1.82 juicy +1.56 minty +1.52 stinky -1.49 putrid -1.78 smelly -1.87 rancid -2.11 M = 1.8 M = -1.8 Winter (2016), Language, Cognition and Neuroscience
Everything is grounded in the notion of a “distribution”
Everything is grounded in the notion of a “distribution”
Everything is grounded in the notion of a “distribution”
Everything is grounded in the notion of a “distribution”
Everything is grounded in the notion of a “distribution”
Everything is grounded in the notion of a “distribution”
Everything is grounded in the notion of a “distribution”
Everything is grounded in the notion of a “distribution” “uniform distribution”
Everything is grounded in the notion of a “distribution” “uniform distribution” Inspired by Cartoon Guide to Statistics
Everything is grounded in the notion of a “distribution” “uniform distribution” Inspired by Cartoon Guide to Statistics
Everything is grounded in the notion of a “distribution” Inspired by Cartoon Guide to Statistics
Everything is grounded in the notion of a “distribution” “normal distribution” Inspired by Cartoon Guide to Statistics
Everything is grounded in the notion of a “distribution” “Gaussian distribution” Inspired by Cartoon Guide to Statistics
Everything is grounded in the notion of a “distribution” “distribution with positive skew” Inspired by Cartoon Guide to Statistics
Ways continuous distributions differ Location Spread Shape Mean Median Mode Range Variance Standard deviation Inter-Quartile Range
Differences in location Warriner et al. (2013), Behavior Research Methods
Differences in location -4 +4 Warriner et al. (2013), Behavior Research Methods
Differences in location -4 +4 Warriner et al. (2013), Behavior Research Methods
Differences in location M = 0.2 -4 +4 Warriner et al. (2013), Behavior Research Methods
Differences in location M = -0.6 -4 +4 Warriner et al. (2013), Behavior Research Methods
Differences in location M = -0.6 -4 +4 Warriner et al. (2013), Behavior Research Methods
Differences in location sum of all the numbers (from the first number to the nth number) Differences in location divided by how many numbers you have +4 Warriner et al. (2013), Behavior Research Methods
Example: the mean of three response times 300ms 200ms 400ms Sum: 300 + 200 + 400 = 900 Divided by N: 900 / 3 = 300ms
The mean is a “balance point”. The median is a “half-way point”.
The mean is a “balance point”. The median is a “half-way point”.
The mean is a “balance point”. The median is a “half-way point”. 50% 50%
The mean is a “balance point”. The median is a “half-way point”. 50% 50%
Differences in spread: range -2.11 +1.56 -4 +4 Warriner et al. (2013), Behavior Research Methods
Differences in spread: standard deviation -4 +4 SD = 1.21 Warriner et al. (2013), Behavior Research Methods
Differences in spread: standard deviation -4 +4 SD = 1.21 Warriner et al. (2013), Behavior Research Methods
the mean Warriner et al. (2013), Behavior Research Methods
differences from the mean Warriner et al. (2013), Behavior Research Methods
squared differences from the mean Warriner et al. (2013), Behavior Research Methods
sum of squared differences from the mean Warriner et al. (2013), Behavior Research Methods
conceptually: “average” of sum of squared differences from the mean Warriner et al. (2013), Behavior Research Methods
conceptually: “undoing” the squaring Warriner et al. (2013), Behavior Research Methods
You can think of the standard deviation conceptually as the “average deviation” from the mean* * it is not technically the average deviation, but the basic idea is right Warriner et al. (2013), Behavior Research Methods
Differences in spread: SD -4 +4 SD = 1.21 Warriner et al. (2013), Behavior Research Methods
Differences in spread: SD -4 +4 SD = 0.41
The 68%-95% rule of thumb
The 68%-95% rule of thumb If the distribution is approximately normal 68% of the data fall within the interval: [ mean - SD, mean + SD ] 95% of the data fall within the interval: [ mean + 2 * SD, mean + 2 * SD ]
The 68%-95% rule of thumb Imagine a paper reports these two numbers: M = 600 ms, SD = 50 ms Between which two numbers do you expect 68% of the data? 550ms – 650 ms
The 68%-95% rule of thumb Imagine a paper reports these two numbers: M = 600 ms, SD = 50 ms Between which two numbers do you expect 95% of the data? 500ms – 700 ms
In R, computing all of this is easy... mean(yournumbers) sd(yournumbers) median(yournumbers) range(yournumbers)
Approaching R: Having the right attitude “I have been writing R code for years, and every day I still write code that doesn’t work!” Wickham & Grolemund (2017: 7) Wickham, H. & Grolemund, G (2017). R for Data Science. Sebastopol, CA: O’Reilly.