Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2018 Room 150 Harvill Building 10:00 - 10:50 Mondays, Wednesdays & Fridays. Welcome 10/15/18 http://www.youtube.com/watch?v=oSQJP40PcGI http://www.youtube.com/watch?v=oSQJP40PcGI http://www.youtube.com/watch?v=oSQJP40PcGI
Alyson Lecturer’s desk Chris Flo Jun Trey Projection Booth Screen Row A 15 14 Row A 13 12 11 10 9 8 7 6 5 4 3 2 1 Row A Chris Row B 23 22 21 20 Row B 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Row B Row C 25 24 23 22 21 Row C 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Row C Row D 29 28 27 26 25 24 23 Row D 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Row D Row E 31 30 29 28 27 26 25 24 23 Row E 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Row E Flo Row F 35 34 33 32 31 30 29 28 27 26 Row F 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Row F Row G 35 34 33 32 31 30 29 28 27 26 Row G 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Row G Row H 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 Row H 12 11 10 9 8 7 6 5 4 3 2 1 Row H 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 1 Row J Row J 13 12 11 10 9 8 7 6 5 4 3 2 41 40 39 38 37 36 35 34 33 32 31 30 29 Row K 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Row K Row L 33 32 31 30 29 28 27 26 25 Row L 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Row L Row M 21 20 19 Row M 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Row M Row N 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Row P 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Jun Trey table 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Projection Booth Left handed desk Harvill 150 renumbered
.. The Green Sheets
Before next exam (November 16th) Schedule of readings Before next exam (November 16th) Please read chapters 1 - 11 in OpenStax textbook Please read Chapters 2, 3, and 4 in Plous Chapter 2: Cognitive Dissonance Chapter 3: Memory and Hindsight Bias Chapter 4: Context Dependence
Lab sessions Presentations Project 2
What are they for? Confidence Intervals: Combining these three skills to build confidence intervals What are they for? Confidence Intervals: We are estimating a value but providing two scores between which we believe the true value lies. We can be 95% confident that our mean falls between these two scores. Central Limit Theorem Central Limit Theorem highlights importance of standard error of the mean (which is built on standard deviation) Using standardized scores (z scores) to find raw score values that border the middle part of the curve. Combining these three skills to build confidence intervals What confidence intervals are most useful for Finding scores that border the middle 95% of curve Using what we know about Central Limit Theorem to use the standard error of the mean rather than the standard deviation Building towards confidence intervals
Central Limit Theorem
x will approach µ Central Limit Theorem Proposition 1: If sample size (n) is large enough (e.g. 100) The mean of the sampling distribution will approach the mean of the population As n ↑ x will approach µ Proposition 2: If sample size (n) is large enough (e.g. 100) The sampling distribution of means will be approximately normal, regardless of the shape of the population As n ↑ curve will approach normal shape Proposition 3: The standard deviation of the sampling distribution equals the standard deviation of the population divided by the square root of the sample size. As n increases SEM decreases. As n ↑ curve variability gets smaller X
Sampling distribution: is a theoretical probability distribution of the possible values of some sample statistic that would occur if we were to draw an infinite number of same-sized samples from a population Eugene Frequency distributions of individual scores derived empirically we are plotting raw data this is a single sample X X Melvin X X X X X X X X X X X 23rd sample Sampling distributions sample means theoretical distribution we are plotting means of samples 2nd sample
Distribution of raw scores: is an empirical probability distribution Central Limit Theorem Distribution of raw scores: is an empirical probability distribution of the values from a sample of raw scores from a population Frequency distributions of individual scores derived empirically we are plotting raw data this is a single sample Eugene X X X X X X X Melvin X X X X X X Take a single score Repeat over and over x x x Population x x x x
same-sized sample = “fixed n” Central Limit Theorem Sampling distribution: is a theoretical probability distribution of the possible values of some sample statistic that would occur if we were to draw an infinite number of same-sized samples from a population important note: same-sized sample = “fixed n” Sampling distributions of sample means theoretical distribution we are plotting means of samples Take sample – get mean Repeat over and over Population
Sampling distribution: is a theoretical probability distribution of Central Limit Theorem Sampling distribution: is a theoretical probability distribution of the possible values of some sample statistic that would occur if we were to draw an infinite number of same-sized samples from a population important note: same-sized sample = “fixed n” Sampling distributions of sample means theoretical distribution we are plotting means of samples Take sample – get mean Repeat over and over Population Distribution of means of samples
Central Limit Theorem: If random samples of a fixed N are drawn from any population (regardless of the shape of the population distribution), as N becomes larger, the distribution of sample means approaches normality, with the overall mean approaching the theoretical population mean. Distribution of Raw Scores Sampling Distribution of Sample means Melvin X Eugene 23rd sample X 2nd sample
Central Limit Theorem µ = 100 Population Take sample – get mean Repeat Question: What if we made our sample absurdly large, as large as possible?? What if our sample size was HUGE - as large as the whole population? Take sample – get mean Repeat over and over Distribution of means of samples 100 Population µ = 100
Central Limit Theorem µ = 100 Population Take sample – get mean Repeat Question: What if we made our sample absurdly large, as large as possible?? What if our sample size was HUGE - as large as the whole population? What is the variability?? It is zero! Take sample – get mean Repeat over and over Distribution of means of samples 100 Population µ = 100
x will approach µ Central Limit Theorem Proposition 1: If sample size (n) is large enough (e.g. 100) The mean of the sampling distribution will approach the mean of the population As n ↑ x will approach µ Proposition 2: If sample size (n) is large enough (e.g. 100) The sampling distribution of means will be approximately normal, regardless of the shape of the population As n ↑ curve will approach normal shape Proposition 3: The standard deviation of the sampling distribution equals the standard deviation of the population divided by the square root of the sample size. As n increases SEM decreases. As n ↑ curve variability gets smaller X
Calculating Two Scores that Border Middle 95% of curve Step 1: Find mean (expected value) x is mean of sample Step 2: Find standard deviation “σ” if population Step 3: Decide on area of interest - Middle 95% α = .05 z = 1.96 - Middle 99% α = .01 z = 2.58 Step 4: Calculations for confidence intervals Step 5: Conclusion - tie findings with statement about what you are estimating
95% ? ? Find the scores that border the middle 95% Mean = 50 Standard deviation = 10 Find the scores that border the middle 95% ? ? 95% x = mean ± (z)(standard deviation) 30.4 69.6 .9500 .4750 .4750 Please note: We will be using this same logic for “confidence intervals” ? ? 1) Go to z table - find z score for for area .4750 z = 1.96 2) x = mean + (z)(standard deviation) x = 50 + (-1.96)(10) x = 30.4 30.4 3) x = mean + (z)(standard deviation) x = 50 + (1.96)(10) x = 69.6 69.6 Scores 30.4 - 69.6 capture the middle 95% of the curve
Calculating Confidence Intervals Step 1: Find mean (expected value) x is mean of sample Step 2: Find standard deviation and standard error of the mean “σ” if population Step 3: Decide on level of confidence - 95% Confident α = .05 z = 1.96 - 99% Confident α = .01 z = 2.58 Step 4: Calculations for confidence intervals Step 5: Conclusion - tie findings with statement about what you are estimating
σ n 95% ? ? 10 = = √ 100 √ Construct a 95% confidence interval Mean = 50 Standard deviation = 10 n = 100 s.e.m. = 1 ? ? 95% 48.04 51.96 ? .9500 .4750 For “confidence intervals” same logic – same z-score But - we’ll replace standard deviation with the standard error of the mean standard error of the mean σ √ n = 10 = x = mean ± (z)(s.e.m.) √ 100 x = 50 + (1.96)(1) x = 51.96 x = 50 + (-1.96)(1) x = 48.04 95% Confidence Interval is captured by the scores 48.04 – 51.96
? ? Construct a 95 percent confidence interval around the mean 95% ? We know this raw score = mean ± (z score)(s.d.) Some Mean Some Variability We used this one when finding raw scores associated with an area under the curve. We had all population info. Not really a “confidence interval” because we know the mean of the population, so there is nothing to estimate or be “confident about”. Hint always draw a picture! We used this one when finding raw scores associated with an area under the curve. We used this to provide an interval within which we believe the mean falls. We have some level of confidence about our guess. We know the population standard deviation. raw score = mean ± (z score)(s.e.m.) Similar, but uses standard error the mean based on population s.d.
Its own mean x Its own variability σx Construct a 95 percent confidence interval around the mean ? 95% ? Its own mean x Its own variability σx Choose confidence level z raw score = mean ± (z score)(s.e.m.) We used this one when finding an interval within which we believe the mean falls. We have some level of confidence about our guess. We know the population standard deviation.
Thank you! See you next time!!