1 rules of engagement no computer or no power → no lesson no SPSS → no lesson no homework done → no lesson GE 5 Tutorial 5.

1 rules of engagement no computer or no power → no lesson no SPSS → no lesson no homework done → no lesson GE 5 Tutorial 5

8 Topics samples and populations significance for the correlation coefficient standard error SPSS

74 1.Quiz 2.Discussion homework 3.Sampling and samples 4.Significance 5.Standard error 6.SPSS workshop 7.Discussion homework next week Content tutorial 5

1. Quiz

2. Discussion homework

Quick refresher - correlation - measures of association

13 Scatter plot When we look at this scatterplot what can we say about the correlation?

Levels of measurement nominalordinal interval or ratio dichotomy nominal Cramérs V or eta Cramérs V ordinal Cramérs VSpearman rho Spearman rho Spearman rho interval or ratio Cramérs V or eta Spearman rho Pearson r dichotomy Cramérs VSpearman rho and measures of association phi

3. Chapters 9, 10, 11 of Howitt & Cramer

Chapter 9: Samples & Populations We talked about large and small correlations 0.3 is small; 0.8 is large; -0.8 is just as large But how large is an “interesting” correlation? Statistical answer: When it is a “significant” correlation Significant = Unlikely to have occurred by chance

Example “This is not an organized demonstration, we just all happened to be wearing the same thing today” Probably not true, because unlikely to have happened by chance.

Chapter 9: Samples & Populations The same question can be asked about samples and populations : – we measure the enthusiasm about a TV show in a sample of 100 people – we find much higher ratings than we expected – how unlikely is it that we found this by chance – if unlikely enough, we conclude that the TV show is more popular than we thought. Inferential Statistics How can we use a sample to draw conclusions that apply to the population?

DEFINITIONS We are not necessarily talking about people, but about scores - age of husband/wife, each couple is a score - “phone brands owned over life time”: multiple scores per person Population: entire set of scores that we are interested in - so only the target group Sample: set of scores that we select for research - because we simply cannot measure every score - a good sample is a random sample

Samples - Statistical inference generally assumes samples are drawn randomly from population - Randomness does not mean just doing something - Often you have to take special measures to give every score the same chance: if you are asking people in the street, don't only do this in the mornings - Random sample: “each score in the population has an equal chance of being selected”

Samples can approximate population Table 9.1 of the book has a population of 100 scores  Mean = 5.52 Table 9.2 has the means of 40 small random samples (size 5)  The mean of sample means is: 5.20, min: 2.00, max: 8.80  SD: 1.60  Sample means from small samples are not such good estimates of the population mean.

larger samples Table 9.3 has means of 40 large samples (of size 20) drawn from same population What can we observe?  The mean of sample means is: 5.33, min: 5.25, max: 6.85  SD: 0.60  Larger sample size and much better estimate of population

Conclusions If we draw a random sample the outcome will not be the same as the outcome in a population. The mean of the sample will not be the same as the mean of the population But the larger the sample, the larger the chance that the sample outcome will be close to the population outcome As we saw: - very small samples are not reliable - larger samples can get close to the population values

SPSS random samples 10% OPEN THE BIG SMARTIE SURVEY! Clean Up the results first: negative results and 100> need to be deleted use data; select cases; IF; weight > 0 & weight < 100 choose “remove unselected cases” and press OK Take a random sample (10% of the cases) data; select cases; random sample; 10 % choose “filter unselected cases” Calculate descriptives

Aside about computers and randomness Computers cannot be really random (or creative) We all got the same “random” sample True random numbers can be found at random.org You can “seed” the SPSS random numbers with your own random number (I use 123456, use your birthday) 1) Go to transform->Random number generators and choose for the twister and a random starting point (for example 123456) 2) OR Open a syntax window (New; Syntax) and type and run this command: 1)SET SEED 123456. Then reselect all cases (Data; select cases; use all; OK) And select another random 10% (Data; select cases;...)

Distribution of means In this (unrealistic) case, we can take 40 samples from the same population We can plot the means we obtained and we find: A normal distribution of sample means Around the mean of means With standard deviation Which we call: standard error

standard deviation: for cases and one sample the average distance from the cases to the mean of the sample standard error: for a number of samples and a mean of means the average distance from this sample's mean to the mean of means But: We can hardly ever obtain more than one sample We also have a formula to estimate standard error from ONE sample only standard error: for one sample and a population mean (via formula, population mean unknown) the estimated 'average' distance from this sample mean to the population mean

The error in the measurement is called standard error Every measurement has a standard error The smaller the standard error the better the measurement. A smaller standard error means a better estimate of the mean of the population! So we could report: The sample mean of age in IMEM year 1 is 20 with standard error of 2.

Confidence interval What is confidence interval? - The confidence interval is a better way to report a mean and standard error - The confidence interval is a good way to understand significance.

Confidence Interval (CI) – report a sample mean The sample mean of age in IMEM year 1 is 20 with standard error of 2. The 68% CI of age was 18 – 22 years The 95% CI of age was 16 – 24 year ← most common The 99% CI of age was 14 – 26 years Meaning of the 95% confidence interval (95% CI): In 95 % of the cases, the population mean will be between 16 and 24.

Chapter 10: Statistical significance for correlation coefficient Remember what null hypothesis (H0) and alternative hypothesis (H1) is? E.g.: H0: there is no relationship between brain size and intelligence H1: there is a relationship between brain size and intelligence - What are the two variables in the previous H0? - What does this imply for the correlation between those? - What would a positive correlation imply?

Under H0 We measured brain size and IQ in 120 volunteers. We compute the Pearson r and the standard error Assume that the SE is 0.2 Under H0, there is no association, we should find r=0 under H0, 95% CI for r = [-0.4 ; +0.4] If we find a value outside 95% CI, we reject H0 sample r = 0.1 → H0 ok sample r = -0.3 → H0 ok sample r = 0.5 → reject H0, accept H1, there is a relationship sample r = -0.5 → reject H0, accept H1, there is an inverse relationship

ASIDE: How to compute estimated standard error Left: Estimated SE for a mean - computed from the standard deviation - here they used 's' for the standard deviation of the sample, we have used sigma ( σ) before. Right: Estimated SE for a correlation coefficient Neither formula is on the exam

Another example H1: Students in this year are smarter than students from last year H0: this year is just as smart as last year Last year’s exam mark was 7.2 - H0: the average exam mark will again be a 7.2 We take a sample of 64 students and find mean 7.7, standard deviation 1.6 the average student is within 1.6 marks of 7.6 estimated standard error = sd / √n = 1.6 / 8 = 0.2 Under H0, mean = 7.2, 95% CI = [6.8 ; 7.6] Observed sample mean is outside of 95% CI We reject H0, this year is smarter! What if we had found mean 7.5, standard deviation 1.6? What if we had found mean 7.7, standard deviation 2.4?

Type I & II Errors Finding a significant results is usually a good thing But keep this in mind: We could make a Type I error Type I: decide that H0 is false when it is true Type II: decide that H0 is true when it is false Powerful statistical tests are those in which Type II has less chance You will never know whether you have made a Type I or II error

4. SPSS workshop

77 Open Big Smartie survey Use two scale type variables Use the correct correlation measure Define how strong the relationship is Check if the relationship is significant. SPSS Topics

77 Take a random sample of 20% of the population Do the same measurement again Calculate your measurements again Repeat this for a 50% sample SPSS Topics

77 Open Big Smartie survey Use two ordinal type variables Use the correct correlation measure Define how strong the relationship is Check if the relationship is significant. SPSS Topics

6. Homework

1 rules of engagement no computer or no power → no lesson no SPSS → no lesson no homework done → no lesson GE 5 Tutorial 5.

Similar presentations

Presentation on theme: "1 rules of engagement no computer or no power → no lesson no SPSS → no lesson no homework done → no lesson GE 5 Tutorial 5."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 rules of engagement no computer or no power → no lesson no SPSS → no lesson no homework done → no lesson GE 5 Tutorial 5.

Similar presentations

Presentation on theme: "1 rules of engagement no computer or no power → no lesson no SPSS → no lesson no homework done → no lesson GE 5 Tutorial 5."— Presentation transcript:

Similar presentations

About project

Feedback