Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 5: Generalisability of Social Research and the Role of Inference Dr Gwilym Pryce.

Similar presentations


Presentation on theme: "1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 5: Generalisability of Social Research and the Role of Inference Dr Gwilym Pryce."— Presentation transcript:

1 1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 5: Generalisability of Social Research and the Role of Inference Dr Gwilym Pryce

2 2 Coin tossing experiment: is this a fair coin?

3 3

4 4 Coin tossing example:

5 5 Implications of fair coin experiment: n If we want to survey a sample of people as a means of saying something about the relative size of a particular group in the population –e.g. the true proportion of OO households facing repayment difficulties n it might take a large sample before the true value of a proportion emerges

6 6 Social Research: n We usually only have a sample n from which we want to infer something about the population –proportion (e.g. % with MPPI) –mean (e.g. average income) n I.e. we want to be able to ‘generalise’

7 7 Statistical Inference: n allows us to generalise from our sample to the population in a systematic way: –Assumes that each member of the ‘population’ has an equally likely chance of entering our sample –If this assumption holds, Statistical Inference takes into account random variation from sample to sample. –Allows us to derive a ‘confidence interval’ for the population mean or proportion

8 8 CIs allow us to make the following types of statement: n E.g. CI for a mean: –95% sure that the average age of a homeless person in Glasgow is between 37 and 45 years. n E.g. CI for a proportion: –95% sure that the proportion of one adult households with no children that have MPPI is between 25% and 28%

9 9 n We usually have different numbers of observations in different groups:  confidence intervals for the proportions & means based on those different groups will vary purely due to sample size n we may have a very large overall sample –but may end up with very large confidence intervals for the population mean or proportion for a particular group.

10 10

11 11

12 12

13 13 n Q/ Given: a sample size of 111 for HHs with 1 adult and 3 children with 15.4% with MPPI, what do you think the 95% confidence interval would be for the population % with MPPI?

14 14

15 15 Why not use intuition? n Q/ in group of 25 people, what is the probability that at least two of them will have the same birthday? n Q/ what’s the probability in a group of 60?

16 16

17 17 How does inference work? n Central Limit Theorem: –if we were able to take repeated samples, we would find that the means from each of those samples would be normally distributed. –Similarly, if we take repeated samples and compute the sample proportion for each, we would find that the sample proportions would have a normal distribution.

18 18 CLT: Distribution of means from repeated samples is normal if n is large

19 19 E.g. Even though GDP pc is not normally distributed, means from repeated samples are.

20 20 Samples: non-normal Population: non-normal

21 21 Sampling Distribution of the mean: normal (I.e. Means from repeated samples will be normally distributed)

22 22 Mean of the sampling distribution of means = population mean ||  ||

23 23 Normal Distribution and CLT: n CLT  the distribution of sample means is normal n Also: population mean = mean of all sample means n These two properties allow us to compute confidence intervals because: –Statisticians have worked out the probabilities associated with the normal curve

24 24 n Suppose we know the sampling distribution of the mean: –we can then say where 95% of sample means lie: e.g. 95% of LTYs lie between 1.2 and 3.3 –That is, 1.1 either side of the population LTY of 2.3 

25 25 But, to say that the sample mean lies within 1.1 of  is the same as saying that  is within 1.1 of the sample mean. –So 95% of all samples will capture the true population mean in the interval n Put another way, there are only 2 possibilities: Either the interval (sample mean ± 1.1) contains  Or our sample was one of the few samples (I.e. one of the 5%) for which the sample mean is not within 1.1 of 

26 26 CLT Applies also to proportions: n MPPI example: –For single parent HHs with 3 children, 95% sure that the population proportion for MPPI take up lies between 8.4% and 22.1% Either the interval 8.4% to 22.1% contains the population proportion Or our sample was one of the few samples (I.e. one of the 5%) for which the sample mean is not within the interval 8.4% to 22.1%

27 27 Testing hypotheses n Sometimes we want to use our sample test a particular hypothesis about the population: Average age that Glasgwegians first have sex is below 15 years. MPPI take-up has now reached the government target of 50% of all mortgage borrowers On average men earn more than women A higher proportion of smokers get lung cancer than non-smokers

28 28 The procedure for hypothesis testing n First establish a null hypothesis, H 0 : This usually says that something is equal to something (sometimes this is the opposite of the hypothesis we’d like to prove but not always): H 0 : Age 1 st have sex = 15 years H 0 : MPPI take-up = 50% H 0 : Ave. male wage = Ave. female wage H 0 : % smokers that get lung cancer = % non-smokers that get lung cancer

29 29 Then state the Alternative Hypothesis, H 1 : n H 1 usually says how we think the outcome will go (but not always) and has to a statement that includes, “not”, “>”, “<“ or “  ” H 1 : Age people 1 st have sex < 15 years H 1 : MPPI take-up  50% H 1 : Ave. male wage > Ave. female wage H 1 : % smokers that get lung cancer > % non-smokers that get lung cancer

30 30 We usually write the alternative hypothesis under the null: H 0 : Age 1 st have sex = 15 years H 1 : Age people 1 st have sex < 15 years H 0 : MPPI take-up = 50% H 1 : MPPI take-up  50% H 0 : Ave. male wage = Ave. female wage H 1 : Ave. male wage > Ave. female wage H 0 : % smokers that get lung cancer = % non-smokers that get lung cancer H 1 : % smokers that get lung cancer > % non-smokers that get lung cancer

31 31 And then… n Find the probability of false rejection of H0: I.e. if we reject H0, what are the chances that we have done so incorrectly? n This particular probability has a special name: “significance level”. n If we say that our alternative hypothesis is “statistically significant” we mean that the chances of false rejection of the null hypothesis are small.

32 32 Summary: n Social Research is usually based on samples n We usually want to use our sample to say something about the population –I.e. we want to be able to generalise n How precisely we can estimate the population mean or proportion depends on our sample size and the variation within the sample n Using the CLT, statistical inference offers a systematic way of establishing: –the range of values in which the population mean or proportion is likely to lie (‘a confidence interval’). –Whether a hypothesis about a mean or a proportion is likely to hold in the population.


Download ppt "1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 5: Generalisability of Social Research and the Role of Inference Dr Gwilym Pryce."

Similar presentations


Ads by Google