Presentation is loading. Please wait.

Presentation is loading. Please wait.

Night 1. INFERENTIAL STATISTICS: USING THE SAMPLE STATISTICS TO INFER (TO) POPULATION PARAMETERS. Modular Course 5 Summary or Descriptive Statistics:

Similar presentations


Presentation on theme: "Night 1. INFERENTIAL STATISTICS: USING THE SAMPLE STATISTICS TO INFER (TO) POPULATION PARAMETERS. Modular Course 5 Summary or Descriptive Statistics:"— Presentation transcript:

1 Night 1

2 INFERENTIAL STATISTICS: USING THE SAMPLE STATISTICS TO INFER (TO) POPULATION PARAMETERS. Modular Course 5 Summary or Descriptive Statistics: Numerical and graphical summaries of data.

3

4 Making Decisions based on the Empirical Rule (Standard Normal Curve) 68% 95% 99.7%

5 Empirical Rule 68% 95% 99.7%

6 Empirical Rule 68% 95% 99.7%

7 Most Important for Inferential Stats on our Syllabus 95% 95% of normal data lies within 2 standard deviations of the mean

8 68% 95% 99.7% Example 1

9 68% 95% 99.7% Example 2

10 Your Turn

11 Race - Week B&B prices per room (€) 56756070807050908075 507550706065605070 8470 60 70 4060 70806065555070805055 68% 95% 99.7% Question

12 68% 95% 99.7% Solution

13 For Leaving Cert we deal with two types of sampling: 1.Sample Proportion (Ordinary Level and Higher Level) 2.Sample Means ( Higher Level) Sampling Inferential Statistics:

14 We are usually unable to collect information about a total population. The aim of sampling is to draw reasonable conclusions about a population by obtaining information from a relatively small sample of that population. When a sample from a population is selected we hope that the data we get represents the population as a whole. To ensure this 1.The sample must be random; 2. Every member of the population must have an equal chance of being included; Sampling

15 However we could say with a certain degree of confidence, if the sample was large enough and representative then the proportion of the sample would be approximately the same as the proportion of the population Population Proportions and Margin of Error

16 How confident we are is usually expressed as a percentage. We already saw (from the empirical rule) that approximately 95% of the area of a normal curve lies within ± 2 standard deviations of the mean. This means that we are 95% certain that the population proportion is within ±2 standard deviations of the sample proportion. ± 2 standard deviations is our margin of error and the percentage margin of error that this represents depends on the sample size. If n = 1000 the percentage margin of error of ± 3% 95% is the confidence interval we are working with, but other confidence intervals also exist (e.g.90% and 99%) for which a different margin of error applies depending on sample size. Population Proportions and Margin of Error

17 95% confident that the population proportion is inside this confidence interval 95% confidence interval Confidence interval for population proportion using Margin of Error

18 20 different 95% confidence intervals Question. A sample of 25 students in a school were asked if they spent over €5 on mobile phone calls over the last week. 10 students has spent over €5. Showing a 95% confidence interval. 95% of the time, the true population proportion is in the interval I made with my sampled proportion and the margin of error interval.

19 As the sample size increases the margin of error decreases A sample of about 50 has a margin of error of about 14% at 95% level of confidence A sample of about 1000 has a margin of error of about 3% at 95% level of confidence The size of the population does not matter If we double the sample size (1000 to 2000) we do not get do not half the margin of error Margin of error estimates how accurately the results of a poll reflect the “true” feelings of the population Some Notes on Margin of Error

20 Sample SizeMargin of Error 25  20% 64  12.5% 100  10% 256  6.25% 400  5% 625  4% 1111  3% 1600  2.5% 2500  2% 10000  1%

21 30% 16%24% Example 1

22

23 Your Turn

24 Question

25 10% 7.5%16.5% Question: Solution

26 Testing claims about a population. Null Hypothesis: The null hypothesis, denoted by H 0 is a claim or statement about a population. We assume this statement is true until proven otherwise. (the null hypothesis means that nothing is wrong with the claim or statement). Alternative Hypothesis: The alternative hypothesis, denoted by H 1 is a claim or statement which opposes the original statement about a population. Recognising the Concept of a Hypothesis Test

27 Courtroom Analogy to Teach Formal Language At the start of a trial it is assumed the defendant is not guilty. Then the evidence is presented to the judge and jury. The null hypothesis is that the defendant is not guilty (H 0 ) If the jury reject the null hypothesis (H 0 ), this means that they find the defendant guilty. If the jury fail to reject the null hypothesis (H 0 ), this means that they find the defendant not guilty.

28 Often we need to make a decision about a population based on a sample. 1.Is a coin which is tossed biased if we get a run of 8 heads in 10 tosses? Assuming that the coin is not biased is called a NULL HYPOTHESIS (H 0 ) Assuming that the coin is biased is called an ALTERNATIVE HYPOTHESIS (H 1 ) 2.During a 5 minute period a new machine produces fewer faulty parts than an old machine. Assuming that the new machine is no better than the old one is called a NULL HYPOTHESIS (H 0 ) Assuming that the new machine is better than the old one is called an ALTERNATIVE HYPOTHESIS (H 1 ) 3.Does a new drug for Hay-Fever work effectively? Assuming that the new drug does not work effectively is called a NULL HYPOTHESIS (H 0 ) Assuming that the new drug does work effectively called an ALTERNATIVE HYPOTHESIS (H 1 )

29 Claim % (H 0 ) is inside Claim % (H 0 ) is outside 95% confidence interval Reject Fail to Reject Fail to Reject Fail to Reject Fail to Reject Hypothesis test on a population proportion using Margin of Error

30 Go Fast Airlines provides internal flights in Ireland, short haul flights to Europe and long haul flights to America and Asia. Each month the company carries out a survey among 1000 passengers. The company repeatedly advertises that 70% of their customers are satisfied with their overall service. 664 of the sample stated they were satisfied with the overall service. Go Fast Airlines Example 1

31 Go Fast Airlines provides internal flights in Ireland, short haul flights to Europe and long haul flights to America and Asia. Each month the company carries out a survey among 1000 passengers. The company repeatedly advertises that 70% of their customers are satisfied with their overall service. 664 of the sample stated they were satisfied with the overall service. Would you say that the company were correct in saying that 70% of their customers were satisfied? State the null hypotheses and state your conclusions clearly. 70% 63.24%69.56% Reject

32 Your Turn

33 Question 1

34 Question 1: Solution 40% 39%45% Fail to Reject Fail to Reject

35 RTÉ claim that 60% of all viewers watch the Late Late Show every Friday night. An independent survey was carried out on 400 randomly selected viewers to see if the claim were true. The result of the survey was that 180 were watching the Late Late Show. I.Calculate the margin of error. II.State the Null and Alternative Hypothesis. III.Would you accept or reject the Null Hypothesis according to this survey? Give a reason for your conclusion. Question 2

36 Question 2: Solution 60% 40%50% Reject

37 Empirical Rule 68% 95% 99.7% What about 1·5 std devs or 0·8 std devs?

38 Night 2

39 Different sets of data have different means and standard deviations but any that are normally distributed have the same bell-shaped normal distribution type of curves. Normal Distribution Curve Standard Normal Curve In order to avoid unnecessary calculations and graphing the scale of a Normal Distribution curve is converted to a standard scale called the z score or standard unit scale. Normal Distribution to Standard Normal Distribution 471013161922 242254266278290302314 –3–3–2–2–1–10123 Normal Distributions Standard Normal Distribution

40

41 z – scores define the position of a score in relation to the mean using the standard deviation as a unit of measurement. z – scores are very useful for comparing data points in different distributions. The z – score is the number of standard deviations by which the score departs from the mean. This standardises the distribution. Standard Units (z – scores)

42 Reading z – values From Tables Example 1 –3–3–2–2–1–1 0 123 1.31 Pg. 36 Pg. 37

43 1.32 –3–3–2–2–1–1 0 123 z 0 Example 2 Pg. 36 Pg. 37

44 –3–3–2–2–1–1 0 123 –0.74 0 –zz Example 3 Pg. 36 Pg. 37

45 1.29–1.32 –3–3–2–2–1–1 0 123 3 –3–3–2–2–1–1 0 123 –3–3–2–2–1–1 0 12 1.29 Example 4 Pg. 36 Pg. 37

46 Your Turn

47 Question 1

48 8–38–3 23 –2 38 –1 53 0 68 1 83 2 98 3 74 1.4 47 –0.4 Question 1: Solution

49 30 –3 40 –2 50 –1 60 0 70 1 80 2 90 3 30 –3 40 –2 50 –1 60 0 70 1 80 2 90 3 45 –1.5 Question 2

50 30 –3 40 –2 50 –1 60 0 70 1 80 2 90 3 30 –3 40 –2 50 –1 60 0 70 1 80 2 90 3 45 –1.5 Question 2: Solution

51 30 –3 40 –2 50 –1 60 0 70 1 80 2 90 3 75 1.5 30 –3 40 –2 50 –1 60 0 70 1 80 2 90 3 72.8 1.28 Question 2: Solution

52 95% 2.5% Pg. 37 Pg. 36 For Higher Level Leaving Cert use z scores

53 95% confidence interval 95% confident that the population proportion is inside this confidence interval Confidence interval for population proportion

54 Example 1 Skygo provides Wifi in the Galway area. In March the company carries out a survey among 625 of its costumers. The company advertises that 60% of their customers were satisfied with their download speeds. 370 of the sample stated they were satisfied with their download speed time. Create a 95% confidence interval based on your sample. 55.36% 63.04% 95% confidence interval

55 Your Turn

56 The Sunday Independent reports that the government's approval rating is at 65%. The paper states that the poll is based on a random sample of 972 voters and that the margin of error is 3% Show that the pollsters used a 95% level of confidence. Question 1:

57 The Sunday Independent reports that the government's approval rating is at 65%. The paper states that the poll is based on a random sample of 972 voters and that the margin of error is 3% Show that the pollsters used a 95% level of confidence. Question 1: Solution

58 It is known that 30% of a certain kind of apple seed will germinate. In an experiment 85 out of 300 seeds germinated. Construct a 95% confidence interval for the sample proportion. Question 2 95% confidence interval

59 Sample Means Sample means

60 The data below are the heights in cm, of a population of 100, 15 year old students 170 174 164 152 155 160 172 156 163 182 167 158 154 167 140 143 167 178 165 176 165 166 177 148 147 166 184 165 162 185 171 168 173 175 160 167 172 179 153 180 172 164 178 153 152 167 165 174 145 155 150 158 162 166 163 159 148 170 154 181 155 165 180 168 158 175 176 166 165 170 175 158 160 177 166 180 165 166 168 180 157 153 150 179 157 161 152 161 144 174 172 165 157 174 159 Slide60

61 Slide61 It does not matter if the original distribution of the sample means will always be normally distributed. Use Java Applets.

62 A single sample of 5 data points. The black arrows are the data points. The mean of the sample is the red dot A single sample of 10 data points.

63

64

65

66 ICT - Shape Have the Summary Sheet open during the next few slides

67 ICT - Centre

68 ICT - Spread

69 Population Large SampleSample Means Mean Standard Deviation

70 http://onlinestatbook.com/stat_sim/sampling_dist/index.html KEY IDEA CLICK LINK BELOW Summary PopulationLarge SampleSample Means Mean Standard Deviation

71 In the Standard Normal Distribution we want the values of z 1 such that 95% of the population lies in the interval - z 1 ≤ z ≤ z 1 Therefore in a Normal Distribution 95% of the population lies within 1∙96 standard deviations of the mean. 95% of the population lies within 1∙96 of μ ( the population mean) Slide71

72 Example 1

73 Example 2

74

75 A study addressed the issue of whether pregnant women can correctly guess the sex of their baby. Among 104 recruited subjects, 57 correctly guessed the sex of the baby Use these sample data to test the claim that the success rate of such guesses is no different from the 50% success rate expected with random chance guesses. Use a 5% significance level. (based on data from “Are Women Carrying ‘Basketballs’ Really Having Boys? Testing Pregnancy Folklore,” by Perry, DiPietro, and Constigan, Birth, Vol. 26, No. 3) Solution: The original claim is that the success rate is no different from 50%. There is not sufficient evidence to warrant rejection of the claim that women who guess the sex of their babies have a success rate equal to 50%.

76 Your Turn

77

78 Night 3

79 Slide79

80 Often we need to make a decision about a population based on a sample. In a trial you are presumed innocent until after the trial? Assuming that an accused person is innocent ( nothing has happened) is called a NULL HYPOTHESIS (H 0 ) Assuming that an accused person is not innocent called an ALTERNATIVE HYPOTHESIS (H 1 ) 1. Is a coin which is tossed biased if we get a run of 8 heads in 10 tosses? Assuming that the coin is not biased is called a NULL HYPOTHESIS (H 0 ) Assuming that the coin is biased is called an ALTERNATIVE HYPOTHESIS (H 1 ) 2.During a 5 minute period a new machine produces fewer faulty parts than an old machine. Assuming that the new machine is no better than the old one is called a NULL HYPOTHESIS (H 0 ) Assuming that the new machine is better than the old one is called an ALTERNATIVE HYPOTHESIS (H 1 ) 3.Does a new drug for Hay-Fever work effectively? Assuming that there is no difference between the new drug and the current drug called a NULL HYPOTHESIS. ( H 0 ) Assuming that the new drug is better than the current most popular drug is called an ALTERNATIVE HYPOTHESIS. ( H 1 )

81 A Two Tailed Test. The critical values for a 5% level of significance z =  1∙96 or z = 1∙96 Slide81 Reject Fail to Reject Fail to Reject Fail to Reject Fail to Reject Testing the Null Hypothesis using z-values

82 The statistical method used to determine whether H 0 is true or not is called HYPOTHESIS TESTING. Statisticians speak of “not accepting or accepting H 0 at a certain level”. This level is called the LEVEL OF SIGNIFICANCE. ( 5% level of significance is on the syllabus). If the value of z lies outside the range  1∙96 < z < 1∙96 (critical region) we reject H 0. Reject Fail to Reject Fail to Reject Fail to Reject Fail to Reject Testing the Null Hypothesis using z-values

83

84

85 1.Write down the null hypothesis H 0 and the alternative hypothesis H 1 2.Convert the observed results into z units. (Calculate the test statistic). 3.Write down the critical values. (a sketch also helps). 4. Reject H 0 if z is in the critical regions, otherwise fail to reject H 0. Review of the steps involved in Hypothesis Testing:

86 Example 1

87 Reject Fail to Reject Fail to Reject Fail to Reject Fail to Reject − 2.7 is in the Reject Region Example 1

88 Example 2

89 Reject Fail to Reject Fail to Reject Fail to Reject Fail to Reject − 10.87 is in the Reject Region Example 2

90 Your Turn

91 Question 1

92 Question 1: Solution

93 Reject Fail to Reject Fail to Reject Fail to Reject Fail to Reject –4.5 is in the Reject Region Question 1: Solution

94 Example 2

95 Reject Fail to Reject Fail to Reject Fail to Reject Fail to Reject −1.24 is in the Fail to Reject Region Example 2

96 Example 3

97 Reject Fail to Reject Fail to Reject Fail to Reject Fail to Reject −3.48 is in the Reject Region Example 3

98 p-value at the 5% Significance Level p - value

99 Example 1

100

101 Reject Fail to Reject Fail to Reject Fail to Reject Fail to Reject 1.13 is in the fail to Reject Region

102 Two things to note: 1. The p-value means: what is the probability that the observed value (130) is this far away from the value I expected to get (128) because of sheer randomness? So a p-value of 0·26 means in this case that there is a 26% chance that the blood pressure will be 2 or more units (130–128 = 2) away from the population mean for a sample of this size, just because of random variation in sampling. This is not enough evidence to reject the null hypothesis – the 5% level of significance means that we only reject the null hypothesis if the probability that the observed value is this far away from the value I expected to get because of sheer randomness is less than 5%. So, at 26%, the chance that this variation was due to randomness is too high. 2. The z-score is doubled to get the p-value because we are doing a two-tailed test.

103 Example 2

104

105 Reject Fail to Reject Fail to Reject Fail to Reject Fail to Reject −2.53 is in the Reject Region Example 2

106 “The p-value is very small – there is only a 1.2% chance that the deviation from the 4 kg stated is due to sampling variability. This is very strong evidence for rejecting the company’s claim.” Example 2

107 Your Turn

108 Question 1

109 Question 1: Solution

110

111 Reject Fail to Reject Fail to Reject Fail to Reject Fail to Reject 1.466 is in the Fail to Reject Region Question 1: Solution

112

113 Question 2

114 Question 2: Solution

115 Reject Fail to Reject Fail to Reject Fail to Reject Fail to Reject − 2.83 is in the Reject Region We can conclude that there is evidence to suggest that the mean is different from the expected mean Question 2: Solution


Download ppt "Night 1. INFERENTIAL STATISTICS: USING THE SAMPLE STATISTICS TO INFER (TO) POPULATION PARAMETERS. Modular Course 5 Summary or Descriptive Statistics:"

Similar presentations


Ads by Google