Download presentation
Presentation is loading. Please wait.
1
Class Handout #3 (Sections 1.8, 1.9)
Definitions zAREA the z-score above which lies an area under the normal curve equal to the subscript AREA 1. Find each of the z-scores listed by using Table 2 of the Statistical Tables. 0.05 0.025 z0.05 = 1.645 z0.025 = 1.960 0.005 0.01 z0.005 = 2.576 z0.01 = 2.326
2
2. When a measurement is randomly selected from a population having a normal distribution, what is the probability that the z-score for this measurement will be less than z0.10? 1 – 0.10 = 0.90 3. For each of the normal probability distribution curves, find the indicated areas under the curve. area = ______ 0.05 area = ______ 0.90 area = ______ 0.05 – z0.05 = – 1.645 z0.05 = area = ______ 0.025 area = ______ 0.95 area = ______ 0.025 – z0.025 = – 1.960 z0.025 =
3
area = ______ 0.99 area = ______ 0.005 area = ______ 0.005 – z0.005 = – 2.576 z0.005 = area = ______ 0.01 area = ______ 0.98 area = ______ 0.01 – z0.01 = – 2.326 z0.01 = /2 1 – /2 area = ______ area = ______ area = ______ – z / 2 z / 2
4
Class Handout #3 Definitions zAREA
the z-score above which lies an area under the normal curve equal to the subscript AREA point estimation the use of the value of a statistic to estimate a parameter Examples are (1) (2) using x to estimate , using s to estimate . If the mean of the sampling distribution for a statistic is equal to the parameter being estimated, then the statistic is called unbiased; otherwise the statistic is called biased. With random sampling, x is an unbiased estimator of . interval estimation the use of an interval of values (often based on a statistic and a standard error) to estimate a parameter confidence interval an interval estimate together with a corresponding probability; the probability represents the chance that the interval actually contains the parameter being estimated and is called the confidence level or confidence coefficient. The most commonly chosen confidence levels are 90%, 95%, and 99%.
5
role of the Central Limit Theorem in finding a confidence interval for
The Central Limit Theorem tells us that when a random sample of size n is taken from a population that has a normal distribution with mean and standard deviation , then the sampling distribution of x has a normal distribution with mean and standard deviation —— n . This is even true when the population does not have a normal distribution, as long as the sample size n is sufficiently large. called the standard error of estimate or standard error of the mean There is a 95% chance that x will be within 2 (more precisely, ) standard errors of . 1.960 There is a (1 – )100% chance that x will be within standard errors of . That is, we can be (1 – )100% confident that population mean will lie between z/2 —— n —— . n x – z/2 and x + z/2 s —— . n Not knowing the value for , we estimate the standard error with With this estimated standard error of the mean, we must use a t distribution in place of a normal or z distribution.
6
Student’s t distribution
a distribution based on sample standard deviation s similar to the way the standard normal distribution is based on population standard deviation The t distributions (1) (2) (3) depend on degrees of freedom df (= n –1 for the one sample t test statistic); are symmetric and bell-shaped but flatter than the standard normal distribution; become more like the standard normal distribution as df increase. Table 3 of the Statistical Tables displays values from various t distributions. The concept of “degrees of freedom” is not easy to explain completely, but one intuitive explanation is to think of “degrees of freedom” as representing the “number of pieces of data observed” minus “the number of parameters being estimated”. When using a confidence interval to estimate a population mean , we observed n pieces of data (i.e., the n measurements in the selected random sample), and we are estimating 1 parameter (i.e., the population mean ). tAREA the t-score above which lies an area under a t curve equal to the subscript AREA; if the corresponding degrees of freedom (df) is not clear, the t-score can be represented as tdf ; AREA
7
4. Use Table 3 of the Statistical Tables to obtain each of the following: t distribution with df = 1 t distribution with df = 8 t0.10 = 3.078 t0.05 = 1.860 t distribution with df = 1 t distribution with df = 2 t0.025 = 12.706 t0.025 = 4.303 t distribution with df = 3 t distribution with df = 15 t0.025 = 3.182 t0.025 = 2.131 t distribution with df = 30 t distribution with df = t0.025 = 2.042 t0.025 = 1.960 = z0.025 The t-scores in the row are exactly the same as z-scores.
8
tAREA the t-score above which lies an area under a t curve equal to the subscript AREA; if the corresponding degrees of freedom (df) is not clear, the t-score can be represented as tdf ; AREA confidence interval for We can be (1 – )100% confident that the population mean is between s —— n s —— . n x – t/2 and x + t/2 For a given sample size, increasing the confidence level _______________ the confidence interval length, and decreasing the confidence level _______________ the confidence interval length. increases decreases For a given confidence level, increasing the sample size _______________ the confidence interval length, and decreasing the sample size _______________ the confidence interval length. tends to decrease tends to increase
9
5. (a) Forbes magazine published data on the best small firms in (Forbes, November 8, 1993, "America's Best Small Companies,"); these were firms with annual sales of more than $5 million and less than $350 million. The ages (in years) of the chief executive officer (CEO) for the first 20 firms listed are as follows: (This data is stored in the worksheet CEO_Data of the Excel file M214_Data.) Treating these 20 ages as a random sample of ages from the population of ages of chief executive officers for small companies, find a 95% confidence interval for the mean age of chief executive officers for small companies. — = 2 0.025 — = 2 0.025 n = x = s = 20 48.25 8.6382 1 – = 0.95 These statistics can all be verified by using the Excel spreadsheet named Summary_Statistics, df = t0.025 = 19 2.093 48.25 – (2.093)(8.6382/20) , (2.093)(8.6382/20) , We can be 95% confident that the mean age of chief executive officers for small companies is between and years.
10
5.-continued (b) (c) (d) What must we assume in order for the confidence in part (a) to be appropriate? We assume that either the ages are normally distributed, at least approximately, or the sample size 20 is sufficiently large so that the sampling distribution of y is approximately normal. How would the confidence interval in part (a) have been different if a 90% confidence level were chosen? The 90% confidence interval would have shorter length than the 95% confidence interval in part (a). How would the confidence interval in part (a) have been different if a 99% confidence level were chosen? The 99% confidence interval would have longer length than the 95% confidence interval in part (a).
11
(e) (f) How would the confidence interval in part (a) have been different if the sample size were 40 instead of 20? A 95% confidence interval based on a sample size of 40 would tend to have shorter length than a 95% confidence interval based on a sample size of 20. If we are willing to assume that the ages are normally distributed (at least approximately), how could we estimate an interval between which lie 95% of the ages of chief executive officers for small companies? We know that about 95% of the ages are within 2 (or more precisely 1.96) standard deviations of the mean. If we estimate the population mean and standard deviation with the sample mean and standard deviation, then we estimate that 95% of the ages of chief executive officers for small companies lie between 48.25 – (2)(8.6382) and (2)(8.6382) , that is, and years. (This type of interval can be called a prediction interval; notice how much wider this interval is than the confidence interval in part (a).)
12
After considering how to estimate a mean with a confidence interval, we now consider how to perform a hypothesis test about a mean. A hypothesis test is used when we have some hypothesized value for the mean prior to any data collection. Return to the definitions in Class Handout #3:
13
hypothesis testing an inferential statistical analysis used to decide which of two competing hypotheses should be believed (analogous to a court trial) Confidence intervals are a method of inferential statistics used when no hypothesized value about a parameter to be estimated exists prior any data analysis; however, when such a hypothesized value exists, hypothesis testing is a popular method of inferential statistics to decide if a statistically significant difference exists. (A hypothesis test can also tell us whether or not a relationship is statistically significant.) null hypothesis (H0) a statement assumed to be true at the outset of a hypothesis test; often, a statement that a parameter is equal to a specific hypothesized value (comparable to “innocence” in a court trial) alternative (research) hypothesis (H1) a statement for which sufficient evidence is required before it will be believed; often, a statement that the parameter is not equal to the hypothesized value (comparable to “guilt” in a court trial) one-sided hypothesis test Now let us go to Class Exercise 6(a). two-sided hypothesis test
14
6. (a) (b) (c) It is believed that the mean right hand grip strength of men between 20 and 40 years of age in the USA is 86.3 lbs. It is now of interest to perform a hypothesis test concerning the mean grip strength of men between 20 and 40 years of age in the country of Techavia. If we are looking for evidence that the mean grip strength in Techavia is different from 86.3 lbs., state the null and alternative hypotheses for the hypothesis test. H0: H1: = (The mean grip strength is 86.3 lbs.) (The mean grip strength is different from 86.3 lbs.) Is the hypothesis test one-sided or two-sided? Now look at the definitions for one-sided and two-sided tests. Describe what it would mean to make a Type I error in this hypothesis test and what it would mean to make a Type II error in this hypothesis test.
15
hypothesis testing an inferential statistical analysis used to decide which of two competing hypotheses should be believed (analogous to a court trial) Confidence intervals are a method of inferential statistics used when no hypothesized value about a parameter to be estimated exists prior any data analysis; however, when such a hypothesized value exists, hypothesis testing is a popular method of inferential statistics to decide if a statistically significant difference exists. (A hypothesis test can also tell us whether or not a relationship is statistically significant.) null hypothesis (H0) a statement assumed to be true at the outset of a hypothesis test; often, a statement that a parameter is equal to a specific hypothesized value (comparable to “innocence” in a court trial) alternative (research) hypothesis (H1) a statement for which sufficient evidence is required before it will be believed; often, a statement that the parameter is not equal to the hypothesized value (comparable to “guilt” in a court trial) one-sided hypothesis test a test designed to identify a difference from a hypothesized value in only one direction two-sided hypothesis test a test designed to identify a difference from a hypothesized value in either direction Even though hypothesis tests may be one-sided or two-sided, confidence intervals are generally two-sided (except for rare occasions).
16
6. (a) (b) (c) It is believed that the mean right hand grip strength of men between 20 and 40 years of age in the USA is 86.3 lbs. It is now of interest to perform a hypothesis test concerning the mean grip strength of men between 20 and 40 years of age in the country of Techavia. If we are looking for evidence that the mean grip strength in Techavia is different from 86.3 lbs., state the null and alternative hypotheses for the hypothesis test. H0: H1: = (The mean grip strength is 86.3 lbs.) (The mean grip strength is different from 86.3 lbs.) Is the hypothesis test one-sided or two-sided? Since we are looking for evidence that the population mean is different from the hypothesized value 86.3 in either direction, then the test is two-sided Describe what it would mean to make a Type I error in this hypothesis test and what it would mean to make a Type II error in this hypothesis test. Now look at the definitions for Type I and Type II error.
17
Type I error believing H1 (the alternative hypothesis) when in reality H0 (the null hypothesis) is true (in a court trial, saying that the defendant is guilty when the defendant is really innocent) Type II error believing H0 (the null hypothesis) when in reality H1 (the alternative hypothesis) is true (in a court trial, saying that the defendant is innocent when the defendant is really guilty) test statistic Now let us go to Class Exercise 6(c). significance level () rejection (critical) region p-value (probability value)
18
6. (a) (b) (c) It is believed that the mean right hand grip strength of men between 20 and 40 years of age in the USA is 86.3 lbs. It is now of interest to perform a hypothesis test concerning the mean grip strength of men between 20 and 40 years of age in the country of Techavia. If we are looking for evidence that the mean grip strength in Techavia is different from 86.3 lbs., state the null and alternative hypotheses for the hypothesis test. H0: H1: = (The mean grip strength is 86.3 lbs.) (The mean grip strength is different from 86.3 lbs.) Is the hypothesis test one-sided or two-sided? Since we are looking for evidence that the population mean is different from the hypothesized value 86.3 in either direction, then the test is two-sided Describe what it would mean to make a Type I error in this hypothesis test and what it would mean to make a Type II error in this hypothesis test. Making a Type I error means the mean grip strength is actually 86.3 lbs., but we mistakenly conclude that it is different from 86.3 lbs. Making a Type II error means the mean grip strength is actually different from 86.3 lbs., but we mistakenly conclude that it is equal to 86.3 lbs.
19
Type I error believing H1 (the alternative hypothesis) when in reality H0 (the null hypothesis) is true (in a court trial, saying that the defendant is guilty when the defendant is really innocent) Type II error believing H0 (the null hypothesis) when in reality H1 (the alternative hypothesis) is true (in a court trial, saying that the defendant is innocent when the defendant is really guilty) test statistic a statistic which is used to decide whether to believe H0 or to believe H1 significance level () It is the test statistic which provides us with evidence to make our decision in a hypothesis test. Now let us go to Class Exercise 6(d). rejection (critical) region p-value (probability value)
20
(d) Suppose we plan to measure each right hand grip strength in a random sample of 16 men from Techavia. If we assume that either the grip strengths are normally distributed or the sample size 16 is sufficiently large so that the sampling distribution of x is approximately normal, what test statistic would be appropriate for us to use to decide whether to believe H0 or to believe H1? x – 86.3 If H0 were true, then would be the t-score for x , where df = 15 , s —–– 16 and we expect this t-score to be within the bounds of random variation. If H0 were not true, then we would expect the t-score to be outside the bounds of random variation. Consequently, we can use this t-score as a test statistic to decide whether to believe H0 or to believe H1, but we need to choose specific bounds for what should be considered random variation.
21
Type I error believing H1 (the alternative hypothesis) when in reality H0 (the null hypothesis) is true (in a court trial, saying that the defendant is guilty when the defendant is really innocent) Type II error believing H0 (the null hypothesis) when in reality H1 (the alternative hypothesis) is true (in a court trial, saying that the defendant is innocent when the defendant is really guilty) test statistic a statistic which is used to decide whether to believe H0 or to believe H1 significance level () the highest probability of making a Type I error that we are willing to tolerate, commonly chosen to be 0.10, 0.05, or 0.01 With a given sample size n, the probability of making a Type II error increases as we decrease (the probability of making a Type I error). rejection (critical) region a set of test statistic values which lead to rejecting H0 in favor of H1 p-value (probability value)
22
6.-continued (e) Find the rejection region for the hypothesis test if (i) a 0.05 significance level were chosen. (ii) a 0.01 significance level were chosen. = 0.05 — = 2 0.025 — = 2 0.025 t distribution with df = 15 1 – = 0.95 – t0.025 = –2.131 t0.025 = 2.131 The rejection region is defined to be all test statistic values t > or t < – = 0.01 — = 2 0.005 — = 2 0.005 t distribution with df = 15 1 – = 0.99 – t0.005 = –2.947 t0.005 = 2.947 The rejection region is defined to be all test statistic values t > or t < –
23
(f) Suppose we actually measure each right hand grip strength in a random sample of 16 men from Techavia, and we find that x = 91.0 lbs. and s = 7.8 lbs. Find the test statistic value, and find the p-value for the hypothesis test. x – 86.3 91.0 – 86.3 The observed test statistic value is t (or t15) = = = 2.410 s —–– 16 7.8 —–– 16 Note that this observed test statistic provides us with sufficient evidence against the H0 (that is, t = is in the rejection region) with = 0.05. does not provide us with sufficient evidence against the H0 (that is, t = is in the rejection region) with = 0.01. Next class, we shall define and calculate the p-value.
24
Before submitting Homework #3, check some of the answers (if you haven’t done so already) from the link on the course schedule:
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.