Chapter 4 Statistical Inferences Estimation
Chapter 4 Statistical Inference Estimation -Confidence interval estimation for mean and proportion -Determining sample size Hypothesis Testing -Test for one and two means -Test for one and two proportions
Statistical Inference Statistical inference is a process of drawing an inference about the data statistically. It concerned in making conclusion about the characteristics of a population based on information contained in a sample. Since populations are characterized by numerical descriptive measures called parameters, therefore, statistical inference is concerned in making inferences about population parameters.
ESTIMATION In estimation, there are two terms that firstly, should be understand. The two terms involved in estimation are estimator and estimate. An estimate of a population parameter may be expressed in two ways: point estimate and interval estimate.
Point Estimate A point estimate of a population parameter is a single value of a statistic. For example, the sample mean is a point estimate of the population mean μ. Similarly, the sample proportion is a point estimate of the population proportion p.
Interval estimate An interval estimate is defined by two numbers, between which a population parameter is said to lie. For example, a < < b is an interval estimate of the population mean μ. It indicates that the population mean is greater than a but less than b.
Point estimators Choosing the right point estimators to estimate a parameter depends on the properties of the estimators it selves. There are four properties of the estimators that need to be satisfied in which it is considered as best linear unbiased estimators. The properties are: Unbiased Consistent Efficient Sufficient
Confidence Interval A range of values constructed from the sample data. So that the population parameter is likely to occur within that range at a specified probability. Specified probability is called the level of confidence. States how much confidence we have that this interval contains the true population parameter. The confidence level is denoted by (1-α)×100% Example :- 95% level of confidence would mean that if 100 confidence intervals were constructed, each based on the different sample from the same population, we would expect 95 of the intervals to contain the population mean.
To compute a confidence interval, we will consider two situations: i.We use sample data to estimate, μ with and the population standard deviation, σ is known. ii.We use sample data to estimate, μ with and the population standard deviation is unknown. In this case, we substitute the sample standard deviation (s) for the population standard deviation σ
Example 2.1: Find 95% confidence interval for a population mean for these values : a) b)
a) 1 st Step: 2 nd Step: Find from table.
3 rd Step: Use formula.
4rd step : Conclusion: 95% confidence interval of mean lies in between to
Example 2.3: The brightness of a television picture tube can be evaluated by measuring the amount of current required to achieve a particular brightness level. A random sample of 10 tubes indicated a sample mean microamps and a sample standard deviation is 15.7microamps. Find (in microamps) a 99% confidence interval estimate for mean current required to achieve a particular brightness level.
Solution: For 99% CI: From t normal distribution table:
Hence 99% CI Thus, we are confident that 99% of the mean current required to achieve a particular brightness level is between and
Exercise 2.1: Taking a random sample of 35 individuals waiting to be serviced by the teller, we find that the mean waiting time was 22.0 min and the standard deviation was 8.0 min. Using a 90% confidence level, estimate the mean waiting time for all individuals waiting in the service line. Answer : [ , ]
Exercise: The mean and standard deviation of the maximum loads supported by sample of 60 cables are given tons and 0.73 tons. Find 95% confidence interval of the mean of the maximum loads all cables produced by company.
Example 2.5: According to a poll, 40% of working women says that they feel stress in working. The poll was based on a randomly selected of 1502 working women aged 30 and above. Construct a 95% confidence interval for the corresponding population proportion.
Solution: Let p be the proportion of all working women age 30 and above, who have a limited amount of time to relax, and let pˆ be the corresponding sample proportion. From the given information, n = 1502, pˆ = 0.40, qˆ =1− pˆ = 1 – 0.40 = 0.60 Hence, 95% CI : Thus, we can state with 95% confidence that the proportion of all working women aged 30 and above who have stress is between 37.52% and 42.48%.
Exercise 2.3 In a random sample of 70 automobiles registered in a certain state, 28 of them were found to have emission levels that exceed a state standard. Find a 95% confidence interval for the proportion of automobiles in the state whose emission levels exceed the standard. Answer : [0.2852, ]
Error of estimation and choosing the sample size When we estimate a parameter, all we have is the estimate value from n measurements contained in the sample. There are two questions that usually arise: (i) How far our estimate will lie from the true value of the parameter? (ii) How many measurements should be considered in the sample?
The distance between an estimate and the estimated parameter is called the error of estimation. For example if most estimates are within 1.96 standard deviations of the true value of the parameter, then we would expect the error of estimation to be less than 1.96 standard deviations of the estimator, with the probability approximately equal to 0.95.
In the process of determining the sample size, we have to determine ethe parameter to be estimated and the standard error of its point estimator. Firstly, choose the bound, B on the margin of errror and confidence coefficient (1-α). Then, use the following equation to find for the suitable sample size, n. E = margin of error
Example 2.6: The college president asks the statistics teacher to estimate the average age of the students at their college. The statistics teacher would like to be 99% confident that the estimate should be accurate within 1 year. From the previous study, the standard deviation of the ages is known to be 3 years. How large a sample is necessary?
Solution: From the table,
Exercise 2.5: The diameter of a two years old Sentang tree is normally distributed with a Standard deviation of 8 cm. How many trees should be sampled if it is required to estimate the mean diameter within ± 1.5 cm with 95% confidence interval? Answer : 110 trees
EXERCISES
Exercise 2.6 A tire manufacturer wishes to investigate the tread life of its tires. A sample of 10 tires driven 50, 000 miles revealed a sample mean of 0.32 inches of tread remaining with a standard deviation of 0.09 inches. Construct a 95 percent confidence interval for the population mean. Would it be reasonable for the manufacturer to conclude that after 50, 000 miles the population mean amount of tread remaining is 0.30 inches? Answer : [0.2556, ]
Exercise: 2.8 The wedding ceremony for a couple, Jamie and Robbin will be held in Menara Kuala Lumpur. A survey has been carried out to determine the proportion of people who will come to the ceremony. From 250 invitations, only 180 people agree to attend the ceremony. Find a 90% confidence interval estimate for the proportion of all people who will attend the ceremony. Answer : [0.6733, ]
Chapter 4 Statistical Inferences Hypothesis Testing
WHY WE HAVE TO DO THE HYPOTHESIS? To make decisions about populations based on the sample information. Example :- we wish to know whether a medicine is really effective to cure a disease. So we use a sample of patients and take their data in effect of the medicine and make decisions. To reach the decisions, it is useful to make assumptions about the populations. Such assumptions maybe true or not and called the statistical hypothesis.
Definitions It is a process of using sample data and statistical procedures to decide whether to reject or not to reject the hypothesis (statement) about a population parameter value (or about its distribution characteristics). Hypothesis Test:
Generally this is a statement that a population has a specific value. The null hypothesis is initially assumed to be true. Therefore, it is the hypothesis to be tested. Null Hypothesis, H0 It is a statement about the same population parameter that is used in the null hypothesis and generally this is a statement that specifies that the population parameter has a value different in some way, from the value given in the null hypothesis. The rejection of the null hypothesis will imply the acceptance of this alternative hypothesis. Alternative Hypothesis, H1
It is a function of the sample data on which the decision is to be based. Test Statistic: It is a set of values of the test statistics for which the null hypothesis will be rejected. Critical/ Rejection region: It is the first (or boundary) value in the critical region. Critical point: The probability calculated using the test statistic. The smaller the p-value is, the more contradictory is the data to H0 P-value:
Procedure for hypothesis testing 1.Define the question to be tested and formulate a hypothesis for stating the problem. 2. Choose the appropriate test statistic and calculate the sample statistic value. The choice of test statistics is dependent upon the probability distribution of the random variable involved in the hypothesis. 3. Establish the test criterion by determining the critical value and critical region. 4. Draw conclusions, whether to accept or to reject the null hypothesis.
Two-Tailed Test Left-Tailed Test Right-Tailed Test Sign= <> Rejection RegionIn both tailIn the left tailIn the right tail Hypothesis tests for a normal population mean, μ Tails of a Test
Example 4.7: A sample of 50 Internet shoppers were asked how much they spent per year on Internet. From this sample, mean expenses per year on Internet is and sample standard deviation is It is desired to test whether they spend in mean expenses is RM32500 per year or not. Test at α = 0.05.
Solution: The hypothesis tested are: Test Statistic: Critical Value : As two tailed (=), so alpha has to divide by two,becomes : Rejection Region: Conclusion: The Internet Shoppers spend RM32500 per year on the Internet.
Example 4.8: A random sample of 10 individuals who listen to radio was selected and the hours per week that each listens to radio was determined. The data are follows: Test a hypothesis if mean hours individuals listen to radio is less than 8 hours at.
Solutions: The hypothesis tested are: Test Statistic: n < 30 Critical Value: Rejection Region: Conclusion : Mean hours individuals listen to radio is greater or equal to 8 hours.
Exercise 4.1: A paint manufacturing company claims that the mean drying time for its paint is at most 45 minutes. A random sample of 35 trials tested. It is found that the sample mean drying time is minutes with standard deviation 3 minutes. Assume that the drying times follow a normal distribution. At 1% significance level, is there any sufficient evidence to support the company claim? (Ans: , Reject)
Population Proportion, p Null hypothesisAlternative hypothesisRejection Region
Example When working properly, a machine that is used to make chips for calculators produce 4% defective chips. Whenever the machine produces more than 4% defective chips it needs an adjustment. To check if the machine is working properly, the quality control department at the company often takes sample of chips and inspects them to determine if they are good or defective. One such random sample of 200 chips taken recently from the production line contained 14 defective chips. Test at the 5% significance level whether or not the machine needs an adjustment.
Solutions: The hypothesis tested are: Test Statistic: Critical Value: Rejection Region : Conclusion : Machine needs adjustment
Exercise 4.3: A manufacturer of a detergent claimed that his detergent is least 95% effective is removing though stains. In a sample of 300 people who had used the detergent and 279 people claimed that they were satisfied with the result. Determine whether the manufacturer’s claim is true at 1% significance level. Answer: Do not Reject
EXERCISES
1. A paint manufacturing company claims that the mean drying time for its paint is at most 45 minutes. A random sample of 20 trials tested. It is found that the sample mean drying time is minutes with standard deviation 3 minutes. Assume that the drying times follow a normal distribution. (a)Construct a 99% confidence interval for the mean drying time of the paint. (b)At 5% significance level, is there any sufficient evidence to support the company claim? (c)Suppose that another manufacturing company wants to estimate the mean drying time for its paints at 95% confidence level. Given, what is the sample size of trials required in order to obtain an estimate that is within maximum error of 3 minutes?
2.A truck loaded with 8000 electronic circuit boards has just pulled into a firm’s receiving dock. The supplier claims that no more than 4% of the electronic circuit boards fall outside the most rigid level of industry performance specifications. In a simple random sample of 300 electronic circuit boards from the shipment, 15 fall outside these specifications. (a)Construct the 95% confidence interval for the percentage of all boards in this shipment that fall outside the specifications. (b)Test whether the supplier’s claim would appear to be correct at 10% significance level.