QBM117 Business Statistics Statistical Inference Sampling 1
Objectives To give an overview of the nest topic, statistical inference. To understand that importance of correct sampling techniques. To introduce different sampling techniques. 2
Populations and Samples A population is the entire collection of items bout which information is desired. A sample is a subset of the population that we collect data from. 3
Parameters and Statistics A parameter is number that describes a population. -A parameter is a fixed number. A statistic is a number that describes a sample. -A statistic is a random variable whose value changes from sample to sample. 4
Statistical Inference Population parameters are almost always unknown. We take a random sample from the population of interest and calculate the sample statistic. We then use the sample statistic as an estimate of the population parameter. Statistical Inference involves drawing conclusions about a population based on sample information. 5
Sampling Distributions Sample statistics are random variables. The probability distribution of a sample statistic is called its sampling distribution. We us the sampling distribution to make inferences about the population parameters. 6
Estimation and Hypothesis Testing There are two types of statistical inference - Estimation - Hypothesis Testing Estimation is appropriate when we want to estimate a population parameter. Hypothesis testing is appropriate when we want to assess some claim about a population based on the evidence provided by a sample. 7
Sampling Sampling is the process of selecting a sample from a population. Samples may be selected in a variety of ways. The sample should be representative of the population. This is best achieved by random sampling. 8
Random Sampling A sample is random if every member of the population has an equal chance of being selected in the sample. Most statistical techniques assume that random samples are used. We will look at three types types of random sampling. 9
Simple Random Sampling A simple random sample is a sample in which each member of the population is equally likely to be included. The easiest way to generate a simple random sample is to use a random number generator. 10
Example: Generating a Simple Random Sample A government income-tax auditor is responsible for 1000 tax returns. The auditor wants to randomly select 40 tax returns to audit. Each tax return in the population of 1000 is given a number from 1 to We then use Excel’s random number generator to select the random sample of 40 tax returns. 11
50 numbers uniformly distributed between 0 and 1 X(1000) Round-up 50 Random numbers between 0 and 1000, each has a probability of 1/1000 to be selected 50 integral random numbers between 1 and 1000 uniformly distributed The auditor will select returns numbered 383, 101, 597,... 12
Stratified Random Sampling A stratified random sample is obtained by dividing the population into homogeneous groups and drawing a simple random sample from each group. The homogenous groups are called strata. Not only can acquire information about the whole population, we can also make inferences within each stratum or compare strata. 13
Example: Generating a Stratified Random Sample Suppose the Internal Revenue Service wants to estimate the median amounts of deductions taxpayers claim in different categories, e.g. property taxes, charitable donations, etc. These amounts vary greatly over the taxpayer population. Therefore a simple random sample will not be very efficient. 14
The taxpayers can be divided into strata based on their adjusted gross incomes, and a separate SRS can be drawn from each individual strata. Because the deductions generally increase with incomes, the resulting stratified random sample would require a much smaller total sample size to provide equally precise estimates. 15
There are several ways to build the stratified random sample. One of them is to maintain the proportion of each stratum in the population, in the sample. A sample of size 1000 is to be drawn. 16 Stratum Income Population proportion 1 under $15,000 25% ,000-29,999 40% ,00030%300 4over $50,000 5% 50 Stratum size Total 1000
Cluster Sampling Cluster sampling groups the population into small clusters, draws a simple random sample of clusters, and observes everything in the sampled clusters. It is useful when it is difficult or costly to develop a complete list of the population members. It is also useful whenever the population elements are widely dispersed geographically. 17
Errors Involved in Sampling Two types of errors occur when sampling from a population - sampling error - non-sampling error 18
Sampling Error Sampling error is the error that arises because the data are collected from part, rather than the whole of the population. Whenever we make inferences about a population based on information from a sample there will naturally be some degree of error. The larger the sample, the smaller the sampling error. 19
- population mean income Sampling error 20
Non-Sampling Error Non-sampling errors are due to errors in data acquisition, non-response error and selection bias. These type of errors are more serious than sampling errors as increasing the sample size will not help to reduce them. 21
Errors in Data Acquisition These types of errors occur during data collection and processing. –Faulty equipment may lead to incorrect measurements being taken. –Data may be recorded incorrectly. –Processing errors may occur. 22
If this observation is wrongly recorded here Then the sample mean is affected Sampling error + Data acquisition error Population Sample Data Acquisition Error 23
Non-Response Error Non-response error is the error introduced when responses are not obtained from some members of the sample. The sample observations that are collected may not be representative of the population. This results in biased results. 24
Non-Response Error Population Sample No response here...May lead to biased results here 25
Selection Bias Selection bias occurs when some members of the population cannot possibly selected for inclusion in the sample. For example, surveying voters by randomly selecting telephone numbers is biased as voters who do not have a telephone cannot possibly be selected in the sample. 26
Selection Bias Population Sample When parts of the population cannot be selected... the sample cannot represent the whole population 27
Reading for next lecture Chapter 7, Section 7.5 Exercises