Download presentation
1
Statistics Sampling
2
STATISTICS in PRACTICE
Cinergy, formerly Cincinnati Gas & Electric Company (CG&E), is a public utility that provides gas and electric power to customers in the Greater Cincinnati area. To improve service to its customers, Cinergy continually strives to stay up-to-date with its customers’ needs.
3
STATISTICS in PRACTICE
Cinergy is using the survey results to improve the forecasts of energy demand and to improve service to its commercial customers.
4
Contents Terminology Used in Sample Surveys
Types of Surveys and Sampling Methods Survey Errors Simple Random Sampling Stratified Simple Random Sampling Cluster Sampling Systematic Sampling
5
Terminology Used in Sample Surveys
An element is the entity on which data are collected. A population is the collection of all elements of interest. A sample is a subset of the population. The target population is the population we want to make inferences about.
6
Terminology Used in Sample Surveys
The sampled population is the population from which the sample is actually selected. These two populations are not always the same. If inferences from a sample are to be valid, the sampled population must be representative of the target population.
7
Terminology Used in Sample Surveys
Example: Dunning Microsystems, Inc. (DMI), a manufacturer of personal computers and peripherals, would like to collect data about the characteristics of individuals who purchased a DMI personal computer. A sample survey of DMI personal computer owners could be conducted.
8
Terminology Used in Sample Surveys
The elements in this sample survey would be individuals who purchased a DMI personal computer. The population would be the collection of all people who purchased a DMI personal computer. The sample would be the subset of DMI personal computer owners who are surveyed.
9
Terminology Used in Sample Surveys
The target population consists of all people who purchased a DMI personal computer. The sampled population, however, might be all owners who sent warranty registration cards back to DMI. Not every person who buys a DMI personal computer sends in the warranty card, so the sampled population would differ from the target population.
10
Terminology Used in Sample Surveys
The population is divided into sampling units which are groups of elements or the elements themselves. A list of the sampling units for a particular study is called a frame.
11
Terminology Used in Sample Surveys
The choice of a particular frame is often determined by the availability and reliability of a list. The development of a frame can be the most difficult and important steps in conducting a sample survey.
12
Terminology Used in Sample Surveys
Example: suppose we want to survey certified professional engineers who are involved in the design of heating and air conditioning systems for commercial buildings If a list of all professional engineers were available, the sampling units would be the professional engineers we want to survey.
13
Terminology Used in Sample Surveys
If such a list is NOT available, a business telephone directory might provide a list of all engineering firms. we could select a sample of the engineering firms to survey; then, for each firm surveyed, we might interview all the professional engineers.
14
Types of Surveys Surveys Involving Questionnaires
Three common types are mail surveys, telephone surveys, and personal interview surveys. Survey costs are lower for mail and telephone surveys. With well-trained interviewers, higher response rates and longer questionnaires are possible with personal interviews. The design of the questionnaire is critical.
15
Types of Surveys Surveys Not Involving Questionnaires
Often, someone simply counts or measures the sampled items and records the results. An example is sampling a company’s inventory of parts to estimate the total inventory value.
16
Sampling Methods Nonprobabilistic Sampling Probabilistic Sampling
17
Non-probabilistic Sampling Methods
The probability of obtaining each possible sample to be computed. Statistically valid statements cannot be made about precision of the estimates. Sampling cost is lower and implementation is easier Methods include convenience and judgment sampling.
18
Non-probabilistic Sampling Methods
Convenience Sampling The units included in the sample are chosen because of accessibility. In some cases, convenience sampling is the only practical approach.
19
Non-probabilistic Sampling Methods
Judgment Sampling A knowledgeable person selects sampling units that he/she feels are most representative of the population. The quality of the result is dependent on the person selecting the sample. Generally, no statistical statement should be made about the precision of the result.
20
Nonprobabilistic Sampling Methods
Example Convenience sampling: professor conducting a research study at a university may ask student volunteers to participate in the study simply because they are in the professor’s class.
21
Probabilistic Sampling Methods
The probability of obtaining each possible sample can be computed. Confidence intervals can be developed which provide bounds on the sampling error. Methods include simple random, stratified simple random, cluster, and systematic sampling.
22
Survey Errors Two types of errors can occur in conducting a survey :
Sampling error Nonsampling error
23
Survey Errors Sampling Error
It is defined as the magnitude of the difference between the point estimate, developed from the sample, and the population parameter. It occurs because not every element in the population is surveyed. It cannot occur in a census. It can not be avoided, but it can be controlled.
24
Survey Errors Nonsampling Error
It can occur in both a census and a sample survey. Examples include: Measurement error Errors due to nonresponse Errors due to lack of respondent knowledge Selection error Processing error
25
Survey Errors Nonsampling Error Measurement Error
Measuring instruments are not properly calibrated. People taking the measurements are not properly trained.
26
Survey Errors Nonsampling Error Errors Due to Nonresponse
They occur when no data can be obtained, or only partial data are obtained, for some of the units surveyed. The problem is most serious when a bias is created.
27
Survey Errors Nonsampling Error
Errors Due to Lack of Respondent Knowledge These errors on common in technical surveys. Some respondents might be more capable than others of answering technical questions.
28
Survey Errors Nonsampling Error Selection Error
An inappropriate item is included in the survey. For example, in a survey of “small truck owners” some interviewers include SUV owners while other interviewers do not.
29
Survey Errors Nonsampling Error Processing Error
Data is incorrectly recorded. Data is incorrectly transferred from recording forms to computer files.
30
Simple Random Sampling
A simple random sample of size n from a finite population of size N is a sample selected such that every possible sample of size n has the same probability of being selected. We begin by developing a frame or list of all elements in the population. Then a selection procedure, based on the use of random numbers, is used to ensure that each element in the sampled population has the same probability of being selected.
31
Simple Random Sampling
We will see in the upcoming slides how to: Estimate the following population parameters: Population mean Population total Population proportion Determine the appropriate sample size
32
Simple Random Sampling
In a sample survey it is common practice to provide an approximate 95% confidence interval estimate of the population parameter. Assuming the sampling distribution of the point estimator can be approximated by a normal probability distribution, we use a value of t = 2 for a 95% confidence interval.
33
Simple Random Sampling
The interval estimate is: Point Estimator +/- 2(Estimate of the Standard Error of the Point Estimator) The bound on the sampling error is: 2(Estimate of the Standard Error of the Point Estimator)
34
Simple Random Sampling
Population Mean Point Estimator Estimate of the Standard Error of the Mean
35
Simple Random Sampling
Population Mean Interval Estimate Approximate 95% Confidence Interval Estimate
36
Simple Random Sampling
Population Total Point Estimator Estimate of the Standard Error of the Total
37
Simple Random Sampling
Population Total Interval Estimate Approximate 95% Confidence Interval Estimate
38
Simple Random Sampling
Population Proportion Point Estimator Estimate of the Standard Error of the Proportion
39
Simple Random Sampling
Population Proportion Interval Estimate Approximate 95% Confidence Interval Estimate
40
Determining the Sample Size
An important consideration is choice of sample size. The best choice usually involves a tradeoff between cost and precision (size of the confidence interval). Larger samples provide greater precision, but are more costly. A budget might dictate how large the sample can be. A specified level of precision might dictate how small a sample can be.
41
Simple Random Sampling
Smaller confidence intervals provide more precision. The size of the approximate confidence interval depends on the bound B on the sampling error. Choosing a level of precision amounts to choosing a value for B. Given a desired level of precision, we can solve for the value of n.
42
Simple Random Sampling
Necessary Sample Size for Estimating the Population Mean Hence,
43
Simple Random Sampling
Necessary Sample Size for Estimating the Population Total
44
Simple Random Sampling
Necessary Sample Size for Estimating the Population Proportion
45
Simple Random Sampling
Example: Steddy Investments Ben Steddy is a financial advisor for 200 clients. A sample of 40 clients has been taken to obtain various demo- graphic data and information about the clients’ investment objectives. Statistics of partic- ular interest are the clients’ net worth and the proportion favoring fixed income investments.
46
Simple Random Sampling
Example: Steddy Investments For the sample, the mean net worth was $480,000 (with a standard deviation of $120,000), and the proportion favoring fixed-income invest- ments was .30.
47
Simple Random Sampling
Point Estimate of Total Net Worth (TNW) $ ( ) , X N x = 200 480 96 000 thousand $96,000,0 00 X $ = N x = 200 ( 480 ) = 96 , 000 thousand = $96,000,0 00 Estimate of Standard Error of TNW = $3,394,113 Approximate 95% Confidence Interval for TNW = $89,211,774 to $102,788,226
48
Simple Random Sampling
Point Estimate of Population Proportion Favoring Fixed-Income Investments Estimate of Standard Error of Proportion Approximate 95% Confidence Interval
49
Simple Random Sampling
Example: Steddy Investments One year later Steddy wants to again survey his clients. He now has 250 clients and wants to set a bound of $30,000 on the error of the estimate of their mean net worth. What is the necessary sample size?
50
Simple Random Sampling
Necessary Sample Size Steddy will need a sample size of 51.
51
Stratified Simple Random Sampling
The population is first divided into H groups, called strata. Then for stratum h, a simple random sample of size nh is selected. The data from the H simple random samples are combined to develop an estimate of a population parameter.
52
Stratified Simple Random Sampling
If the variability within each stratum is smaller than the variability across the strata, a stratified simple random sample can lead to greater precision The basis for forming the various strata depends on the judgment of the designer of the sample.
53
Stratified Simple Random Sampling
Example: The College of Business at Lakeside College wants to conduct a survey of this year’s graduating class to learn about their starting salaries. There are five majors in the college: N = 1500 students who graduated this year, N1 = 500 accounting majors, N2 = 350 finance majors, N3 = 200 information systems majors, N4 = 300 marketing majors, and N5 = 150 operations management majors
54
Stratified Simple Random Sampling
A stratified simple random sample of n = 180 students is selected; n1 = 45, 45 of the 180 students majored in accounting, n2 = 40, 40 majored in finance, n3 = 30, 30 majored in information systems, n4= 35, 35 majored in marketing , and n5 = 30, 30 majored in operations management.
55
Stratified Simple Random Sampling
Example: ChemTech International ChemTech International has used stratified simple random sampling to obtain demographic information and preferences regarding health care coverage for its employees and their families. The population of employees has been divided into 3 strata on the basis of age: under 30, 30-49, and 50 or over. Some of the sample data is shown on the next slide.
56
Stratified Simple Random Sampling
Demographic Data Annual Family Dental Expense Proportion Married Stratum Nh nh Mean St.Dev. Under $ $ 50 or Over
57
Stratified Simple Random Sampling
Population Mean Point Estimator where: H = number of strata = sample mean for stratum h Nh = number of elements in the population in stratum h N = total number of elements in the population (all strata)
58
Stratified Simple Random Sampling
Population Mean Estimate of the Standard Error of the Mean
59
Stratified Simple Random Sampling
Population Mean Interval Estimate Approximate 95% Confidence Interval Estimate
60
Stratified Simple Random Sampling
The College of Business at Lakeland College provided the sample means for each major, or stratum, are $35,000 for accounting, $33,500 for finance, $41,500 for information systems, $32,000 for marketing, and $36,000 for operations management.
61
Stratified Simple Random Sampling
Point estimate of the population mean the standard error
62
Stratified Simple Random Sampling
A 95% confidence interval estimate of the population mean is 35, (138) = 35, , or $34,741 to $35,293.
63
Stratified Simple Random Sampling
Point Estimate of Mean Annual Dental Expense = $375 Estimate of Standard Error of Mean = $9.27
64
Stratified Simple Random Sampling
Approximate 95% Confidence Interval for Mean Annual Dental Expense
65
Stratified Simple Random Sampling
Population Total Point Estimator Estimate of the Standard Error of the Total
66
Stratified Simple Random Sampling
Population Total Interval Estimate Approximate 95% Confidence Interval Estimate
67
Stratified Simple Random Sampling
Point Estimate of Total Family Expense For All Employees Approximate 95% Confidence Interval = $169,318 to $186,932
68
Stratified Simple Random Sampling
Population Proportion Point Estimator where: H = number of strata = sample proportion for stratum h Nh = number of elements in the population in stratum h N = total number of elements in the population (all strata)
69
Stratified Simple Random Sampling
Population Proportion Estimate of the Standard Error of the Proportion
70
Stratified Simple Random Sampling
Population Proportion Interval Estimate Approximate 95% Confidence Interval Estimate
71
Stratified Simple Random Sampling
Example: the Lakeland College survey, The college wants to know the proportion of graduates receiving a starting salary of $36,000 or more.
72
Stratified Simple Random Sampling
The results of the sample survey of 180 graduates show that 63 received starting salaries of $36,000 or more 16 of the 63 majored in accounting, 3 majored in finance, 29 majored in information systems, 0 majored in marketing, and 15 majored in operations management.
73
Stratified Simple Random Sampling
The point estimate of the proportion receiving starting salaries of $36,000 or more The standard error;
74
Stratified Simple Random Sampling
Point Estimate of Proportion Married Estimate of Standard Error of Proportion = .0417 Approximate 95% Confidence Interval for Proportion =.5903 to .7571
75
Stratified Simple Random Sampling
Sample Size When Estimating Population Mean
76
Stratified Simple Random Sampling
Sample Size When Estimating Population Total
77
Stratified Simple Random Sampling
Sample Size When Estimating Population Proportion
78
Stratified Simple Random Sampling
Proportional Allocation of Sample n to the Strata
79
Cluster Sampling Cluster sampling requires that the population be divided into N groups of elements called clusters. We would define the frame as the list of N clusters. We then select a simple random sample of n clusters. We would then collect data for all elements in each of the n clusters.
80
Cluster Sampling Cluster sampling tends to provide better results than stratified sampling when the elements within the clusters are heterogeneous. A primary application of cluster sampling involves area sampling, where the clusters are counties, city blocks, or other well-defined geographic sections.
81
Cluster Sampling Example: We want to survey registered voters in the state of Ohio One approach is to develop a frame consisting of all registered voters in the state of Ohio and then select a simple random sample of voters from this frame
82
Cluster Sampling Alternatively, in cluster sampling, we might choose to define the frame as the list of the N = 88 counties in the state. Each county or cluster would consist of a group of registered voters, and each registered voter in the state would belong to one and only one cluster.
83
Cluster Sampling Counties of The State of Ohio Used As Clusters of Registered Voters
84
Cluster Sampling Notation N = number of clusters in the population
n = number of clusters selected in the sample Mi = number of elements in cluster i M = number of elements in the population M = average number of elements in a cluster xi = total of all observations in cluster i ai = number of observations in cluster i with a certain characteristic
85
Cluster Sampling Population Mean Point Estimator
Estimate of Standard Error of the Mean
86
Cluster Sampling Population Mean Interval Estimate
Approximate 95% Confidence Interval Estimate
87
Cluster Sampling Population Total Point Estimator
Estimate of the Standard Error of the Total
88
Cluster Sampling Population Total Interval Estimate
Approximate 95% Confidence Interval Estimate
89
Cluster Sampling Population Proportion Point Estimator
90
Cluster Sampling Population Proportion
Estimate of the Standard Error of the Proportion
91
Cluster Sampling Population Proportion Interval Estimate
Approximate 95% Confidence Interval Estimate
92
Cluster Sampling Example: A survey conducted by the CPA (certified public accountant) Society of the 12,000 practicing CPAs in a particular state. The CPA Society used a cluster sample to minimize the total travel and interviewing cost
93
Cluster Sampling The frame consisted of all CPA firms that were registered to practice accounting in the state. Suppose there are N = 1000 clusters, or CPA firms, A simple random sample of n = 10 CPA firms is to be selected.
94
Cluster Sampling Results of CPA Sample Survey
We have N = 1000, n = 10, M = 12,000, = 12,000/1000 = 12
95
Cluster Sampling An estimate of the mean salary is the standard error
A 95% confidence interval estimate for the mean annual salary is 51, (1979) = 51, or $47,292 to $55,208.
96
Cluster Sampling Example: Cooper County Schools
There are 40 high schools in Cooper County. School officials are interested in the effect of participation in athletics on academic preparation for college. American History A cluster sample of 5 schools has been taken and a questionnaire administered to all the seniors on the basketball teams at those schools. There are a total of 1200 high school seniors in the county playing basketball.
97
Cluster Sampling Data Obtained From the Questionnaire Number
of Players Average SAT Score Number Planning to Attend College School 173 84
98
Cluster Sampling Point Estimate of Population Mean SAT Score
99
Cluster Sampling Estimate of Standard Error of the
Point Estimator of Population Mean
100
Cluster Sampling Approximate 95% Confidence Interval Estimate of the Population Mean SAT Score
101
Cluster Sampling Point Estimator of Population Total SAT Score
Estimate of Standard Error of the Point Estimator of Population Total
102
Cluster Sampling Approximate 95% Confidence Interval Estimate of the Population Total SAT Score = 1,075, to 1,099,834.72
103
Cluster Sampling Point Estimate of Population Proportion
Planning to Attend College
104
Cluster Sampling Estimate of Standard Error of the Point Estimator of the Population Proportion
105
Cluster Sampling Approximate 95% Confidence Interval Estimate of the Population Proportion Planning College = to
106
Systematic Sampling Systematic Sampling is often used as an alternative to simple random sampling which can be time-consuming if a large population is involved. If a sample size of n from a population of size N is desired, we might sample one element for every N/n elements in the population. We would randomly select one of the first N/n elements and then select every (N/n)th element thereafter. Since the first element selected is a random choice, a systematic sample is often assumed to have the properties of a simple random sample.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.