Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.

Slides:



Advertisements
Similar presentations
Introduction Simple Random Sampling Stratified Random Sampling
Advertisements

Estimation of Means and Proportions
“Students” t-test.
Lesson 10: Linear Regression and Correlation
Previous Lecture: Distributions. Introduction to Biostatistics and Bioinformatics Estimation I This Lecture By Judy Zhong Assistant Professor Division.
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
Chapter 7 Statistical Data Treatment and Evaluation
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Sampling Distributions
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
The Simple Linear Regression Model: Specification and Estimation
Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.
Chapter 4 Multiple Regression.
Evaluating Hypotheses
Development of Empirical Models From Process Data
Sampling Distributions
BCOR 1020 Business Statistics Lecture 21 – April 8, 2008.
STAT262: Lecture 5 (Ratio estimation)
Estimation 8.
Ratio Estimation and Regression Estimation (Chapter 4, Textbook, Barnett, V., 1991) 2.1 Estimation of a population ratio:
Inferences About Process Quality
STAT 4060 Design and Analysis of Surveys Exam: 60% Mid Test: 20% Mini Project: 10% Continuous assessment: 10%
5-3 Inference on the Means of Two Populations, Variances Unknown
1 PREDICTION In the previous sequence, we saw how to predict the price of a good or asset given the composition of its characteristics. In this sequence,
McGraw-Hill Ryerson Copyright © 2011 McGraw-Hill Ryerson Limited. Adapted by Peter Au, George Brown College.
Linear Regression and Correlation
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 14 Analysis.
Copyright © Cengage Learning. All rights reserved. 12 Simple Linear Regression and Correlation.
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
CORRELATION & REGRESSION
Correlation.
RMTD 404 Lecture 8. 2 Power Recall what you learned about statistical errors in Chapter 4: Type I Error: Finding a difference when there is no true difference.
Lecture 14 Sections 7.1 – 7.2 Objectives:
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
Probabilistic and Statistical Techniques 1 Lecture 24 Eng. Ismail Zakaria El Daour 2010.
Copyright © 2009 Cengage Learning Chapter 10 Introduction to Estimation ( 추 정 )
PROBABILITY (6MTCOAE205) Chapter 6 Estimation. Confidence Intervals Contents of this chapter: Confidence Intervals for the Population Mean, μ when Population.
R. Kass/W03P416/Lecture 7 1 Lecture 7 Some Advanced Topics using Propagation of Errors and Least Squares Fitting Error on the mean (review from Lecture.
Section 8.1 Estimating  When  is Known In this section, we develop techniques for estimating the population mean μ using sample data. We assume that.
1 Ratio estimation under SRS Assume Absence of nonsampling error SRS of size n from a pop of size N Ratio estimation is alternative to under SRS, uses.
Sampling Design and Analysis MTH 494 Lecture-30 Ossam Chohan Assistant Professor CIIT Abbottabad.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Section Inference about Two Means: Independent Samples 11.3.
Copyright © Cengage Learning. All rights reserved. 2 Limits and Derivatives.
Sampling Design and Analysis MTH 494 LECTURE-12 Ossam Chohan Assistant Professor CIIT Abbottabad.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
Chapter 7 Sampling Distributions Statistics for Business (Env) 1.
© Copyright McGraw-Hill 2000
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Sampling Design and Analysis MTH 494 Lecture-22 Ossam Chohan Assistant Professor CIIT Abbottabad.
Copyright © 2013, 2009, 2005 Pearson Education, Inc. 1 4 Inverse, Exponential, and Logarithmic Functions Copyright © 2013, 2009, 2005 Pearson Education,
Chapter 8: Simple Linear Regression Yang Zhenlin.
College Algebra & Trigonometry
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
Uncertainties and errors
MBF1413 | Quantitative Methods Prepared by Dr Khairul Anuar 8: Time Series Analysis & Forecasting – Part 1
Sampling Design and Analysis MTH 494 Ossam Chohan Assistant Professor CIIT Abbottabad.
Sampling Design and Analysis MTH 494 LECTURE-11 Ossam Chohan Assistant Professor CIIT Abbottabad.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Statistics for Business and Economics 8 th Edition Chapter 7 Estimation: Single Population Copyright © 2013 Pearson Education, Inc. Publishing as Prentice.
Chapter 6: Random Errors in Chemical Analysis. 6A The nature of random errors Random, or indeterminate, errors can never be totally eliminated and are.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
R. Kass/Sp07P416/Lecture 71 More on Least Squares Fit (LSQF) In Lec 5, we discussed how we can fit our data points to a linear function (straight line)
Slide 1 Copyright © 2004 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-1 Overview Overview 10-2 Correlation 10-3 Regression-3 Regression.
Virtual University of Pakistan
Virtual University of Pakistan
Sampling Design and Analysis MTH 494
Sampling Design and Analysis MTH 494 Lecture-9
Copyright © Cengage Learning. All rights reserved.
Chapter 10 Correlation and Regression
Simple Linear Regression
Presentation transcript:

Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad

Estimators Ratio estimator of a population mean µ y : Estimated variance of 2

Bound on the error of estimation: Note that we don’t need to know τ x or N to estimate µ y when using the ratio procedure; however we must know µ x: 3

Association To remember the formulas for ratio estimation of a population mean, total, or ratio, we make the following association. The sample ratio r is given by the formula The estimator of R, τ y, µ y are then 4

Thus we need to know only the formula for r and its relation to And Approximate variances can be obtained if you remember the basic formula 5

6

Selecting the Sample Size We stated previously that the amount of information contained in the sample depends on the variation in the data (which is frequently controlled by the sample survey design) and the number of observations n included in the sample. Once the sampling procedure (design) has been chosen, the investigator must determine the number of elements to be drawn. We will consider the sample size required to estimate a population parameter R, µ y, or τ y to within B units for simple random sampling using ratio estimators. 7

Note that the procedure of choosing the sample size n is identical to that presented in unit 2. The number of observations required to estimate R, a population ratio, with a bound on the error of estimation of magnitude B is determine by setting two standard deviations of the ratio estimator r equal to B and solving this expression for n. That is, we must solve 8

For n. Although we have not discussed the form of V(r), you recall that, the estimated variance of r, is given by the formula 9

Equation 3.19 can be rewritten as In this instance we define 10

An approximate population variance, V(r), can be obtained from by replacing s2 with the corresponding population variance σ2. Thus the number of observation required to estimate R with a bound B on the error of estimation is determined by solving the following equation for n: 11

Sample size required to estimate R with a bound on the error of estimation B (3.22) 12

In a practical situation we are faced with a problem in determining the appropriate sample because we don’t not know σ 2. If no past information is available to calculate s 2 as an estimate of σ 2, we take a preliminary sample of size n’ and compute 13

Then we substitute this quantity of σ 2 in equation 3.22, and we find an approximate sample size. If µ x is also unknown, it can be replaced by the sample mean, calculated from the n’ preliminary observations. 14

Similarly we can determine the number of observations n needed to estimation a population mean µ y, with a bound on the error of estimation of magnitude B. The required sample size is found by solving the following equation for N: Stated differently, 15

Sample Size to estimate µ y with a bound on the error of estimation B (3.24) 16

Note that we need not to know the value of µ x, to determine n in equation 3.24; however we do need an estimate of σ 2, either from prior information if it is available or from information obtained in a preliminary study. 17

The sample size required to estimate τy with a bound on the error of estimation of magnitude B can be found by solving the following expression for n: Or equivalently, from equation

Sample size required to estimate τ y with a bound on the error of estimation B: (3.26) 19

When to use ratio estimation Use of ratio estimator is most effective when the relationship between the response y and a subsidiary variable x is linear through the origin and the variance of y is proportional to x. To understand above point, let us take an example 20

Understanding example An automobile tire distributor wishes to estimate the average cash receipts for his 1570 stores (N= 1570) during a particular sales period. From a simple random sample of n=50 stores, the corresponding cash receipts y i (i=1,2,3,…,50) are observed. One possible estimator of µ y, the average cash receipts for the company, is, the sample mean. 21

In addition to obtaining cash receipts y i, suppose the distributor can obtain xi (i=1,2,…, 50). The number of customers who made purchases in store I during the sales period. To determine the relationship between y and x, he can plot the sales and customer data for the n=50 samples stores. 22

If the plot is similar to the one presented in next slide, we can assume that the cash receipts y are linearly related to the number of customers purchasing good, x. in fact, we could depict this relationship with a straight line passing through the intersection of the x and y axes, and hence we can say it is linear through the origin. In addition, you will note from figure that the scatter of y values widens as x increases. Hence we can say that the variance of y is proportional to x. under these conditions the ratio estimator of µ y, the average amount of cash receipts per store, should have a smaller variance and, hence, be more precise than. 23

Figure of positive scatter plot 24

Sometimes, a plot of y versus x does not clearly indicate that ratio estimation should be used. The strength of the correlation ρ between y and x is another good indicator of effectiveness of the ratio estimator. For ρ>1/2, the ratio estimator should provide a more precise estimate of µ y or τ y than would or. 25

Unlike the estimation procedure discussed previously, ratio estimation usually leads to biased estimators. Thus we must consider the magnitude of the bias to decide which estimation procedure to use. Although there are no exact formulas to determine the bias of these estimators, it can be shown that the absolute value of the bias is less than or equal to the product of the standard deviation of the sample mean of subsidiary variable x and the standard deviation of the ratio estimator, all divided by µ x. That is 26

Where can be the ratio estimator r,, or, and is the corresponding parameter estimated. If estimates of,, and are known from prior experimentation, we can estimate maximum bias for a given physical situation by using equation (3.27). 27

Generally, for large sample size (n>30) and for ( )≤0.10, the bias is negligible. Note also that ration estimators are unbiased when the relationship between y and x is linear through the origin. Finally, we must consider the cost of obtaining information on the subsidiary variable x. if the physical situation suggests the use of ratio estimation, the experimenter must decide whether the increased precision of the ratio estimator justifies the additional cost. 28

Ratio Estimation in Stratified Random Sampling For the same reasons indicated in previous unit, stratifying the population before using a ratio estimator is sometimes advantageous. We will assume that we can take a large enough sample of both x’s and y’s in each stratum for the variance approximations to work fairly well. There are two different methods for constructing estimators of a ratio in stratified sampling. One is to estimate the ratio of µ x to µ y within each stratum and then form a weighted average of these separate estimates as a single estimate of the population ratio. The result of this procedure is called separate ratio estimator 29

The other method involves first estimating µ y by the usual and similarly estimating µ x by. Then ( ) can be used as an estimator of ( ). This estimator is called a combined ratio estimator. We will not introduce a general (and cumbersome) notation for these estimators but will illustrate their use by a numerical example. 30

Example 3.7 Refer to example 3.4. Treat the 10 observations given there on man-hours lost due to sickness as a simple random sample from company A. Thus A simple random sample of n B =10 measurements was taken from company B within the same industry. 31

(Assume companies A and B together form the population of workers of interest in this problem). The data are given in the accompanying table. It is known that NB=1500 employees and τ xB =12,800. Find the separate ratio estimate of µ y and its estimated variance 32

Table for data 33 EmployeeMan-hour lost in previous year, xMan-hours lost in current year, y Totals7846

Solution 34

Solution 35

Example 3.8 Refer to the data of example 3.7 and find a combined ration estimate of µ y. 36

Solution 37

Solution 38

Comparison between 3.7 and 3.8 On comparing 3.7 and 3.8, we see that the combined ration estimator gives the larger estiamted variance. This result is generally the case, and so we should emply the separate ratio estimator most of the time. However, the separate ratio estimator may have a larger bias since each stratum ratio estimate contributes to that bias. In summary, if the stratum sample sizes are large enough (say 20 or so) so that the separate ratios do not have large biases and so that the variance approximations work adequately, then use the separate ratio estimator. 39

If stratum sample sizes are very small, or if the within-stratum ratios are all approximately equal, then the combined ratio estimator may perform better. Of course, an estimator of the population total can be found by multiplying either of the estimators above by the population size N, and the variances can be adjusted accordingly. Thus we might use the notation 40

Regression Estimation We observed that the ratio estimator is most appropriate when the relationship between y and x is linear through the origin. If there is evidence of a linear relationship between the observed y’s and x’s, but not necessarily one that would pass through the origin, then this extra information provided by the auxiliary variable x may be taken into account through a regression estimator of the mean µ y. 41

One must still have knowledge of µ x before the estimator can be employed, as it was in the case of ratio estimation of µ y. The underlying line that shows the basic relationship between y’s and x’s is sometimes referred to as the regression line of y upon x. Thus the subscript L in the ensuing formulas is used to denote linear regression. 42

The estimator given in next section assumes the x’s to be fixed in advance and the y’s to be random variable. We can think of the x values as something that has already been observed, like last year’s first quarter earnings, and the y response as a random variable yet to be observed, such as the current quarterly earnings of a company for which x is already known. The probabilistic properties of the estimator then depend only on y for a given set of x’s. 43

Estimators Regression estimator of the population mean µ y. (3.28) Estimated Variance of (3.29) 44

Estimator Bound of the error of estimation: (3.30) When calculating b from observed pairs (y 1,x 1 ),…,(y n, x n ), we may use the fact that 45