Download presentation
Presentation is loading. Please wait.
Published byAshley Joseph Modified over 9 years ago
1
Sampling Error
2
When we take a sample, our results will not exactly equal the correct results for the whole population. That is, our results will be subject to errors. Sampling error: A sample is a subset of a population. Because of this property of samples, results obtained from them cannot reflect the full range of variation found in the larger group (population). This type of error, arising from the sampling process itself, is called sampling error which is a form of random error. Sampling error can be minimized by increasing the size of the sample. When n = N ⇒ sampling error = 0
3
Non-sampling error (bias) It is a type of systematic error in the design or conduct of a sampling procedure which results in distortion of the sample, so that it is no longer representative of the reference population. We can eliminate or reduce the non-sampling error (bias) by careful design of the sampling procedure and not by increasing the sample size.
4
Sources of non sampling errors: Accessibility bias, volunteer bias, etc. The best known source of bias is non response. It is the failure to obtain information on some of the subjects included in the sample to be studied. Non response results in significant bias when the following two conditions are both fulfilled. 1. When non-respondents constitute a significant proportion of the sample (about 15% or more) 2. When non-respondents differ significantly from respondents.
5
There are several ways to deal with this problem and reduce the possibility of bias: 1. Data collection tools (questionnaire) have to be pre-tested. 2. If non response is due to absence of the subjects, repeated attempts should be considered to contact study subjects who were absent at the time of the initial visit. 3. To include additional people in the sample, so that non respondents who were absent during data collection can be replaced (make sure that their absence is not related to the topic being studied).
6
ESTIMATION
7
The sample from a population is used to provide the estimates of the population parameters. A parameter is a numerical descriptive measure of a population ( μ is an example of a parameter). A statistic is a numerical descriptive measure of a sample ( X is an example of a statistic). To each sample statistic there corresponds a population parameter. We use X, S2, S, p, etc. to estimate μ, σ2, σ, P (or π), etc
8
Sample statisticCorresponding population parameter X (sample mean)μ (population mean) S2 ( sample variance)σ2 ( population variance) S (sample Standard deviation)σ(population standard deviation) p ( sample proportion)P or π (Population proportion)
9
Sampling Distribution of Means
10
Sampling Distribution is a frequency distribution and it has its own mean and standard deviation. Steps: 1. Obtain a sample of n observations selected completely at random from a large population. Determine their mean and then replace the observations in the population. 2. Obtain another random sample of n observations from the population, determine their mean and again replace the observations.
11
1. Repeat the sampling procedure indefinitely, calculating the mean of the random sample of n each time and subsequently replacing the observations in the population. 2. The result is a series of means of samples of size n. If each mean in the series is now treated as an individual observation and arrayed in a frequency distribution, one determines the sampling distributionof means of samples of size n.
12
Because the scores ( X s) in the sampling distribution of means are themselves means (of individual samples), we shall use the notation σ X for the standard deviation of the distribution. Standard error of mean (SEM): The standard deviation of the sampling distribution of means is called the standard error of the mean. Formula: σ x = √ Ʃ ( x i - μ)2 / N
13
Properties of sampling distribution 1. The mean of the sampling distribution of means is the same as the population mean, μ. 2. The SD of the sampling distribution of means is σ / √n 3. The shape of the sampling distribution of means is approximately a normal curve, regardless of the shape of the population distribution and provided n is large enough (Central limit theorem).
14
Confidence interval
15
Interval Estimation (large samples) A point estimate does not give any indication on how far away the parameter lies. A more useful method of estimation is to compute an interval which has a high probability of containing the parameter. An interval estimate is a statement that a population parameter has a value lying between two specified limits.
16
Confidence interval Confidence interval provides an indication of how close the sample estimate is likely to be to the true population value. Gives an estimated range of values which is likely to include the true value of the unknown population parameter with a certain confidence (probability) and the estimated range being calculated from a given set of sample data. Consider the standard normal distribution and the statement Pr (-1.96≤ Z ≤1.96) =. 95. it means that 95% of the standard normal curve lies between + 1.96 and –1.96.
17
Formula: Pr( X - 1.96(σ /√n) ≤ μ ≤ X + 1.96(σ /√n) ) =.95 The range X -1.96(σ /√n) to X + 1.96(σ /√n) ) is called the 95% confidence interval; X -1.96(σ /√n) is the lower confidence limit while X + 1.96(σ /√n) is the upper confidence limit
18
Few things to remember At 90%, the corresponding Z score to be used in the formula is 1.64 At 95%, the corresponding Z score to be used in the formula is 1.96 99%, the corresponding Z score is 2.58
20
problem 1 : Suppose x= 50, SD = 10 and N=100. what is the 99% confidence interval? CI lower, X –2.58 (σ /√n) = 47 CI upper, X + 2.58 (σ /√n) = 53
21
Example 1 The mean fasting blood sugar of a group of 70 individuals was found to be 115 with a SD of 12.56. Find the 95% and 99% CI’s for the population mean. Solution: SE (mean) = (12.56/√70) = 1.51 95% CI = 115 ± 1.96 (1.51) = (112.05, 117.95)
22
Interpretation of 95% CI: Probabilistic Interpretation: In repeated sampling, approximately 95 percent of the intervals constructed will include the population mean. Practical Interpretation: One can say with 95 percent confidence that the population mean fasting blood sugar is between 112.05 and 117.95
23
Example 2 In a study, it was found that 129 out of 150 carcinoma of lung patients were smokers. Find the 95% & 99% CI’s for the proportion of smokers among lung cancer patients. Solution: SE (Proportion) = √ (0.86)(0.14)/150) = 0.028 95% CI = 0.86 ± 1.96 (0.028) = (0.8, 0.91) Similarly, 99% CI is (0.932, 0.788)
24
Example 3 In a study to assess the effect of anabolic steroids in weight gain, the following data was observed Find the 95% CI for the difference in the mean weight gain? Solution: SE (Diff. in mean) = √(21.2 2 /50)+(9 2 /50)) = 3.257 CI= (3.7 – 3.1) + (1.96 x 3.257) 95% CI = (-5.78, 6.98) GroupnMean weight gainSD of Weight Study503.721.2 Contro l 503.19
25
Example 4 In a study to assess the effect of BCG, the following data was observed Find the 99% CI for the difference in proportion? Solution: SE (Diff. in Proportion) = √((0.0088)(0.9912)/2500+(0.03)(0.97)/3000) = 0.363 99% CI = ( -0.03, 0.019) Actual difference between proportion = 3.00-0.88 = 2.12 BCGnTB developedDisease rate Vaccinated2500220.88% Unvaccinated3000903.00%
26
Factors affecting the width of confidence interval Variation in the data (standard deviation): more the SD, more the confidence interval Sample size : as N increases, confidence interval decreases Level of Confidence: more the confidence level (90%, 95% and 99%), more the Confidence interval
27
THANK YOU
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.