Download presentation
Presentation is loading. Please wait.
1
Topic 7 Sampling And Sampling Distributions
2
The term Population represents everything we want to study, bearing in mind that the population is ever changing and hence a dynamic concept. A Census is a snapshot of the population at any single point of time. For example, the last UK census was taken on the 29 th of April 2001. The Office of National Statistics (ONS) attempted to get a picture of everything that is relevant on that specific day!
3
A sample is usually a part or a fraction of the population and not the whole of the latter. The act of collecting samples is called sampling. Descriptive Statistics: Using the sample data to describe and draw conclusions about the sample only Inferential Statistics: Using the sample data to draw conclusions about the population
4
The statistician uses the information out of sample(s) to estimate population characteristics. Getting access to the entire population may be prohibitively costly and a sample is therefore taken.
5
Errors therefore occur naturally. A parameter is a defining characteristic of a population that can be quantified. The mean and standard deviation of the normal distribution are examples of parameters of the distribution.
6
The difference between the actual population characteristic and the corresponding estimate is called a sampling error. The process can be visualised as below:
7
Population
10
Parameter Summary Statistic ??
11
Statistic
12
Learning Objectives Determine when to use sampling instead of a census. Distinguish between random and nonrandom sampling. Be aware of the different types of error that can occur in a study. Understand the impact of the Central Limit Theorem on statistical analysis. Use the sampling distributions of x MEAN the sample mean and p (the sample proportion).
13
Reasons for Sampling Sampling can save money. Sampling can save time. For given resources, sampling can broaden the scope of the data set. Because the research process is sometimes destructive, the sample can save product. If accessing the population is impossible; sampling is the only option.
14
Reasons for Taking a Census Eliminate the possibility that a random sample is not representative of the population. The person authorizing the study is uncomfortable with sample information.
15
Random vs Nonrandom Sampling Random sampling Every unit of the population has the same probability of being included in the sample. A chance mechanism is used in the selection process. Eliminates bias in the selection process Also known as probability sampling
16
Random vs Nonrandom Sampling Nonrandom Sampling Every unit of the population does not have the same probability of being included in the sample. Open the selection bias Not appropriate data collection methods for most statistical methods Also known as nonprobability sampling
17
u Data from nonrandom samples are not appropriate for analysis by inferential statistical methods. u Sampling Error occurs when the sample is not representative of the population Errors
18
u Nonsampling Errors Missing Data, Recording, Data Entry, and Analysis Errors Poorly conceived concepts, unclear definitions, and defective questionnaires Response errors occur when people do not know, will not say, or overstate in their answers
19
The Central Limit Theorem (CLT) Whatever the population distribution, the distribution of the sample mean is normal( 2 /n ) as long as n is ‘large’
20
Proper analysis and interpretation of a sample statistic requires knowledge of its distribution. Sampling Distribution of the Sample Mean Process of Inferential Statistics
21
Distribution of Sample Means for Various Sample Sizes U Shaped Population n = 2n = 5n = 30 Normal Population n = 2n = 5n = 30
22
Topic 8 Point and Confidence Interval Estimation
23
The methodology we follow is known as Parametric Analysis
24
A parameter is a defining characteristic of a population that can be quantified. For example, the mean and standard deviation of the normal distribution are parameters of the distribution
25
Parameter Summary Statistic ??
26
. Parameter to be estimated. A Point Estimate For the Parameter [ ] A Confidence Interval For the Parameter
27
Three Properties of Point Estimators 1. Unbiasedness 2. Consistency 3. Efficiency
28
Parameter........ Estimator Although each estimator is way off target, together they may well give a good estimation
29
.......... Parameter Real Line + ++ + + - -- - (Unknown) This method of estimation is Unbiased if and only if the algebraic sum of all ‘errors’ is zero. Each deviation from the parameter is called an error And the ‘average’ of these errors is Called standard error of estimation
30
...... Question: Given the five piece dataset, which point represents the summary statistic ? Answer: The sample mean is the best of all available options
31
The Sampling Distribution of the Sample Mean (x MEAN ) Suppose that the population mean = 20 and consider the following statistical process Sample Number Value of x MEAN 1 18 2 24 3 21 - - 100 22
32
x MEAN The Sampling Distribution of x MEAN for ‘large’ samples
33
......................... Location of Parameter. Negatively biased estimator..... Positively biased estimator Unbiased estimator
34
Estimate Number Error 1 +6 2 +8 3 -10 4 +2 5 -6 Error 0 0 Although the first set of estimates (in red) have an average of zero, it is probably not as good as the second one (in green)
35
......................... An example of an unbiased yet inefficient estimator.
36
............................... Available Resources: R1 Available Resources: R2 Available Resources: R3 This estimator is consistent if R1< R2< R3.
37
Formally, an estimator b of a parameter ii s unbiased if and only if the average of the b values is exactly That is, E(b) = If E(b) then the estimator is biased and the difference E(b) is the bias of estimation
38
An estimator b of a parameter is efficient if and only if it has the smallest standard error of all unbiased estimators The standard error of estimation for estimator b (se b ) is given by (se b ) = E -b) 2
39
An estimator b of a parameter is consistent if and only if its standard error gets smaller as n gets larger
40
Distribution of Sample Means for Various Sample Sizes U Shaped Population n = 2n = 5n = 30 Normal Population n = 2n = 5n = 30
41
Z Formula for Sample Means
42
The standard error (s.e.) of estimation for x MEAN is given by s.e. = / n where is the population standard deviation and n is the sample size
43
This is the distribution for a ‘small’ value of n x MEAN x MEAN Sample Mean Density
44
x MEAN x MEAN Sample Mean This is the distribution for a ‘small’ value of n Density
45
x MEAN x MEAN Sample Mean Density This is the distribution for a ‘small’ value of n
46
As n gets larger x MEAN x MEAN Sample Mean Density
47
and larger…. x MEAN x MEAN Sample Mean Density
48
x MEAN x MEAN Sample Mean Density and larger….
49
x MEAN x MEAN Sample Mean Density and larger….
50
x MEAN The distribution gets more compact around the mean value ( x MEAN Sample Mean Density
51
The distribution gets more compact around the mean value ( x MEAN x MEAN Sample Mean Density
52
The distribution gets more compact around the mean value ( x MEAN x MEAN Sample Mean Density
53
The distribution of the sample mean (x MEAN ) for three sample sizes: n1 < n2 < n3 x MEAN Density Sample Size: n2 Sample Size: n1 Sample Size: n3
54
Summary 1.X MEAN is an unbiased estimator of the population mean E(X MEAN ) = 2. Standard error of X MEAN (s.e.) is given by s.e. = / n
55
3. X MEAN is an efficient estimator of the population mean . It has the smallest of all s.e values 4. X MEAN is a consistent estimator of the population mean . The s.e. value becomes smaller as the sample gets larger
56
The Central Limit Theorem (CLT) Whatever the population distribution, the distribution of the sample mean is normal( 2 /n ) as long as n is ‘large’
57
E Frequency Density of E. E Estimator Value E is an unbiased estimator
58
E Frequency Density of E. E Estimator Value E is a negatively biased estimator
59
E Frequency Density of E. E Estimator Value E is a positively biased estimator
60
All three are unbiased. E1 is the most Efficient, E3 is the least Frequency Density Estimator E2 Estimator E3 Estimator E1. E1,E2,E3 Describe these estimators
61
Both of E2 and E3 are unbiased but less efficient than E1. E1 is the most efficient, but it is positively biased. Frequency Density Estimator E2 Estimator E3 Estimator E1. E1,E2,E3 Describe these estimators
62
Each is a negatively biased estimator.. E1 is the most efficient of the three and E3 the least. Frequency Density Estimator E2 Estimator E3 Estimator E1. E1,E2,E3 Describe these estimators
63
Confidence Interval (CI) Sometimes, it is possible and convenient to predict, with a certain amount of confidence in the prediction, that the true value of the parameter lies within a specified interval. Such an interval is called a Confidence Interval (CI)
64
The statement ‘ [ L, H ] is the 95% CI of ’ is to be interpreted that with 95% chance the population mean lies within the specified interval and with 5% chance it lies outside.
65
z The area shaded orange is approximately 98% of the whole -2.33 0 +2.33
66
z The area shaded orange is approximately 95% of the whole -1.96 0 +1.96
67
Example1 (Confidence Interval for the sample mean): Suppose that the result of sampling yields the following: x MEAN = 25 ; n = 36. Use this information to construct a 95% CI for , given that = 16
68
Since n >24, we can say that x MEAN is approximately N( , 2 /36). Standardisation means that (x MEAN - )/( /6) is approximately z. Now find the two symmetric points around 0 in the z table such that the area is 0.95. The answer is z = 1.96.
69
Now solve (x MEAN - )/( 6) = 1.96. (25- )/(16/6) = 1.96 to get two values of = 19.77 and = 30.23. Thus the 95% CI for is [19.77 30.23]
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.