Presentation is loading. Please wait.

Presentation is loading. Please wait.

Topic 7 Sampling And Sampling Distributions. The term Population represents everything we want to study, bearing in mind that the population is ever changing.

Similar presentations


Presentation on theme: "Topic 7 Sampling And Sampling Distributions. The term Population represents everything we want to study, bearing in mind that the population is ever changing."— Presentation transcript:

1 Topic 7 Sampling And Sampling Distributions

2 The term Population represents everything we want to study, bearing in mind that the population is ever changing and hence a dynamic concept. A Census is a snapshot of the population at any single point of time. For example, the last UK census was taken on the 29 th of April 2001. The Office of National Statistics (ONS) attempted to get a picture of everything that is relevant on that specific day!

3 A sample is usually a part or a fraction of the population and not the whole of the latter. The act of collecting samples is called sampling. Descriptive Statistics: Using the sample data to describe and draw conclusions about the sample only Inferential Statistics: Using the sample data to draw conclusions about the population

4 The statistician uses the information out of sample(s) to estimate population characteristics. Getting access to the entire population may be prohibitively costly and a sample is therefore taken.

5 Errors therefore occur naturally. A parameter is a defining characteristic of a population that can be quantified. The mean  and standard deviation  of the normal distribution are examples of parameters of the distribution.

6 The difference between the actual population characteristic and the corresponding estimate is called a sampling error. The process can be visualised as below:

7 Population

8

9

10 Parameter Summary Statistic ??

11 Statistic

12 Learning Objectives Determine when to use sampling instead of a census. Distinguish between random and nonrandom sampling. Be aware of the different types of error that can occur in a study. Understand the impact of the Central Limit Theorem on statistical analysis. Use the sampling distributions of x MEAN the sample mean and p (the sample proportion).

13 Reasons for Sampling Sampling can save money. Sampling can save time. For given resources, sampling can broaden the scope of the data set. Because the research process is sometimes destructive, the sample can save product. If accessing the population is impossible; sampling is the only option.

14 Reasons for Taking a Census Eliminate the possibility that a random sample is not representative of the population. The person authorizing the study is uncomfortable with sample information.

15 Random vs Nonrandom Sampling Random sampling Every unit of the population has the same probability of being included in the sample. A chance mechanism is used in the selection process. Eliminates bias in the selection process Also known as probability sampling

16 Random vs Nonrandom Sampling Nonrandom Sampling Every unit of the population does not have the same probability of being included in the sample. Open the selection bias Not appropriate data collection methods for most statistical methods Also known as nonprobability sampling

17 u Data from nonrandom samples are not appropriate for analysis by inferential statistical methods. u Sampling Error occurs when the sample is not representative of the population Errors

18 u Nonsampling Errors Missing Data, Recording, Data Entry, and Analysis Errors Poorly conceived concepts, unclear definitions, and defective questionnaires Response errors occur when people do not know, will not say, or overstate in their answers

19 The Central Limit Theorem (CLT) Whatever the population distribution, the distribution of the sample mean is normal(   2 /n ) as long as n is ‘large’

20 Proper analysis and interpretation of a sample statistic requires knowledge of its distribution. Sampling Distribution of the Sample Mean Process of Inferential Statistics

21 Distribution of Sample Means for Various Sample Sizes U Shaped Population n = 2n = 5n = 30 Normal Population n = 2n = 5n = 30

22 Topic 8 Point and Confidence Interval Estimation

23 The methodology we follow is known as Parametric Analysis

24 A parameter is a defining characteristic of a population that can be quantified. For example, the mean  and standard deviation  of the normal distribution are parameters of the distribution

25 Parameter Summary Statistic ??

26 . Parameter to be estimated. A Point Estimate For the Parameter [ ] A Confidence Interval For the Parameter

27 Three Properties of Point Estimators 1. Unbiasedness 2. Consistency 3. Efficiency

28 Parameter........ Estimator Although each estimator is way off target, together they may well give a good estimation

29 .......... Parameter Real Line + ++ + + - -- - (Unknown) This method of estimation is Unbiased if and only if the algebraic sum of all ‘errors’ is zero. Each deviation from the parameter is called an error And the ‘average’ of these errors is Called standard error of estimation

30 ...... Question: Given the five piece dataset, which point represents the summary statistic ? Answer: The sample mean is the best of all available options

31 The Sampling Distribution of the Sample Mean (x MEAN ) Suppose that the population mean  = 20 and consider the following statistical process Sample Number Value of x MEAN 1 18 2 24 3 21 - - 100 22

32 x MEAN  The Sampling Distribution of x MEAN for ‘large’ samples

33 ......................... Location of Parameter. Negatively biased estimator..... Positively biased estimator Unbiased estimator

34 Estimate Number Error 1 +6 2 +8 3 -10 4 +2 5 -6 Error 0 0 Although the first set of estimates (in red) have an average of zero, it is probably not as good as the second one (in green)

35 ......................... An example of an unbiased yet inefficient estimator.

36 ............................... Available Resources: R1 Available Resources: R2 Available Resources: R3 This estimator is consistent if R1< R2< R3.

37 Formally, an estimator b of a parameter ii s unbiased if and only if the average of the b values is exactly  That is, E(b) =  If E(b)  then the estimator is biased and the difference E(b)  is the bias of estimation

38 An estimator b of a parameter  is efficient if and only if it has the smallest standard error of all unbiased estimators  The standard error of estimation for estimator b (se b ) is given by (se b ) =  E  -b) 2

39 An estimator b of a parameter  is consistent if and only if its standard error gets smaller as n gets larger 

40 Distribution of Sample Means for Various Sample Sizes U Shaped Population n = 2n = 5n = 30 Normal Population n = 2n = 5n = 30

41 Z Formula for Sample Means

42 The standard error (s.e.) of estimation for x MEAN is given by s.e. =  /  n where  is the population standard deviation and n is the sample size

43  This is the distribution for a ‘small’ value of n x MEAN x MEAN  Sample Mean Density

44  x MEAN x MEAN  Sample Mean This is the distribution for a ‘small’ value of n Density

45  x MEAN x MEAN  Sample Mean Density This is the distribution for a ‘small’ value of n

46  As n gets larger x MEAN x MEAN  Sample Mean Density

47  and larger…. x MEAN x MEAN  Sample Mean Density

48  x MEAN x MEAN  Sample Mean Density and larger….

49  x MEAN x MEAN  Sample Mean Density and larger….

50  x MEAN The distribution gets more compact around the mean value (  x MEAN  Sample Mean Density

51  The distribution gets more compact around the mean value (  x MEAN x MEAN  Sample Mean Density

52  The distribution gets more compact around the mean value (  x MEAN x MEAN  Sample Mean Density

53  The distribution of the sample mean (x MEAN ) for three sample sizes: n1 < n2 < n3 x MEAN Density Sample Size: n2 Sample Size: n1 Sample Size: n3

54 Summary 1.X MEAN is an unbiased estimator of the population mean  E(X MEAN ) =  2. Standard error of X MEAN (s.e.) is given by s.e. =  /  n

55 3. X MEAN is an efficient estimator of the population mean . It has the smallest of all s.e values 4. X MEAN is a consistent estimator of the population mean . The s.e. value becomes smaller as the sample gets larger

56 The Central Limit Theorem (CLT) Whatever the population distribution, the distribution of the sample mean is normal(   2 /n ) as long as n is ‘large’

57 E Frequency Density of E. E  Estimator Value E is an unbiased estimator

58 E Frequency Density of E. E  Estimator Value E is a negatively biased estimator

59 E Frequency Density of E. E  Estimator Value E is a positively biased estimator

60 All three are unbiased. E1 is the most Efficient, E3 is the least Frequency Density Estimator E2 Estimator E3 Estimator E1. E1,E2,E3 Describe these estimators

61 Both of E2 and E3 are unbiased but less efficient than E1. E1 is the most efficient, but it is positively biased. Frequency Density Estimator E2 Estimator E3 Estimator E1. E1,E2,E3 Describe these estimators

62 Each is a negatively biased estimator.. E1 is the most efficient of the three and E3 the least. Frequency Density Estimator E2 Estimator E3 Estimator E1. E1,E2,E3 Describe these estimators

63 Confidence Interval (CI)  Sometimes, it is possible and convenient to predict, with a certain amount of confidence in the prediction, that the true value of the parameter lies within a specified interval.  Such an interval is called a Confidence Interval (CI)

64 The statement ‘ [  L,  H ] is the 95% CI of  ’ is to be interpreted that with 95% chance the population mean lies within the specified interval and with 5% chance it lies outside.

65 z The area shaded orange is approximately 98% of the whole -2.33 0 +2.33

66 z The area shaded orange is approximately 95% of the whole -1.96 0 +1.96

67 Example1 (Confidence Interval for the sample mean): Suppose that the result of sampling yields the following: x MEAN = 25 ; n = 36. Use this information to construct a 95% CI for , given that  = 16

68  Since n >24, we can say that x MEAN is approximately N( ,  2 /36).  Standardisation means that (x MEAN -  )/(  /6) is approximately z.  Now find the two symmetric points around 0 in the z table such that the area is 0.95. The answer is  z =  1.96.

69  Now solve  (x MEAN -  )/(  6) =  1.96.  (25-  )/(16/6) =  1.96 to get two values of  = 19.77 and  = 30.23. Thus the 95% CI for  is [19.77 30.23]


Download ppt "Topic 7 Sampling And Sampling Distributions. The term Population represents everything we want to study, bearing in mind that the population is ever changing."

Similar presentations


Ads by Google