Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sampling and Sampling Distributions

Similar presentations


Presentation on theme: "Sampling and Sampling Distributions"— Presentation transcript:

1 Sampling and Sampling Distributions
Prof. Dr Hamit ACEMOĞLU 1

2 The Aim By the end of this lecture, the students will be aware of sampling and sampling distributions. 2

3 The Goals To explain why we are doing sampling.
Count the factors affecting example size. Explain the types of sampling Able to write SEM and SEP formulas. Explain SD and SEM usage areas. 3

4 While doing statistical analysis, generally we want to collect information and to comment about an entire population. However, to obtain data from the entire population, often not possible both time and ecnomically. Therefore we collect data from a sample to represent the population, and by using the data we make inferences about the population. 4 4

5 By examining only a part of the population we make a sampling error.
When we take a sample from the population, we can predict that, the sample cannot represent the population entirly. By examining only a part of the population we make a sampling error. In this lecture, using theoretical distributions we will learn how to calculate this error . 5 5

6 Factors affecting the sample size
Data type Categorical : The percentage or ratio Numerical : Average Spread Alpha ( α ) Significance Level The power of the test ( 1 - β ) Effects of Width ( Δ ) It is the smallest change amount we want to determine correctly at the end of the hypothesis test. In other words, the difference between the values specified in the null hypothesis and the alternative hypothesis. The size of the group ( N)

7 Sample size, to estimate the mass ratio
When the population size N unknown When the population size N known n: number of individual samples to be taken p: The incidence of the analyzed event t: t table value determined by the level of error and a certain degree of freedom d: the desired  deviation according to event incidance.

8 How many people should be included in this resaearch?
Example: Suppose that, malnutrition rate of p = 0.15 get found in a study conducted previously. A research investigator wants this value within ±0.05 ”d” limits, (value between ) And , level error =0.05 between these limits in other words, to make 95% reliability. How many people should be included in this resaearch?

9 Result: If the examination is requested of an event rate seen in population 0,15, 95% chance boundary between 0.10 to 0.20, at least "196" individuals should be worked on.

10 Sample size to estimate the average mass
When the population size N unknown When the population size N known : mass standard deviation d: according to the average desired ± deviation

11 Appropriate sampling method
Randomness in sampling For each sampling subject, equal chance must be given in terms of selection. In the case of the chance is not synchronized, the results will be biased, since the errors obtained from sampling will not be random. In order to achieve randomness, randomness conditions must be complied with. 11

12 Simple Random Sampling
Sampling Methods Probable samplinng Improbable sampling Quota sampling Snowball sampling Simple Random Sampling Stratified Sampling Cluster Sampling

13 Probability sampling methods
In probability sampling methods, equal chance must be given for examples of sampling units to be selected. By giving an equal chance to sample units, the protection of variability of population is provided in the sample. Thus, the ability of the sample to represent the population would have increased. In order to give equal chance to each sample units to be selected, random selection is done between population units. To ensure the randomness, table of random numbers or the random number generating software are used. 13

14 Simple random sampling
Simple random sample is a subset of individuals (a sample) chosen from a larger set (a population). Each individual is chosen randomly and entirely by chance, such that each individual has the same probability of being chosen at any stage during the sampling process. In this method, after determining the appropriate sample size, examples are selected using simple random sampling method. By calculating sample statistics, estimates for the population parameters are done. 14

15 Stratified sampling Stratification is the process of dividing members of the population into homogeneous subgroups before sampling. The strata should be mutually exclusive: every element in the population must be assigned to only one stratum. The strata should also be collectively exhaustive: no population element can be excluded. Then simple random sampling or systematic sampling is applied within each stratum. To get the best results from stratified sampling -Layers must be homogeneous in themselves -Layers must be heterogeneous between themselves 15

16 Cluster sampling This method is used when the subjects can not be listed in the population, therefor reaching individual subjects not possible. Cluster sampling is a sampling technique used when "natural" but relatively heterogeneous groupings are evident in a statistical population. In this technique, the total population is divided into these groups (or clusters) and a simple random sample of the groups is selected. The elements in each cluster are then sampled. In this method, samples made by selecting clustures instead of selecting subjects. 16

17 Sample variations If we were to take repeated samples of the same size from a population, it is unlikely that the estimates of the population parameter would be exactly the same in each sample. However, our estimates should all be close to the true value of the parameter in the population, and the estimates themselves should be similar to each other. By quantifying the variability of these estimates, we obtain information on the precision of our estimate and can thereby assess the sampling error. In reality, we usually only take one sample from the population. However, we still make use of our knowledge of the theoretical distribution of sample estimates to draw inferences about the population parameter. 17

18 Sampling distribution of the mean
We try to measure population mean. Suppose we are interested in estimating the population mean; we could take many repeated samples of size n from the population, and estimate the mean in each sample. A histogram of the estimates of these means would show their distribution. This is the sampling distribution of the mean. 18

19 Figure: Changes in the distribution of the number of various samples from the same population.
19 19

20 Ifthe sample size is reasonably large, the estimates of the mean follow a Normal distribution, whatever the distribution of the original data in the population (Central Limit Theorem). If the sample size is small, the estimates of the mean follow a Normal distribution provided the data in the population follow a Normal distribution. The mean of the estimates is an unbiased estimate of the true mean in the population, i.e. the mean of the estimates equals the true population mean. The variability of the distribution is measured by the standard deviation of the estimates; this is known as the standard error of the mean (SEM). If we know the population standard deviation (σ ), then the standard error of the mean is given by SEM = σ / √n 20 20

21 Where s is the standard deviation of the observations in the sample.
When we only have one sample, as is customary, our best estimate of the population mean is the sample mean, and because we rarely know the standard deviation in the population, we estimate the standard error of the mean by SEM = s / √n Where s is the standard deviation of the observations in the sample. The SEM provides a measure of the precision of our estimate. 21 21

22 Interpreting standard errors
A large standard error indicates that the estimate is imprecise. A small standard error indicates that the estimate is precise. The standard error is reduced, i.e. we obtain a more precise estimate, if: -the size of the sample is increased. -the data are less variable. 22

23 Standart deviation? Or standart error?
Although these two parameters seem to be similar, they are used for different purposes. The standard deviation describes the variation in the data values and should be quoted if you wish to illustrate variability in the data. In contrast, the standard error describes the precision of the sample mean, and should be quoted if you are interested in the mean of a set of data values. 23

24 Sampling distribution of proportion
We may be interested in the proportion of individuals in a population who possess some characteristic. Having taken a sample of size n from the population, our best estimate, p, of the population proportion, is given by: p = r/ n π: Mean of the population p: Population proportion n: Sample size from the population r: The number of individuals in the sample with the characteristic. If we were to take repeated samples of size n from our population and plot the estimates of the proportion as a histogram, the resulting sampling distribution of the proportion would approximate a Normal distribution with mean value π. The standard deviation of this distribution of estimated proportions is the standart error of the proportion. 24

25 When we take only a single sample, it is estimated by:
This provides a measure of the precision of our estimate of π; a small standard error indicates a precise estimate.

26 Examples Bir araştırmada 250 kişiden alınan kan örneklerinin biyokimyasal analizine göre ortalama açlık kan şekeri 85,7 mg/dl standart sapması 25,4 mg/dl bulunmuştur. Aynı araştırmada kişilerin %15’inde şeker hastalığı saptanmıştır. Ankete katılanların % 20’si şeker hastalığı hakkında bilgisini “iyi” olarak belirtirken % 15’i “hiç bilgisinin olmadığını” belirtmiştir. Paragrafta geçen veri tiplerini tartışın Açlık kan şekerinin SEM’ni hesaplayarak yorumlayın Şeker hastası olanların SEP’ini hesaplayarak yorumlayın Kan şekeri ortalaması ile birlikte SM mi yoksa SEM mi verelim? Neden? 26

27 Answers Veri tipleri Açlık kan şekeri ortalaması=nümerik
Şeker hastası olan kişi sayısı=Nominal Ankete katılanların şeker hastalığı hakkındaki bilgileri=Ordinal Sadece açlık kan şekeri ortalaması verilmiş ve örneklemde gruplar arası bir karşılaştırma yapılmadığından bu örnekte SEM verilmesi gerekir


Download ppt "Sampling and Sampling Distributions"

Similar presentations


Ads by Google