Presentation is loading. Please wait.

Presentation is loading. Please wait.

Md. Ashraful Islam Khan, PhD

Similar presentations


Presentation on theme: "Md. Ashraful Islam Khan, PhD"— Presentation transcript:

1 Md. Ashraful Islam Khan, PhD
Data Collection Techniques Md. Ashraful Islam Khan, PhD Department of Population Science and Human Resource Development, Rajshahi University. 29 July, 2018

2 What is sampling? A shortcut method for investigating a whole population Data is gathered on a small part of the whole parent population or sampling frame, and used to inform what the whole picture is like

3 SAMPLING STUDY POPULATION SAMPLE TARGET POPULATION

4 Target Population: The population to be studied/ to which the investigator wants to generalize his results Sampling Unit: smallest unit from which sample can be selected Sampling frame List of all the sampling units from which sample is drawn Sampling scheme Method of selecting sampling units from sampling frame

5 Why Sample the Population?
To contact the whole population would be time consuming. The cost of studying all the items in a population may be prohibitive. The physical impossibility of checking all items in the population. The destructive nature of some tests. The sample results are adequate.

6 Process The sampling process comprises several stages:
Defining the population of concern Specifying a sampling frame, a set of items or events possible to measure Specifying a sampling method for selecting items or events from the frame Determining the sample size Implementing the sampling plan Sampling and data collecting Reviewing the sampling process

7 Population definition
A population can be defined as including all people or items with the characteristic one wishes to understand. Because there is very rarely enough time or money to gather information from everyone or everything in a population, the goal becomes finding a representative sample (or subset) of that population.

8 Types of Samples Probability (Random) Samples Non-Probability Samples
Simple random sample Systematic random sample Stratified random sample Multistage sample Multiphase sample Cluster sample Non-Probability Samples Convenience sample Purposive sample Quota Two general approaches to sampling are used in social science research. With probability sampling, all elements (e.g., persons, households) in the population have some opportunity of being included in the sample, and the mathematical probability that any one of them will be selected can be calculated. With nonprobability sampling, in contrast, population elements are selected on the basis of their availability (e.g., because they volunteered) or because of the researcher's personal judgment that they are representative. The consequence is that an unknown portion of the population is excluded (e.g., those who did not volunteer). One of the most common types of nonprobability sample is called a convenience sample – not because such samples are necessarily easy to recruit, but because the researcher uses whatever individuals are available rather than selecting from the entire population. Because some members of the population have no chance of being sampled, the extent to which a convenience sample – regardless of its size – actually represents the entire population cannot be known

9 Probability versus Nonprobability
Probability Samples: each member of the population has a known non-zero probability of being selected Methods include random sampling, systematic sampling, and stratified sampling. Nonprobability Samples: members are selected from the population in some nonrandom manner Methods include convenience sampling, judgment sampling, quota sampling, and snowball sampling

10 Simple Random Sampling
The purest form of probability sampling. Assures each element in the population has an equal chance of being included in the sample Random number generators When there are very large populations, it is often ‘difficult’ to identify every member of the population, so the pool of available subjects becomes biased. Sample Size Probability of Selection = Population Size

11 Simple random sampling

12 Table of random numbers

13 Advantages Disadvantages Minimal knowledge of population needed
External validity high; internal validity high; statistical estimation of error Easy to analyze data Disadvantages High cost; low frequency of use Requires sampling frame Does not use researchers’ expertise Larger risk of random error than stratified

14 Systematic Sampling Systematic sampling is often used instead of random sampling. It is also called an nth name selection technique. After the required sample size has been calculated, every nth record is selected from a list of population members. As long as the list does not contain any hidden order, this sampling method is as good as the random sampling method. Its only advantage over the random sampling technique is simplicity.

15 Systematic Sampling An initial starting point is selected by a random process, and then every nth number on the list is selected n=sampling interval The number of population elements between the units selected for the sample Error: periodicity- the original list has a systematic pattern

16 Systematic sampling

17 Advantages Disadvantages Moderate cost; moderate usage
External validity high; internal validity high; statistical estimation of error Simple to draw sample; easy to verify Disadvantages Periodic ordering Requires sampling frame

18 Stratified Sampling A stratified random sample is obtained by separating the population into mutually exclusive sets, or strata, and then drawing simple random samples from each stratum Sub-samples are randomly drawn from samples within different strata that are more or less equal on some characteristic

19 Why Stratified Sampling?
Can reduce random error More accurately reflect the population by more proportional representation

20 STRATIFIED SAMPLING Draw a sample from each stratum
We can acquire about the total population, make inferences within a stratum or make comparisons across strata

21 Advantages Disadvantages
Assures representation of all groups in sample population needed Characteristics of each stratum can be estimated and comparisons made Reduces variability from systematic Disadvantages Requires accurate information on proportions of each stratum Stratified lists costly to prepare

22 Cluster Sampling A cluster sample is a simple random sample of groups or clusters of elements (vs. a simple random sample of individual objects). This method is useful when it is difficult or costly to develop a complete list of the population members or when the population elements are widely dispersed geographically. Used more in the “old days”. Cluster sampling may increase sampling error due to similarities among cluster members.

23 Cluster Sampling The primary sampling unit is not the individual element, but a large cluster of elements. Either the cluster is randomly selected or the elements within are randomly selected Why? Frequently used when no list of population available or because of cost Ask: is the cluster as heterogeneous as the population? Can we assume it is representative?

24 Cluster sampling Section 1 Section 2 Section 3 Section 5 Section 4

25 Advantages Disadvantages Low cost/high frequency of use
Requires list of all clusters, but only of individuals within chosen clusters Can estimate characteristics of both cluster and population For multistage, has strengths of used methods Disadvantages Larger error for comparable size than other probability methods Multistage very expensive and validity depends on other methods used

26 Difference Between Strata and Clusters
Although strata and clusters are both non-overlapping subsets of the population, they differ in several ways. All strata are represented in the sample; but only a subset of clusters are in the sample. With stratified sampling, the best survey results occur when elements within strata are internally homogeneous. However, with cluster sampling, the best results occur when elements within clusters are internally heterogeneous .

27 Non-Probability Sampling Methods
Convenience Sample The sampling procedure used to obtain those units or people most conveniently available Why? Speed and cost External validity? Internal validity Is it ever justified?

28 CONVENIENCE SAMPLING Use results that are easy to get 28 28

29 Advantages Disadvantages Very low cost Extensively used/understood
No need for list of population elements Disadvantages Variability and bias cannot be measured or controlled Projecting data beyond sample not justified.

30 Judgment or Purposive Sampling
Judgment sampling is a common nonprobability method. The sample is selected based upon judgment. An extension of convenience sampling When using this method, the researcher must be confident that the chosen sample is truly representative of the entire population.

31 Judgment or Purposive Sample
The sampling procedure in which an experienced research selects the sample based on some appropriate characteristic of sample members… to serve a purpose

32 Advantages Disadvantages Moderate cost Commonly used/understood
Sample will meet a specific objective Disadvantages Bias! Projecting data beyond sample not justified.

33 Quota Sampling Quota sampling is the nonprobability equivalent of stratified sampling. First identify the stratums and their proportions as they are represented in the population Then convenience or judgment sampling is used to select the required number of subjects from each stratum.

34 Quota Sampling The sampling procedure that ensure that a certain characteristic of a population sample will be represented to the exact extent that the investigator desires

35 Advantages Disadvantages Moderate cost
Very extensively used/understood No need for list of population elements Introduces some elements of stratification Disadvantages Variability and bias cannot be measured or controlled (classification of subjects) Projecting data beyond sample not justified.

36 Snowball Sampling Snowball sampling is a special nonprobability method used when the desired sample characteristic is rare. It may be extremely difficult or cost prohibitive to locate respondents in these situations. This technique relies on referrals from initial subjects to generate additional subjects. It lowers search costs; however, it introduces bias because the technique itself reduces the likelihood that the sample will represent a good cross section from the population.

37 Snowball sampling The sampling procedure in which the initial respondents are chosen by probability or non-probability methods, and then additional respondents are obtained by information provided by the initial respondents

38 Advantages Disadvantages low cost Useful in specific circumstances
Useful for locating rare populations Disadvantages Bias because sampling units not independent Projecting data beyond sample not justified.

39 MULTISTAGE SAMPLING Complex form of cluster sampling in which two or more levels of units are embedded one in the other. First stage, random number of districts chosen in all states. Followed by random number of Thanas, villages. Then third stage units will be houses. All ultimate units (houses, for instance) selected at last step are surveyed.

40 Sampling considerations
Larger sample sizes are more accurate representations of the whole The sample size chosen is a balance between obtaining a statistically valid representation, and the time, energy, money, labour, equipment and access available A sampling strategy made with the minimum of bias is the most statistically valid

41 Sampling considerations
Most approaches assume that the parent population has a normal distribution where most items or individuals clustered close to the mean, with few extremes A 95% probability or confidence level is usually assumed, for example 95% of items or individuals will be within plus or minus two standard deviations from the mean This also means that up to five per cent may lie outside of this - sampling, no matter how good can only ever be claimed to be a very close estimate.

42 Determining Sample Size
What data do you need to consider Variance or heterogeneity of population The degree of acceptable error (confidence interval) Confidence level Generally, we need to make judgments on all these variables

43 Determining Sample Size
Variance or heterogeneity of population Previous studies? Industry expectations? Pilot study? Sequential sampling Rule of thumb: the value of standard deviation is expected to be 1/6 of the range.

44 Determining Sample Size
Formulas: Means n = (ZS/E) 2 Proportions n = Z2 pq/ E2 Percentiles n = pc (100 – pc) Z2/ E2 Z at 95% confidence = 1.96 Z at 99% confidence = 2.58 Let n = sample size S = standard deviation Z = confidence level (ex. 95% confidence = 1.96), E = range of possible random error (how much error you are willing to accept) p = estimated proportion of successes q = 1 – p, or estimated proportion of failures pc = percentage

45 Sample Size Determination
The size of the group to be surveyed generally determines the size of the sample. Standard sampling practice is to include all members of a particular group if the number in the group is 100 or under. So there is no sampling. We can use only descriptive statistics but no inferential statistics in this situation. If the group size is , about 50% should be chosen through the application of a sampling technique. For larger groups, 20% of the total number of the group is an appropriate size. With 1,500 or more in the group, a sample size of 300 is considered adequate.

46 For a nation-wide survey a very small percentage of population (say 0
For a nation-wide survey a very small percentage of population (say 0.001%) could produce a big sample size. A serious consideration in determining the sample size is the number of non-respondents. Selecting 300 students from a group of 1,500 may not be effective if only 50 parents return the survey forms. The determination of sample size discussed so far are not backed by any standard theory, those come from the experiences of experimenters who are often involved in sample survey.

47 Randomization Pepsi Challenge 1975
At malls, shopping centers and other public locations, a Pepsi representative used to set up a table with two blank cups: one containing Pepsi and one with Coca-Cola. Shoppers were encouraged to taste both colas, and then select which drink they prefer. Then the representative revealed the two bottles so the taster can see whether they preferred Coke or Pepsi. The results of the test leaned toward a consensus that Pepsi was preferred by more Americans. Despite this claim, the market showed a different scenario. Americans used to buy Coke much more than Pepsi. In Pepsi challenge, coca-cola was always served earlier and in a bit warmer than Pepsi. In general human being remembers more the last thing he/she tastes and most of the Americans like chilled cola. Although the shoppers were blind about the cola they tasted but the whole process lacked randomization and had a clear bias towards Pepsi.

48 Why Is Random Sampling Important?
The myth: "A random sample will be representative of the population". A slightly better explanation that is partly true but partly urban legend : "Random sampling eliminates bias by giving all individuals an equal chance to be chosen.“ The real reason: The mathematical theorems which justify most frequents statistical procedures apply only to random samples.

49 Thank You


Download ppt "Md. Ashraful Islam Khan, PhD"

Similar presentations


Ads by Google