Download presentation
Presentation is loading. Please wait.
Published byDella Holt Modified over 9 years ago
1
1 Data Collection and Sampling ST 511
2
2 Methods of Collecting Data The reliability and accuracy of the data affect the validity of the results of a statistical analysis. The reliability and accuracy of the data depend on the method of collection. Three of the most popular sources of statistical data are: –Published data –Observational studies –Experimental studies
3
3 –This is often a preferred source of data due to low cost and convenience. –Published data is found as printed material, tapes, disks, and on the Internet. –Data published by the organization that has collected it is called PRIMARY DATA. For example: Data published by the US Bureau of Census. For example: Data published by the US Bureau of Census. –Data published by an organization different than the organization that has collected it is called SECONDARY DATA. For example: The Statistical abstracts of the United States, compiles data from primary sources Compustat, sells variety of financial data tapes compiled from primary sources For example: The Statistical abstracts of the United States, compiles data from primary sources Compustat, sells variety of financial data tapes compiled from primary sources Published Data
4
4 – Observational study is one in which measurements representing a variable of interest are observed and recorded, without controlling any factor that might influence their values. – Experimental study is one in which measurements representing a variable of interest are observed and recorded, while controlling factors that might influence their values. When published data is unavailable, one needs to conduct a study to generate the data. Observational and experimental studies
5
5 Surveys solicit information from people. Surveys can be made by means of –personal interview –telephone interview –self-administered questionnaire Surveys
6
6 A good questionnaire must be well designed: Keep the questionnaire as short as possible. Ask short,simple, and clearly worded questions. Start with demographic questions to help respondents get started comfortably. Use dichotomous and multiple choice questions. Use open-ended questions cautiously. Avoid using leading-questions. Pretest a questionnaire on a small number of people. Think about the way you intend to use the collected data when preparing the questionnaire. A good questionnaire must be well designed: Keep the questionnaire as short as possible. Ask short,simple, and clearly worded questions. Start with demographic questions to help respondents get started comfortably. Use dichotomous and multiple choice questions. Use open-ended questions cautiously. Avoid using leading-questions. Pretest a questionnaire on a small number of people. Think about the way you intend to use the collected data when preparing the questionnaire. Surveys
7
7 Sampling and Sampling Plans Motivation for conducting a sampling procedure: –Costs. –Population size. –The possible destructive nature of the sampling process. The sampled population and the target population should be similar to one another.
8
8 Sampling Plans We introduce four different sampling plans –Simple random samples –Stratified random samples –Cluster samples –Systematic samples
9
9 Simple Random Samples In simple random sampling all the samples with the same size are equally likely to be chosen. –It is a consequence of this definition that each individual in the population has an equal chance to be chosen An SRS is the standard against which we measure other sampling methods, and the sampling method on which the theory of working with sampled data is based To conduct random sampling… –assign a number to each element of the chosen population (or use already given numbers), –randomly select the sample numbers (members). Use a random numbers table, or a software package.
10
10 Simple Random Samples (cont.) To select a sample at random, we first need to define where the sample will come from. –The sampling frame is a list of individuals from which the sample is drawn. –E.g., To select a random sample of students from a college, we might obtain a list of all registered full-time students. –When defining sampling frame, must deal with details defining the population; are part-time students included? How about current study-abroad students? Once we have our sampling frame, the easiest way to choose an SRS is with random numbers.
11
11 Example –A government income-tax auditor is responsible for 1,000 tax returns. –The auditor will randomly select 40 returns to audit. –Use Excel’s random number generator to select the returns. Solution We generate 50 numbers between 1 and 1000 (we need only 40 numbers, but the extra might be used if duplicate numbers are generated.) Simple Random Sampling
12
12 Simple Random Sampling X(1000) Round-up 383 101 597 900 885 959 15 408 864 139 246. The auditor should select 40 files numbered 383, 101,... 50 Random numbers between 0 and 1000, each has a probability of 1/1000 to be selected 50 numbers uniformly distributed between 0 and 1 50 random uniformly distributed whole- numbers between 1 and 1000.
13
13 This sampling procedure separates the population into mutually exclusive sets (strata), and then selects simple random samples from each stratum. Sex Male Female Age under 20 20-30 31-40 41-50 Occupation professional clerical blue-collar Stratified Random Sampling
14
14 With this procedure we can acquire information about –the whole population –each stratum –the relationships among strata. Stratified Random Sampling
15
15 Stratified Random Sampling There are several ways to build the stratified sample. For example, keep the proportion of each stratum in the population. A sample of size 1,000 is to be drawn Stratum Income Population proportion 1 under $15,000 25% 250 2 15,000-29,999 40% 400 3 30.000-50,00030%300 4over $50,000 5% 50 Stratum size Total 1,000
16
16 Cluster sampling is a simple random sample of groups or clusters of elements. This procedure is useful when –it is difficult and costly to develop a complete list of the population members (making it difficult to develop a simple random sampling procedure. –the population members are widely dispersed geographically. Cluster sampling may increase sampling error, because of probable similarities among cluster members. Cluster Sampling
17
17 Systematic Samples Sometimes we draw a sample by selecting individuals systematically. –For example, you might survey every 10th person on an alphabetical list of students. To make it random, you must still start the systematic selection from a randomly selected individual. When there is no reason to believe that the order of the list could be associated in any way with the responses sought, systematic sampling can give a representative sample.
18
18 Systematic Samples (cont.) Systematic sampling can be much less expensive than true random sampling. When you use a systematic sample, you need to justify the assumption that the systematic method is not associated with any of the measured variables.
19
19 What Can Go Wrong?—or, How to Sample Badly Sample Badly with Volunteers: –In a voluntary response sample, a large group of individuals is invited to respond, and all who do respond are counted. Voluntary response samples are almost always biased, and so conclusions drawn from them are almost always wrong. –Voluntary response samples are often biased toward those with strong opinions or those who are strongly motivated. –Since the sample is not representative, the resulting voluntary response bias invalidates the survey.
20
20 What Can Go Wrong?—or, How to Sample Badly (cont.) Sample Badly, but Conveniently: –In convenience sampling, we simply include the individuals who are convenient. Unfortunately, this group may not be representative of the population. –Convenience sampling is not only a problem for students or other beginning samplers. In fact, it is a widespread problem in the business world— the easiest people for a company to sample are its own customers.
21
21 What Can Go Wrong?—or, How to Sample Badly (cont.) Sample from a Bad Sampling Frame: –An SRS from an incomplete sampling frame introduces bias because the individuals included may differ from the ones not in the frame. Undercoverage: –Many of these bad survey designs suffer from undercoverage, in which some portion of the population is not sampled at all or has a smaller representation in the sample than it has in the population. –Undercoverage can arise for a number of reasons, but it’s always a potential source of bias.
22
22 What Else Can Go Wrong? Watch out for nonrespondents. –A common and serious potential source of bias for most surveys is nonresponse bias. –No survey succeeds in getting responses from everyone. The problem is that those who don’t respond may differ from those who do. And they may differ on just the variables we care about.
23
23 What Else Can Go Wrong? (cont.) Don’t bore respondents with surveys that go on and on and on and on… –Surveys that are too long are more likely to be refused, reducing the response rate and biasing all the results.
24
24 What Else Can Go Wrong? (cont.) Work hard to avoid influencing responses. –Response bias refers to anything in the survey design that influences the responses. –For example, the wording of a question can influence the responses:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.