Download presentation
Presentation is loading. Please wait.
1
213Sampling.pdf When one is attempting to study the variable of a population, whether the variable is qualitative or quantitative, there are two methods of data collection which can be employed: 1. Conduct a Census. 2. Take a Sample.
2
Some Population characteristics: = population average (mean) 2. p = population proportion Some Sample characteristics: 1. = sample average 2. = sample proportion
3
A census is taken when every element (or individual if the population consists of people) in the population of interest is inspected with regards to the population variable of interest. Taking a Census:
4
Problems with a Census 1.Can each member of the population be accessed? --- Cause for bias. 2. Control of information gathering is very difficult for large populations. --- Cause for bias. 3. Very expensive and time consuming.
5
Sample Take only part of the population: We hope that the characteristics of the sample reflect those of the population.
6
Suppose the mean (or average) number of courses taken by the 100 students in the sample was 4.2, or = 4.2. What does this statistic represent? Does this indicate that the average number of courses taken in the Fall 2005 semester for ALL undergraduate students is 4.2, or µ = 4.2? No, it does not.
7
Taking a Random Sample There are four ways to select a random sample: 1. Simple random sampling (SRS). 2. Stratified sampling. 3. Cluster sampling. 4. Systematic sampling (or 1-in-k sampling)
8
1. Simple Random Sampling - SRS A simple random sample is taken when every conceivable subgroup of a size n has the same “chance” of being selected as the sample.
9
In order to take a simple random sample, you require: 1.A sampling frame - a complete list of all the elements in the population of interest. 2. A method of numerically differentiating between each element in the sampling frame. This is done by assigning each element in the sampling frame a unique numerical number - a number that “belongs” to only them. 3. A random number generator - such as the table on pages 847 and 848 of your text.
10
Consider a regular sized lecture section of Statistics 213 (having 115 students, or N = 115) as the population of interest. Do you have at least one part time job? (Yes = 1, No = 0) How many part-time jobs do you have? (Answer = 0, 1, 2, …) Example
11
Below is a condensed version of the class list, or the sampling frame. 1. Student A 2. Student B 3. Student C. 115. Student XYZ 1 and 2:
12
3. Using the random number generator A simple random sample of 5 will be taken. The first three digit number (row 1, columns 1 - 3) is “104”. If the next three digit number is not between 1 and 115, continue until you find a three digit number between 1 and 115. It should be different from the previously selected number. The next four choices are: 094, 103, 071, 023. The SRS will be the 23rd, 94th, 71st, 103rd and 104th student.
13
Suppose of the five students selected, two have at least one part-time job. The sample proportion,, is then
14
Advantages of SRS A simple random sample is the purest method of random selection. The simple random sample criteria allows the selection of the sample to be done in a completely objective manner. There are no issues with selection bias
15
A stratified sample is taken when the population of interest is subdivided into k-different groups, or k-strata. Once this is done a simple random sample (SRS) is taken from each stratum. The simple random samples taken from each stratum are put together and constitute the random sample, or n. How is a population stratified? 2. Stratified Sampling
16
The population is stratified according to some other population variable: 1. geographic - stratify according to some ‘location’ variable of the underlying population: province, region (West, Central, East, Maritimes), quadrant (NW, NE, SW, SE), rural vs. urban, etc. 2. non-geographic - stratify according to some ‘non- location’ variable of the population: gender (male, female), income level/tax bracket (lower, middle, upper), age level (18 < 30, 30 < 40, 40 < 55, 55 and up), education level, etc.
17
Consider the population of Canadian voters, and the variable of interest is: “whether or not a politician can be trusted”. To measure such, suppose a random sample of 1000 Canadian voters is to be selected using stratified sampling, and the population will be stratified into 4 strata. (k = 4) Example
18
Stratifying according to region of the country, we have Stratum #1Stratum #2Stratum #3Stratum #4 The West Ontario Quebec Atlantic n W = 250 n O = 250 n Q = 250 n M = 250 = 58% = 52% = 61% = 68% Since the samples are “equally weighted”, the sample proportion is simply the average of the individual sample proportions: = 0.58 + 0.52 + 0.61 + 0.68 = 0.5975 4
19
Stratum #1 Stratum #2 Stratum #3 Stratum #4 West Ontario Quebec Atlantic 30.3 % 37.9 % 24.0 % 7.8 % n W = 303 n O = 379 n Q = 240 n M = 78 = 58% = 52% = 61% = 68% Proportionally Stratified Sampling1
20
Stratum #1 Stratum #2 Stratum #3 Stratum #4 West Ontario Quebec Atlantic 30.3 % 37.9 % 24.0 % 7.8 % n W = 303 n O = 379 n Q = 240 n M = 78 = 58% = 52% = 61% = 68% Because the stratified sample has been conducted proportionally, we then “weight” the individual percentages and calculate the weighted-average: = 303 (0.58) + 379 (0.52) +240 (0.61) + 78 (0.68) 1000 1000 1000 1000 = 0.5723
21
1.Approximately 58% of voters in the West and Territories believe politicians cannot be trusted. 2.Approximately 52% of voters in Ontario believe politicians cannot be trusted. 3.Approximately 61% of voters in Quebec believe politicians cannot be trusted. 4.Approximately 68% of voters in Atlantic Canada believe politicians cannot be trusted.
22
Often one does not have the luxury of a large budget or time frame to complete a study on a large population. In such cases one can attempt to sample from the population using a method that seems to closely follow a stratified sample, but is much easier. If one wishes to study the annual income of households in Calgary, clearly it would be difficult to have a complete list of all the households of Calgary. Identify “clusters” of a population - those non-overlapping groups that elements in a population naturally fall within.
23
Once this is done, the researcher can either: 1. randomly select, using SRS, one cluster (or many clusters) and then inspect every element falling within the randomly selected cluster(s) 2. randomly select, using SRS, one cluster (or many clusters) and then take a simple random sample of elements from the randomly selected cluster(s). Option #1 is deemed a single stage cluster sample. Option #2 is called a double (or multi) stage cluster sample.
24
The distinction is the following: In a single stage cluster sample, random selection is only occurring once: the cluster (or clusters) are RANDOMLY selected. In a double (or multi) stage cluster sample, random selection is occurring twice (or more than twice).
25
The sampling error is simply the difference between the sample and the population. There will be difference, albeit slight, between what is happening in the sample and what is really happening in the population. Both the sample mean ( ) and the sample proportion ( ) have a sampling error. The sampling error of is approximated by the following: Error = Clearly the larger the sample size, the smaller the sampling error.
26
A random sample of 500 Calgarians indicated that 45% of Calgarians think that Highway #2 between Calgary and Edmonton should be a toll highway. In this example the population of interest consists of Calgarians. The variable is an opinion about whether #2 between Calgary and Edmonton should be a toll highway -a qualitative variable that can be measured using a nominal scale (why?). The sample size, or n, is 500. The sample proportion, or = 0.45. What is the error of this sample?
27
Error = We get an interval estimate: That is, from:
28
1.Error of Non-Inclusion - errors due to a member of the population having no “chance” of appearing in the sample. (Nonrandom sampling) 2.Errors of Non-Observation - errors that arise due to problems with sampling. (Non-response) 3.Errors of Observation - errors related to the collecting of the data. Such errors can occur even when the sample is selected using random methods. (Incorrect answer, or measurement bias) There are other errors in sampling that can occur, which are often called “biases”. Preventative measures can be taken to reduce, and in some cases eliminate, such sampling bias. These biases, or errors, can be classified into:
29
Summarizing the Data Four types of graphical methods will be discussed. These four methods are used for displaying data on a population variable that is quantitative. These four graphical routines are: Dotplot. Stem-and-Leaf plot (or stemplot). Histogram. Boxplot.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.