Download presentation
Presentation is loading. Please wait.
Published byEmery Wheeler Modified over 9 years ago
1
SAMPLING DESIGN AND PROCEDURES
2
Sampling Terminology Sample A subset, or some part, of a larger population. Population (universe) Any complete group of entities that share some common set of characteristics. Population Element An individual member of a population. Census An investigation of all the individual elements that make up a population.
3
Sample Survey A survey which is carried out using a sampling method, i.e., in which a portion only, and not the whole population, is surveyed. One of the units into which an aggregate is divided for the purposes of sampling, each unit being regarded as individual and indivisible when the selection is made. The definition of unit may be made on some natural basis, for example, households, persons etc.
4
PARAMETER & STATISTIC PARAMETER(S): A characteristic of a population STATISTIC(S):A characteristic of a sample (estimation of a parameter from a statistic is the prime objective of sampling analysis).
5
A list, map or other specification of the units which constitute the available information relating to the population designated for a particular sampling scheme. There is corresponding to each state of sampling in a multi-stage sampling scheme. The frame may or may not contain information about the size or other supplementary information of the units, but it should have enough details so that a unit, if included in the sample, may be located and taken up for inquiry.
6
that part of the difference between a population value and an estimate thereof, derived from a random sample, which is due to the fact that only a sample of values is observed; as distinct from errors due to imperfect selection, bias in response or estimation, errors of observation and recording, etc. the totality of sampling errors in all possible samples of the same size generates the sampling distribution of the statistic which is being used to estimate the parent value.
7
Why Sample? Budget and time constraints. Limited access to total population. Accurate and Reliable Results Destruction of Test Units Sampling reduces the costs of research in finite populations.
8
Sample Vs. Census
9
Sampling Techniques Nonprobability Sampling Techniques Probability Sampling Techniques Convenience Sampling Judgmental Sampling Quota Sampling Snowball Sampling Systematic Sampling Stratified Sampling Cluster Sampling Other Sampling Techniques Simple Random Sampling
10
Convenience sampling attempts to obtain a sample of convenient elements. Often, respondents are selected because they happen to be in the right place at the right time. use of students, and members of social organizations mall intercept interviews without qualifying the respondents department stores using charge account lists “people on the street” interviews
11
ABCDE 1611 16 21 2712 17 22 3813 18 23 4914 19 24 51015 20 25 Group D happens to assemble at a convenient time and place. So all the elements in this Group are selected. The resulting sample consists of elements 16, 17, 18, 19 and 20. Note, no elements are selected from group A, B, C and E.
12
Judgmental sampling is a form of convenience sampling in which the population elements are selected based on the judgment of the researcher. test markets purchase engineers selected in industrial marketing research bellwether precincts selected in voting behavior research expert witnesses used in court
13
ABCDE 16 11 1621 27121722 3 813 1823 491419 24 5 10 152025 The researcher considers groups B, C and E to be typical and convenient. Within each of these groups one or two elements are selected based on typicality and convenience. The resulting sample consists of elements 8, 10, 11, 13, and 24. Note, no elements are selected from groups A and D.
14
Quota sampling may be viewed as two-stage restricted judgmental sampling. The first stage consists of developing control categories, or quotas, of population elements. In the second stage, sample elements are selected based on convenience or judgment. PopulationSample compositioncomposition Control CharacteristicPercentagePercentageNumber Gender Male4848480 Female5252520 ____________ 1001001000
15
ABCDE 1 6 111621 271217 22 3 8 13 1823 49141924 51015 20 25 A quota of one element from each group, A to E, is imposed. Within each group, one element is selected based on judgment or convenience. The resulting sample consists of elements 3, 6, 13, 20 and 22. Note, one element is selected from each column or group.
16
o Each element in the population has a known and equal probability of selection. o Each possible sample of a given size (n) has a known and equal probability of being the sample actually selected. o This implies that every element is selected independently of every other element.
17
ABCDE 1611 16 21 2 7 121722 3 8131823 4 9 1419 24 510152025 Select five random numbers from 1 to 25. The resulting sample consists of population elements 3, 7, 9, 16, and 24. Note, there is no element from Group C.
18
The sample is chosen by selecting a random starting point and then picking every ith element in succession from the sampling frame. The sampling interval, i, is determined by dividing the population size N by the sample size n and rounding to the nearest integer. When the ordering of the elements is related to the characteristic of interest, systematic sampling increases the representativeness of the sample.
19
If the ordering of the elements produces a cyclical pattern, systematic sampling may decrease the representativeness of the sample. For example, there are 100,000 elements in the population and a sample of 1,000 is desired. In this case the sampling interval, i, is 100. A random number between 1 and 100 is selected. If, for example, this number is 23, the sample consists of elements 23, 123, 223, 323, 423, 523, and so on.
20
ABCDE 16111621 27121722 38131823 49141924 510152025 Select a random number between 1 to 5, say 2. The resulting sample consists of population 2, (2+5=) 7, (2+5x2=) 12, (2+5x3=)17, and (2+5x4=) 22. Note, all the elements are selected from a single row.
21
A two-step process in which the population is partitioned into subpopulations, or strata. The strata should be mutually exclusive and collectively exhaustive in that every population element should be assigned to one and only one stratum and no population elements should be omitted. Next, elements are selected from each stratum by a random procedure, usually SRS. A major objective of stratified sampling is to increase precision without increasing cost.
22
The elements within a stratum should be as homogeneous as possible, but the elements in different strata should be as heterogeneous as possible. The stratification variables should also be closely related to the characteristic of interest. Finally, the variables should decrease the cost of the stratification process by being easy to measure and apply.
23
ABCDE 161116 21 2 7 121722 38 13 1823 4 914 19 24 510152025 Randomly select a number from 1 to 5 for each stratum, A to E. The resulting sample consists of population elements 4, 7, 13, 19 and 21. Note, one element is selected from each column.
25
HYPOTHESIS …??? is formally stated expectation about how a behavior operates. … is a proposition that a researcher wants to verify. A hypothesis is an assumption about the population parameter.
26
Formulate a Null Hypothesis (H 0 ). Formulate an Alternative Hypothesis (H 1 ) Select a suitable Test Statistic Specify a Level of Significance ( ) Define a suitable Decision Criterion based on and Test Statistic Make necessary Assumptions if required Experiment and Calculation of Test Statistic Conclusion or Decision
27
As the sample size gets large enough…the sampling distribution becomes almost normal regardless of shape of population Central Limit Theorem
28
The Null Hypothesis, H 0 Always contains the ‘ = ‘ sign It is a statement about the hypothesized value of population parameter. States the Assumption (numerical) to be tested for possible rejection under the assumption that the null hypothesis is TRUE. The average sale of showroom is at least 3.0 lakh (H 0 : μ≥ 3.0)
29
Is the opposite of the null hypothesis e.g. The average sale of a showroom is less than 3.0 (H 1 : μ < 3.0) Never contains the ‘=‘ sign The Alternative Hypothesis may or may not be accepted Is generally the hypothesis that is believed to be true by the researcher The Alternative Hypothesis, H 1
30
Level of Significance, a Typical values are 0.01, 0.05, 0.10 Defines Unlikely Values of Sample Statistic if Null Hypothesis Is True. If we assume that hypothesis is correct, then the significance level will indicate the percentage of sample statistics is outside certain limits. 0
31
Level of Significance, and the Rejection Region H 0 : 3 H 1 : < 3 0 0 0 H 0 : 3 H 1 : > 3 H 0 : 3 H 1 : 3 /2 Critical Value(s) Rejection Regions
32
One-Tailed Hypothesis Test The term one-tailed signifies that all values that would cause to reject H 0, are in just one tail of the sampling distribution Two-Tailed Hypothesis Test Two-tailed test is one in which values of the test statistic leading to rejectioin of the null hypothesis fall in both tails of the sampling distribution curve
33
Summary of Errors Involved in Hypothesis Testing
34
Reduce probability of one error and the other one goes up. & Have an Inverse Relationship
35
How to choose between Type I and Type II errors Reworking cost is low----Type I error Reworking cost is high---Type II
36
TOSH of means when the population Standard deviation is known Zcalc = (X - 0 )/( / n) H 0 : = 0 vs. H A : ≠ > < 0 0
37
Example Bajaj Company claims that the length of life of its electric bulb is 1000 hours with standard deviation of 30 hours. A random sample of 25 checked an average life of 960 hours. At 5 % level of significance can we conclude that the sample has come from a population with mean life of 1000 hours? Table value = 1.96
38
t –test, Standard deviation is unknown and small sample H 0 : = 0 vs. H A : > < 0 Testing a Hypothesis About a Mean; We Do Not Know Which Must be Estimated by S.. Calculate t calc = (X - 0 )/(s/ n )
39
Example The weight of a canned food product is specified as 500 grm. For a sample of 8 cans the weight were observed as 480, 475, 510, 500, 505, 495, 504 and 515 grm. Test at 5% level of significance, whether on an average the weight is as per specification. Table value = 2.365
40
Two independent samples were collected. For the first sample of 42 items, the mean was 32.3 and the variance 9. The second sample of 57 items had a mean of 34 and a variance of 16. Using 0.05level of significance, test whether there is sufficient evidence to show the second population has a larger mean.
41
H 0 : = vs. H A : ≠ > < n1 = ______, n2=______ = _______ Testing a Hypothesis About two Mean; Process Performance Measure is Approximately Normally Distributed; We “Know” Therefore this is a “Z-test” - Use the Normal Distribution. Calculate test statistic (x 1 - x 2 ) - ( 1 - 2 ) Zcalc = ------------------------------- 1 2 /n 1 + 2 2 /n 2 DR: (≠ in H A ) Reject H 0 in favor of H A if Z calc +Z /2. Otherwise, FTR H 0. DR: (> in H A ) Reject H 0 in favor of H A iff Z calc > +Z . Otherwise, FTR H 0. DR: (< in H A ) Reject H 0 in favor of H A iff Z calc < -Z . Otherwise, FTR H 0.
42
Z-test to test two population mean( )When population standard deviation is unknown & n is large H 0 : = vs. H A : ≠ > < n1 = ______, n2=______ = _______ Testing a Hypothesis About two Mean; Process Performance Measure is Approximately Normally Distributed; We “Know” S S Therefore this is a “Z-test” - Use the Normal Distribution. Calculate test statistic (x 1 - x 2 ) - ( 1 - 2 ) Zcalc = ------------------------------- S 2 /n 1 + S 2 /n 2
43
H 0 : = 2 vs. H A : > < 2 n = _______ = _______ Testing a Hypothesis About a Mean; Process Performance Measure is Approximately Normally Distributed or We Have a “small” Samples; We Do Not Know Which Must be Estimated by S. Therefore this is a “t-test” - Use Student’s T Distribution. Calculate (x 1 - x 2 ) - ( 1 - 2 ) t = ------------------------- s * ( 1/n 1 + 1/n 2 ) with d.f. = n 1 + n 2 - 2. In this expression, s * is the pooled standard deviation, given by s 2 = [ (n 1 – 1)s 1 2 + (n 2 – 1)s 2 2 ] / (n1+n2-2) = --------------------------------- n 1 + n 2 - 2 t-test,To test two population mean
44
Paired Samples The difference in these cases is examined by a paired samples t test. To compute t for paired samples, the paired difference variable, denoted by D, is formed and its mean and variance calculated. Then the t statistic is computed. The degrees of freedom are n - 1, where n is the number of pairs. The relevant formulas are:
45
The difference in these cases is examined by a paired samples t test. To compute t for paired samples, the paired difference variable, denoted by D, is formed and its mean and variance calculated. Then the t statistic is computed. The degrees of freedom are n - 1, where n is the number of pairs. The relevant formulas are:
46
Cross-Tabulations: Chi-square Test Technique used for determining whether there is a statistically significant relationship between two categorical (nominal or ordinal) variables
47
Telecommunications Company Marketing manager of a telecommunications company is reviewing the results of a study of potential users of a new cell phone Random sample of 200 respondents A cross-tabulation of data on whether target consumers would buy the phone (Yes or No) and whether the cell phone had Bluetooth wireless technology (Yes or No) Question Can the marketing manager infer that an association exists between Bluetooth technology and buying the cell phone?
48
Two-Way Tabulation of Bluetooth Technology and Whether Customers Would Buy Cell Phone
49
Cross Tabulations -Hypotheses H 0 : There is no association between wireless technology and buying the cell phone (the two variables are independent of each other). H a : There is some association between the Bluetooth feature and buying the cell phone (the two variables are not independent of each other).
50
Conducting the Test Test involves comparing the actual, or observed, cell frequencies in the cross- tabulation with a corresponding set of expected cell frequencies (E ij ) Expected Values n i n j E ij = ----- n Where n i and n j are the marginal frequencies, that is, the total number of sample units in category i of the row variable and category j of the column variable, respectively
51
Computing Expected Values The expected frequency for the first-row, first- column cell is given by 100 100 E 11 =------------ = 50 200
52
Observed and Expected Cell Frequencies
53
Chi-square Test Statistic = 72.00 Where r and c are the number of rows and columns, respectively, in the contingency table. The number of degrees of freedom associated with this chi ‑ square statistic are given by the product (r - 1)(c - 1).
54
Chi-square Test Statistic in a Contingency Test For d.f. = 1, Assuming =.05, from Appendix 2, the critical chi ‑ square value ( 2 c ) = 3.84. Decision rule is: “Reject H 0 if 2 3.84.” Computed 2 = 72.00 Since the computed Chi-square value is greater than the critical value of 3.84, reject H 0. The apparent relationship between “Bluetooth technology"and "would buy the cellular phone" revealed by the sample data is unlikely to have occurred because of chance
55
EXAMPLE In a management institute, the A+, A and B grades allocated to students in there final examination, were as follows. Using 5% level of significance, determine whether the grading scale is independent of the specialization. Table value = 9.488 Specialization GradeFinanceMarketingOperations A+20 15 05 A25 20 15 B 10 08 07
56
Univariate Hypothesis: Papa John’s restaurants are more likely to be located in a stand-alone location or in a shopping center. Bivariate Hypothesis: Stand-alone locations are more likely to be profitable than are shopping center locations.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.