Introduction to Marketing Research CHAPTERS 13 : SAMPLE SIZE SELECTION AND BASIC MEASURES OF CENTRAL TENDENCY Idil Yaveroglu Lecture Notes
What do Statistics Mean? Descriptive statistics Number of people Trends in employment Data Inferential statistics Make an inference about a population from a sample Determining Sample Size
Measures of Central Tendency Mean - arithmetic average Median - midpoint of the sorted distribution Mode - the value that occurs most often Determining Sample Size
Measures of Central Tendency Number of Salesperson Sales calls Mike 4 Patty 3 Billie 2 Bob 5 John 3 Frank 3 Chuck 1 Samantha 5 26
In the previous example measures of central tendency are calculated as follows… Sample mean Xi n Xi = X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 = = 4 + 3+2+5+3+3+1+5 = 26 Xi 26 = = 3.25 n 8 Sample median = 3 Mode = 3
In the previous example measures of dispersion are calculated as follows… Range = largest value – smallest value = 5 – 1= 4 Interquartile Range = middle 50 percent of observations = 4.5 – 2.5 = 2 Deviation scores - Variance = S2 = (Xi-X)2 = 1.9286 n - 1 - Standard Deviation = = = 1.3887
Calculating the Standard Deviation and Variance
Low Dispersion 5 4 3 2 Frequency 1 Value on Variable Determining Sample Size
High dispersion 5 4 3 2 1 Frequency Value on Variable
The Normal Distribution Normal curve Bell-shaped Almost all of its values are within plus or minus 3 standard deviations IQ is an example Determining Sample Size
Standardized Normal Distribution SYMMETRICAL ABOUT ITS MEAN MEAN IDENTIFIES HIGHEST POINT INFINITE NUMBER OF CASES - A CONTINUOUS DISTRIBUTION AREA UNDER CURVE HAS A PROBABILITY DENSITY = 1.0 MEAN OF ZERO, STANDARD DEVIATION OF 1
Standardized Normal Distribution Pr(Z) -3 -2 -1 1 2 3 Z
An Example of the Distribution of Intelligence Quotient (IQ) Scores 13.59% 13.59% 34.13% 34.13% 2.14% 2.14% 70 85 100 115 130 IQ
Definitions and Symbols POPULATION DISTRIBUTION – frequency distribution of the elements of the population Parameter: A parameter is a summary description of a fixed characteristic or measure of the target population. A parameter denotes the true value which would be obtained if a census rather than a sample was undertaken. SAMPLE DISTRIBUTION – A frequency distribution of the sample Statistic: A statistic is a summary description of a characteristic or measure of the sample. The sample statistic is used as an estimate of the population parameter. Random sampling error: The error when the sample selected is an imperfect representation of the population of interest. SAMPLING DISTRIBUTION – A theoretical probability distribution of sample means for all possible samples of a certain size drawn from a particular population Standard error of the mean – The standard deviation of the sampling distribution
Symbols for Population and Sample Variables Mean μ Proportion π p Variance Standard deviation σ s Size N n Standard error of the mean Standard error of the proportion Standardized variate (z)
Central Limit Theorem The central limit theorem states that as a sample size increases, the distribution of sample means of size n, randomly selected approaches a normal distribution
Definitions and Symbols Precision level: When estimating a population parameter by using a sample statistic, the precision level is the desired size of the estimating interval. This is the maximum permissible difference between the sample statistic and the population parameter. Confidence interval: The confidence interval is the range into which the true population parameter will fall, assuming a given level of confidence. Confidence level: The confidence level is the probability that a confidence interval will include the population parameter.
The Confidence Interval Approach z
The Confidence Interval Approach (Cont.)
95% Confidence Interval 0.475
Area for 95%= 0.5-(.05/2)=.475 Z=1.96
The Confidence Interval Approach (Cont.) The confidence interval is given by We can now set a 95% confidence interval around the sample mean of $182. Assume that sample size is 300, and σ=55. The 95% confidence interval is given by = 182.00 + 1.96(3.18) = 182.00 + 6.23 Thus the 95% confidence interval ranges from $175.77 to $188.23.
Confidence Interval Suppose you plan to open a sporting good store to cater to working women who golf. In a survey of 100 women you find that: Construct a 95% confidence interval. (z-value for 95% is 1.96)
Example: Determining CI for Population Mean We are interested in the mean starting income of Koc University MBAs so we go out and obtain a SRS of 100 MBAs. The mean starting income in our sample is TL115,000 with standard deviation (s.d.) equal to TL20,000. Construct a 95% confidence interval (CI) of the true population mean starting income of all Koc MBAs. Answer: = 115,000, = 20,000, n = 100, = 1.96 = ($11,1080 , $11,8920)
Confidence Interval for proportion p A proportion is a mean where each observation is a 0 or 1, so we can compute a CI for a population We can construct a 100(1- q)% CI for population mean p by computing where = the sample proportion n = the sample size = 1.645, 1.96, 2.575 for a 90%, 95% and 99% confidence interval
Example: Determining CI for Proportion We are interested in the proportion of Koc students who have tried a new soup. We go out and obtain an SRS of 400 students and ask them if they have tried the new soup. Suppose 20% of the 400 students say they have tried it. Construct a 90% confidence interval (CI) of the true population market share. Answer: (0.1671, 0.2329)
Sample Size Three factors are required to determine sample size VARIANCE (STANDARD DEVIATION) MAGNITUDE OF ERROR CONFIDENCE LEVEL
The Confidence Interval Approach and Determining Sample Size Means Proportions
The Size of the Sample (using Mean) What sample size do I need so that if I construct a 100(1-q)% confidence interval it will be an interval with width D ? Compute Size of the Sample Determine q the desired CI you want, e.g., = 1.645 is a 90% CI Determine the interval width you want, e.g., Compute an estimate of the standard deviation of the population of interest, denoted by s. Compute the necessary sample size
Where do you get ‘s’ from? Historical data Predicted worst case scenario (largest possible) Pre-tests / previous surveys
Ex.: Size of the sample (using Mean) Suppose you want a 95% CI for mean MBA starting salaries such that the interval will have a width of $5000. Using the results of the previous example where s = $20,000, how many MBAs do I need to obtain in my SRS to obtain this desired width? Answer: Ideally, you need to do this for every question in your survey and then take the maximum 61.5
Sample Size Determination for Means and Proportions Steps Means Proportions 1. Specify the level of precision. 2. Specify the confidence level (CL). CL = 95% 3. Determine the z value associated with the CL. Z value is 1.96
Sample Size Determination for Means and Proportions Steps Means Proportions 4. Determine the standard deviation of the population Estimate σ σ = 55 Estimate π π = 0.64 5. Determine the sample size using the formula for the standard error 6. If necessary, reestimate the confidence interval by employing s to estimate σ
A sample size of 400 is enough to represent China’s more than 1 A sample size of 400 is enough to represent China’s more than 1.3 billion people or the more than 300 million American people. The sample size is independent of the population size for large populations.
Adjusting the Statistically Determined Sample Size Incidence rate refers to the rate of occurrence or the percentage of persons eligible to participate in the study. In general, if there are c qualifying factors with an incidence of Q1, Q2, Q3, ...QC, each expressed as a proportion, Incidence rate = Q1 x Q2 x Q3....x QC Initial sample size = Final sample size Incidence rate x Completion rate