Introduction to Marketing Research CHAPTERS 16 DATA ANALYSIS: Frequency Distribution, Hypothesis Testing, Cross Tabulations Idil Yaveroglu Lecture Notes
Frequency Distribution In a frequency distribution, one variable is considered at a time. A frequency distribution for a variable produces a table of frequency counts, percentages, and cumulative percentages for all the values associated with that variable.
Usage and Attitude Toward Nike Shoes
Next: Hypothesis Testing Hypotheses are based on ideas about the research results and objectives A hypothesis is an unproven statement that provides an explanation for certain facts or phenomena Possible answer to a Marketing Research Question Statistical techniques are used to determine whether proposed hypotheses can be confirmed by the empirical evidence (collected data)
Hypothesis Testing Examples include: Explaining relationships among variables Comparing an observed value (e.g., mean or proportion) across multiple groups of respondents Comparing an observed value (e.g., mean or proportion) against a given value Comparing an observed value (e.g., mean or proportion) at multiple times Comparing an observed value (e.g., mean or proportion) across multiple variables Predicting the value of one variable with a set of other variables
General Procedure for Hypothesis Testing 1. Formulate the hypotheses Null hypothesis – predicts no difference or no effect Alternative hypothesis – difference or effect is predicted One-tail or two-tail test? Depends on whether or not the alternative hypothesis is expressed directionally ( , ) or not ( ≠ ) 2. Select an appropriate statistical technique and test statistic Test statistics are used to measure how close the observation from the sample has come to the null hypothesis Test statistics often follow a well-known distribution such as the normal (z), F, t, or chi-square ( )
General Procedure for Hypothesis Testing 3. Choose a level of significance Consider Type 1 Error Occurs when the sample results lead to the rejection of the null hypothesis when it is in fact true The probability of this type of error is also called the level of significance ( α ) α is most often set at 0.05; sometimes set at 0.10 or 0.01 Consider Type II Error Occurs when the sample results lead to the null hypothesis not being rejected when it is in fact false 4. Collect data and calculate test statistic
Type I and Type II Errors
General Procedure for Hypothesis Testing Determine the critical value for the test statistic Area to the right of the critical value is equal to α for a one-tail test and α/2 for a two-tail test
General Procedure for Hypothesis Testing Compare the critical value to the observed value of the test statistic and make a decision to reject the null hypothesis or not If the probability associated with the observed value of the test statistic is less than the level of significance (α), the null hypothesis is rejected and the alternative hypothesis is accepted If the probability associated with the observed value of the test statistic is greater than the level of significance (α), the null hypothesis cannot be rejected
General Procedure for Hypothesis Testing Summarize conclusions and key findings based on results of the hypothesis testing Conclusions must be expressed in reference to the specific marketing research question you are attempting to answer
What Hypothesis Tests Will We Learn? Test associations between two discrete variables Crosstabs and Chi-Squared test we will do this today Test observed values of means or proportions of a single variable against a standard One sample t-test and One sample z-test Test observed values of means across two variables Paired samples t-test Test differences across two or more groups of respondents Independent samples t-test and ANOVA Test associations between two scaled variables Correlation Test the significance of regression coefficients that correspond to predictors in a regression model Regression
Understand relationships between two discrete variables Cross-Tabulation Analysis “Cross-tabs”
Cross-Tabulations (aka “Cross-tabs”) Describes the relationship between two discrete variables Discrete = Nominal or ordinal scaled A contingency table contains a cell for every combination of the categories of the two variables under investigation Joint frequency counts and percentages are computed for each cell
Cross-Tabulations (aka “Cross-tabs”) Examples include: Is coupon use significantly related to annual household income? If so, what is the nature of this relationship? Coupon use measured dichotomously as coupon user vs. non- user Income measured using a multiple choice question with income categories as response options Is gender significantly related to usage of Nike shoes? If so, what is the nature of this relationship? Gender measured dichotomously as male vs. female Usage of Nike shoes measured as a multiple choice question with 1=light user, 2= medium user, 3= heavy user
Cross-Tabulation Analysis In cross-tabulation analysis, we test whether the relationship between the two variables is significantly greater than chance. For example, we can test whether there is a statistically significant relationship between “Usage level of Nike shoes” and “Gender” Null hypothesis is the hypothesis of no relationship / no differences No, there is no significant relationship between gender and Nike shoe usage. Alternative hypothesis is the statement of a relationship between the variables Yes, there is a significant relationship between gender and Nike shoe usage, in other words, usage varies by gender. A Chi-square statistic (χ2) is used to test the significance of an association between the two discrete variables
Steps for Conducting a Cross-tabs Analysis Write out the null and alternative hypotheses Construct the contingency table using SPSS Calculate the chi-square statistic Use the significance level corresponding to the chi-square statistic to determine whether or not there is support for a significant relationship between the two variables If the significance level is <.05 then you can reject the null hypothesis and conclude that there is a significant relationship If the significance level is >.05 then you are unable to reject the null hypothesis and conclude that there is no significant relationship Interpret the relationship by looking at the percentages (or counts) reported in the contingency table
SPSS EXAMPLE: Cross-Tabulation Analysis RQ: Is gender significantly associated with the level of usage of Nike shoes? Ho: Gender is not significantly associated with usage of Nike shoes H1: Gender is significantly associated with the usage of Nike shoes Case number (Respondent ID) User group 1 = light user 2 = medium user 3 = heavy user Sex 1 = females 2 = males Attitude toward Nike shoes: Measured using a 7-point semantic differential scale where 1 = very unfavorable & 7 = very favorable
SPSS EXAMPLE: Cross-Tabulation Analysis Select ANALYZE Click on DESCRIPTIVE STATISTICS Select CROSSTABS Choose the variable for the rows and columns Select “dependent variable” or “effect variable” for row Select “independent variable” or “causal variable” for column In this case, gender would “cause” shoe usage (shoe usage does not cause gender) Click on “user group” and move that to the row box Click on “sex” and move that to the column box Click on STATISTICS Select “Chi-square” Click on CONTINUE Click on “Cells” Select “observed” and “expected” and “column percentages” Click on OK
SPSS EXAMPLE: Cross-Tabulation Analysis Is there a significant difference in shoe usage between genders???
Usage of Nike Shoes by Gender Female Male Light Users 58.4% 23.8% Medium Users 20.8% Heavy Users 52.4% Column Total 100.0%
Gender by Usage of Nike Shoes Female Male Raw Total Light Users 73.7% 26.3% 100.0% Medium Users 50.0% Heavy Users 31.2% 68.8%
Statistics Associated with Cross-Tabulation Chi-Square (Cont.) The chi-square statistic (2) is used to test the statistical significance of the observed association in a cross- tabulation. The expected frequency for each cell can be calculated by using a simple formula: where = total number in the row = total number in the column = total sample size
Expected Frequency For the data in Table 16.3, for the six cells from left to right and top to bottom = (24 x 19)/45 = 10.1 = (21 x 19)/45 =8.9 = (24 x 10)/45 = 5.3 = (21 x 10)/45 =4.7 = (24 x 16)/45 = 8.5 = (21 x 16)/45 =7.5
= (14 -10.1)2 + (5 – 8.9)2 10.1 8.9 + (5 – 5.3)2 + (5 – 4.7)2 5.3 4.7 + (5 – 8.5)2 + (11 – 7.5)2 8.5 7.5 = 1.51 + 1.71 + 0.02 + 0.02 + 1.44 + 1.63 = 6.33
Is there difference in Shoe Usage between Genders? (24x19)/45=10.1 O=Observed Data; E = Expected Data (if there is no difference in Usage between Genders)
SPSS EXAMPLE: Cross-Tabulation Analysis Are the observed values far enough away from the expected values to support a significant relationship between the two variables? To test the null hypothesis of independence (no relationship), use the chi-square statistic.
Statistics Associated with Cross-Tabulation: Chi-Square To determine whether a systematic association exists, the probability of obtaining a value of chi-square as large or larger than the one calculated from the cross- tabulation is estimated. An important characteristic of the chi-square statistic is the number of degrees of freedom (df) associated with it. That is, df = (r - 1) x (c -1). The null hypothesis (H0) of no association between the two variables will be rejected only when the calculated value of the test statistic is greater than the critical value of the chi-square distribution with the appropriate degrees of freedom.
SPSS EXAMPLE: Cross-Tabulation Analysis Expected cell counts should be >5. If not, p-value may be biased. Here it is ok because only one cell has expected cell count <5 and it is close (4.67), i.e., can be rounded up to 5. χ2>5.99 p<.05
SPSS EXAMPLE: Cross-Tabulation Analysis The chi-square statistic is equal to 6.341 with a significance level equal to .042. Because 6.341 is above the critical value of 5.99 and the significance level is <.05, we can reject the null hypothesis and conclude that there is a significant relationship between gender and usage level of Nike shoes. Specifically, in our sample of 45 respondents, more males (52.4%) indicated that they were heavy users compared to females (20.8%). In contrast, more females (58.3%) indicated that they were light users compared to males (23.8%). Interestingly, the same number of females (5) and males (5) indicated that they were medium users.
Math: χ2 test Step 1: State the null and alternative hypotheses Step 2: Assume the null hypothesis is true Step 3: Compute a relevant test statistic Step 4: Degrees of freedom df= (r-1) x (c-1) Step 5: Compare observed χ2 to a χ2 table with df and appropriate significance level
SPSS Windows: Frequencies The main program in SPSS is FREQUENCIES. It produces a table of frequency counts, percentages, and cumulative percentages for the values of each variable. It gives all of the associated statistics. If the data are interval scaled and only the summary statistics are desired, the DESCRIPTIVES procedure can be used. The EXPLORE procedure produces summary statistics and graphical displays, either for all of the cases or separately for groups of cases. Mean, median, variance, standard deviation, minimum, maximum, and range are some of the statistics that can be calculated.
SPSS Windows: Frequencies To select these frequencies procedures, click the following: Analyze > Descriptive Statistics > Frequencies . . . or Analyze > Descriptive Statistics > Descriptives . . . Analyze > Descriptive Statistics > Explore . . . We illustrate the detailed steps using the data of Table 16.1.
SPSS Detailed Steps: Frequencies Select ANALYZE on the SPSS menu bar. Click DESCRIPTIVE STATISTICS, and select FREQUENCIES. Move the variable "Attitude toward Nike <attitude>" to the VARIABLE(S) box. Click STATISTICS. Select MEAN, MEDIAN, MODE, STD. DEVIATION, VARIANCE, and RANGE. Click CONTINUE. Click CHARTS. Click HISTOGRAMS, then click CONTINUE. Click OK.
SPSS Windows: Cross-tabulations The major cross-tabulation program is CROSSTABS. This program will display the cross-classification tables and provide cell counts, row and column percentages, the chi-square test for significance, and all the measures of the strength of the association that have been discussed. To select these procedures, click the following: Analyze > Descriptive Statistics > Crosstabs
SPSS Detailed Steps: Cross-Tabulations Select ANALYZE on the SPSS menu bar. Click DESCRIPTIVE STATISTICS, and select CROSSTABS. Move the variable "User Group <usergr>" to the ROW(S) box. Move the variable "Sex <sex>" to the COLUMN(S) box. Click CELLS. Select OBSERVED under COUNTS, and select COLUMN under PERCENTAGES. Click CONTINUE. Click STATISTICS. Click CHI-SQUARE, PHI, and CRAMER'S V. Click OK.
Excel: Cross-Tabulations The Insert > Pivot Table function performs cross-tabulations in Excel. To do additional analysis or customize data, select a different summary function, such as maximum, minimum, average, or standard deviation. In addition, a custom calculation can be selected to analyze values based on other cells in the data plane. The Chi-square Test can be accessed under Formulas > Insert Function > ChiTest. We illustrate the detailed steps using the data of Table 16.3
Excel Detailed Steps: Cross-Tabulations Select Insert (Alt + N). Click on Pivot Table. The Pivot Table window pops up. Select columns A to D and rows 1 to 46. “$A$1:$D$46” should appear in the range box. Select NEW WORKSHEET in the CREATE PIVOT TABLE window. Click OK.
Excel Detailed Steps: Cross-Tabulations (Cont.) 6. Drag variables into the layout on the left in the following format: SEX USERGR CASENO (Double-click CASENO and select Count)