Download presentation
Presentation is loading. Please wait.
Published byJonah Hart Modified over 9 years ago
1
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-1 CHAPTER 16 BIVARIATE STATISTICS: PARAMETRIC TESTS
2
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-2 What The Experts Say What’s the point in doing surveys if you can’t analyze the data? Converting and reducing data into meaningful results is a marketing researcher’s key responsibility. --SPSS Web Page, “Analysis,” http://www.spss.com/spssmr/solutions/ analysis.htm, February 19, 2001. http://www.spss.com/spssmr/solutions/ analysis.htm
3
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-3 Learning Objectives Discuss the importance of parametric statistics Describe the difference between tests of differences and tests of associations Explain how to use z- and t-tests to compare two groups Describe and calculate the F-test Discuss the meaning and use of analysis of variance Describe correlation and regression analyses Calculate and interpret correlation and regression statistics Compute one-way analysis of variance manually and by computer
4
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-4 Get This! My Name Is Important to Me In the book, How to Win Friends and Influence People, Dale Carnegie wrote, “Remember that a person’s name is to that person the sweetest and most important sound in any language.” A professor classified students into three groups: names (those he could remember), no-names (those he could not remember), and neutral-names (those whose names he never made reference to during the conversations). At the end of a meeting with each student, the professor would state “Oh, I have to ask you something else. My wife is selling cookies for the church. If you want any, they’re only 25 cents.” This offer was made to examine if remembrance of a student’s name made a difference regarding whether or not he or she would comply with a request (that is, purchase the cookies).
5
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-5 The results were analyzed using several different statistical techniques, one being analysis of variance. He found: –Not being able to remember a student’s name produced compliance results (that is, purchasing the cookies) no different from those of a condition in which the issue of a student’s name was never raised. –The higher purchasing rate for those students whose names were remembered indicates that name remembrance facilitates compliance. Get This! My Name Is Important to Me – cont’d
6
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-6 Now Ask Yourself Based on your knowledge of statistics, do you have faith in the findings since various statistical tools were used to analyze the data? Was it really necessary for the researchers to run statistical tests to generate their findings? What was meant by, “The professor decided to use this method [analysis of variance] since it tests whether there are statistically significant differences among the means of each of the student groups”? Were the results surprising to you? If so, what did you expect? If not, why not?
7
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-7 Parametric Tests The sample data should be randomly drawn from a normally distributed population. The sample data drawn must be independent of each other. When examining central tendency for which two or more samples are drawn, the population should have equal variances. The hypothesis tests assume that variables under investigation are measured using either interval or ratio scales. Furthermore, it is necessary to make some additional assumptions.
8
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-8 Tests of Difference the first population and its samples are identified by subscript 1 the second population and its samples are identified by subscript 2 1 represents the mean of the sample drawn from population 1 2 represents the mean of the sample drawn from population 2 Can be used whenever a researcher is interested in comparing some characteristic of one group with a characteristic of another and determining whether or not a significant difference exists between the two groups.
9
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-9 where [Formula 16-1] where = the difference between sample means = the difference between population means and = sample means for the two variables = standard error of the difference between the means Z-test: Difference Between Means Used to determine whether two population means differ from each other. This can be determined by using either the z-test or t- test, depending on the sample size and whether or not the population standard deviation is known for either group. If the sample size is at least 30 and the population standard deviations are known, the z-test should be used.
10
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-10 t-Test: Difference Between Means When the sample size is less than 30 and the population standard deviations are unknown, we can determine whether or not a significant difference exists between two means (or whether the two population means are equal). where
11
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-11 Difference Between Two Proportions and Independent Samples Let p 1 and p 2 be the proportions of two samples drawn from respective populations with proportions P 1 and P 2. The null hypothesis is that there is no difference between the two population proportions; that is P 1 = P 2 or stated another way, P 1 - P 2 = 0. If the null hypothesis is true, P 1 = P 2, the two populations are really the same population. The basic concept concerning the difference between two sample proportions is analogous to that concerning the difference between two sample means. 1.The mean of the sampling distribution (p 1 - p 2 ) is equal to the difference between the two population proportions, P 1 and P 2, or p 1 – p 2 = P 1 – P 2.
12
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-12 2.The variance of the difference between two sample proportions is the sum of variances of the two sample proportions, where When the sampling distributions of p 1 and p 2 are normal, the distribution of the differences between p 1 and p 2 is also normal. Since the mean of the sampling distribution of p 1 - p 2 is equal to the difference between the two population proportions, the distribution that follows is normal. Difference Between Two Proportions and Independent Samples – cont’d
13
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-13 = sample proportion successes in first group = sample proportion successes in second group = population proportion of first group = population proportion of second group = variance of the difference between two sample proportions When P 1 = P 2, P 1 - P 2 = 0 and P 1 Q 1 = P 2 Q 2 = PQ where Q = 1 – P. Thus where Difference Between Two Proportions and Independent Samples – cont’d
14
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-14 Analysis of Variance The two tests (z-tests and t-tests) are useful when testing a null hypothesis when only two samples are involved. Analysis of Variance (ANOVA) is often the preferred method to test whether there is a significant difference among means of two or more independent samples. It is applicable whenever a study involves an interval- or ratio-scaled dependent variable. One-Way Analysis of Variance is discussed in this chapter. It is a bivariate statistical technique that involves only one independent variable, although there may be multiple levels of that variable. The null hypothesis for ANOVA is that the means of normally distributed populations, such as three populations, a, b, c, are equal or a = b = c.
15
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-15 If we take a random sample from each of the three original populations, we may consider the three samples of subsets of a single large sample drawn from the single large population. Grand mean = The unbiased estimate of the large population variance ( ) based on the preceding samples may be obtained by calculating the variance between groups [MSA ( )] and the variance within groups [MSE ( )]. Analysis of Variance – cont’d
16
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-16 The variance between groups (or between samples) is also referred to as the “mean sum of squares between (among) groups.” It is sometimes denoted as MSA or. It is written in a general form: Variance Between Groups i = individual groups or samples a, b, c, … = size of group i, or size of sample drawn from population i, such as in the preceding illustration = mean of the items in group or sample i = grand mean, or mean of all items in the single large sample = deviation of group mean from grand mean = variation, or squared deviation (The term “variation” has been used loosely in previous discussions. Here, the term is limited to represent the squared deviation.) r = number of groups or samples, such as three groups in the above illustration
17
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-17 Note that the deviation is called the effect, and the nature of the sample i is called the treatment. Furthermore, whenever ANOVA is used, the independent variables are called factors, so the different levels (or categories) of a factor are the treatments. Variance Between Groups – cont’d
18
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-18 Variance Within Groups The variance within groups (or within individual samples) is also referred to as the mean square error (MSE) or, since it is an estimate of the random error existing in the data. It is written in a general form where individual items in group i number of items in the single large sample
19
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-19 F-Test The F-statistic is the variance between groups divided by the variance within groups. It is used to test for group differences and compares one sample variance with another sample variance. It can be presented this way: where the subscripts 1 (in the numerator) and 2 (in the denominator) indicate the sample numbers and each represents the estimate of the population variance based on the sample. Represents the variance ratio in showing the relationship between the two independently estimated population variances Variance between groups MSA Variance within groups MSE F = =
20
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-20 Tests of Associations Examine associations between two or more variables. When two groups are studied, there will always be a variable that predicts the actions of another variable. The predictor variable is the independent variable, and the criterion variable is the dependent variable. Tests to measure statistical relationships between variables are: –Regression Analysis –Correlation Analysis
21
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-21 Scatter Diagrams When two related variables, called bivariate data, are plotted as points on a graph, the graph is called a scatter diagram. A scatter diagram indicates whether the relationship between the two variables is positive or negative.
22
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-22 Regression Analysis Refers to statistical techniques for measuring the linear or curvilinear relationship between a dependent variable and one or more independent variables. The relationship between two variables is characterized by how they vary together. Given pairs of X and Y variables, regression analysis measures the direction (positive or negative) and rate of change (slope) in Y as X changes, or vice versa. Using the values of the independent variable, it attempts to predict the values of an interval- or ratio-scaled dependent variable. Regression analysis requires two operations: (1) Derive an equation, called the regression equation, and a line representing the equation to describe the shape of the relationship between the variables. (2) Estimate the dependent variable (Y) from the independent variable (X), based on the relationship described by the regression equation. The regression line is the line drawn through a scatter diagram that “best fits” the data points and most accurately describes the relationship between the two variables.
23
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-23 Regression Equation and Regression Line While all shapes are informative, a straight line is especially useful, because it is the easiest to deal with in regression analysis to describe the shape of the average relationship between two variables. The straight line can be expressed by the linear equation: where = computed value of the dependent variable a = Y-intercept where X equals zero b = slope of the regression line, which is the increase or decrease in Y for each change of one unit of X X =a given value of the independent variable
24
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-24 To create a regression model, researchers estimate the regression line using the following equation where =Y-intercept where X equals zero =slope of the regression line, which is the increase or decrease in Y for each change of one unit of X =a given value of the independent variable i =observation number =error term associated with the ith observation Regression Equation and Regression Line – cont’d
25
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-25 Least-Squares Method A statistical technique that fits a straight line to a scatter diagram by finding the smallest sum of the vertical distances squared (i.e., ) of all the points from the straight line. The equation derived by this method will yield a regression line that best fits the data. To calculate the straight line by the least-squares method, the equation is used. We must first determine the constants, a and b, which are called regression coefficients. Regression coefficients are the values that represent the effect of the individual independent variables on the dependent variable.
26
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-26 or Least-Squares Method – cont’d
27
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-27 Standard Deviation of Regression The standard deviation of the Y values from the regression line ( ) is called the standard deviation of regression. It is also popularly called the standard of error of estimate, since it can be used to measure the error of the estimates of individual Y values based on the regression line. Thus = the standard deviation of Y values from the mean = the standard deviation of X values from the mean = the standard deviation of regression of Y values from = the standard deviation of regression of X values from
28
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-28 The standard deviation of Y values from the regression line is based on the points representing Y values scattered around the least-squares line. The closer the points to the line, the smaller the value of the standard deviation of regression. Thus, the estimates of Y values based on the line are more reliable. On the other hand, the wider the points are scattered around the least-squares line, the larger the standard deviation of regression and the smaller the reliability of the estimates based on the line or the regression equation. The general formula for the standard deviation of regression of Y values on X is where k = number of total (dependent and independent) variables. However, a simpler method of computing is to use the following formula Standard Deviation of Regression – cont’d
29
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-29 Correlation Analysis Correlation Analysis: Refers to the statistical techniques for measuring the closeness of the relationship between two metric (interval- or ratio- scaled) variables. It measures the degree to which changes in one variable are associated with changes in another. The computation concerning the degree of closeness is based on regression statistics.
30
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-30 Total Deviation, Coefficient of Determination, and Correlation Coefficient Total Deviation ( ). Assume there are two variables, X and Y. The mean of Y values = ( Y)/n,, is obtained without referring to X values. The, representing the regression line of Y values = a + bx, is obtained with the influence of X values. If Y values are related to X values to some degree, the deviations of Y values from must be reduced somewhat by the introduction of X values in computing values. The total deviation of Y from the mean is divided into two parts: Total deviation = Unexplained deviation + Explained deviation = +
31
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-31 The explained variation may also be referred to as the regression sum of squares (RSS). The unexplained variation is called the error sum of squares (ESS). This relationship may be expressed as Total variation= Unexplained variation + Explained variation TSS= ESS + RSS = + Total Deviation, Coefficient of Determination, and Correlation Coefficient – cont’d
32
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-32 Coefficient of Determination (r 2 ) The coefficient of determination ( r 2 ) is the strength of association or degree of closeness of the relationship between two variables measured by a relative value. It demonstrates how well the regression line fits the scattered points. It may be defined as the ratio of the explained variation to the total variation: Coefficient of determination = Explained variation =RSS Total variation TSS or symbolically, The range of the r 2 value is therefore from 0 to 1. When r 2 is close to 1, the Y values are very close to the regression line. When r 2 is close to 0, the Y values are not close to the regression line.
33
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-33 Correlation Coefficient The correlation coefficient, the square root of r 2 or is frequently computed to indicate the direction of the relationship in addition to indicating the degree of the relationship. It is the correlation between the observed and predicted values of the dependent variable. Since the range of r 2 is from 0 to 1, the coefficient of correlation r will vary within the range of to, or from 0 to +1.
34
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-34 Decision Time! As a marketing manager, you want information from marketing researchers that can enhance your decision- making abilities. If correlation analysis is a popular and informative statistical method, why should researchers bother using more complex, somewhat intimidating bivariate statistical techniques? Do you feel that there is really that much to gain from these methods? Why or why not?
35
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-35 Net Impact The Internet can be a valuable tool to learn about bivariate statistical techniques. Using almost any search engine, you can find a variety of discussions about the topic. These discussions may be available on the Internet as part of a company’s promotion of its statistical services, a university professor’s statistical seminar notes, or PowerPoint slides that were used in a seminar presentation.
36
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-36 Chapter 16 End of Presentation
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.