Multinomial Logistic Regression Basic Relationships

Slides:



Advertisements
Similar presentations
Continued Psy 524 Ainsworth
Advertisements

The SPSS Sample Problem
5/15/2015Slide 1 SOLVING THE PROBLEM The one sample t-test compares two values for the population mean of a single variable. The two-sample test of a population.
Strategy for Complete Regression Analysis
Assumption of normality
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
Outliers Split-sample Validation
Chi-square Test of Independence
Outliers Split-sample Validation
Principal component analysis
Discriminant Analysis – Basic Relationships
Multiple Regression – Assumptions and Outliers
Strategy for Complete Discriminant Analysis
An Introduction to Logistic Regression
Multiple Regression – Basic Relationships
Multinomial Logistic Regression Basic Relationships
Assumption of Homoscedasticity
Standard Binary Logistic Regression
SW388R6 Data Analysis and Computers I Slide 1 One-sample T-test of a Population Mean Confidence Intervals for a Population Mean.
Logistic Regression – Basic Relationships
Logistic Regression – Complete Problems
SW388R7 Data Analysis & Computers II Slide 1 Assumption of normality Transformations Assumption of normality script Practice problems.
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Basic Relationships Purpose of multiple regression Different types of multiple regression.
Correlation Question 1 This question asks you to use the Pearson correlation coefficient to measure the association between [educ4] and [empstat]. However,
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Split Sample Validation General criteria for split sample validation Sample problems.
Assumption of linearity
SW388R7 Data Analysis & Computers II Slide 1 Discriminant Analysis – Basic Relationships Discriminant Functions and Scores Describing Relationships Classification.
SW388R6 Data Analysis and Computers I Slide 1 Chi-square Test of Goodness-of-Fit Key Points for the Statistical Test Sample Homework Problem Solving the.
8/15/2015Slide 1 The only legitimate mathematical operation that we can use with a variable that we treat as categorical is to count the number of cases.
Stepwise Binary Logistic Regression
An Illustrative Example of Logistic Regression
8/20/2015Slide 1 SOLVING THE PROBLEM The two-sample t-test compare the means for two groups on a single variable. the The paired t-test compares the means.
SW388R7 Data Analysis & Computers II Slide 1 Logistic Regression – Hierarchical Entry of Variables Sample Problem Steps in Solving Problems.
8/23/2015Slide 1 The introductory statement in the question indicates: The data set to use: GSS2000R.SAV The task to accomplish: a one-sample test of a.
Selecting the Correct Statistical Test
SW388R7 Data Analysis & Computers II Slide 1 Assumption of Homoscedasticity Homoscedasticity (aka homogeneity or uniformity of variance) Transformations.
Hierarchical Binary Logistic Regression
Stepwise Multiple Regression
SW388R7 Data Analysis & Computers II Slide 1 Multinomial Logistic Regression: Complete Problems Outliers and Influential Cases Split-sample Validation.
Slide 1 The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics.
Slide 1 Hierarchical Multiple Regression. Slide 2 Differences between standard and hierarchical multiple regression  Standard multiple regression is.
SW388R7 Data Analysis & Computers II Slide 1 Logistic Regression – Hierarchical Entry of Variables Sample Problem Steps in Solving Problems Homework Problems.
SW388R6 Data Analysis and Computers I Slide 1 Independent Samples T-Test of Population Means Key Points about Statistical Test Sample Homework Problem.
SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample.
6/4/2016Slide 1 The one sample t-test compares two values for the population mean of a single variable. The two-sample t-test of population means (aka.
SW388R6 Data Analysis and Computers I Slide 1 Multiple Regression Key Points about Multiple Regression Sample Homework Problem Solving the Problem with.
11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the.
Chi-square Test of Independence
SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample.
Slide 1 The Kleinbaum Sample Problem This problem comes from an example in the text: David G. Kleinbaum. Logistic Regression: A Self-Learning Text. New.
SW388R7 Data Analysis & Computers II Slide 1 Detecting Outliers Detecting univariate outliers Detecting multivariate outliers.
12/23/2015Slide 1 The chi-square test of independence is one of the most frequently used hypothesis tests in the social sciences because it can be used.
1/5/2016Slide 1 We will use a one-sample test of proportions to test whether or not our sample proportion supports the population proportion from which.
SW388R7 Data Analysis & Computers II Slide 1 Principal component analysis Strategy for solving problems Sample problem Steps in principal component analysis.
Nonparametric Statistics
(Slides not created solely by me – the internet is a wonderful tool) SW388R7 Data Analysis & Compute rs II Slide 1.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
SW388R7 Data Analysis & Computers II Slide 1 Strategy for Complete discriminant Analysis Assumption of normality, linearity, and homogeneity Outliers Multicollinearity.
SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption.
Nonparametric Statistics
BINARY LOGISTIC REGRESSION
Logistic Regression APKC – STATS AFAC (2016).
Assumption of normality
Multiple logistic regression
Nonparametric Statistics
Discriminant Analysis – Basic Relationships
Multiple Regression – Split Sample Validation
Multinomial Logistic Regression: Complete Problems
Presentation transcript:

Multinomial Logistic Regression Basic Relationships Describing Relationships Classification Accuracy Sample Problem Steps in Solving Problems

Multinomial logistic regression Multinomial logistic regression is used to analyze relationships between a non-metric dependent variable and metric or dichotomous independent variables. Multinomial logistic regression compares multiple groups through a combination of binary logistic regressions. The group comparisons are equivalent to the comparisons for a dummy-coded dependent variable, with the group with the highest numeric score used as the reference group. For example, if we wanted to study differences in BSW, MSW, and PhD students using multinomial logistic regression, the analysis would compare BSW students to PhD students and MSW students to PhD students. For each independent variable, there would be two comparisons.

What multinomial logistic regression predicts Multinomial logistic regression provides a set of coefficients for each of the two comparisons. The coefficients for the reference group are all zeros, similar to the coefficients for the reference group for a dummy-coded variable. Thus, there are three equations, one for each of the groups defined by the dependent variable. The three equations can be used to compute the probability that a subject is a member of each of the three groups. A case is predicted to belong to the group associated with the highest probability. Predicted group membership can be compared to actual group membership to obtain a measure of classification accuracy.

Level of measurement requirements Multinomial logistic regression analysis requires that the dependent variable be non-metric. Dichotomous, nominal, and ordinal variables satisfy the level of measurement requirement. Multinomial logistic regression analysis requires that the independent variables be metric or dichotomous. Since SPSS will automatically dummy-code nominal level variables, they can be included since they will be dichotomized in the analysis. In SPSS, non-metric independent variables are included as “factors.” SPSS will dummy-code non-metric IVs. In SPSS, metric independent variables are included as “covariates.” If an independent variable is ordinal, we will attach the usual caution.

Assumptions and outliers Multinomial logistic regression does not make any assumptions of normality, linearity, and homogeneity of variance for the independent variables. Because it does not impose these requirements, it is preferred to discriminant analysis when the data does not satisfy these assumptions. SPSS does not compute any diagnostic statistics for outliers. To evaluate outliers, the advice is to run multiple binary logistic regressions and use those results to test the exclusion of outliers.

Sample size requirements The minimum number of cases per independent variable is 10, using a guideline provided by Hosmer and Lemeshow, authors of Applied Logistic Regression, one of the main resources for Logistic Regression. For preferred case-to-variable ratios, we will use 20 to 1.

Methods for including variables Beginning with version 13, SPSS supports stepwise entry of variables, as well as simultaneous or direct entry. In previous versions, the only method for selecting independent variables in SPSS is simultaneous or direct entry.

Overall test of relationship - 1 The overall test of relationship among the independent variables and groups defined by the dependent is based on the reduction in the likelihood values for a model which does not contain any independent variables and the model that contains the independent variables. This difference in likelihood follows a chi-square distribution, and is referred to as the model chi-square. The significance test for the final model chi-square (after the independent variables have been added) is our statistical evidence of the presence of a relationship between the dependent variable and the combination of the independent variables.

Overall test of relationship - 2 The presence of a relationship between the dependent variable and combination of independent variables is based on the statistical significance of the final model chi-square in the SPSS table titled "Model Fitting Information". In this analysis, the probability of the model chi-square (18.457) was 0.005, less than or equal to the level of significance of 0.05. The null hypothesis that there was no difference between the model without independent variables and the model with independent variables was rejected. The existence of a relationship between the independent variables and the dependent variable was supported.

Strength of multinomial logistic regression relationship While multinomial logistic regression does compute correlation measures to estimate the strength of the relationship (pseudo R square measures, such as Nagelkerke's R²), these correlations measures do not really tell us much about the accuracy or errors associated with the model. A more useful measure to assess the utility of a multinomial logistic regression model is classification accuracy, which compares predicted group membership based on the logistic model to the actual, known group membership, which is the value for the dependent variable.

Evaluating usefulness for logistic models The benchmark that we will use to characterize a multinomial logistic regression model as useful is a 25% improvement over the rate of accuracy achievable by chance alone. Even if the independent variables had no relationship to the groups defined by the dependent variable, we would still expect to be correct in our predictions of group membership some percentage of the time. This is referred to as by chance accuracy. The estimate of by chance accuracy that we will use is the proportional by chance accuracy rate, computed by summing the squared percentage of cases in each group. The only difference between by chance accuracy for binary logistic models and by chance accuracy for multinomial logistic models is the number of groups defined by the dependent variable.

Computing by chance accuracy The percentage of cases in each group defined by the dependent variable is found in the ‘Case Processing Summary’ table. The proportional by chance accuracy rate was computed by calculating the proportion of cases for each group based on the number of cases in each group in the 'Case Processing Summary', and then squaring and summing the proportion of cases in each group (0.371² + 0.557² + 0.072² = 0.453). The proportional by chance accuracy criteria is 56.6% (1.25 x 45.3% = 56.6%).

Comparing accuracy rates To characterize our model as useful, we compare the overall percentage accuracy rate produced by SPSS at the last step in which variables are entered to 25% more than the proportional by chance accuracy. (Note: SPSS does not compute a cross-validated accuracy rate for multinomial logistic regression .) The classification accuracy rate was 60.5% which was greater than or equal to the proportional by chance accuracy criteria of 56.6% (1.25 x 45.3% = 56.6%). The criteria for classification accuracy is satisfied in this example.

Numerical problems The maximum likelihood method used to calculate multinomial logistic regression is an iterative fitting process that attempts to cycle through repetitions to find an answer. Sometimes, the method will break down and not be able to converge or find an answer. Sometimes the method will produce wildly improbable results, reporting that a one-unit change in an independent variable increases the odds of the modeled event by hundreds of thousands or millions. These implausible results can be produced by multicollinearity, categories of predictors having no cases or zero cells, and complete separation whereby the two groups are perfectly separated by the scores on one or more independent variables. The clue that we have numerical problems and should not interpret the results are standard errors for some independent variables that are larger than 2.0.

Relationship of individual independent variables and the dependent variable There are two types of tests for individual independent variables: The likelihood ratio test evaluates the overall relationship between an independent variable and the dependent variable The Wald test evaluates whether or not the independent variable is statistically significant in differentiating between the two groups in each of the embedded binary logistic comparisons. If an independent variable has an overall relationship to the dependent variable, it might or might not be statistically significant in differentiating between pairs of groups defined by the dependent variable.

Relationship of individual independent variables and the dependent variable The interpretation for an independent variable focuses on its ability to distinguish between pairs of groups and the contribution which it makes to changing the odds of being in one dependent variable group rather than the other. We should not interpret the significance of an independent variable’s role in distinguishing between pairs of groups unless the independent variable also has an overall relationship to the dependent variable in the likelihood ratio test. The interpretation of an independent variable’s role in differentiating dependent variable groups is the same as we used in binary logistic regression. The difference in multinomial logistic regression is that we can have multiple interpretations for an independent variable in relation to different pairs of groups.

Relationship of individual independent variables and the dependent variable SPSS identifies the comparisons it makes for groups defined by the dependent variable in the table of ‘Parameter Estimates,’ using either the value codes or the value labels, depending on the options settings for pivot table labeling. The reference category is identified in the footnote to the table. In this analysis, two comparisons will be made: the TOO LITTLE group (coded 1, shaded blue) will be compared to the TOO MUCH group (coded 3, shaded purple) the ABOUT RIGHT group (coded 2 , shaded orange)) will be compared to the TOO MUCH group (coded 3, shaded purple). The reference category plays the same role in multinomial logistic regression that it plays in the dummy-coding of a nominal variable: it is the category that would be coded with zeros for all of the dummy-coded variables that all other categories are interpreted against.

Relationship of individual independent variables and the dependent variable In this example, there is a statistically significant relationship between the independent variable CONLEGIS and the dependent variable. (0.010 < 0.05) As well, the independent variable CONLEGIS is significant in distinguishing both category 1 of the dependent variable from category 3 of the dependent variable. (0.027 < 0.05) And the independent variable CONLEGIS is significant in distinguishing category 2 of the dependent variable from category 3 of the dependent variable. (0.007 < 0.05)

Interpreting relationship of individual independent variables to the dependent variable Survey respondents who had greater confidence in congress (higher values correspond to greater confidence) were less likely to be in the group of survey respondents who thought we spend too little money on highways and bridges (DV category 1), rather than the group of survey respondents who thought we spend too much money on highways and bridges (DV category 3). For each unit increase in confidence in Congress, the odds of being in the group of survey respondents who thought we spend too little money on highways and bridges decreased by 74.7%. (0.253 – 1.0 = -0.747)

Interpreting relationship of individual independent variables to the dependent variable Survey respondents who had greater confidence in congress (higher values correspond to greater confidence) were less likely to be in the group of survey respondents who thought we spend about the right amount of money on highways and bridges (DV category 2), rather than the group of survey respondents who thought we spend too much money on highways and bridges (DV Category 3). For each unit increase in confidence in Congress, the odds of being in the group of survey respondents who thought we spend about the right amount of money on highways and bridges decreased by 80.9%. (0.191 – 1.0 = 0.809)

Relationship of individual independent variables and the dependent variable In this example, there is a statistically significant relationship between SEX and the dependent variable, spending on childcare assistance. As well, SEX plays a statistically significant role in differentiating the TOO LITTLE group from the TOO MUCH (reference) group. (0.007 < 0.5) However, SEX does not differentiate the ABOUT RIGHT group from the TOO MUCH (reference) group.(0.51 > 0.5)

Interpreting relationship of individual independent variables and the dependent variable Survey respondents who were male (code 1 for sex) were less likely to be in the group of survey respondents who thought we spend too little money on childcare assistance (DV category 1), rather than the group of survey respondents who thought we spend too much money on childcare assistance (DV category 3). Survey respondents who were male were 88.5% less likely (0.115 – 1.0 = -0.885) to be in the group of survey respondents who thought we spend too little money on childcare assistance.

Interpreting relationships for independent variables in problems In the multinomial logistic regression problems, the problem statement will ask about only one of the independent variables. The answer will be true or false based on only the stated relationship between the specified independent variable and the dependent variable. The individual relationships between other independent variables and the dependent variable are not incorporated in the determination of whether or not the answer is true or false.

Level of Measurement - question The first question requires us to examine the level of measurement requirements for multinomial regression. Multinomial logistic regression requires that the dependent variable be non-metric and the independent variables be metric or dichotomous.

Level of Measurement – evidence and answer True with caution is the correct answer, since we satisfy the level of measurement requirements, but include ordinal level variables in the analysis.

Sample Size - question The second question asks about the sample size requirements for the multinomial regression. To answer this question, we will run the a baseline logistic regression to obtain some basic data about the problem and solution. The phrase “simultaneous entry” dictates the method for including variables in the model.

Request multinomial logistic regression Select the Regression | Multinomial Logistic… command from the Analyze menu.

Selecting the dependent variable First, highlight the dependent variable natroad in the list of variables. Second, click on the right arrow button to move the dependent variable to the Dependent text box.

Selecting metric independent variables Metric independent variables are specified as covariates in multinomial logistic regression. Metric variables can be either interval or, by convention, ordinal. Move the metric independent variables, age, educ and conlegis to the Covariate(s) list box. In this analysis, there are no non-metric independent variables. Non-metric independent variables would be moved to the Factor(s) list box.

Specifying statistics to include in the output While we will accept most of the SPSS defaults for the analysis, we need to specifically request the classification table. Click on the Statistics… button to make a request.

Requesting the classification table Third, click on the Continue button to complete the request. First, keep the SPSS defaults for Model and Parameters. Second, mark the checkbox for the Classification table.

Completing the multinomial logistic regression request Click on the OK button to request the output for the multinomial logistic regression. The multinomial logistic procedure supports additional commands to specify the model computed for the relationships (we will use the default main effects model), additional specifications for computing the regression, and saving classification results. We will not make use of these options.

Sample size – ratio of cases to variables evidence and answer Multinomial logistic regression requires that the minimum ratio of valid cases to independent variables be at least 10 to 1. The ratio of valid cases (167) to number of independent variables (3) was 55.7 to 1, which was equal to or greater than the minimum ratio. The requirement for a minimum ratio of cases to independent variables was satisfied. The preferred ratio of valid cases to independent variables is 20 to 1. The ratio of 55.7 to 1 was equal to or greater than the preferred ratio. The preferred ratio of cases to independent variables was satisfied. The answer to this question is true.

Multicollinearity and Numerical Problems - question Multicollinearity in the logistic regression solution is detected by examining the standard errors for the b coefficients. A standard error larger than 2.0 indicates numerical problems, such as multicollinearity among the independent variables, cells with a zero count for a dummy-coded independent variable because all of the subjects have the same value for the variable, and 'complete separation' whereby the two groups in the dependent event variable can be perfectly separated by scores on one of the independent variables. Analyses that indicate numerical problems should not be interpreted.

Multicollinearity and Numerical Problems – evidence and answer None of the independent variables in this analysis had a standard error larger than 2.0. (We are not interested in the standard errors associated with the intercept.) The answer to this question is true.

Overall Relationship - question The presence of a relationship between the dependent variable and combination of independent variables is based on the statistical significance of the final model chi-square in the SPSS table titled 'Model Fitting Information'.

Overall Relationship – evidence and answer In this analysis, the probability of the model chi-square (18.457) was p=0.005, less than or equal to the level of significance of 0.05. The null hypothesis that there was no difference between the model without independent variables and the model with independent variables was rejected. The existence of a relationship between the independent variables and the dependent variable was supported. The answer to this question is true with caution. Caution in interpreting the relationship should be exercised because of the ordinal level variable "confidence in Congress" [conlegis] was treated as metric.

Individual Relationships – Age question The statistical significance of the relationship between age and opinion about spending on highways and bridges is based on the statistical significance of the chi-square statistic in the SPSS table titled "Likelihood Ratio Tests" and the interpretation of the odds ratio.

Individual Relationships – Age evidence and answer The statistical significance of the relationship between confidence in Congress and opinion about spending on highways and bridges is based on the statistical significance of the chi-square statistic in the SPSS table titled "Likelihood Ratio Tests". The likelihood ratio test of the relationship between "age" and "opinion about spending on highways and bridges" did not support the existence of a relationship (chi-square=2.652, p=0.265). False is the correct answer to this question.

Individual Relationships – highest year of school question The statistical significance of the relationship between highest year of school completed and opinion about spending on highways and bridges is based on the statistical significance of the chi-square statistic in the SPSS table titled "Likelihood Ratio Tests" and the interpretation of the odds ratio.

Individual Relationships – highest year of school evidence and answer The statistical significance of the relationship between confidence in Congress and opinion about spending on highways and bridges is based on the statistical significance of the chi-square statistic in the SPSS table titled "Likelihood Ratio Tests". The likelihood ratio test of the relationship between "highest year of school completed" and "opinion about spending on highways and bridges" did not support the existence of a relationship (chi-square=4.423, p=0.110). False is the correct answer to this question.

Individual Relationships – confidence in Congress question The statistical significance of the relationship between confidence in Congress and opinion about spending on highways and bridges is based on the statistical significance of the chi-square statistic in the SPSS table titled "Likelihood Ratio Tests" and the interpretation of the odds ratio.

Individual Relationships – confidence in Congress evidence and answer - 1 The statistical significance of the relationship between confidence in Congress and opinion about spending on highways and bridges is based on the statistical significance of the chi-square statistic in the SPSS table titled "Likelihood Ratio Tests". For this relationship, the probability of the chi-square statistic (9.221) was 0.010, less than or equal to the level of significance of 0.05. The null hypothesis that all of the b coefficients associated with confidence in Congress were equal to zero was rejected. The existence of a relationship between confidence in Congress and opinion about spending on highways and bridges was supported.

Individual Relationships – confidence in Congress evidence and answer - 2 In the comparison of survey respondents who thought we spend too little money on highways and bridges to survey respondents who thought we spend too much money on highways and bridges, the probability of the Wald statistic (4.913) for the variable confidence in Congress [conlegis] was 0.027. Since the probability was less than or equal to the level of significance of 0.05, the null hypothesis that the b coefficient for confidence in Congress was equal to zero for this comparison was rejected.

Individual Relationships – confidence in Congress evidence and answer - 3 The value of Exp(B) was 3.948 which implies that for each unit increase in confidence in Congress the odds increased by approximately four times. The relationship stated in the problem is supported. Survey respondents who had more confidence in congress were more likely to be in the group of survey respondents who thought we spend too little money on highways and bridges, rather than the group of survey respondents who thought we spend too much money on highways and bridges. For each unit increase in confidence in Congress, the odds of being in the group of survey respondents who thought we spend too little money on highways and bridges increased by approximately four times.

Individual Relationships – confidence in Congress evidence and answer - 4 In the comparison of survey respondents who thought we spend about the right amount of money on highways and bridges to survey respondents who thought we spend too much money on highways and bridges, the probability of the Wald statistic (7.298) for the variable confidence in Congress [conlegis] was 0.007. Since the probability was less than or equal to the level of significance of 0.05, the null hypothesis that the b coefficient for confidence in Congress was equal to zero for this comparison was rejected.

Individual Relationships – confidence in Congress evidence and answer - 5 The value of Exp(B) was 5.242 which implies that for each unit increase in confidence in Congress the odds increased by approximately five and a quarter times. The relationship stated in the problem is supported. Survey respondents who had more confidence in congress were more likely to be in the group of survey respondents who thought we spend about the right amount of money on highways and bridges, rather than the group of survey respondents who thought we spend too much money on highways and bridges. For each unit increase in confidence in Congress, the odds of being in the group of survey respondents who thought we spend about the right amount of money on highways and bridges increased by approximately five and a quarter times.

Individual Relationships – confidence in Congress evidence and answer - 6 True with caution is the correct answer to this question. Caution in interpreting the relationship should be exercised because of the ordinal level variable "confidence in Congress" [conlegis] was treated as metric.

Classification Accuracy - question The independent variables could be characterized as useful predictors distinguishing survey respondents who thought we spend too little money on highways and bridges, survey respondents who thought we spend about the right amount of money on highways and bridges and survey respondents who thought we spend too much money on highways and bridges if the classification accuracy rate was substantially higher than the accuracy attainable by chance alone.

Classification Accuracy – evidence and answer 1 The proportional by chance accuracy rate was computed by calculating the proportion of cases for each group based on the number of cases in each group in the 'Case Processing Summary', and then squaring and summing the proportion of cases in each group (0.371² + 0.557² + 0.072² = 0.453).

Classification Accuracy – evidence and answer 2 The classification accuracy rate was 60.5% which was greater than or equal to the proportional by chance accuracy criteria of 56.6% (1.25 x 45.3% = 56.6%). The criteria for classification accuracy is satisfied. True is the correct answer to this question.

Steps in solving multinomial logistic regression problems: level of measurement Question: Variables included in the analysis satisfy the level of measurement requirements? Dependent non-metric? Independent variables metric or dichotomous? No Inappropriate application of a statistic Yes Ordinal independent variable included in analysis? Yes True with caution No True

Steps in solving multinomial logistic regression problems: sample size Question: Number of variables and cases satisfy sample size requirements? Run multinomial logistic regression Ratio of cases to independent variables at least 10 to 1? No Inappropriate application of a statistic Yes Yes Ratio of cases to independent variables at least 20 to 1? No True with caution Yes Yes True

Steps in solving multinomial logistic regression problems: multicollinearity/numerical problems Question: no evidence of multicollinearity or numerical problems? Standard errors of coefficients indicate presence of numerical problems (s.e. > 2.0)? Yes False If numerical problem found, halt analysis until problem is resolved. No True

Steps in solving multinomial logistic regression problems: overall relationship Question: overall relationship between independent variables and dependent variable? Overall relationship statistically significant? (model chi-square test) No False Yes Caution for ordinal variable or sample size not meeting preferred requirements? Yes True with caution No True

Steps in solving multinomial logistic regression problems: relationships between IV's and DV Question: Interpretation of relationship between independent variable and dependent variable groups? Overall relationship between specific IV and DV is statistically significant? (likelihood ratio test) No False Yes Role of specific IV and DV groups statistically significant and interpreted correctly? (Wald test and Exp(B)) No False Yes Ordinal independent variable or sample size less than preferred requirements? Yes True with caution No True

Steps in solving multinomial logistic regression problems: classification accuracy Question: Classification accuracy sufficient to be characterized as a useful model? Overall accuracy rate is 25% > than proportional by chance accuracy rate? No False Yes Yes True