Forecasting Choices
Types of Variable Variable Quantitative Qualitative Continuous Discrete (counting) Ordinal Nominal
Nominal or Ordinal Dependent Variable Indicating “choices” of a decision maker, say a consumer. Response categories: –Mutually exclusive –Collectively exhaustive –Finite Number Desired regression outputs –Probability that the d.m. chooses each category –Coefficient of each independent variable
Generalized Linear Models (GLM) Regression model for a continuous Y: Y = 0 + 1 X 1 + 2 X 2 + e e following N(0, ) GLM Formulation: 1.Model for Y: Y is N( , ) 2.Link Function (model for the predictors) = 0 + 1 X 1 + 2 X 2
Estimation of Parameters of GLM Maximum Likelihood Estimation –For normal Y, MLE is the LS estimation Maximize: –Sum of log (likelihood function), L i of each observation
MLE for Regression Model Y is N( , ) MLE: Maximize
GLM for Binary Dependent Variable, Y Model for response: Y is B (n, ) Model for predictors (Link Function) logit( 0 + 1 X 1 + 2 X 2 +… K X K = g Probability exp(g) / (1+exp(g))
X : Covariates Independent variables are often referred to as “covariates.” Example: –SPSS binary logistic regression routine –SPSS multinomial logistic regression routine
A. Logistic Regression For Ungrouped Data (n i =1) Model of Observation for the i-th observation Y i = 1: Choose category 1with probability i Y i = 0: Choose category 2with probability 1- i Log Likelihood Function for the i-th observation
MLE Maximize:
Setting Up a Worksheet for MLE Define an array for storing parameters of the link function. Enter an initial estimate for each parameter. Then for each observation: Sum the likelihood and invoke the solver to maximize by changing the parameters. Multiply –2 to the maximized value for test of significance of the regression Link Function, g i Parameters of the Likelihood ln(Likelihood) L i
Test of Significance Hypotheses: H 0 : 1 = 2 …. = 0 H 1 : At least one j = 0 Test statistic: The Distribution Under H 0 : (DF = K)
Standard Errors of Logistic Regression Coefficients (optional) Estimate of Information Matrix, I (K=2)
Deviance Residuals and Deviance for Logistic Regression (Optional) Deviance (corresponds to SSE) Deviance Residual
B. Logistic Regression for Grouped Data Using WLS The observation for the i-th group: ->
WLS for Logistic Regression Regress: on X 1i, …, X Ki with
WLS for Unequal Variance Data X Y * * * * * 1 2 Observation 2 is subject to a larger variance than observation 1. So, it makes sense to give a lower weight. In WLS, the weight is proportional to 1/variance.
Modeling of Forecasting Choices - GLM 1.Model for Observation of the Dependent Variable. A probability distribution Link Function (Model for Independent Variables) A mathematical function
Forecasting Choices # of Choices 2 Binomial Distr. > 2 Multinomial Distr. UnorderedOrdered
Multinomial Logit Regression Multinomial Choice (m=3), Ungrouped Data: –Y 1 =1: Choose category 1with probability –Y 1 =0: Choose category 2 or 3with probability 1- –Y 2 =1: Choose category 2with probability –Y 2 =0: Choose category 1 or 3with probability 1- –Y 3 =1: Choose category 3with probability –Y 3 =0: Choose category 1 or 2with probability 1-
Log Likelihood Function Log Likelihood Function of the i-th ungrouped observation MLE: Maximize
Y 3 and 3 can be omitted Multinomial Choice (m=3), Ungrouped Data: –Y 1 =1: Choose category 1with probability –Y 1 =0: Choose category 2 or 3with probability 1- –Y 2 =1: Choose category 2with probability –Y 2 =0: Choose category 1 or 3with probability 1-
Log Likelihood Function Log Likelihood Function of the i-th (ungrouped) observation MLE: Maximize
1: Formulating “Link” Functions: Unordered Choice Categories Category 3 as the baseline category.
From Link Functions to Probabilities
Test of Significance Hypotheses: H 0 : 11 = 21 = … K1 = 12 = 22 = … K2 = 0 H 1 : At least one ij = 0 Test statistic The Distribution Under H 0 : (DF = 2 K)
Interpreting Coefficients Not easy, as a change of probability for one category affects probabilities for other (two) categories.
11 22 2: Formulating Link Functions: Ordered Choice Categories Underlying Variable Defining Categories Category 1Category 2Category 3
Choices for Probability Distribution of U a. Ordered Probit Model for the i-th DM U i = follows N( i, =1) b. Ordered Logit Model for the i-th DM U i follows Logistic Distribution( i ) i = 1 X 1i + 2 X 2i (no const)
a. Ordered Probit Model
b. Ordered Logit Model
Types of Variable Variable Quantitative Qualitative Continuous Discrete (counting) Ordinal Nominal
Poisson Regression for Counting Model of observations for Y Link Function Log Likelihood Function