Predicting Individual Responses Using Multinomial Logit Analysis

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

Introduction to Neural Networks
NEURAL NETWORKS Biological analogy
Managerial Economics in a Global Economy
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Artificial Neural Networks (1)
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.
Neural Networks  A neural network is a network of simulated neurons that can be used to recognize instances of patterns. NNs learn by searching through.
Neural Network I Week 7 1. Team Homework Assignment #9 Read pp. 327 – 334 and the Week 7 slide. Design a neural network for XOR (Exclusive OR) Explore.
Consumer Behavior Prediction using Parametric and Nonparametric Methods Elena Eneva CALD Masters Presentation 19 August 2002 Advisors: Alan Montgomery,
Machine Learning Neural Networks
The Nature of Statistical Learning Theory by V. Vapnik
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
1 Artificial Neural Networks: An Introduction S. Bapi Raju Dept. of Computer and Information Sciences, University of Hyderabad.
Neural Networks Marco Loog.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Data Mining with Neural Networks (HK: Chapter 7.5)
Market Response Modeling
Neural Networks (NN) Ahmad Rawashdieh Sa’ad Haddad.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Rohit Ray ESE 251. What are Artificial Neural Networks? ANN are inspired by models of the biological nervous systems such as the brain Novel structure.
Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Lecture Presentation Software to accompany Investment Analysis and Portfolio Management Seventh Edition by Frank K. Reilly & Keith C. Brown Chapter 7.
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
Data Mining and Neural Networks Danny Leung CS157B, Spring 2006 Professor Sin-Min Lee.
IE 585 Introduction to Neural Networks. 2 Modeling Continuum Unarticulated Wisdom Articulated Qualitative Models Theoretic (First Principles) Models Empirical.
Explorations in Neural Networks Tianhui Cai Period 3.
Advanced information retreival Chapter 02: Modeling - Neural Network Model Neural Network Model.
Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates David Speights Senior Research Statistician HNC Insurance.
NEURAL NETWORKS FOR DATA MINING
Chapter 7 Neural Networks in Data Mining Automatic Model Building (Machine Learning) Artificial Intelligence.
For games. 1. Control  Controllers for robotic applications.  Robot’s sensory system provides inputs and output sends the responses to the robot’s motor.
Classification / Regression Neural Networks 2
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
BookBinders Book Club 1 1.
Neural Networks Automatic Model Building (Machine Learning) Artificial Intelligence.
Investment Analysis and Portfolio Management First Canadian Edition By Reilly, Brown, Hedges, Chang 6.
Modelling Language Evolution Lecture 1: Introduction to Learning Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.
So Far……  Clustering basics, necessity for clustering, Usage in various fields : engineering and industrial fields  Properties : hierarchical, flat,
Neural Networks II By Jinhwa Kim. 2 Neural Computing is a problem solving methodology that attempts to mimic how human brain function Artificial Neural.
1 CMSC 671 Fall 2001 Class #25-26 – Tuesday, November 27 / Thursday, November 29.
Neural Networks Steven Le. Overview Introduction Architectures Learning Techniques Advantages Applications.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
CS 478 – Tools for Machine Learning and Data Mining Perceptron.
Over-Trained Network Node Removal and Neurotransmitter-Inspired Artificial Neural Networks By: Kyle Wray.
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Dr.Abeer Mahmoud ARTIFICIAL INTELLIGENCE (CS 461D) Dr. Abeer Mahmoud Computer science Department Princess Nora University Faculty of Computer & Information.
EEE502 Pattern Recognition
Each neuron has a threshold value Each neuron has weighted inputs from other neurons The input signals form a weighted sum If the activation level exceeds.
CHEE825 Fall 2005J. McLellan1 Nonlinear Empirical Models.
LOAD FORECASTING. - ELECTRICAL LOAD FORECASTING IS THE ESTIMATION FOR FUTURE LOAD BY AN INDUSTRY OR UTILITY COMPANY - IT HAS MANY APPLICATIONS INCLUDING.
CPH Dr. Charnigo Chap. 11 Notes Figure 11.2 provides a diagram which shows, at a glance, what a neural network does. Inputs X 1, X 2,.., X P are.
Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
March 31, 2016Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I 1 … let us move on to… Artificial Neural Networks.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 6: Artificial Neural Networks for Data Mining.
Machine Learning Supervised Learning Classification and Regression
Neural networks.
Fall 2004 Perceptron CS478 - Machine Learning.
Advanced information retreival
Artificial Intelligence (CS 370D)
Neural Networks Dr. Peter Phillips.
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Machine Learning Today: Reading: Maria Florina Balcan
Chapter 3. Artificial Neural Networks - Introduction -
network of simple neuron-like computing elements
Artificial Intelligence Lecture No. 28
Presentation transcript:

Predicting Individual Responses Using Multinomial Logit Analysis Modeling an individual’s response to marketing effort The BookBinders Book Club case 59 59

The Logit Model The objective of the model is to predict the probabilities that an individual will choose each of several choice alternatives (e.g., buy versus not buy; Select from among three brands A, B, and C). The model has the following properties: The probabilities lie between 0 and 1, and sum to 1. The model is consistent with the proposition that customers pick the choice alternative that offer them the highest utility on a purchase occasion, but the utility has a random component that varies from one purchase occasion to the next. The model has the proportional draw property -- each choice alternative draws from other choice alternatives in proportion to their utility.

Technical Specification of the Multinomial Logit Model Individual i’s probability of choosing brand 1(Pi1) is given by: where Aij is the “attractiveness” of alternative j to customer i = å wk bijk k bijk is the value (observed or measured) of variable k (e.g., price) for alternative j when customer i made a purchase. Wk is the importance weight associated with variable k (estimated by the model) Similar equations can be specified for the probabilities that customer i will choose other alternatives.

Technical Specification of the Multinomial Logit Model On each purchase occasion, the (unobserved) utility that customer i gets from alternative j is given by: where ij is an error term. Notice that utility is the sum of an observable term (Aij) and an unobservable term (ij ).

Example: Choosing Among Three Brands bijk Brand Performance Quality Variety Value A 0.7 0.5 0.7 0.7 B 0.3 0.4 0.2 0. C 0.6 0.8 0.7 0.4 D (new) 0.6 0.4 0.8 0.5 Estimated Importance Weight (wk) 2.0 1.7 1.3 2.2

Example Computations (a) (b) (c) (d) (e) Share Share Brand Aij = wk bijk estimate estimate Draw without with (c)–(d) new brand new brand A 4.70 109.9 0.512 0.407 0.105 B 3.30 27.1 0.126 0.100 0.026 C 4.35 77.5 0.362 0.287 0.075 D 4.02 55.7 0.206

An Important Logit Model Implication High Marginal Impact of a Marketing Action ( ) Low 0.0 0.5 1.0 Probability of Choosing Alternative 1 ( )

Quote for the Day You will lose money sending a terrific piece of mail to a lousy list, but make money sending a lousy piece of mail to a terrific list! -- Direct mail lore

MNL Model of Response to Direct Mail Probability of function of (past response behavior, responding to = marketing effort, direct mail characteristics of solicitation customers)

BookBinders Book Club Case Predict response to a mailing for the “Art History of Florence” based on the following variables: Gender Amount Purchased Months since first purchase Months since last purchase Frequency of purchase Past purchases of art books Past purchases of children’s books Past purchases of cook books Past purchases of DIY books Past purchases of youth books

Scoring Using Current Industry Practice Dominant “Scoring Rule” used in the industry is the RFM (Recency, Frequency, and Monetary) model: Recency Last purchased in the past 3 months 25 points Last purchased in the past 3 - 6 months 20 Last purchased in the past 6 - 9 months 10 Last purchased in the past 12 - 18 months 5 Last purchased in the past 18 months 0 Come up with similar “scoring rules” for Frequency and Monetary. For each customer, add up his/her score on each of the components (recency, frequency, and monetary) to compute an overall score.

Scoring Based on Regression Regression Model: Pij = wo + wkbijk + ij where Pij is the probability that individual i will choose alternative j, wk are the regression coefficients and bijk are the independent variables described earlier. Note that Pij computed this way need not necessarily lie between 0 and 1.

Scoring Model using Artificial Neural Networks What is a neural network? Determinants of network properties Description of feed-forward network with back propagation Potential value of neural networks 5 5 5

Artificial Neural Networks An artificial neural network is a general response model that relates inputs (e.g., advertising) to outputs (e.g., product awareness). The modeler need not specify the functional form of this relationship. A neural net attempts to mimic how the human brain processes input information and consists of a richly interlinked set of simple processing mechanisms (nodes). 6 6 6

Characteristics of Biological Neural Networks Massively parallel Distributed representation and computation Learning ability Generalization ability Adaptivity Inherent contextual information Fault tolerance Low energy consumption 7 7 7

An Example Artificial Neural Network Inputs In humans: sensory data. In 4Thought: advertising, selling effort, price, etc. Neurons Outputs In humans: muscular reflexes. In 4Thought: sales model. “Synapses” 9 9 9

Determinants of the Behavior of Artificial Neural Network Network properties (depends on whether network is feedforward or feedback; number of nodes, number of layers in the network, and order of connections between nodes). Node properties (threshold, activation range, transfer function). System dynamics (initial weights, learning rule). 10 10 10

Processing Mechanism of Individual Neurons Each neuron converts input signals into an overall signal value by weighting and summing the incoming signals. Z = å Wi Xi i It transforms the overall signal value into an output signal (Y) using a “transfer function.” 11 11 11

Transfer Function Formulations Hard limiter (Y = 1 if Z T; else = 0) Sigmoidal (0 Y 1) 1 Y = g(Z) = –––––––– 1 + e–(Z–T) Tanh (–1 Y 1) Y = g(Z) = tanh (Z – T) 21 13 13 13

Role of Hidden Unit in a Two-Dimensional Input Space Description of decision regions Exclusive or Problem Classes with meshed regions General region shapes Structure Half plane bounded by hyperplane Single layer Arbitrary (complexity limited by number of hidden units) Two layer Arbitrary (complexity limited by number of hidden units) Three layer 12 12 12

System Dynamics (Learning Mechanism) Supervised learning using back propagation of errors. Goal of this process is to reduce the total error at output nodes: EP = å (tPk – OPk)2 k where: EP = error to be minimized; tPk = target value associated with the kth input values to the output nodes; OPk = Output of neural net as calculated from the current set of weights. 22 14 14 14

diL = g¢ (ZiL)[tiL – YiL] Error Propagation The error is calculated at each node for each input set k: The error at the output node is equal to diL = g¢ (ZiL)[tiL – YiL] where: TiL = Target value on the i-th output node (layer L of network); diL = Error to be back propagated from node i in layer L; g¢ = gradient of transfer function. 15 15 15

Error Propagation Error is propagated back as follows: dil = g¢ (Zil)[ å wijl+1 djl+1] j for l = (L–1), . . . 1. (Lth layer is output) The weights are then adjusted using an optimality rule (in conjunction with a learning rate) to minimize overall error EP. 16 16 16

So, What’s the Big Deal? With a sigmoidal transfer function and back propagation, the neural network can “learn” to represent any sampled function to any required degree of accuracy with a sufficient number of nodes and hidden layers. This allows us to capture underlying relationships without knowing the form of the relationship. 17 17 17

Some Successful Applications Recognizing handwritten characters (e.g., zip codes) Recognizing speech (e.g., Dragon’s Naturally Speaking software) Estimating response to direct mail operations 18 18 18

Predictions of Probability of Purchase RFM Model: Use computed score as a measure of probability of purchase. Regression: MNL: RFM and Regression models can be implemented in Excel. Also, all three scoring procedures for “probability of purchase” can be implemented in Excel.

Predictions of Probability of Purchase Neural Net: Use the 4Thought software to compute “choice probability.” Note, as in regression, these predictions need not necessarily lie between 0 and 1. Follow the tutorial closely in doing this exercise.

Scoring Customers for their Potential Profitability A B C D Average Customer Purchase Purchase Score Customer Probability Volume Margin = A ´ B ´ C 1 30% $31.00 0.70 6.51 2 2% $143.00 0.60 1.72 3 10% $54.00 0.67 3.62 4 5% $88.00 0.62 2.73 5 60% $20.00 0.58 6.96 6 22% $60.00 0.47 6.20 7 11% $77.00 0.38 3.22 8 13% $39.00 0.66 3.35 9 1% $184.00 0.56 1.03 10 4% $72.00 0.65 1.87 Average Expected Score per customer = 3.72

Develop Tables such as the Following (Example Shown for Mailing to the Top 60%

Summary of Coefficients

Economics of Mailings Note: If we mailed to everyone on the list, we can expect a response rate of 8.9%.