Consumer Behavior Prediction using Parametric and Nonparametric Methods Elena Eneva CALD Masters Presentation 19 August 2002 Advisors: Alan Montgomery,

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

Slides from: Doug Gray, David Poole
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Modular Neural Networks CPSC 533 Franco Lee Ian Ko.
Chapter 13 Multiple Regression
Neural Networks Basic concepts ArchitectureOperation.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Chapter 12 Multiple Regression
Speaker Adaptation for Vowel Classification
Neural Networks Marco Loog.
1 Ensembles of Nearest Neighbor Forecasts Dragomir Yankov, Eamonn Keogh Dept. of Computer Science & Eng. University of California Riverside Dennis DeCoste.
Bagging LING 572 Fei Xia 1/24/06. Ensemble methods So far, we have covered several learning methods: FSA, HMM, DT, DL, TBL. Question: how to improve results?
An Introduction To The Backpropagation Algorithm Who gets the credit?
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Multilayer feed-forward artificial neural networks for Class-modeling F. Marini, A. Magrì, R. Bucci Dept. of Chemistry - University of Rome “La Sapienza”
Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 1 Further advanced methods Chapter 17.
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
Radial Basis Function Networks
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Classification Part 3: Artificial Neural Networks
Multiple-Layer Networks and Backpropagation Algorithms
Cascade Correlation Architecture and Learning Algorithm for Neural Networks.
Integrating Neural Network and Genetic Algorithm to Solve Function Approximation Combined with Optimization Problem Term presentation for CSC7333 Machine.
A Neural Network MonteCarlo approach to nucleon Form Factors parametrization Paris, ° CLAS12 Europen Workshop In collaboration with: A. Bacchetta.
IE 585 Introduction to Neural Networks. 2 Modeling Continuum Unarticulated Wisdom Articulated Qualitative Models Theoretic (First Principles) Models Empirical.
Chapter 9 Neural Network.
Chapter 11 – Neural Networks COMP 540 4/17/2007 Derek Singer.
Outline What Neural Networks are and why they are desirable Historical background Applications Strengths neural networks and advantages Status N.N and.
Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates David Speights Senior Research Statistician HNC Insurance.
Predicting Consumer Choice Using Supermarket Scanner Data: Combining Parametric and Non-parametric Methods Elena Eneva April CALD.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Neural and Evolutionary Computing - Lecture 9 1 Evolutionary Neural Networks Design  Motivation  Evolutionary training  Evolutionary design of the architecture.
Ensemble Methods: Bagging and Boosting
Jennifer Lewis Priestley Presentation of “Assessment of Evaluation Methods for Prediction and Classification of Consumer Risk in the Credit Industry” co-authored.
Neural Networks Steven Le. Overview Introduction Architectures Learning Techniques Advantages Applications.
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Dr.Abeer Mahmoud ARTIFICIAL INTELLIGENCE (CS 461D) Dr. Abeer Mahmoud Computer science Department Princess Nora University Faculty of Computer & Information.
Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.
Consumer Behavior Prediction using Parametric and Nonparametric Methods Elena Eneva Carnegie Mellon University 25 November 2002
Machine Learning 5. Parametric Methods.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression.
CHEE825 Fall 2005J. McLellan1 Nonlinear Empirical Models.
BACKPROPAGATION (CONTINUED) Hidden unit transfer function usually sigmoid (s-shaped), a smooth curve. Limits the output (activation) unit between 0..1.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
CPH Dr. Charnigo Chap. 11 Notes Figure 11.2 provides a diagram which shows, at a glance, what a neural network does. Inputs X 1, X 2,.., X P are.
Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
A Presentation on Adaptive Neuro-Fuzzy Inference System using Particle Swarm Optimization and it’s Application By Sumanta Kundu (En.R.No.
An Introduction To The Backpropagation Algorithm.
A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning Supervised Learning Classification and Regression
Multiple-Layer Networks and Backpropagation Algorithms
Chapter 7. Classification and Prediction
Deep Feedforward Networks
Impact of Sales Promotions on When, What, and How Much to Buy
A Logit model of brand choice calibrated on scanner data
Neural Networks Advantages Criticism
Artificial Neural Network & Backpropagation Algorithm
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
An Introduction To The Backpropagation Algorithm
The use of Neural Networks to schedule flow-shop with dynamic job arrival ‘A Multi-Neural Network Learning for lot Sizing and Sequencing on a Flow-Shop’
Modeling IDS using hybrid intelligent systems
Presentation transcript:

Consumer Behavior Prediction using Parametric and Nonparametric Methods Elena Eneva CALD Masters Presentation 19 August 2002 Advisors: Alan Montgomery, Rich Caruana, Christos Faloutsos

Outline Introduction Data Economics Overview Baseline Models New Hybrid Models Results Conclusions and Future Work

Background Retail chains are aiming to customize prices in individual stores Pricing strategies should adapt to the neighborhood demand Stores can increase operating profit margins by 33% to 83%

Price Elasticity consumer’s response to price change inelasticelastic Q is quantity purchased P is price of product

Data Example

Data Example – Log Space

Assumptions Independence –Substitutes: fresh fruit, other juices –Other Stores Stationarity –Change over time –Holidays

“The” Model Category Price of Product 1 Price of Product 2 Price of Product 3 Price of Product N... “I know your customers” Predictor Quantity bought of Product 1... Quantity bought of Product 2 Quantity bought of Product 3 Quantity bought of Product N Need to multiply this across many stores, many categories. convert to ln spaceconvert to original space

Converting to Original Space

Existing Methods Traditionally – using parametric models (linear regression) Recently – using non-parametric models (neural networks)

Our Goal Advantage of LR: known functional form (linear in log space), extrapolation ability Advantage of NN: flexibility, accuracy robustness accuracy NN new LR Take Advantage: use the known functional form to bias the NN Build hybrid models from the baseline models

Datasets weekly store-level cash register data at the product level Chilled Orange Juice category 2 years 12 products 10 random stores selected

Evaluation Measure Root Mean Squared Error (RMS) the average deviation between the predicted quantity and the true quantity

Models Hybrids –Smart Prior –MultiTask Learning –Jumping Connections –Frozen Jumping Connections Baselines –Linear Regression –Neural Networks

Baselines Linear Regression Neural Networks

q is the quantity demanded p i is the price for the i th product K products overall The coefficients a and b i are determined by the condition that the sum of the square residuals is as small as possible. Linear Regression

Results RMS

Neural Networks generic nonlinear function approximators a collection of basic units (neurons), computing a (non)linear function of their input backpropagation

Neural Networks 1 hidden layer, 100 units, sigmoid activation function

Results RMS

Hybrids Smart Prior MultiTask Learning Jumping Connections Frozen Jumping Connections

Smart Prior Idea: start the NN at a “good” set of weights, help it start from a “smart” prior. Take this prior from the known “linearity” NN first trained on synthetic data generated by the LR model NN then trained on the real data

Smart Prior

Results RMS

Multitask Learning Idea: learning an additional related task in parallel, using a shared representation Adding the output of the LR model (built over the same inputs) as an extra output to the NN Make the net share its hidden nodes between both tasks Custom halting function Custom RMS function

MultiTask Learning

Results RMS

Jumping Connections Idea: fusing LR and NN change architecture add connections which “jump” over the hidden layer Gives the effect of simulating a LR and NN all together

Jumping Connections

Results RMS

Frozen Jumping Connections Idea: you have the linearity, now use it! same architecture as Jumping Connections, plus really emphasizing the linearity freeze the weights of the jumping layer, so the network can’t “forget” about the linearity

Frozen Jumping Connections

Results RMS

Models Hybrids –Smart Prior –MultiTask Learning –Jumping Connections –Frozen Jumping Connections Baselines: –Linear Regression –Neural Networks Combinations –Voting –Weighted Average

Combining Models Idea: Ensemble Learning Committee Voting – equal weights for each model’s prediction Weighted Average – optimal weights determined by a linear regression model 2 baseline and 3 hybrid models (Smart Prior, MultiTask Learning, Frozen Jumping Conections)

Committee Voting Average the predictions of the models

Results RMS

Weighted Average – Model Regression Linear regression on baselines and hybrid models to determine vote weights

Results RMS

Normalized RMS Error Compare model performance across stores Stores of different sizes, ages, locations, etc Need to normalize Compare to baselines Take the error of the LR benchmark as unit error

Normalized RMS Error

Conclusions Clearly improved models for customer choice prediction Will allow stores to price the products more strategically and optimize profits Maintain better inventories Understand product interaction

Future Work Ideas analyze Weighted Average model compare extrapolation ability of new models use other domain knowledge –shrinkage model – a “super” store model with data pooled across all stores

Acknowledgements I would like to thank my advisors and my CALDling friends and colleagues

The Most Important Slide for this presentation and the paper:

References Montgomery, A. (1997). Creating Micro- Marketing Pricing Strategies Using Supermarket Scanner Data West, P., Brockett, P. and Golden, L (1997) A Comparative Analysis of Neural Networks and Statistical Methods for Predicting Consumer Choice Guadagni, P. and Little, J. (1983) A Logit Model of Brand Choice Calibrated on Scanner data Rossi, P. and Allenby, G. (1993) A Bayesian Approach to Estimating Household Parameters

Error Measure – Unbiased Model Details which is an unbiased estimator for q. is a biased estimator for q, so we correct the bias by using by computing the integral over the distribution

On one hand… In log space, Price-Quantity relationship is fairly linear

On the other hand… the derivation of consumers' demand responses to price changes without the need to write down and rely upon particular mathematical models for demand