Download presentation
Presentation is loading. Please wait.
Published byLambert Bryan Modified over 9 years ago
1
Model Building and Validation An overview using the discriminant analysis technique
2
Assumption for this lecture There are several types of models, but this lecture assumes we are building one with a 2-valued dependent variable. –e.g. We want to predict who will respond to a mailing – dependent var. has two values – responders/non-responders. –e.g. Predict who is at risk for a heart attack – dependent variable is – had a heart attack/did not have a heart attack
3
What will it tell us? The model is built using past data to generate a score to predict the likelihood of something occurring or not. –(What is the probability that this person will respond to the mailing?)
4
The Modeling Process Sample Design Data Collection and Cleaning Sample selection Data aggregation Build Model Test the Model
5
Sample Design What data do you need? Where is it? How much is needed? What is the dependent variable?
6
Data Collection and Cleaning Read, validate data Deal with Missing values Delete unwanted records and variables.
7
Selecting a sample Choose a sample to analyze. For 0/1 regression (discriminant analysis equivalent) use approximately equal records of each type. Select twice the number you need to build the model, so you can set aside 50% of the data for validation.
8
Data Aggregation Data from multiple sources merged –This may occur as a first step before data cleaning, depending on the situation. New variables defined –(eg: ratio of satisfactory trades to total trades).
9
Model Building Break up each independent variable into classes. Each class should have roughly 2 to 10% of the observations. Run Crosstabs of each variable with the dependent variable. Redefine the independent variable as multiple dummy (0/1) variables. Run regression with the dummies.
10
Example: Data looks like this Bad/Good (Y) Age (X1)# Trades (X2) Ratio of Sat. trades to Total Trades (X3) 022535.5% 136886.8% 1457100% 069456.4%
11
It is transformed to look like this: Bad/Good (Y) Age01 (18 to 30) (X1) Age02 (40 to 55) (X2) Age03 (56+) (X3) 0100 1000 1010 0001
12
Model Building, contd. Eliminate variables that are not significant, until you have a model with variables that are significant and intuitively meaningful.
13
Testing the model Perform Kolmogorov-Smirnov (K-S Test) to test how well the model performs on: –The analysis sample –The validation sample –The total sample If it separates the 0 and the 1s well in each of the three cases, you have a good model.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.