Presentation is loading. Please wait.

Presentation is loading. Please wait.

Model Building and Validation An overview using the discriminant analysis technique.

Similar presentations


Presentation on theme: "Model Building and Validation An overview using the discriminant analysis technique."— Presentation transcript:

1 Model Building and Validation An overview using the discriminant analysis technique

2 Assumption for this lecture There are several types of models, but this lecture assumes we are building one with a 2-valued dependent variable. –e.g. We want to predict who will respond to a mailing – dependent var. has two values – responders/non-responders. –e.g. Predict who is at risk for a heart attack – dependent variable is – had a heart attack/did not have a heart attack

3 What will it tell us? The model is built using past data to generate a score to predict the likelihood of something occurring or not. –(What is the probability that this person will respond to the mailing?)

4 The Modeling Process Sample Design Data Collection and Cleaning Sample selection Data aggregation Build Model Test the Model

5 Sample Design What data do you need? Where is it? How much is needed? What is the dependent variable?

6 Data Collection and Cleaning Read, validate data Deal with Missing values Delete unwanted records and variables.

7 Selecting a sample Choose a sample to analyze. For 0/1 regression (discriminant analysis equivalent) use approximately equal records of each type. Select twice the number you need to build the model, so you can set aside 50% of the data for validation.

8 Data Aggregation Data from multiple sources merged –This may occur as a first step before data cleaning, depending on the situation. New variables defined –(eg: ratio of satisfactory trades to total trades).

9 Model Building Break up each independent variable into classes. Each class should have roughly 2 to 10% of the observations. Run Crosstabs of each variable with the dependent variable. Redefine the independent variable as multiple dummy (0/1) variables. Run regression with the dummies.

10 Example: Data looks like this Bad/Good (Y) Age (X1)# Trades (X2) Ratio of Sat. trades to Total Trades (X3) 022535.5% 136886.8% 1457100% 069456.4%

11 It is transformed to look like this: Bad/Good (Y) Age01 (18 to 30) (X1) Age02 (40 to 55) (X2) Age03 (56+) (X3) 0100 1000 1010 0001

12 Model Building, contd. Eliminate variables that are not significant, until you have a model with variables that are significant and intuitively meaningful.

13 Testing the model Perform Kolmogorov-Smirnov (K-S Test) to test how well the model performs on: –The analysis sample –The validation sample –The total sample If it separates the 0 and the 1s well in each of the three cases, you have a good model.


Download ppt "Model Building and Validation An overview using the discriminant analysis technique."

Similar presentations


Ads by Google