Download presentation
Presentation is loading. Please wait.
Published byEric Walter Thomas Modified over 6 years ago
1
Is My Model Valid? Using Simulation to Understand Your Model and If It Can Accurately Predict Events
Brad Foulkes JMP Discovery Summit 2016
2
All Rights Reserved. No part of this document may be reproduced, transmitted, stored in a retrieval system nor translated into any human or computer language, in any form or by any means, electronic, mechanical, magnetic, optical, chemical, manual, or otherwise, without the prior written permission of the General Electric Company.
3
Agenda Types of models we’re talking about
How can we usually tell if a model is valid? Using survival models to predict an event How does simulation help? Building a simple script to simulate Let’s do an example
4
Types of models we’re talking about
Discrete events… i.e. will it or won’t it happen? Will a part fail at XYZ time? Will I roll a 7? Will I get a ticket? Will someone leave the company? Will the student graduate? Will I survive a heart attack?
5
Types of models we’re talking about
Weibull plot Survival, Reliability, Weibull Logistic regression Neural Net Decision tree / Bootstrap Forest Anything with a probability of occurrence
6
Types of models we’re talking about
Event prediction, also where there are distinct groups Anything with a probability of occurrence Survival models, logistic regression, bootstrap forest, etc… P(t) of occurrence Event
7
How can we usually tell if a model is valid?
Review goodness of fit, R-squared, AICc ROC curves Confusion matrix What is wrong with these? For some data sets… Results can be misleading Unsure of how far off the model is Can be off for unbalanced data sets ( i.e. low number of events) AUC = 0.987 Accuracy 92.7% Accuracy 91.2% … but missed most of the actual events
8
Using survival models to predict events
Start with a Weibull model Look at the risk at each event Event Time With no clear difference, is the model valid? For some groups, the model is good, others are outside the CBs Event Count Groupings of data
9
What is simulating an event
Probability of an event occurring When the randomness is less than the risk of it happening, it counts as an event Add up the number of events over several runs to figure out the average and standard deviation for the probability of an event Random Chance!
10
Reliability simulation modeling
Given a certain probability (0.25), sum the number of times the random value was less than that probability Then divide by the total number of iterations SimProb.jmp from sample data sets This works well for predicting and understanding 1 event, but what about multiple events? Or the same event on multiple units?
11
Simulating many units If 10 units with different probabilities are simulated, the same principles can be used to determine the number of failures Now, a mean and standard deviation can be found for the group What about running multiple groups?
12
Interpreting the results
Using the Western Electric (WeCo) Rules for SPC, you can determine if the data is outside the confidence of the model Example using Blenders.jmp, showing WeCo rules Ref:
13
Running the simulation - Normally
For loops can be used to reevaluate the number of events that occur 1. Set up a comparison column to identify “events” 2. Iterate to count the number of events each time 3. Append each trial to a summary table 4. Calculate the statistics on the entire data set For 10K iterations, this code creates 10K summary tables and runs the formulas 10K times
14
Vectorization – i.e. using matrices to speed up your code
From this…. … to this Vectorization just means instead of doing one at a time, run groups at the same time It can move the computational tough tasks to a group, so they happen less frequently In short, it’s just linear algebra A few references on the topic
15
Saving a vector into a column
To use a vector in a column, the data type needs to change to “Expression” Change the column type in the column viewer “Vector” type Or “None” type Either type will work in this situation Vector type is available in JMP13 An expression like this can then be used
16
The new code… 1. Set up a comparison column to identify “events” 2. Create new summary table 3. Create a subset of each group Roughly the same length of code, but much more efficient 4. Calculate & store the statistics on each group For 10K iterations, this code creates 1 table and runs the formulas once
17
Running the final script
All sorts of options can be added in… Sub-setting of data Run multiple different groupings Using only data after a certain date Enter a model, select a formula Conditional risk/reliability
18
An example…Worcester Heart Attack Study
If you came into the hospital with a heart attack, would you survive?
19
WHAS – Bootstrap Forest model
AUC=0.9 AUC=0.81 Confusion matrix seems off in predicting survivals
20
WHAS - Simulation
21
A few references The certified reliability engineer handbook, Benbow and Broome, 11/28/2008, ASQ Quality Press, pages
22
Summary Predicting events is tough Depending on the data set, it can be tough to know if the model is any good Probability simulation can help or at least provide another path to try Vectorization can speed up the simulation
23
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.