Download presentation
Presentation is loading. Please wait.
Published byArline Flynn Modified over 9 years ago
1
Industrial Data Modeling with DataModeler Mark Kotanchek Evolved Analytics mark@evolved-analytics.com
2
Wolfram Tech Conf 2006Evolved Analytics LLC2 Nonlinear Data Modeling: The Bottom Line The world is nonlinear People time is expensive Computing is cheap Life doesn’t have to be hard Success has been demonstrated in the real world. The caveat here is that we are (mostly) looking at response surface analysis and modeling of numerical data
3
Wolfram Tech Conf 2006Evolved Analytics LLC3 Symbolic Regression Algorithmic advances in recent years have resulted greater than three-order-of- magnitude speed improvement in symbolic regression via genetic programming relative to conventional GP This has been coupled with continuing improvements in compute hardware Furthermore, symbolic regression is naturally parallelizable Symbolic regression features most of the unique nonlinear capabilities The net result is that symbolic regression has moved into the forefront of nonlinear modeling technologies for us Symbolic Regression searches for both the expression structure as well as the associated coefficients which capture the data behavior
4
Wolfram Tech Conf 2006Evolved Analytics LLC4 What Makes DataModeler Special? Goal: To Dazzle & Delight Dazzle –Extract value out of data –Robustness of models –Provide insight & understanding in the process Delight –Ease & efficiency of model development –Model lifecycle management Automatic variable selection & variable transform identification Ability to handle ill- conditioned data sets System insight Problem insight Robust & accurate models Trust metric Modeling lifecycle tools
5
Wolfram Tech Conf 2006Evolved Analytics LLC5 Package Case Studies Distillation Column Quality Predictor –Large data set (skinny array: 6929 x 23 variables) –Multiple data sets (test, train, validate) –Ensemble of models –Potential pathologies Emissions Inferential Sensor –Handling correlated data sets Train/test: 251/107 x 8 –Looking at extrapolation Process Optimization Emulator –Working against designed data (320/275 x 10 + 5 response –Goal is to replace a 24 hour optimization Blown Film Process Effects –Interpreting research data –20 x 9 inputs, 21 responses –Applications into combinatorial chemistry Balancing Service Price –Fat array (298 x 48) –Handle correlated data –Identify driving variables
6
Wolfram Tech Conf 2006Evolved Analytics LLC6 Getting the Zen of the Data Context-free analysis leads to confidently wrong answers
7
Wolfram Tech Conf 2006Evolved Analytics LLC7 Evolving Models Models may be automatically archived For convenience, default option sets are defined Progress may be monitored at several levels
8
Wolfram Tech Conf 2006Evolved Analytics LLC8 Pareto Front & Modeling Potential Hard but potential Exploratory Run Useful??? Where is the knee?
9
Wolfram Tech Conf 2006Evolved Analytics LLC9 Driving Variables Notice the natural variable selection
10
Wolfram Tech Conf 2006Evolved Analytics LLC10 Selecting Models (ad hoc)
11
Wolfram Tech Conf 2006Evolved Analytics LLC11 Potential Pathologies?
12
Wolfram Tech Conf 2006Evolved Analytics LLC12 Model Performance Visually, our goal is minimal error with an even distribution of errors and no structural error problems as a function of variable value.
13
Wolfram Tech Conf 2006Evolved Analytics LLC13 Trust via Ensembles Models with independent error structures may be “stacked” with their consensus forming a trust metric Note that the models generally won’t be on the Pareto front Also note that for large data sets, the error residuals will be highly correlated (so we need a relaxed definition of uncorrelated)
14
Wolfram Tech Conf 2006Evolved Analytics LLC14 On the (near!) Horizon Implement a ConvertModelForExcel[ ] function Complete documentation & release package sale
15
Wolfram Tech Conf 2006Evolved Analytics LLC15 Symbolic Regression: Summary Benefits Compact Nonlinear Models –Compact empirical models can be suitable for online implementation –Model(s) can be used as an emulator for coarse system optimization Driving Variable Selection & Identification –Identified driving variables may be used as inputs into other modeling tools Models from Pathological Data Sets –Appropriate models may be developed from poorly structured data sets (too many variables & not enough measurements) Metasensor (Variable Transform) Identification –Identifying variable couplings can give insight into underlying physical mechanisms –Identified metavariables can enable linearizing transforms to meld symbolic regression and more traditional statistical analysis –Metavariables can also be used as inputs into other modeling tools Rapid Data Content Assessment –Examining the shape of the Pareto front allow us to quickly assess whether viable models can be developed from the available data Diverse Model Ensembles –The independent evolutions will produce independent models. Independent (but comparable) models may be stacked into ensembles whose divergence in prediction may be an indicator of extrapolation & model trustworthiness. This is an issue in high dimensional parameter spaces. Human Insight –The transparency of the evolved models as well as the explicit identification of the model complexity-accuracy trade-off is very compelling –Examining an expression can be viewed as a visualization technique for high-dimensional data Rapid Modeling –Exploitation of the Pareto front has resulted in several orders-of-magnitude in the symbolic regression performance relative to more traditional GP. This greatly increases the range of possible applications. There are many benefits to symbolic regression. These are enhanced when coupled with other analysis tools and techniques.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.