Regularization of Evolving Polynomial Models IWIM 2007 workshop, Sept. 23-26, 2007, Prague Regularization of Evolving Polynomial Models Pavel Kordík CTU Prague, Faculty of Electrical Engineering, Department of Computer Science
External criteria in GMDH theory
GAME model
Encoding into chromosomes
Polynomial units encoded into neurons
Data set division Adaptive division? 2/3 1/3 A B Training data Validation data Testing data Optimize coefficients (learning of units) Select surviving units Check if model overfits data
Fitness function – which units survive? RMS Error of the unit on the training data – feedback for optimization methods RMS Error of the unit on the validation data – used to compute fitness of units
External criterion Computation of error on validation set Fitness = 1/CR
CRrms-r-val criterion on real data Optimal value of R is 300 on the Antro data set
CRrms-r-val criterion on real data Optimal value of R is 725 on the Building data set
CR should be sensitive to noise
How to estimate the penalization strength (1/R)? Variance of the output variable?
Experiments with synthetic data Validate just on validation set! Regularization works, but the difference of R300 and p-n is not significant.
Theoretical and experimental aspects of regularization
So which criterion is the best?
Regularized polynomial models on Antro data set It is evident that optimal value of R is between 100 and 1000 – the same results as in our pervious experiments with Antro data set (Ropt=300). Linear models are still better than the best polynomial !!!
Conclusion Experiments with regularization of polynomial models Every data set requires different level of penalization for complexity It can be partially derived from the variance of the output variable The regularization is still not sufficient, linear models perform better on highly noisy data sets!
Thank you!