ERROR ENTROPY, CORRENTROPY AND M-ESTIMATION Weifeng Liu, P. P. Pokharel, J. C. Principe CNEL, University of Florida weifeng@cnel.ufl.edu Acknowledgment: This work was partially supported by NSF grant ECS-0300340 and ECS-0601271.
Outlines Maximization of correntropy criterion (MCC) Minimization of error entropy (MEE) Relation between MEE and MCC Minimization of error entropy with fiducial points Experiments
Supervised learning Desired signal D System output Y Error signal E
Supervised learning The goal in supervised training is to bring the system output ‘close’ to the desired signal. The concept of ‘close’, implicitly or explicitly employs a distance function or similarity measure. Equivalently, to minimize the error in some sense. For instance, MSE
Maximization of Correntropy Criterion Correntropy of the desired signal and the system output V(D,Y) is estimated by where
Correntropy induced metric Define satisfy the following properties: Non-negativity Identity of indiscernibles Symmetry Triangle inequality
CIM contours Contours of CIM(E,0) in 2D sample space close, like L2 norm Intermediate, like L1 norm far apart, saturates with large-value elements (direction sensitive)
MCC is minimization of CIM
MCC is M-estimation MCC where
Minimization of Error Entropy Renyi’s quadratic error entropy is estimated by Information Potential (IP)
Relation between MEE and MCC Define Construct
Relation between MEE and MCC
IP induced metric Define is a pseudo-metric. NO identity of indiscernibles.
IPM contours Contours of IPM(E,0) in 2D sample space valley along e1 = e2, not sensitive to the error mean saturates with points far from the valley
MEE and its equivalences
MEE is M-estimation Assume the error PDF with then
Nuisance of conventional MEE How to determine the location of the error PDF since it is shift-invariant. Conventionally by making the error mean equal to zero. In the case that the error PDF is non-symmetric or has heavy tails the estimation of error mean is problematic. Fixing the error peak at the origin is obviously better than the conventional method of shifting the error based on zero-mean.
ERROR ENTROPY WITH FIDUCIAL POINTS supervised training most of the errors equal to zero minimizes the error entropy with respect to 0 Denote E is the error vector and e0 serves a point of reference
ERROR ENTROPY WITH FIDUCIAL POINTS In general, we have
ERROR ENTROPY WITH FIDUCIAL POINTS λ is a weighting constant between 0 and 1 how many fiducial points at the origin λ =0 MEE λ =1 MCC 0 < λ < 1 Minimization of Error Entropy with Fiducial points (MEEF).
ERROR ENTROPY WITH FIDUCIAL POINTS MCC term locates the main peak of the error PDF and fixes it at the origin even in the cases where the estimation of the error mean is not robust Unifying two cost functions actually retains all the merits of being completely robust with outlier resistance and kernel size resilience.
Metric induced by MEEF directional sensitive Well-defined metric directional sensitive favor errors with the same sign penalize errors have different signs
Experiment 1: Robust regression X input variable f unknown function N noise Y observation Noise PDF
Regression results
Experiment 2: Chaotic signal prediction Mackey-Glass chaotic time series with parameter t=30 time delayed neural network (TDNN) 7 inputs, 14 hidden PEs tanh nonlinearity 1 linear output
Training error PDF
Conclusions Establish connections between MEE, distance function and M-estimation Theoretically explains the robustness of this family of cost functions Unify MEE and MCC in the framework of information theoretic models propose a new cost function—minimization of error entropy with fiducial points (MEEF) which solves the problem of MEE being shift-invariant in an elegant and robust way.