ERROR ENTROPY, CORRENTROPY AND M-ESTIMATION

Slides:



Advertisements
Similar presentations
ECG Signal processing (2)
Advertisements

Neural networks Introduction Fitting neural networks
Neural Networks and SVM Stat 600. Neural Networks History: started in the 50s and peaked in the 90s Idea: learning the way the brain does. Numerous applications.
Lecture 13 L1 , L∞ Norm Problems and Linear Programming
Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.
Pattern Recognition and Machine Learning: Kernel Methods.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Support Vector Machines
Pattern Recognition and Machine Learning
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Goals of Adaptive Signal Processing Design algorithms that learn from training data Algorithms must have good properties: attain good solutions, simple.
Support Vector Machine Regression for Volatile Stock Market Prediction Haiqin Yang, Laiwan Chan, and Irwin King Department of Computer Science and Engineering.
Chapter 6: Multilayer Neural Networks
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Sketching for M-Estimators: A Unified Approach to Robust Regression Kenneth Clarkson David Woodruff IBM Almaden.
Introduction to Linear Regression and Correlation Analysis
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Radial Basis Function Networks
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.
Correntropy as a similarity measure Weifeng Liu, P. P. Pokharel, Jose Principe Computational NeuroEngineering Laboratory University of Florida
Information Theoretic Signal Processing and Machine Learning Jose C. Principe Computational NeuroEngineering Laboratory Electrical and Computer Engineering.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
1 RECENT DEVELOPMENTS IN MULTILAYER PERCEPTRON NEURAL NETWORKS Walter H. Delashmit Lockheed Martin Missiles and Fire Control Dallas, TX 75265
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
10/18/ Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,
1. 2  A Hilbert space H is a real or complex inner product space that is also a complete metric space with respect to the distance function induced.
CS 478 – Tools for Machine Learning and Data Mining SVM.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Insight: Steal from Existing Supervised Learning Methods! Training = {X,Y} Error = target output – actual output.
Chapter 8: Adaptive Networks
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
LMS Algorithm in a Reproducing Kernel Hilbert Space Weifeng Liu, P. P. Pokharel, J. C. Principe Computational NeuroEngineering Laboratory, University of.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
(COEN507) LECTURE III SLIDES By M. Abdullahi
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Computacion Inteligente Least-Square Methods for System Identification.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
A few illustrations on the Basic Concepts of Nonlinear Control
The simple linear regression model and parameter estimation
CS 9633 Machine Learning Support Vector Machines
Chapter 7. Classification and Prediction
DEEP LEARNING BOOK CHAPTER to CHAPTER 6
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
Deep Feedforward Networks
LECTURE 11: Advanced Discriminant Analysis
Tirza Routtenberg Dept. of ECE, Ben-Gurion University of the Negev
Outlier Processing via L1-Principal Subspaces
LECTURE 28: NEURAL NETWORKS
Data Mining Lecture 11.
Final Year Project Presentation --- Magic Paint Face
Statistical Learning Dong Liu Dept. EEIS, USTC.
Learning with information of features
Mixture Density Networks
Neuro-Computing Lecture 4 Radial Basis Function Network
10701 / Machine Learning Today: - Cross validation,
Connecting Data with Domain Knowledge in Neural Networks -- Use Deep learning in Conventional problems Lizhong Zheng.
Identification of Wiener models using support vector regression
LECTURE 28: NEURAL NETWORKS
Product moment correlation
Support Vector Machines and Kernels
Recursively Adapted Radial Basis Function Networks and its Relationship to Resource Allocating Networks and Online Kernel Learning Weifeng Liu, Puskal.
NONLINEAR AND ADAPTIVE SIGNAL ESTIMATION
August 8, 2006 Danny Budik, Itamar Elhanany Machine Intelligence Lab
NONLINEAR AND ADAPTIVE SIGNAL ESTIMATION
Presentation transcript:

ERROR ENTROPY, CORRENTROPY AND M-ESTIMATION Weifeng Liu, P. P. Pokharel, J. C. Principe CNEL, University of Florida weifeng@cnel.ufl.edu Acknowledgment: This work was partially supported by NSF grant ECS-0300340 and ECS-0601271.

Outlines Maximization of correntropy criterion (MCC) Minimization of error entropy (MEE) Relation between MEE and MCC Minimization of error entropy with fiducial points Experiments

Supervised learning Desired signal D System output Y Error signal E

Supervised learning The goal in supervised training is to bring the system output ‘close’ to the desired signal. The concept of ‘close’, implicitly or explicitly employs a distance function or similarity measure. Equivalently, to minimize the error in some sense. For instance, MSE

Maximization of Correntropy Criterion Correntropy of the desired signal and the system output V(D,Y) is estimated by where

Correntropy induced metric Define satisfy the following properties: Non-negativity Identity of indiscernibles Symmetry Triangle inequality

CIM contours Contours of CIM(E,0) in 2D sample space close, like L2 norm Intermediate, like L1 norm far apart, saturates with large-value elements (direction sensitive)

MCC is minimization of CIM  

MCC is M-estimation MCC   where

Minimization of Error Entropy Renyi’s quadratic error entropy is estimated by Information Potential (IP)

Relation between MEE and MCC Define Construct

Relation between MEE and MCC

IP induced metric Define is a pseudo-metric. NO identity of indiscernibles.

IPM contours Contours of IPM(E,0) in 2D sample space valley along e1 = e2, not sensitive to the error mean saturates with points far from the valley

MEE and its equivalences    

MEE is M-estimation Assume the error PDF with then

Nuisance of conventional MEE How to determine the location of the error PDF since it is shift-invariant. Conventionally by making the error mean equal to zero. In the case that the error PDF is non-symmetric or has heavy tails the estimation of error mean is problematic. Fixing the error peak at the origin is obviously better than the conventional method of shifting the error based on zero-mean.

ERROR ENTROPY WITH FIDUCIAL POINTS supervised training  most of the errors equal to zero minimizes the error entropy with respect to 0 Denote E is the error vector and e0 serves a point of reference

ERROR ENTROPY WITH FIDUCIAL POINTS In general, we have

ERROR ENTROPY WITH FIDUCIAL POINTS λ is a weighting constant between 0 and 1 how many fiducial points at the origin λ =0  MEE λ =1  MCC 0 < λ < 1  Minimization of Error Entropy with Fiducial points (MEEF).

ERROR ENTROPY WITH FIDUCIAL POINTS MCC term locates the main peak of the error PDF and fixes it at the origin even in the cases where the estimation of the error mean is not robust Unifying two cost functions actually retains all the merits of being completely robust with outlier resistance and kernel size resilience.

Metric induced by MEEF directional sensitive Well-defined metric directional sensitive favor errors with the same sign penalize errors have different signs

Experiment 1: Robust regression X input variable f unknown function N noise Y observation Noise PDF

Regression results

Experiment 2: Chaotic signal prediction Mackey-Glass chaotic time series with parameter t=30 time delayed neural network (TDNN) 7 inputs, 14 hidden PEs tanh nonlinearity 1 linear output

Training error PDF

Conclusions Establish connections between MEE, distance function and M-estimation Theoretically explains the robustness of this family of cost functions Unify MEE and MCC in the framework of information theoretic models propose a new cost function—minimization of error entropy with fiducial points (MEEF) which solves the problem of MEE being shift-invariant in an elegant and robust way.