Total Variation and Euler's Elastica for Supervised Learning Tong Lin, Hanlin Xue, Ling Wang, Hongbin Zha Contact: tonglin123@gmail.com Peking University, China 2012-6-29 Key Lab. Of Machine Perception, School of EECS, Peking University, China
Background Supervised Learning: Prior Work: Definition: Predict u: x -> y, with training data (x1, y1), …, (xN, yN) Two tasks: Classification and Regression Prior Work: SVM: RLS: Regularized Least Squares, Rifkin, 2002 Hinge loss: Squared loss:
Background Prior Work (Cont.): Laplacian Energy: “Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples,” Belkin et al., JMLR 7:2399-2434, 2006 Hessian Energy: “Semi-supervised Regression using Hessian Energy with an Application to Semi-supervised Dimensionality Reduction,” K.I. Kim, F. Steinke, M. Hein, NIPS 2009 GLS: “Classification using geometric level sets,” Varshney & Willsky, JMLR 11:491-516, 2010
Motivation SVM Our Proposed Method
3D display of the output classification function u(x) by the proposed EE model Large margin should not be the sole criterion; we argue sharper edges and smoother boundaries can play significant roles.
Models General: Laplacian Regularization (LR): Total Variation (TV): Euler’s Elastica (EE):
TV&EE in Image Processing TV: a measure of total quantity of the value change Image denoising (Rudin, Osher, Fatemi, 1992) Elastica was introduced by Euler in 1744 on modeling torsion-free elastic rods Image inpainting (Chan et al., 2002)
TV can preserve sharp edges, while EE can produce smooth boundaries For details, see T. Chan & J. Shen’s textbook: Image Processing and Analysis: Variational, PDE, Wavelet, and Stochastic Methods, SIAM, 2005
Decision boundary The mean curvature k in high dimensional space can have same expression except the constant 1/(d-1).
Framework
Energy Functional Minimization The calculus of variations → Euler-Lagrange PDE
a. Laplacian Regularization (LR) Solutions a. Laplacian Regularization (LR) Radial Basis Function Approximation b. TV & EE: We develop two solutions Gradient descent time marching (GD) Lagged linear equation iteration (LagLE)
Experiments: Two-Moon Data SVM EE Both methods can achieve 100% accuracies with different parameter combinations
Experiments: Binary Classification
Experiments: Multi-class Classification
Experiments: Multi-class Classification Note: Results of TV and EE are computed by the LagLE method.
Experiments: Regression
Conclusions End, thank you! Contributions: Future Work: Introduce TV&EE to the ML community Demonstrate the significance of curvature and gradient empirically Achieve superior performance for classification and regression Future Work: Hinge loss Other basis functions Extension to semi-supervised setting Existence and uniqueness of the PDE solutions Fast algorithm to reduce the running time End, thank you!