Quadratic Perceptron Learning with Applications

Slides:

Advertisements

Similar presentations

Introduction to Machine Learning Fall 2013 Perceptron (6) Prof. Koby Crammer Department of Electrical Engineering Technion 1.

Advertisements

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

On-line learning and Boosting

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Supervised Learning Recap

Chapter 9 Perceptrons and their generalizations. Rosenblatt ’ s perceptron Proofs of the theorem Method of stochastic approximation and sigmoid approximation.

Separating Hyperplanes

Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.

Linear Discriminant Functions

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

Support Vector Machines (and Kernel Methods in general)

Linear Learning Machines  Simplest case: the decision function is a hyperplane in input space.  The Perceptron Algorithm: Rosenblatt, 1956  An on-line.

1 PEGASOS Primal Efficient sub-GrAdient SOlver for SVM Shai Shalev-Shwartz Yoram Singer Nati Srebro The Hebrew University Jerusalem, Israel YASSO = Yet.

The Perceptron Algorithm (Primal Form) Repeat: until no mistakes made within the for loop return:. What is ?

Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Learning of Pseudo-Metrics. Slide 1 Online and Batch Learning of Pseudo-Metrics Shai Shalev-Shwartz Hebrew University, Jerusalem Joint work with Yoram.

Efficient and Numerically Stable Sparse Learning Sihong Xie 1, Wei Fan 2, Olivier Verscheure 2, and Jiangtao Ren 3 1 University of Illinois at Chicago,

Linear Learning Machines  Simplest case: the decision function is a hyperplane in input space.  The Perceptron Algorithm: Rosenblatt, 1956  An on-line.

Sample-Separation-Margin Based Minimum Classification Error Training of Pattern Classifiers with Quadratic Discriminant Functions Yongqiang Wang 1,2, Qiang.

Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic.

CS 4700: Foundations of Artificial Intelligence

Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.

Online Learning Algorithms

STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.

Outline Classification Linear classifiers Perceptron Multi-class classification Generative approach Naïve Bayes classifier 2.

Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.

Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.

1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.

Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.

A DISTRIBUTION BASED VIDEO REPRESENTATION FOR HUMAN ACTION RECOGNITION Yan Song, Sheng Tang, Yan-Tao Zheng, Tat-Seng Chua, Yongdong Zhang, Shouxun Lin.

Linear Classification with Perceptrons

Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical.

Online Learning Rong Jin. Batch Learning Given a collection of training examples D Learning a classification model from D What if training examples are.

Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr.

A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.

Smooth ε -Insensitive Regression by Loss Symmetrization Ofer Dekel, Shai Shalev-Shwartz, Yoram Singer School of Computer Science and Engineering The Hebrew.

Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.

Linear machines márc Decison surfaces We focus now on the decision surfaces Linear machines = linear decision surface Non-optimal solution but.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

CS 9633 Machine Learning Support Vector Machines

Multilayer Perceptrons

Learning Recommender Systems with Adaptive Regularization

Lecture 07: Soft-margin SVM

Restricted Boltzmann Machines for Classification

Importance Weighted Active Learning

An Introduction to Support Vector Machines

Boosting Nearest-Neighbor Classifier for Character Recognition

Linear machines 28/02/2017.

Learning with information of features

A New Boosting Algorithm Using Input-Dependent Regularizer

Lecture 07: Soft-margin SVM

Artificial Intelligence Chapter 3 Neural Networks

PEGASOS Primal Estimated sub-GrAdient Solver for SVM

[Figure taken from googleblog

Lecture 08: Soft-margin SVM

Lecture 07: Soft-margin SVM

Pattern Recognition and Machine Learning

CS480/680: Intro to ML Lecture 01: Perceptron 9/11/18 Yao-Liang Yu.

Artificial Intelligence Chapter 3 Neural Networks

The loss function, the normal equation,

Mathematical Foundations of BME Reza Shadmehr

Artificial Intelligence Chapter 3 Neural Networks

A Dynamic System Analysis of Simultaneous Recurrent Neural Network

Using Uneven Margins SVM and Perceptron for IE

Artificial Intelligence Chapter 3 Neural Networks

Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks

A task of induction to find patterns

Learning and Memorization

Discriminative Training

Artificial Intelligence Chapter 3 Neural Networks

Presentation transcript:

Quadratic Perceptron Learning with Applications Tonghua Su National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences Beijing, PR China Dec 2, 2010

Outline Introduction Motivations Quadratic Perceptron Algorithm Previous works Theory perspective Practical perspective Open issues Conclusions

1 Introduction Notation, binary classification, multi-class classification, large scale learning vs large category learning

Introduction Domain Set Label Set Training Data Binary Classification e.g. linear model

Introduction Multi-class Classification Learning strategy One vs one One vs all Single machine e.g. Linear model Large-category classification Chinese character recognition (3,755 classes) More confusions between classes

Introduction Large Scale Learning Large Category vs Large Scale large numbers of data points high dimensions Challenge in computation resource Large Category vs Large Scale Almost certainly: large category  large scale Tradeoffs: efficiency vs accuracy

2 Motivations MQDF HMMs

Modified Quadratic Discriminant Function (MQDF) MQDF [Kimura et al ‘1987] Using SVD, Truncate small eigenvalues

Modified Quadratic Discriminant Function (MQDF) MQDF+MCE+Synthetic samples [Chen et al ‘2010] Building block: discriminative learning of MQDF

Hidden Markov Models (HMMs) Markovian transition + state specific generator Continuous density HMMs: each state emits a GMM e.g. Usable in handwritten Chinese text recognition [Su ‘2007] F L 0.05 0.95 F L S E

Hidden Markov Models (HMMs) Perceptron training of HMMs [Cheng et al ’2009] Joint distribution Discriminant function log p(s,x) Perceptron training Nonnegative-definite constraint Lack of theoretical foundation ‘

3 Quadratic Perceptron Algorithm Related works Theoretical considerations Practical considerations Open issues

Previous Works Rosenblatt’s Perceptron [Rosenblatt ’58] Updating rule:

Previous Works Rosenblatt’s Perceptron w0 wTx3y3=0 w2 _ + wTx2y2=0 Solution Region x2y2 w1 wTx4y4=0 _ x3y3 w2 + x2y2 w3 + _ x3y3 w4 wTx1y1=0 + _

Previous Works Rosenblatt’s Perceptron [Rosenblatt ’58] View from batch loss where Using stochastic gradient decent (SGD) 立陶宛

Previous Works Convergence Theorem [Block ’62,Novikoff ’62] Linearly separate data Stop at most (R/)2 steps

Previous Works Voted Perceptron [Freund ’99] Training algorithm Prediction:

Previous Works Voted Perceptron Generalization bound

Previous Works Perceptron with Margin [Krauth ’87, Li ’2002]

Previous Works Ballseptron [Shalev-Shwartz ’2005]

Previous Works Perceptron with Unlearning [Panagiotakopoulos ’2010]

Theoretical Perspective Prediction rule Learning 立陶宛 Lithuanian Dataset [,lɪθju'enɪə ]

Theoretical Perspective Algorithm online version

Theoretical Perspective Convergence Theorem of Quadratic Perceptron (quadratic separable)

Theoretical Perspective Convergence Theorem of Quadratic Perceptron with Magin (quadratic separable)

Theoretical Perspective Bounds for quadratic inseparable case

Theoretical Perspective Generalization Bound

Theoretical Perspective Nonnegative-definite constraints Projection to the valid space Restriction on updating Convergence holds

Theoretical Perspective Toy problem: Lithuanian Dataset 4000 training instances 2000 test instances

Theoretical Perspective Perceptron learning (toy problem)

Theoretical Perspective Extension to Multi-class QDF

Theoretical Perspective Extension to Multi-class QDF Theoretical property holds as binary QDF Proof can be completed using Kesler’s construction

Practical Perspective Perceptron batch loss where SGD

Practical Perspective Constant margin Dynamic margin

Practical Perspective Experiments Benchmark on digit databases

Practical Perspective Experiments Benchmark on digit databases grg on MNIST

Practical Perspective Experiments Benchmark on digit databases grg on USPS

Practical Perspective Experiments Effects of training size (grg on MNIST)

Practical Perspective Experiments Benchmark on CASIA-HWDB1.1

Practical Perspective Experiments Benchmark on CASIA-HWDB1.1

Open Issues Convergence on GMM/MQDF? Error reduction on CASIA-DB1.1 is small How about adding more data ? Can label permutation help? Speedup the training process Evaluate on more datasets

4 Conclusions

Conclusions Theoretical foundation for QDF Perceptron learning of MQDF Convergence Theorem Generalization Bound Perceptron learning of MQDF Margin is need for good generalization More data may help

Thank you!

References [Chen et al ‘2010] Xia Chen, Tong-Hua Su,Tian-Wen Zhang. Discriminative Training of MQDF Classifier on Synthetic Chinese String Samples, CCPR,2010 [Cheng et al ‘2009] C. Cheng, F. Sha, L. Saul. Matrix updates for perceptron training of continuous density hidden markov models, ICML, 2009. [Kimura ‘87] F. Kimura, K. Takashina, S. Tsuruoka, Y. Miyake. Modified quadratic discriminant functions and the application to Chinese character recognition, IEEE TPAMI, 9(1): 149-153, 1987. [Panagiotakopoulos ‘2010] C. Panagiotakopoulos, P. Tsampouka. The Margin Perceptron with Unlearning, ICML, 2010. [Krauth ‘87] W. Krauth and M. Mezard. Learning algorithms with optimal stability in neural networks. Journal of Physics A, 20, 745-752, 1987. [Li ‘2002] Yaoyong Li, Hugo Zaragoza, Ralf Herbrich, John Shawe-Taylor, Jaz Kandola. The Perceptron Algorithm with Uneven Margins, ICML, 2002.

References [Freund ‘99] Y. Freund and R. E. Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 37(3): 277-296, 1999. [Shalev-Shwartz ’2005] Shai Shalev-Shwartz, Yoram Singer. A New Perspective on an Old Perceptron Algorithm, COLT, 2005. [Novikoff ‘62] A. B. J. Novikoff. On convergence proofs on perceptrons. In Proc. Symp. Math. Theory Automata, Vol.12, pp. 615–622, 1962. [Rosenblatt ‘58] Rosenblatt, F. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65 (6):386–408, 1958. [Block ‘62] H.D. Block. The perceptron: A model for brain functioning, Reviews of Modern Phsics, 1962, 34:123-135.