Presentation is loading. Please wait.

Presentation is loading. Please wait.

Quadratic Perceptron Learning with Applications

Similar presentations


Presentation on theme: "Quadratic Perceptron Learning with Applications"— Presentation transcript:

1 Quadratic Perceptron Learning with Applications
Tonghua Su National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences Beijing, PR China Dec 2, 2010

2 Outline Introduction Motivations Quadratic Perceptron Algorithm
Previous works Theory perspective Practical perspective Open issues Conclusions

3 1 Introduction Notation, binary classification, multi-class classification, large scale learning vs large category learning

4 Introduction Domain Set Label Set Training Data Binary Classification
e.g. linear model

5 Introduction Multi-class Classification Learning strategy
One vs one One vs all Single machine e.g. Linear model Large-category classification Chinese character recognition (3,755 classes) More confusions between classes

6 Introduction Large Scale Learning Large Category vs Large Scale
large numbers of data points high dimensions Challenge in computation resource Large Category vs Large Scale Almost certainly: large category  large scale Tradeoffs: efficiency vs accuracy

7 2 Motivations MQDF HMMs

8 Modified Quadratic Discriminant Function (MQDF)
MQDF [Kimura et al ‘1987] Using SVD, Truncate small eigenvalues

9 Modified Quadratic Discriminant Function (MQDF)
MQDF+MCE+Synthetic samples [Chen et al ‘2010] Building block: discriminative learning of MQDF

10 Hidden Markov Models (HMMs)
Markovian transition + state specific generator Continuous density HMMs: each state emits a GMM e.g. Usable in handwritten Chinese text recognition [Su ‘2007] F L 0.05 0.95 F L S E

11 Hidden Markov Models (HMMs)
Perceptron training of HMMs [Cheng et al ’2009] Joint distribution Discriminant function log p(s,x) Perceptron training Nonnegative-definite constraint Lack of theoretical foundation

12 3 Quadratic Perceptron Algorithm
Related works Theoretical considerations Practical considerations Open issues

13 Previous Works Rosenblatt’s Perceptron [Rosenblatt ’58] Updating rule:

14 Previous Works Rosenblatt’s Perceptron w0 wTx3y3=0 w2 _ + wTx2y2=0
Solution Region x2y2 w1 wTx4y4=0 _ x3y3 w2 + x2y2 w3 + _ x3y3 w4 wTx1y1=0 + _

15 Previous Works Rosenblatt’s Perceptron [Rosenblatt ’58]
View from batch loss where Using stochastic gradient decent (SGD) 立陶宛

16 Previous Works Convergence Theorem [Block ’62,Novikoff ’62]
Linearly separate data Stop at most (R/)2 steps

17 Previous Works Voted Perceptron [Freund ’99] Training algorithm
Prediction:

18 Previous Works Voted Perceptron Generalization bound

19 Previous Works Perceptron with Margin [Krauth ’87, Li ’2002]

20 Previous Works Ballseptron [Shalev-Shwartz ’2005]

21 Previous Works Perceptron with Unlearning [Panagiotakopoulos ’2010]

22 Theoretical Perspective
Prediction rule Learning 立陶宛 Lithuanian Dataset [,lɪθju'enɪə ]

23 Theoretical Perspective
Algorithm online version

24 Theoretical Perspective
Convergence Theorem of Quadratic Perceptron (quadratic separable)

25 Theoretical Perspective
Convergence Theorem of Quadratic Perceptron with Magin (quadratic separable)

26 Theoretical Perspective
Bounds for quadratic inseparable case

27 Theoretical Perspective
Generalization Bound

28 Theoretical Perspective
Nonnegative-definite constraints Projection to the valid space Restriction on updating Convergence holds

29 Theoretical Perspective
Toy problem: Lithuanian Dataset 4000 training instances 2000 test instances

30 Theoretical Perspective
Perceptron learning (toy problem)

31 Theoretical Perspective
Extension to Multi-class QDF

32 Theoretical Perspective
Extension to Multi-class QDF Theoretical property holds as binary QDF Proof can be completed using Kesler’s construction

33 Practical Perspective
Perceptron batch loss where SGD

34 Practical Perspective
Constant margin Dynamic margin

35 Practical Perspective
Experiments Benchmark on digit databases

36 Practical Perspective
Experiments Benchmark on digit databases grg on MNIST

37 Practical Perspective
Experiments Benchmark on digit databases grg on USPS

38 Practical Perspective
Experiments Effects of training size (grg on MNIST)

39 Practical Perspective
Experiments Benchmark on CASIA-HWDB1.1

40 Practical Perspective
Experiments Benchmark on CASIA-HWDB1.1

41 Open Issues Convergence on GMM/MQDF?
Error reduction on CASIA-DB1.1 is small How about adding more data ? Can label permutation help? Speedup the training process Evaluate on more datasets

42 4 Conclusions

43 Conclusions Theoretical foundation for QDF Perceptron learning of MQDF
Convergence Theorem Generalization Bound Perceptron learning of MQDF Margin is need for good generalization More data may help

44 Thank you!

45 References [Chen et al ‘2010] Xia Chen, Tong-Hua Su,Tian-Wen Zhang. Discriminative Training of MQDF Classifier on Synthetic Chinese String Samples, CCPR,2010 [Cheng et al ‘2009] C. Cheng, F. Sha, L. Saul. Matrix updates for perceptron training of continuous density hidden markov models, ICML, 2009. [Kimura ‘87] F. Kimura, K. Takashina, S. Tsuruoka, Y. Miyake. Modified quadratic discriminant functions and the application to Chinese character recognition, IEEE TPAMI, 9(1): , 1987. [Panagiotakopoulos ‘2010] C. Panagiotakopoulos, P. Tsampouka. The Margin Perceptron with Unlearning, ICML, 2010. [Krauth ‘87] W. Krauth and M. Mezard. Learning algorithms with optimal stability in neural networks. Journal of Physics A, 20, , 1987. [Li ‘2002] Yaoyong Li, Hugo Zaragoza, Ralf Herbrich, John Shawe-Taylor, Jaz Kandola. The Perceptron Algorithm with Uneven Margins, ICML, 2002.

46 References [Freund ‘99] Y. Freund and R. E. Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 37(3): , 1999. [Shalev-Shwartz ’2005] Shai Shalev-Shwartz, Yoram Singer. A New Perspective on an Old Perceptron Algorithm, COLT, 2005. [Novikoff ‘62] A. B. J. Novikoff. On convergence proofs on perceptrons. In Proc. Symp. Math. Theory Automata, Vol.12, pp. 615–622, 1962. [Rosenblatt ‘58] Rosenblatt, F. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65 (6):386–408, 1958. [Block ‘62] H.D. Block. The perceptron: A model for brain functioning, Reviews of Modern Phsics, 1962, 34:


Download ppt "Quadratic Perceptron Learning with Applications"

Similar presentations


Ads by Google