Download presentation
Presentation is loading. Please wait.
Published byCameron Dawson Modified over 9 years ago
1
1 Statistical Mechanics of Online Learning for Ensemble Teachers Seiji Miyoshi Masato Okada Kobe City College of Tech. Univ. of Tokyo, RIKEN BSI
2
2 S U M M A R Y We analyze the generalization performance of a student in a model composed of linear perceptrons: a true teacher, K teachers, and the student. Calculating the generalization error of the student analytically using statistical mechanics in the framework of on-line learning, we prove that when the learning rate satisfies η 1, the properties are completely reversed. If the variety of the K teachers is rich enough, the direction cosine between the true teacher and the student becomes unity in the limit of η→0 and K→∞.
3
3 B A C K G R O U N D (1/2) Batch learning –given examples are used more than once –student becomes to give correct answers for all examples –long time and large memory On-line learning –examples once used are discarded –cannot give correct answers for all examples used in training –large memory is not necessary –it is possible to follow a time variant teacher
4
4 B A C K G R O U N D (2/2) P U R P O S E In most cases in an actual human society, a student can observe examples from two or more teachers who differ from each other. To analyze generalization performance of a model composed of a student, a true teacher and K teachers (ensemble teachers) who exist around the true teacher To discuss the relationship between the number, the variety of ensemble teachers and the generalization error
5
5 M O D E L (1/4) True teacher Student J learns B 1,B 2, ・・・ in turn. J can not learn A directly. A, B 1,B 2, ・・・,J are linear perceptrons with noises. Ensemble teachers
6
6 M O D E L (2/4) Output of true teacher Outputs of ensemble teachers Output of student Linear perceptronGaussian noise Linear perceptrons Linear perceptron Gaussian noises Gaussian noise
7
7 M O D E L (3/4) Inputs: Initial value of student: True teacher: Ensemble teachers: N→∞ (Thermodynamic limit) Order parameters –Length of student –Direction cosines
8
8 M O D E L (4/4) fkmfkm Gradient method Squared errors Student learns K ensemble teachers in turn.
9
9 GENERALIZATION ERROR A goal of statistical learning theory is to obtain generalization error theoretically. Generalization error = mean of errors over the distribution of new input Error Multiple Gaussian Distribution
10
10 Differential equations, which describe the dynamical behaviors of order parameters, have been obtained based on self-averaging in the thermodynamic limits as follows: J m+1 = J m + f k m x m + Nr J m+1 = Nr J m + f k m y m Ndt inputs A is multiplied to both side of Nr J m+2 = Nr J m+1 + f k m+1 y m+1 Nr J m+Ndt = Nr J m+Ndt-1 + f k m+Ndt-1 y m+Ndt-1 1. To simplify the analysis, the following auxiliary order parameters are introduced: 2. 3.
11
11 Simultaneous differential equations in deterministic forms, which describe dynamical behaviors of order parameters
12
12 Analytical solutions of order parameters
13
13 Dynamical behaviors of generalization error, R and l ( η=0.3, K=3, R B =0.7, σ A 2 =0.0, σ B 2 =0.1, σ J 2 =0.2 ) Student becomes cleverer than a member of ensemble teachers. The larger the variety of the ensemble teachers is, the nearer the student and true teacher are. Student Ensemble teachers
14
14 Steady state analysis ( t → ∞ ) ・ If η <0 or η >2 ・ If 0< η <2 Generalization error and length of student diverge. If η <1, the more teachers exist or the richer the variety of teachers is, the cleverer the student can become. If η >1, the fewer teachers exist or the poorer the variety of teachers is, the cleverer the student can become.
15
15 Steady value of generalization error, R and l ( K=3, R B =0.7, σ A 2 =0.0, σ B 2 =0.1, σ J 2 =0.2 ) Rich variety is good !Poor variety is good !
16
16 Steady value of generalization error, R and l ( q=0.49, R B =0.7, σ A 2 =0.0, σ B 2 =0.1, σ J 2 =0.2 ) Many teachers are good !Few teachers are good !
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.