1 A Statistical Mechanical Analysis of Online Learning: Seiji MIYOSHI Kobe City College of Technology

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Data-Assimilation Research Centre
CHE 185 – PROCESS CONTROL AND DYNAMICS
Use of Kalman filters in time and frequency analysis John Davis 1st May 2011.
Hazırlayan NEURAL NETWORKS Least Squares Estimation PROF. DR. YUSUF OYSAL.
A Statistical Mechanical Analysis of Online Learning: Can Student be more Clever than Teacher ? Seiji MIYOSHI Kobe City College of Technology
1 Statistical Mechanics of Online Learning for Ensemble Teachers Seiji Miyoshi Masato Okada Kobe City College of Tech. Univ. of Tokyo, RIKEN BSI.
Classification Neural Networks 1
Markov processes in a problem of the Caspian sea level forecasting Mikhail V. Bolgov Water Problem Institute of Russian Academy of Sciences.
Chapter 3 Steady-State Conduction Multiple Dimensions
1 On-Line Learning with Recycled Examples: A Cavity Analysis Peixun Luo and K. Y. Michael Wong Hong Kong University of Science and Technology.
NNs Adaline 1 Neural Networks - Adaline L. Manevitz.
Estimation from Samples Find a likely range of values for a population parameter (e.g. average, %) Find a likely range of values for a population parameter.
1 MECH 221 FLUID MECHANICS (Fall 06/07) Tutorial 6 FLUID KINETMATICS.
Goals of Adaptive Signal Processing Design algorithms that learn from training data Algorithms must have good properties: attain good solutions, simple.
T T07-01 Sample Size Effect – Normal Distribution Purpose Allows the analyst to analyze the effect that sample size has on a sampling distribution.
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
Compton Scattering Reporter: Hui He. Outline Theory Experimental setup Experimental results (1) Calibration (2) Angular Distribution of the 137 Cs Source.
1 Analysis of Ensemble Learning using Simple Perceptrons based on Online Learning Theory Seiji MIYOSHI 1 Kazuyuki HARA 2 Masato OKADA 3,4,5 1 Kobe City.
Section 8.3 – Systems of Linear Equations - Determinants Using Determinants to Solve Systems of Equations A determinant is a value that is obtained from.
Neural Networks Lecture 8: Two simple learning algorithms
Analysis of Simulation Results Andy Wang CIS Computer Systems Performance Analysis.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Natural Gradient Works Efficiently in Learning S Amari (Fri) Computational Modeling of Intelligence Summarized by Joon Shik Kim.
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
Associative Memory by Recurrent Neural Networks with Delay Elements Seiji MIYOSHI Hiro-Fumi YANAI Masato OKADA Kobe City College of Tech. Ibaraki Univ.
Andrew Thomson on Generalised Estimating Equations (and simulation studies)
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 20 Oct 26, 2005 Nanjing University of Science & Technology.
 Diagram of a Neuron  The Simple Perceptron  Multilayer Neural Network  What is Hidden Layer?  Why do we Need a Hidden Layer?  How do Multilayer.
1 Definition of System Simulation: The practice of building models to represent existing real-world systems, or hypothetical future systems, and of experimenting.
Analysis of Simulation Results Chapter 25. Overview  Analysis of Simulation Results  Model Verification Techniques  Model Validation Techniques  Transient.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Curve-Fitting Regression
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Inference for Regression Chapter 14. Linear Regression We can use least squares regression to estimate the linear relationship between two quantitative.
CHAPTER 5 S TOCHASTIC G RADIENT F ORM OF S TOCHASTIC A PROXIMATION Organization of chapter in ISSO –Stochastic gradient Core algorithm Basic principles.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
September Bound Computation for Adaptive Systems V&V Giampiero Campa September 2008 West Virginia University.
Linear Discrimination Reading: Chapter 2 of textbook.
Lecturer CS&E Department,
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Suppose we are given a differential equation and initial condition: Then we can approximate the solution to the differential equation by its linearization.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Backpropagation Training
Differential Equations Linear Equations with Variable Coefficients.
Introduction The objective of simulation – Analysis the system (Model) Analytically the model – a description of some system intended to predict the behavior.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
State-Space Recursive Least Squares with Adaptive Memory College of Electrical & Mechanical Engineering National University of Sciences & Technology (NUST)
Chapter 10 Notes AP Statistics. Re-expressing Data We cannot use a linear model unless the relationship between the two variables is linear. If the relationship.
Computacion Inteligente Least-Square Methods for System Identification.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.
6/11/2016Atomic Scale Simulation1 Definition of Simulation What is a simulation? –It has an internal state “S” In classical mechanics, the state = positions.
Conducting Research Psychology, like chemistry and biology, is an experimental science, assumptions must be supported by scientific evidence. It is not.
CORRELATION-REGULATION ANALYSIS Томский политехнический университет.
Back Propagation and Representation in PDP Networks
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
Lecture 4 - Transverse Optics II
Real Neurons Cell structures Cell body Dendrites Axon
A Simple Artificial Neuron
EHPV® Technology Advanced Control Techniques for Electro-Hydraulic Control Valves by Patrick Opdenbosch Goals Develop a smarter and self-contained valve.
Description and Analysis of Systems
Adjustment of Temperature Trends In Landstations After Homogenization ATTILAH Uriah Heat Unavoidably Remaining Inaccuracies After Homogenization Heedfully.
Classification Neural Networks 1
Linear simultaneous Equations: Eliminating the variable
Linear simultaneous equations: Correct solution? (step 7)
Recursively Adapted Radial Basis Function Networks and its Relationship to Resource Allocating Networks and Online Kernel Learning Weifeng Liu, Puskal.
Dynamics of Training Noh, Yung-kyun Mar. 11, 2003
Linear simultaneous Equations: Make x the same (step 2*)
Presentation transcript:

1 A Statistical Mechanical Analysis of Online Learning: Seiji MIYOSHI Kobe City College of Technology

2 Background (1) Batch Learning –Examples are used repeatedly –Correct answers for all examples –Long time –Large memory Online Learning –Examples used once are discarded –Cannot give correct answers for all examples –Large memory isn't necessary –Time variant teacher

3 A Statistical Mechanical Analysis of Online Learning: Can Student be more Clever than Teacher ? Seiji MIYOSHI Kobe City College of Technology Jan. 2006

4 Moving Teacher Student True Teacher A Jan. 2006

5 A Statistical Mechanical Analysis of Online Learning: Seiji MIYOSHI Kobe City College of Technology Many Teachers or Few Teachers ?

6 True teacher Student Ensemble teachers

7 P U R P O S E To analyze generalization performance of a model composed of a student, a true teacher and K teachers (ensemble teachers) who exist around the true teacher To discuss the relationship between the number, the diversity of ensemble teachers and the generalization error

8 M O D E L (1/4) True teacher Student J learns B 1,B 2, ・・・ in turn. J can not learn A directly. A, B 1,B 2, ・・・,J are linear perceptrons with noises. Ensemble teachers

9 Simple Perceptron Output Inputs Connection weights +1

10 Output Inputs Connection weights Simple Perceptron Linear Perceptron

11 M O D E L (2/4) Linear Perceptrons with Noises

12 M O D E L (3/4) Inputs: Initial value of student: True teacher: Ensemble teachers: N→∞ (Thermodynamic limit) Order parameters –Length of student –Direction cosines

13 True teacher Student Ensemble teachers

14 fkmfkm Student learns K ensemble teachers in turn. M O D E L (4/4) Gradient method Squared errors

15 GENERALIZATION ERROR A goal of statistical learning theory is to obtain generalization error theoretically. Generalization error = mean of errors over the distribution of new input

16 Simultaneous differential equations in deterministic forms, which describe dynamical behaviors of order parameters

17 Analytical solutions of order parameters

18 GENERALIZATION ERROR A goal of statistical learning theory is to obtain generalization error theoretically. Generalization error = mean of errors over the distribution of new input

19 Dynamical behaviors of generalization error, R J and l ( η=0.3, K=3, R B =0.7, σ A 2 =0.0, σ B 2 =0.1, σ J 2 =0.2 ) Student Ensemble teachers J

20 Analytical solutions of order parameters

21 Steady state analysis ( t → ∞ ) ・ If η <0 or η > 2 ・ If 0< η <2 Generalization error and length of student diverge. If η <1, the more teachers exist or the richer the diversity of teachers is, the cleverer the student can become. If η >1, the fewer teachers exist or the poorer the diversity of teachers is, the cleverer the student can become.

22 Steady value of generalization error, R J and l ( K=3, R B =0.7, σ A 2 =0.0, σ B 2 =0.1, σ J 2 =0.2 ) J

23 Steady value of generalization error, R J and l ( q=0.49, R B =0.7, σ A 2 =0.0, σ B 2 =0.1, σ J 2 =0.2 ) J

24 CONCLUSIONS We have analyzed the generalization performance of a student in a model composed of linear perceptrons: a true teacher, K teachers, and the student. Calculating the generalization error of the student analytically using statistical mechanics in the framework of on-line learning, we have proven that when the learning rate satisfies η 1, the properties are completely reversed. If the diversity of the K teachers is rich enough, the direction cosine between the true teacher and the student becomes unity in the limit of η→0 and K→∞.