A Statistical Mechanical Analysis of Online Learning: Can Student be more Clever than Teacher ? Seiji MIYOSHI Kobe City College of Technology

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

It is very difficult to measure the small change in volume of the mercury. If the mercury had the shape of a sphere, the change in diameter would be very.
Introduction to Neural Networks Computing
G53MLE | Machine Learning | Dr Guoping Qiu
Hazırlayan NEURAL NETWORKS Least Squares Estimation PROF. DR. YUSUF OYSAL.
1 A Statistical Mechanical Analysis of Online Learning: Seiji MIYOSHI Kobe City College of Technology
1 Statistical Mechanics of Online Learning for Ensemble Teachers Seiji Miyoshi Masato Okada Kobe City College of Tech. Univ. of Tokyo, RIKEN BSI.
The loss function, the normal equation,
1 On-Line Learning with Recycled Examples: A Cavity Analysis Peixun Luo and K. Y. Michael Wong Hong Kong University of Science and Technology.
NNs Adaline 1 Neural Networks - Adaline L. Manevitz.
Hypothesis Testing Steps of a Statistical Significance Test. 1. Assumptions Type of data, form of population, method of sampling, sample size.
Chapter 1 Introduction The solutions of engineering problems can be obtained using analytical methods or numerical methods. Analytical differentiation.
Goals of Adaptive Signal Processing Design algorithms that learn from training data Algorithms must have good properties: attain good solutions, simple.
Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.
September 23, 2010Neural Networks Lecture 6: Perceptron Learning 1 Refresher: Perceptron Training Algorithm Algorithm Perceptron; Start with a randomly.
MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
1 Analysis of Ensemble Learning using Simple Perceptrons based on Online Learning Theory Seiji MIYOSHI 1 Kazuyuki HARA 2 Masato OKADA 3,4,5 1 Kobe City.
Simple Linear Regression NFL Point Spreads – 2007.
Neural Networks Lecture 8: Two simple learning algorithms
Where We’re At Three learning rules  Hebbian learning regression  LMS (delta rule) regression  Perceptron classification.
FIRST ORDER TRANSIENT CIRCUITS
Learning with Positive and Unlabeled Examples using Weighted Logistic Regression Wee Sun Lee National University of Singapore Bing Liu University of Illinois,
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Natural Gradient Works Efficiently in Learning S Amari (Fri) Computational Modeling of Intelligence Summarized by Joon Shik Kim.
Associative Memory by Recurrent Neural Networks with Delay Elements Seiji MIYOSHI Hiro-Fumi YANAI Masato OKADA Kobe City College of Tech. Ibaraki Univ.
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 20 Oct 26, 2005 Nanjing University of Science & Technology.
1 Definition of System Simulation: The practice of building models to represent existing real-world systems, or hypothetical future systems, and of experimenting.
A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is: By gradient descent. x0x0 + -
CHAPTER 5 S TOCHASTIC G RADIENT F ORM OF S TOCHASTIC A PROXIMATION Organization of chapter in ISSO –Stochastic gradient Core algorithm Basic principles.
September Bound Computation for Adaptive Systems V&V Giampiero Campa September 2008 West Virginia University.
Learning Theory Reza Shadmehr LMS with Newton-Raphson, weighted least squares, choice of loss function.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
1 Beginning & Intermediate Algebra – Math 103 Math, Statistics & Physics.
3.7 Adaptive filtering Joonas Vanninen Antonio Palomino Alarcos.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Backpropagation Training
Differential Equations Linear Equations with Variable Coefficients.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
1 Neural networks 2. 2 Introduction: Neural networks The nervous system contains 10^12 interconnected neurons.
Dimensions of Neural Networks Ali Akbar Darabi Ghassem Mirroshandel Hootan Nokhost.
State-Space Recursive Least Squares with Adaptive Memory College of Electrical & Mechanical Engineering National University of Sciences & Technology (NUST)
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Neural Networks for Machine Learning Lecture 3a Learning the weights of a linear neuron Geoffrey Hinton with Nitish Srivastava Kevin Swersky.
Lecture 2 Introduction to Neural Networks and Fuzzy Logic President UniversityErwin SitompulNNFL 2/1 Dr.-Ing. Erwin Sitompul President University
2( ) 8x + 14y = 4 -12x – 14y = x = x = 4 8x + 14y = 4 8(4) + 14y = y = y = -28 ___ ___ y = -2 The solution is (4, -2)
Thomas F. Edgar (UT-Austin) RLS – Linear Models Virtual Control Book 12/06 Recursive Least Squares Parameter Estimation for Linear Steady State and Dynamic.
This whole paper is about...
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
MECH 373 Instrumentation and Measurement
Objective – To use tables to represent functions.
A Simple Artificial Neuron
Neural Networks Dr. Peter Phillips.
EHPV® Technology Advanced Control Techniques for Electro-Hydraulic Control Valves by Patrick Opdenbosch Goals Develop a smarter and self-contained valve.
exa.im/stempy16.files - Session 12 Python Camp
Simple Learning: Hebbian Learning and the Delta Rule
Introduction to Instrumentation Engineering
Biological and Artificial Neuron
Biological and Artificial Neuron
Biological and Artificial Neuron
Backpropagation.
MATH 1310 Session 2.
Backpropagation.
The loss function, the normal equation,
Mathematical Foundations of BME Reza Shadmehr
Solve the equation: 6 x - 2 = 7 x + 7 Select the correct answer.
Dynamics of Training Noh, Yung-kyun Mar. 11, 2003
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

A Statistical Mechanical Analysis of Online Learning: Can Student be more Clever than Teacher ? Seiji MIYOSHI Kobe City College of Technology

2 Background (1) Batch Learning –Examples are used repeatedly –Correct answers for all examples –Long time –Large memory Online Learning –Examples used once are discarded –Cannot give correct answers for all examples –Large memory isn't necessary –Time variant teacher

3 Background (2) TeacherStudent

4 Simple Perceptron Output Inputs Connection weights +1

5 Background (2) TeacherStudent Learnable Case

6 Background (3) Teacher Student Unlearnable Case ( Inoue & Nishimori, Phys. Rev. E, 1997) ( Inoue, Nishimori & Kabashima, TANC-97, cond-mat/ , 1997)

7 Background (4) Hebbian Learning Perceptron Learning

8 Model (1) Moving Teacher Student True Teacher A

9 Model (2) Length of Student Length of Moving Teacher A B J

10 Model (3) A B J

11 Output Inputs Connection weights Simple Perceptron Linear Perceptron

12 Model (3) Linear Perceptrons with Noises A B J

13 f g Model (4) Squared Errors Gradient Method A B J

14 ErrorGaussian Generalization Error A B J

15 Differential equations for order parameters

16 f g Model (4) Squared Errors Gradient Method A B J

17 B m+1 = B m + g m x m + Nr B m+1 = Nr B m + g m y m Ndt Nr B m+2 = Nr B m+1 + g m+1 y m+1 Nr B m+Ndt = Nr B m+Ndt-1 + g m+Ndt-1 y m+Ndt-1 Nr B m+Ndt = Nr B m + Ndt N(r B +dr B ) = Nr B + Ndt dr B / dt =

18 Differential equations for order parameters

19 Sample Averages

20 Differential equations for order parameters

21 Analytical Solutions of Order Parameters

22 Differential equations for order parameters

23 ErrorGaussian Generalization Error A B J

24 Dynamical Behaviors of Generalization Errors η J = 1.2 η J = 0.3

25 Dynamical Behaviors of R and l η J = 1.2η J = 0.3

26 Analytical Solutions of Order Parameters

27 Steady State

28 ηJηJ 20

29 Conclusions Generalization errors of a model composed of a true teacher, a moving teacher, and a student that are all linear perceptrons with noises have been obtained analytically using statistical mechanics. Generalization errors of a student can be smaller than that of a moving teacher, even if the student only uses examples from the moving teacher.