1 Analysis of Ensemble Learning using Simple Perceptrons based on Online Learning Theory Seiji MIYOSHI 1 Kazuyuki HARA 2 Masato OKADA 3,4,5 1 Kobe City.

Slides:

Advertisements

Similar presentations

Introduction to Neural Networks Computing

Advertisements

1 Non-Linear and Smooth Regression Non-linear parametric models: There is a known functional form y=  x,  derived from known theory or from previous.

Chapter 14 The Simple Linear Regression Model. I. Introduction We want to develop a model that hopes to successfully explain the relationship between.

A Statistical Mechanical Analysis of Online Learning: Can Student be more Clever than Teacher ? Seiji MIYOSHI Kobe City College of Technology

1 A Statistical Mechanical Analysis of Online Learning: Seiji MIYOSHI Kobe City College of Technology

1 Statistical Mechanics of Online Learning for Ensemble Teachers Seiji Miyoshi Masato Okada Kobe City College of Tech. Univ. of Tokyo, RIKEN BSI.

P M V Subbarao Professor Mechanical Engineering Department

Aims: - evaluate typical properties in controlled model situations - gain general insights into machine learning problems - compare algorithms in controlled.

x – independent variable (input)

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Chapter 5: Linear Discriminant Functions

SYSTEMS Identification Ali Karimpour Assistant Professor Ferdowsi University of Mashhad Reference: “System Identification Theory For The User” Lennart.

Economics 214 Lecture 2 Mathematical Framework of Economic Analysis Continued.

Chapter 6: Multilayer Neural Networks

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

T T07-01 Sample Size Effect – Normal Distribution Purpose Allows the analyst to analyze the effect that sample size has on a sampling distribution.

MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Basic Models in Theoretical Neuroscience Oren Shriki 2010 Differential Equations.

November 21, 2012Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms III 1 Learning in the BPN Gradients of two-dimensional functions:

Feedback Control Systems (FCS)

Discriminant Analysis Testing latent variables as predictors of groups.

Calibration & Curve Fitting

Integrated Science Unit 2, Chapter 4.

3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors.

Vocabulary Chapter 4. In a relationship between variables, the variable that changes with respect to another variable is called the.

Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.

Where We’re At Three learning rules  Hebbian learning regression  LMS (delta rule) regression  Perceptron classification.

Sampling January 9, Cardinal Rule of Sampling Never sample on the dependent variable! –Example: if you are interested in studying factors that lead.

Neural Networks. Plan Perceptron  Linear discriminant Associative memories  Hopfield networks  Chaotic networks Multilayer perceptron  Backpropagation.

EM and expected complete log-likelihood Mixture of Experts

Natural Gradient Works Efficiently in Learning S Amari (Fri) Computational Modeling of Intelligence Summarized by Joon Shik Kim.

Mathematical Modeling and Formal Specification Languages CIS 376 Bruce R. Maxim UM-Dearborn.

Associative Memory by Recurrent Neural Networks with Delay Elements Seiji MIYOSHI Hiro-Fumi YANAI Masato OKADA Kobe City College of Tech. Ibaraki Univ.

Quality of Curve Fitting P M V Subbarao Professor Mechanical Engineering Department Suitability of A Model to a Data Set…..

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Computational Physics Introduction 3/30/11. Goals  Calculate solutions to physics problems  All physics problems can be formulated mathematically. 

1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 21 Oct 28, 2005 Nanjing University of Science & Technology.

Benk Erika Kelemen Zsolt

Equations of Linear Relationships

1 SMU EMIS 7364 NTU TO-570-N Inferences About Process Quality Updated: 2/3/04 Statistical Quality Control Dr. Jerrell T. Stracener, SAE Fellow.

Curve-Fitting Regression

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

Derivatives In modern structural analysis we calculate response using fairly complex equations. We often need to solve many thousands of simultaneous equations.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Teacher Resources  Chapter 4 Color Teaching Transparency —Ch 4.2Ch 4.2  Laboratory Black line Masters Laboratory Black line Masters  Electronic Book.

September Bound Computation for Adaptive Systems V&V Giampiero Campa September 2008 West Virginia University.

Computational fluid dynamics Authors: A. Ghavrish, Ass. Prof. of NMU, M. Packer, President of LDI inc.

Linear Discrimination Reading: Chapter 2 of textbook.

Graphs We often use graphs to show how two variables are related. All these examples come straight from your book.

ADALINE (ADAptive LInear NEuron) Network and

V Bandi and R Lahdelma 1 Forecasting. V Bandi and R Lahdelma 2 Forecasting? Decision-making deals with future problems -Thus data describing future must.

Chapter 8: Adaptive Networks

Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.

Equations of Linear Relationships

Alex Stabile. Research Questions: Could a computer learn to distinguish between different composers? Why does music by different composers even sound.

State Equations BIOE Processes A process transforms input to output States are variables internal to the process that determine how this transformation.

Introduction The objective of simulation – Analysis the system (Model) Analytically the model – a description of some system intended to predict the behavior.

Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.

1 Neural networks 2. 2 Introduction: Neural networks The nervous system contains 10^12 interconnected neurons.

Instructor: Chen-Hsiung Yang Dynamic System Modeling Analysis and Control 動態系統建模分析與控制 Lecture1 Introduction to System Dynamics.

Modelling & Simulation of Semiconductor Devices Lecture 1 & 2 Introduction to Modelling & Simulation.

BASIC CONCEPTS OF CONTROL SYSTEM SEM :- V CONTROL ENGINEERING ENROLLMENT NO: GUIDED BY PROF. S.P.PATEL.

Chapter 7. Classification and Prediction

Regression Analysis Module 3.

Linear Control Systems

Simple Learning: Hebbian Learning and the Delta Rule

Hidden Markov Models Part 2: Algorithms

Implicit Differentiation

Evolutionary Ensembles with Negative Correlation Learning

Presentation transcript:

1 Analysis of Ensemble Learning using Simple Perceptrons based on Online Learning Theory Seiji MIYOSHI 1 Kazuyuki HARA 2 Masato OKADA 3,4,5 1 Kobe City College of Tech., 2 Tokyo Metropolitan College of Tech., 3 University of Tokyo 4 RIKEN BSI, 5 Intelligent Cooperation and Control, PRESTO, JST

2 ABSTRACT Ensemble learning of K simple perceptrons, which determine their outputs by sign functions, is discussed within the framework of online learning and statistical mechanics. One purpose of statistical learning theory is to theoretically obtain the generalization error. We show that ensemble generalization error can be calculated by using two order parameters, that is, the similarity between a teacher and a student, and the similarity among students. The differential equations that describe the dynamical behaviors of these order parameters are derived in the case of general learning rules. The concrete forms of these differential equations are derived analytically in the cases of three well-known rules: Hebbian learning, perceptron learning and AdaTron learning. Ensemble generalization errors of these three rules are calculated by using the results determined by solving their differential equations. As a result, these three rules show different characteristics in their affinity for ensemble learning, that is “maintaining variety among students”. Results show that AdaTron learning is superior to the other two rules with respect to that affinity. ‘

3 BACKGROUND Ensemble learning has recently attracted the attention of many researchers. Ensemble learning means to combine many rules or learning machines (students in the following) that perform poorly. Theoretical studies analyzing the generalization performance by using statistical mechanics have been performed vigorously. Hara and Okada theoretically analyzed the case in which students are linear perceptrons. Hebbian learning, perceptron learning and AdaTron learning are well-known as learning rules for a nonlinear perceptron, which decides its output by sign function. Determining differences among ensemble learnings with Hebbian learning, perceptron learning and AdaTron learning, is a very attractive problem, but it is one that has never been analyzed. OBJECTIVE We discuss ensemble learning of K simple perceptrons within the framework of online learning and finite K.

4 MODEL Common input x to teacher and all students in the same order. Input x, once used for an update, is abandoned. (Online learning) Update of student is independent each other. Two methods are treated to decide an ensemble output. One is the majority vote (MV) of students, and the other is the weight mean (WM). length of student Input: Teacher ： Student: TeacherStudents 12K

5 Generalization Error ε g : Probability that an ensemble output disagrees with that of the teacher for a new input x THEORY Similarity between teacher and student Similarity among students

6 Differential equations describing l and R (known result) Differential equation describing q (new result)

7 Hebbian (known result) (new result) RESULTS

8 Perceptron (known result) (new result)

9 AdaTron (known result) (new result)

10 Generalization Error Hebbian Perceptron AdaTron K= ∞ K=1

11 Similarity between teacher and student Similarity among students DISCUSSION

12 To maintain the variety of students is important in ensemble learning. →Relationship between R and q is essential. B J k J k' B J k J q kk' q is small → Effect of ensemble is strong. q is large → Effect of ensemble is small.

13 Dynamical behaviors of R and q Hebbian Perceptron AdaTron Relationship between R and q