Asymptotic Behavior of Stochastic Complexity of Complete Bipartite Graph-Type Boltzmann Machines Yu Nishiyama and Sumio Watanabe Tokyo Institute of Technology,

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

The Helmholtz Machine P Dayan, GE Hinton, RM Neal, RS Zemel
Pattern Recognition and Machine Learning
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
Shinichi Nakajima Sumio Watanabe  Tokyo Institute of Technology
An Introduction to Variational Methods for Graphical Models.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Laboratory for Social & Neural Systems Research (SNS) PATTERN RECOGNITION AND MACHINE LEARNING Institute of Empirical Research in Economics (IEW)
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
Structure learning with deep neuronal networks 6 th Network Modeling Workshop, 6/6/2013 Patrick Michl.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Presenting: Assaf Tzabari
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.
T T07-01 Sample Size Effect – Normal Distribution Purpose Allows the analyst to analyze the effect that sample size has on a sampling distribution.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
Introduction to Artificial Neural Network and Fuzzy Systems
. Expressive Graphical Models in Variational Approximations: Chain-Graphs and Hidden Variables Tal El-Hay & Nir Friedman School of Computer Science & Engineering.
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Lecture note for Stat 231: Pattern Recognition and Machine Learning 4. Maximum Likelihood Prof. A.L. Yuille Stat 231. Fall 2004.
A C B Small Model Middle Model Large Model Figure 1 Parameter Space The set of parameters of a small model is an analytic set with singularities. Rank.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Analysis of Exchange Ratio for Exchange Monte Carlo Method Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology Japan.
Hyperparameter Estimation for Speech Recognition Based on Variational Bayesian Approach Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee and Keiichi.
High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture model Based on Minimum Message Length by Nizar Bouguila.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Tokyo Institute of Technology, Japan Yu Nishiyama and Sumio Watanabe Theoretical Analysis of Accuracy of Gaussian Belief Propagation.
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
1 Analytic Solution of Hierarchical Variational Bayes Approach in Linear Inverse Problem Shinichi Nakajima, Sumio Watanabe Nikon Corporation Tokyo Institute.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Lecture 2: Statistical learning primer for biologists
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
BART VANLUYTEN, JAN C. WILLEMS, BART DE MOOR 44 th IEEE Conference on Decision and Control December 2005 Model Reduction of Systems with Symmetries.
신경망의 기울기 강하 학습 ー정보기하 이론과 자연기울기를 중심으로ー
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Probabilistic Automaton Ashish Srivastava Harshil Pathak.
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
Machine Learning 5. Parametric Methods.
NTU & MSRA Ming-Feng Tsai
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Toric Modification on Machine Learning Keisuke Yamazaki & Sumio Watanabe Tokyo Institute of Technology.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
A Method to Approximate the Bayesian Posterior Distribution in Singular Learning Machines Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Canadian Bioinformatics Workshops
CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Learning Deep Generative Models by Ruslan Salakhutdinov
Random Testing: Theoretical Results and Practical Implications IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 2012 Andrea Arcuri, Member, IEEE, Muhammad.
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
LECTURE 10: EXPECTATION MAXIMIZATION (EM)
Intelligent Information System Lab
Spectral Clustering.
Special Topics In Scientific Computing
Data Mining Lecture 11.
Structure learning with deep autoencoders
Summarizing Data by Statistics
Regulation Analysis using Restricted Boltzmann Machines
Pattern Recognition and Machine Learning
Parametric Methods Berlin Chen, 2005 References:
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
Data Exploration and Pattern Recognition © R. El-Yaniv
Presentation transcript:

Asymptotic Behavior of Stochastic Complexity of Complete Bipartite Graph-Type Boltzmann Machines Yu Nishiyama and Sumio Watanabe Tokyo Institute of Technology, Japan

Background Learning machines Mixture models Hidden Markov models Bayesian networks Pattern recognition Natural language processing Gene analysis Information systems mathematically Bayes learning is effective Singular statistical models

Problem : Calculations which include a Bayes posterior require huge computational cost. Mean field approximation a Bayes posterior a trial distribution Stochastic Complexity Accuracy of approximation Difference from regular Model selection statistical models

Asymptotic behavior of mean field stochastic complexities are studied. Mixture models [ K. Watanabe, et al ] Reduced rank regressions [ Nakajima, et al ] Hidden Markov models [ Hosino, et al ] Stochastic context-free grammar [ Hosino, et al ] Neural networks [ Nakano, et al ]

Purpose We derive the upper bound of mean field stochastic complexity of complete bipartite graph-type Boltzmann machines. Boltzmann Machines Graphical models Spin systems

Table of Contents Review Bayes Learning Mean Field Approximation Boltzmann Machines Main Theorem Outline of the Proof Discussion and Conclusion Main Theorem ( Complete Bipartite Graph-type )

Bayes Learning True distribution model prior : Bayes posterior : Bayes predictive distribution

Mean Field Approximation (1) The Bayes posterior can be rewritten as We consider a Kullback distance from a trial distribution to the Bayes posterior..

Mean Field Approximation (2) When we restrict the trial distributionto The minimum value of which minimizes is called mean field stochastic complexity., is called mean field approximation.

Complete Bipartite Graph-type Boltzmann Machines units parametric model takes

True Distribution units We assume that the true distribution is included in the parametric model and the number of hidden units is. True distribution is

Main Theorem The mean field stochastic complexity of complete bipartite graph-type Boltzmann machines has the following upper bound. : the number of input and output units : the number of hidden units (learning machines) : the number of hidden units (true distribution) : constant

Outline of the Proof (Methods) normal distribution family prior depends on the BM

Outline of the Proof [lemma] of parameter and, such that the number of elements of the set if there exists a value is less than or equal to, mean field stochastic complexity has the e r o n o n - z following upper bound. Hessian matrix For Kullback information

We apply this lemma to the Boltzmann machines. Kullback information is given by The second order differential is Here..,.

The parameter is a true parameter. Then, becomes. hold. By using the lemma, we have., e r o n o n - z Then,. and

Discussion Comparison with other studies regular statistical model :Number of Training data asymptotic area Bayes learning mean field approximation derived result upper bound algebraic geometry [Yamazaki] upper bound Stochastic Complexity

Conclusion We derived the upper bound of mean field stochastic complexity of complete bipartite graph-type Boltzmann Machines. Lower bound Future works Comparison with experimental results