Asymptotic Behavior of Stochastic Complexity of Complete Bipartite Graph-Type Boltzmann Machines Yu Nishiyama and Sumio Watanabe Tokyo Institute of Technology, Japan
Background Learning machines Mixture models Hidden Markov models Bayesian networks Pattern recognition Natural language processing Gene analysis Information systems mathematically Bayes learning is effective Singular statistical models
Problem : Calculations which include a Bayes posterior require huge computational cost. Mean field approximation a Bayes posterior a trial distribution Stochastic Complexity Accuracy of approximation Difference from regular Model selection statistical models
Asymptotic behavior of mean field stochastic complexities are studied. Mixture models [ K. Watanabe, et al ] Reduced rank regressions [ Nakajima, et al ] Hidden Markov models [ Hosino, et al ] Stochastic context-free grammar [ Hosino, et al ] Neural networks [ Nakano, et al ]
Purpose We derive the upper bound of mean field stochastic complexity of complete bipartite graph-type Boltzmann machines. Boltzmann Machines Graphical models Spin systems
Table of Contents Review Bayes Learning Mean Field Approximation Boltzmann Machines Main Theorem Outline of the Proof Discussion and Conclusion Main Theorem ( Complete Bipartite Graph-type )
Bayes Learning True distribution model prior : Bayes posterior : Bayes predictive distribution
Mean Field Approximation (1) The Bayes posterior can be rewritten as We consider a Kullback distance from a trial distribution to the Bayes posterior..
Mean Field Approximation (2) When we restrict the trial distributionto The minimum value of which minimizes is called mean field stochastic complexity., is called mean field approximation.
Complete Bipartite Graph-type Boltzmann Machines units parametric model takes
True Distribution units We assume that the true distribution is included in the parametric model and the number of hidden units is. True distribution is
Main Theorem The mean field stochastic complexity of complete bipartite graph-type Boltzmann machines has the following upper bound. : the number of input and output units : the number of hidden units (learning machines) : the number of hidden units (true distribution) : constant
Outline of the Proof (Methods) normal distribution family prior depends on the BM
Outline of the Proof [lemma] of parameter and, such that the number of elements of the set if there exists a value is less than or equal to, mean field stochastic complexity has the e r o n o n - z following upper bound. Hessian matrix For Kullback information
We apply this lemma to the Boltzmann machines. Kullback information is given by The second order differential is Here..,.
The parameter is a true parameter. Then, becomes. hold. By using the lemma, we have., e r o n o n - z Then,. and
Discussion Comparison with other studies regular statistical model :Number of Training data asymptotic area Bayes learning mean field approximation derived result upper bound algebraic geometry [Yamazaki] upper bound Stochastic Complexity
Conclusion We derived the upper bound of mean field stochastic complexity of complete bipartite graph-type Boltzmann Machines. Lower bound Future works Comparison with experimental results