Toric Modification on Machine Learning Keisuke Yamazaki & Sumio Watanabe Tokyo Institute of Technology.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Recovery of affine and metric properties from images in 2D Projective space Ko Dae-Won.
Pattern Recognition and Machine Learning
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
Pattern Recognition and Machine Learning
CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct
Shinichi Nakajima Sumio Watanabe  Tokyo Institute of Technology
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
Markov Chains Ali Jalali. Basic Definitions Assume s as states and s as happened states. For a 3 state Markov model, we construct a transition matrix.
Visual Recognition Tutorial
Nice, 17/18 December 2001 Autonomous mapping of natural fields using Random Closed Set Models Stefan Rolfes, Maria Joao Rendas
1 Dr. Scott Schaefer The Bernstein Basis and Bezier Curves.
1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)
Presenting: Assaf Tzabari
Parametric Inference.
The Implicit Mapping into Feature Space. In order to learn non-linear relations with a linear machine, we need to select a set of non- linear features.
Visual Recognition Tutorial
Chapter 5 Section 5 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley.
Computational Learning Theory
4.5 Multiplying Polynomials
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Summary of the Bayes Net Formalism David Danks Institute for Human & Machine Cognition.
PATTERN RECOGNITION AND MACHINE LEARNING
APL and J Functional programming with linear algebra Large set of primitive operations Very compact program representation Regular grammar, uniform abbreviation.
1 Reproducing Kernel Exponential Manifold: Estimation and Geometry Kenji Fukumizu Institute of Statistical Mathematics, ROIS Graduate University of Advanced.
University of Southern California Department Computer Science Bayesian Logistic Regression Model (Final Report) Graduate Student Teawon Han Professor Schweighofer,
A C B Small Model Middle Model Large Model Figure 1 Parameter Space The set of parameters of a small model is an analytic set with singularities. Rank.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Dividing a Polynomial by a Monomial
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
2.2 Warm Up Find the sum or difference. 1. (2x – 3 + 8x²) + (5x + 3 – 8x²) 2. (x³ - 5x² - 4x) – (4x³ - 3x² + 2x – 8) 3. (x – 4) – (5x³ - 2x² + 3x – 11)
1 Weak Convergence of Random Free Energy in Information Theory Sumio Watanabe Tokyo Institute of Technology.
1 Analytic Solution of Hierarchical Variational Bayes Approach in Linear Inverse Problem Shinichi Nakajima, Sumio Watanabe Nikon Corporation Tokyo Institute.
Bayes Theorem The most likely value of x derived from this posterior pdf therefore represents our inverse solution. Our knowledge contained in is explicitly.
Machine Learning 5. Parametric Methods.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Asymptotic Behavior of Stochastic Complexity of Complete Bipartite Graph-Type Boltzmann Machines Yu Nishiyama and Sumio Watanabe Tokyo Institute of Technology,
Machine Learning in CSC 196K
Kernel Regression Prof. Bennett Math Model of Learning and Discovery 1/28/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.
1 Kernel-class Jan Recap: Feature Spaces non-linear mapping to F 1. high-D space 2. infinite-D countable space : 3. function space (Hilbert.
1.(-7) (-2) 2.(3)(-6) 3.(4)(5) 4.(-3) (4t) 5.(2)(-2x) 6.(7y)(3) 7.3(s+5) 8.4(-n+2) 9.4-(t+2) 10.3n+(2-n) Algebra S9 Day 21.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
HW: Handout due at the end of class Wednesday. Do Now: Take out your pencil, notebook, and calculator. 1)Sketch a graph of the following rational function.
MAT 2401 Linear Algebra 4.2 Vector Spaces
A Method to Approximate the Bayesian Posterior Distribution in Singular Learning Machines Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Crash course in probability theory and statistics – part 2 Machine Learning, Wed Apr 16, 2008.
In this lesson, we will multiply polynomials
ML estimation in Gaussian graphical models
Probability Theory and Parameter Estimation I
CS 2750: Machine Learning Probability Review Density Estimation
POD: Using the table determine if the operation
Geometrical intuition behind the dual problem
10701 / Machine Learning.
7.8 Multiplying Polynomials
Special Topics In Scientific Computing
Probabilistic Models for Linear Regression
The Bernstein Basis and Bezier Curves
OVERVIEW OF LINEAR MODELS
Monomials & Polynomials
Introduction to Polynomials
6.5 Taylor Series Linearization
Pattern Recognition and Machine Learning
Day 2 Multiplying Linear Expressions Monomials by binomials And
Generally Discriminant Analysis
Parametric Methods Berlin Chen, 2005 References:
MA5242 Wavelets Lecture 1 Numbers and Vector Spaces
Polynomials and Special Products
Presentation transcript:

Toric Modification on Machine Learning Keisuke Yamazaki & Sumio Watanabe Tokyo Institute of Technology

Agenda Learning theory and algebraic geometry Two forms of the Kullback divergence Toric modification Application to a binomial mixture model Summary

Agenda Learning theory and algebraic geometry Two forms of the Kullback divergence Toric modification Application to a binomial mixture model Summary

What is the generalization error? The true model Data Learning model Predictive model i.i.d. MLE MAP Bayes Generalization Error

Algebraic geometry connected to learning theory in the Bayes method. The formal definition of the generalization error. The true model DataLearning model Predictive model

Algebraic geometry connected to learning theory in the Bayes method. The formal definition of the generalization error. Another Kullback divergence : The true model DataLearning model Predictive model

Algebraic geometry connected to learning theory in the Bayes method. The formal definition of the generalization error. Another Kullback divergence : The true model DataLearning model Predictive model Machine Learning / Learning Theory Algebraic Geometry

The zeta function has an important role for the connection. Zeta function : h(w) holomorphic function C-infinity func. with compact support h(w) Machine Learning / Learning Theory Algebraic Geometry Analytic func.

The zeta function has an important role for the connection. Zeta function : Machine Learning / Learning Theory Algebraic Geometry holomorphic function Prior distribution Kullback divergence

The largest pole of the zeta function determines the generalization error. Machine Learning / Learning Theory Algebraic Geometry Asymptotic Bayes generalization error [ Watanabe 2001]

Agenda Learning theory and algebraic geometry Two forms of the Kullback divergence Toric modification Application to a binomial mixture model Summary

Calculation of the zeta function requires well-formed H(w). : Polynomial form : Factorized form Resolution of singularities

The resolution of singularities with blow-ups is an iterative method. Blow-ups

The resolution of singularities with blow-ups is an iterative method. Blow-ups Kullback divergence in ML is complicated and has high dimensional w :

The bottleneck is the iterative method. Blow-ups Applicable function Computational cost Any function Large

Toric modification is a systematic method for the resolution of singularities. Blow-upsToric Modification Applicable function Computational cost Any function Large

Agenda Learning theory and algebraic geometry Two forms of the Kullback divergence Toric modification Application to a binomial mixture model Summary

Newton diagram is a convex hull in the exponent space Toric Modification

The borders determine a set of vectors in the dual space. Toric Modification

Add some vectors subdividing the spanned area. Toric Modification

Selected vectors construct the resolution map. Toric Modification

Non-degenerate Kullback divergence The condition to apply the toric modification to the Kullback divergence For all i : Toric Modification

Toric modification is “systematic”. Toric Modification Blow-ups Blow-up The search space will be large. We cannot know how many iterations we need. The number of vectors is limited.

Toric modification can be an effective plug-in method. Blow-upsToric Modification Applicable function Computational cost Any function Large Non-degenerate function Limited

Agenda Learning theory and algebraic geometry Two forms of the Kullback divergence Toric modification Application to a binomial mixture model Summary

An application to a mixture model Mixture of binomial distributions The true model: Learning model:

The Newton diagram of the mixture

The resolution map based on the toric modification

The generalization error of the mixture of binomial distributions : Factorized form The zeta function: : Generalization error : Polynomial form

Agenda Learning theory and algebraic geometry Two forms of the Kullback divergence Toric modification Application to a binomial mixture model Summary

The Bayesian generalization error is derived on the basis of the zeta function. Calculation of the coefficients requires the factorized form of the Kullback divergence. Toric modification is an effective method to find the factorized form. The error of a binomial mixture is derived as the application.

Thank you !!! Toric Modification on Machine Learning Keisuke Yamazaki & Sumio Watanabe Tokyo Institute of Technology

Asymptotic form of G.E. has been derived in regular models. Essential assumption: Fisher information matrix