Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Weak Convergence of Random Free Energy in Information Theory Sumio Watanabe Tokyo Institute of Technology.

Similar presentations


Presentation on theme: "1 Weak Convergence of Random Free Energy in Information Theory Sumio Watanabe Tokyo Institute of Technology."— Presentation transcript:

1 1 Weak Convergence of Random Free Energy in Information Theory Sumio Watanabe Tokyo Institute of Technology

2 2 Contents 1. Background 2. Main Theorem 3. Outline of Proof 4. Applications and Future Study Identification Problem ≡ Math. Phys. with Random Hamiltonian

3 3 Example : Classical Spin System Visible x sisi sjsj w ij Hidden sisi sjsj w ij Hidden Visible samples Learn Unknown Learner Background (1) p(x|w) = ∑ exp( - ∑ w ij s i s j ) 1 Z(w) (i,j) Hidden

4 4 Identification Problem q(x) Classical Unknown Information Source X 1, X 2,…, X n p(x|w) φ(w) Learning System p( x | X 1, X 2,…, X n ) D(q||p) ≡ ∫ dx q [log q –log p] = ? Observation (Relative Entropy) Background (2) Estimated Distribution

5 5 Random Free Energy and Relative Entropy F(X 1, X 2,…, X n ) ≡ - log ∫ p(X 1 |w) p(X 2 |w) ・・・ p(X n |w) φ(dw) D( q(X n+1 ) || p(X n+1 | X 1, X 2,…, X n ) ) = F(X 1, X 2,…, X n+1 ) - F(X 1, X 2,…, X n ) Definition. Random Free Energy = log-Likelihood of System Relation between F and D(q||p) Background (3) + Σ log q(Xi) n i=1

6 6 Identifiability and Singularities A learning system p(x|w) is called identifiable p(x|w 1 ) = p(x|w 2 ) ( ∀ x) ⇒ w 1 =w 2 A system which identifies the structure is non-identifiable. { w ; p(x|w)=p(x|w 0 )} is an analytic set with singularities. W={w}, w 1 ~ w 2 ⇔ “p(x|w 1 ) = p(x|w 2 ) ( ∀ x)” W / is not a manifold because Background (4) ~ Remark.

7 7 Mathematical Definitions X : a random variable on R N with p.d.f. q(x). W : a real d-dimensional manifold. L 2 (q) = {f ; ∫ f(x) 2 q(x) dx < ∞ } : real Hilbert space. φ(w) : a p.d.f. on W, C 0 ∞ -class function. φ(w) dw : prob. Dist. on W Main Theorem (1)

8 8 Mathematical Definitions F = - log ∫ exp( - Σ H(X i, w) ) φ(w) dw n i= 1 Given X 1, X 2, …,X n : i.i.d., Random Free Energy W 0 ≡ {w ∈ supp φ; K(w)=0} ≠ O H( ・,w) : an L 2 (q)-valued real analytic function on W. e.g. H(x,w)=log q(x) – log p(x|w) Main Theorem (2) E X [e -H(X,w) ]=1 ( ∀ w). [ ⇒ K(w) ≡ E X [H(X,w)] ≧ 0 ] s.t.

9 9 Gel ’ fand ’ s Zeta function ζ(z) = ∫ K(w) z φ(w) dw Difficulty : {w; K(w)=0} is an analytic set with singularities. The zeta function (1) ζ(z) can be analytically continued to a meromorphic function on the entire complex plane. (2) All poles are real, negative, and rational numbers. Theorem (Atiyah,Sato,Bernstein,Bjork,Kashiwara,1970-1980) Poles: 0>-λ 1 > -λ 2 > -λ 3 > ・・・, Orders: m 1,m 2,m 3,… Main Theorem (3) : holomorphic in Re(z)>0.

10 10 Main Theorem F – λ 1 log n + (m 1 -1)loglog n → F* The convergence in law holds. where F* can be represented by a limit process of an empirical process on W 0. (n → ∞) Main Theorem (4) E[ D(q||p) ] = + o( ) λ1 nλ1 n CorollaryIf E[ D(q||p)] has an asymptotic expansion 1n1n

11 11 Hironaka Resolution Theorem W K(w) 0 W0W0 g U locally U0U0 K(g(u))=a(u) u 1 2s 1 u 2 2s 2 ・・・ u d 2s d Proof Outline (1)

12 12 Resolution Theorem Let K(w) ≧ 0 be a real analytic function defined in a neighborhood of 0 ∈ W ⊂ R d. Then there exist an open set W, a real analytic manifold U, and a proper analytic map g: U → W such that H.Hironaka(1964) M.F.Atiyah(1970) (1)g:U-U 0 → W-W 0 is an isomorphism. (2) For each P ∈ U, there are local coordinates (u 1,u 2,…,u d ) centered at P so that locally near P K(g(u)) = a(u) u 1 2s 1 u 2 2s 2 ・・・ u d 2s d where a(u)>0 is an analytic function and s i ≧ 0 is integer. Proof Outline (2)

13 13 Division of Partition Function Because suppφ is compact and g is a proper map, We can assume W = ∪ U  (finite sum 、 joint set measure zero) K(g  (u  )) = a(u) u 1 2s 1 u 2 2s 2 ・・・ u d 2s d φ  (u  ) = Σ b  (u) u 1 k 1 u 2 k 2 ・・・ u d k d in each U , Proof Outline (3) Hereafter,  is omitted and K(u) ≡ K(g(u)) is used. exp(-F) = Σ ∫ exp[ -ΣH(X i, g  (u  )) ] φ  (u  ) du   UU ( Both s i and k i depend on  ) n i= 1

14 14 B-function Proof Outline (4) ζ(z) = ∫ K(w) z φ(w) dw The zeta function ∃ P(w,∂w,z) ∃ b(z) s.t. P(w,∂w,z) K(w) z+1 =b(z)K(w) z Analytic continuation is carried out using b-function. If K(w) is a polynomial, then there exists an algorithm to calculate b(z). (Oaku, 1997).

15 15 Ideals of Local Analytic functions (2) H(x,u) =∑ g j (u) h j (x,u) Lemma 1. Let u → H( ・,u) be a real analytic function in U. J j=1 There exist an open set U  ⊂ U and a finite set of analytic functions { g j (u), h j ( ・,u) ; j=1,2, …,J } in U  s.t. (1) T(u) ≧ I ( ∀ u ∈ U  T ij (u) ≡ ∫ h j (x,u) h k (x,u) q(x) dx Proof Outline (5)

16 16 Decomposition of Hamiltonian Σ H(X i,u) = nK(u) + (nK(u)) 1/2 σ n (u) n i=1 σ n (u) ≡ ∑ r(X i,u) 1n1n r(x,u) ≡ n i=1 H(x,u) - K(u) K(u) 1/2 Since Lemma 1 and K(u) = ∫{K(x,u)+e -K(x,u) -1} q(x) dx, r(x,u) is well defined even if K(u)=0. Proof Outline (6) Random Hamiltonian

17 17 Donsker ’ s Empirical Process σ n ( ・ ) → σ ( ・ ) Empirical processTight Gaussian process σ n (u) ≡ ∑ r(Xi,u) 1n1n n i=1 E [ f(σ n )] → E σ [ f(σ)] x 1,x 2,…,x n ( ∀ f : a bounded continuous functional on L ∞ (supp φ)) Proof Outline (7) Central limit theorem in Banach Space

18 18 Poles of Zeta function K(u) = a(u) u 1 2s 1 u 2 2s 2 ・・・ u d 2s d Φ(u) = Σ b(u) u 1 k 1 u 2 k 2 ・・・ u d k d λ = min K j +1 2s j m = ♯ { j ; λ = } K j +1 2s j ζ(z) = Σ ∫ K(u) z φ(u) du Proof Outline (8)

19 19 Zeta function and State Density ∬ L( u ) z φ( u,v ) d u d v ∬ δ(t-L( u )) φ( u,v ) d u d v u=(u,v) u =(u j ) ; j ∈ J : attains min. L(u) ≡ Π u j 2s j j ∈ J : Pole – λ order m = t λ-1 (-log t) m-1 ∫ φ( 0,v ) d v Inverse Mellin Transf. Proof Outline (9) ( t → 0 ) Partial Zeta State Density

20 20 Partition function and Empirical Process E [ { Z } iε ]→const. (log n) m-1 n λ Characteristic function of F : Sufficiently small ε >0 Z = ∬ exp(-nK (u,v) + (nK (u,v) ) 1/2 σ n (u,v) ) φ( u,v ) d u d v → ∬ ( ) λ-1 (-log( )) m-1 φ( 0,v ) d v ×exp[ -tK (0,v) + (tK (0,v) ) 1/2 σ(0,v) ] t n t n t n t n dt n Proof Outline (10) Partition function ← State Density ← Zeta function Q.E.D. ( n → ∞ )

21 21 Information Science & Mathematical Physics Applications and Future Study (1) Identification of Unknown Information Source = Statistical Physics with Random Hamiltonian Identification of Hidden Structure = Hamiltonian has Singularities ⇒ Singularities make State Density to be singular.

22 22 Model Identification Applications and Future Study (2) p(x|w), φ(w) F F = F(p,φ,X 1,X 2,…,X n ) True From Samples, then true distribution is identified.

23 23 Poles and orders of Zeta function 1. If φ(w)>0 at W 0, then 0<λ ≦ d/2. 2. 1 ≦ m ≦ d. 3. If φ(w) is Jeffreys’ prior, λ ≧ d/2. 4. If ζ(z) has a pole –λ’, then λ ≦ λ. Applications and Future Study (3)

24 24 Concrete Learning Systems 1. Neural Networks, True H 0, Model H. 2. Gaussian Mixtures, True H 0, Model H. p(y|x,w)= exp(- ) p(x|w) = Σ a h exp( - ) 1 (2π) 1/2 || y – Σa h f(b h ・ x+c h )|| 2 2 || x - b h || 2 2 2λ ≦ H 0 (M+N+1) + (H-H 0 ) Min(M+1,N) 2λ ≦ H 0 + (M-1)H/2 +(M-3)/2 Applications and Future Study (4)

25 25 Future Study Applications and Future Study (5) 2. Large System : Thermo-dynamical limit. 3. Replica Method : f(z) = E[ exp( zF) ]. 4. Generalization to Non-commutative System. 1. Testing hypothesis ⇒ q(x)=p(x|w 0 ) ; w 0 near singularity


Download ppt "1 Weak Convergence of Random Free Energy in Information Theory Sumio Watanabe Tokyo Institute of Technology."

Similar presentations


Ads by Google