. Entropy of Hidden Markov Processes Or Zuk 1 Ido Kanter 2 Eytan Domany 1 Weizmann Inst. 1 Bar-Ilan Univ. 2.

Slides:

Advertisements

Similar presentations

The Future (and Past) of Quantum Lower Bounds by Polynomials Scott Aaronson UC Berkeley.

Advertisements

Point-wise Discretization Errors in Boundary Element Method for Elasticity Problem Bart F. Zalewski Case Western Reserve University Robert L. Mullen Case.

HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:

. The sample complexity of learning Bayesian Networks Or Zuk*^, Shiri Margel* and Eytan Domany* *Dept. of Physics of Complex Systems Weizmann Inst. of.

. On the Number of Samples Needed to Learn the Correct Structure of a Bayesian Network Or Zuk, Shiri Margel and Eytan Domany Dept. of Physics of Complex.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The FIR Adaptive Filter The LMS Adaptive Filter Stability and Convergence.

An Introduction to Variational Methods for Graphical Models.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.

Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.

TCOM 501: Networking Theory & Fundamentals

 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.

Hidden Markov Models: Applications in Bioinformatics Gleb Haynatzki, Ph.D. Creighton University March 31, 2003.

Entropy Rates of a Stochastic Process

Matrices, Digraphs, Markov Chains & Their Use. Introduction to Matrices  A matrix is a rectangular array of numbers  Matrices are used to solve systems.

Hidden Markov Models Theory By Johan Walters (SR 2003)

Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Maximum likelihood (ML) and likelihood ratio (LR) test

. Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken.

Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.

Prediction and model selection

Properties of Logarithms

Stochastic Process1 Indexed collection of random variables {X t } t   for each t  T  X t is a random variable T = Index Set State Space = range.

Maximum likelihood (ML)

EXAMPLE 1 Apply the distributive property

Introduction to AEP In information theory, the asymptotic equipartition property (AEP) is the analog of the law of large numbers. This law states that.

Copyright © Cengage Learning. All rights reserved. CHAPTER 11 ANALYSIS OF ALGORITHM EFFICIENCY ANALYSIS OF ALGORITHM EFFICIENCY.

0 Pattern Classification, Chapter 3 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda,

CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21- Forward Probabilities and Robotic Action Sequences.

Algorithms for a large sparse nonlinear eigenvalue problem Yusaku Yamamoto Dept. of Computational Science & Engineering Nagoya University.

Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Copyright © Cengage Learning. All rights reserved. 17 Second-Order Differential Equations.

Adjoint matrix Yes! The array of algebraic complements!

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.

Hidden Markov Models Usman Roshan CS 675 Machine Learning.

Channel Capacity.

Logarithmic and Exponential Equations Solving Equations.

. The Relative Entropy Rate of Two Hidden Markov Processes Or Zuk Dept. of Phys. Of Comp. Systems Weizmann Inst. Of Science Rehovot, Israel.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

Chapter 61 Continuous Time Markov Chains Birth and Death Processes,Transition Probability Function, Kolmogorov Equations, Limiting Probabilities, Uniformization.

Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.

1 Dirichlet Process Mixtures A gentle tutorial Graphical Models – Khalid El-Arini Carnegie Mellon University November 6 th, 2006 TexPoint fonts used.

Chapter 8: Simple Linear Regression Yang Zhenlin.

ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.

. Finding Motifs in Promoter Regions Libi Hertzberg Or Zuk.

Unique Games Approximation Amit Weinstein Complexity Seminar, Fall 2006 Based on: “Near Optimal Algorithms for Unique Games" by M. Charikar, K. Makarychev,

In Chapters 6 and 8, we will see how to use the integral to solve problems concerning:  Volumes  Lengths of curves  Population predictions  Cardiac.

STA347 - week 91 Random Vectors and Matrices A random vector is a vector whose elements are random variables. The collective behavior of a p x 1 random.

Stochastic Processes and Transition Probabilities D Nagesh Kumar, IISc Water Resources Planning and Management: M6L5 Stochastic Optimization.

A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.

Lecture 9 Model of Hopfield

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.

Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.

Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.

Dynamic Presentation of Key Concepts Module 6 – Part 2 Natural Response of First Order Circuits Filename: DPKC_Mod06_Part02.ppt.

Hidden Markov Models BMI/CS 576

LECTURE 03: DECISION SURFACES

Exact solution of Hidden Markov Processes

LECTURE 10: EXPECTATION MAXIMIZATION (EM)

Sequential Pattern Discovery under a Markov Assumption

Hidden Markov Models Part 2: Algorithms

Class Notes 9: Power Series (1/3)

The Curve Merger (Dvir & Widgerson, 2008)

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

An Introduction to Variational Methods for Graphical Models

LECTURE 15: REESTIMATION, EM AND MIXTURES

Parametric Methods Berlin Chen, 2005 References:

Algebra Introduction.

Presentation transcript:

. Entropy of Hidden Markov Processes Or Zuk 1 Ido Kanter 2 Eytan Domany 1 Weizmann Inst. 1 Bar-Ilan Univ. 2

2 Overview u Introduction u Problem Definition u Statistical Mechanics approach u Cover&Thomas Upper-Bounds u Radius of Convergence u Related subjects u Future Directions

3 HMP - Definitions Markov Process: X – Markov Process M – Transition Matrix u M ij = Pr(X n+1 = j| X n = i) Hidden Markov Process : Y – Noisy Observation of X N – Noise/Emission Matrix u N ij = Pr(Y n = j| X n = i) M N N XnXn X n+1 Y n+1 YnYn

4 Example: Binary HMP 0 1 p(1|0) p(0|1) p(1|1) p(0|0) 0 1 q(0|0) q(1|0) q(0|1) q(1|1) Transition Emission

5 Example: Binary HMP (Cont.) u For simplicity, we will concentrate on Symmetric Binary HMP : u M = N = u So all properties of the process depend on two parameters, p and . Assume (w.l.o.g.) p,  < ½

6 HMP Entropy Rate u Definition : H is difficult to compute, given as a Lyaponov Exponent (which is hard to compute generally.) [Jacquet et al 04] u What to do ? Calculate H in different Regimes.

7 Different Regimes p -> 0, p -> ½ (  fixed)  -> 0,  -> ½ (p fixed) [Ordentlich&Weissman 04] study several regimes. We concentrate on the ‘small noise regime’  -> 0. Solution can be given as a power-series in  :

8 Statistical Mechanics First, observe the Markovian Property : Perform Change of Variables :

9 Statistical Mechanics (cont.) Ising Model : ,   {-1,1} Spin Glasses 11 11 22 22 K J K J nn nn

10 Statistical Mechanics (cont.) Summing, we get :

11 Statistical Mechanics (cont.) Computing the Entropy (low-temperature/high-field expansion) :

12 Cover&Thomas Bounds It is known (Cover & Thomas 1991) : u We will use the upper-bounds C (n), and derive their orders : u Qu : Do the orders ‘saturate’ ?

13 Cover&Thomas Bounds (cont.) n=4

14 Cover&Thomas Bounds (cont.) u Ans : Yes. In fact they ‘saturate’ sooner than would have been expected ! For n  (K+3)/2 they become constant. We therefore have : u Conjecture 1 : (proven for k=1) u How do the orders look ? Their expression is simpler when expressed using = 1-2p, which is the 2 nd eigenvalue of P. u Conjecture 2 :

15 First Few Orders : u Note : H 0 -H 2 proven. The rest are conjectures from the upper-bounds.

16 First Few Orders (Cont.) :

17 First Few Orders (Cont.) :

18 Radius of Convergence : u When is our approximation good ? u Instructive : Compare to the I.I.D. model u For HMP, the limit is unknown. We used the fit :

19 Radius of Convergence (cont.) :

20 Radius of Convergence (cont.) :

21 Relative Entropy Rate u Relative entropy rate : u We get :

22 Index of Coincidence u Take two realizations Y,Y’ (of length n) of the same HMP. What is the probability that they are equal ? Exponentially decaying with n. u We get : u Similarly, we can solve for three and four (but not five) realizations. Can give bounds on the entropy rate.

23 Future Directions u Proving conjectures u Generalizations (e.g. any alphabets, continuous case) u Other regimes u Relative Entropy of two HMPs Thank You