Tim Holliday Peter Glynn Andrea Goldsmith Stanford University

Slides:



Advertisements
Similar presentations
Random Processes Introduction (2)
Advertisements

. Markov Chains. 2 Dependencies along the genome In previous classes we assumed every letter in a sequence is sampled randomly from some distribution.
CS433 Modeling and Simulation Lecture 06 – Part 03 Discrete Markov Chains Dr. Anis Koubâa 12 Apr 2009 Al-Imam Mohammad Ibn Saud University.
Discrete Time Markov Chains
Chain Rules for Entropy
Markov Chains 1.
11 - Markov Chains Jim Vallandingham.
TCOM 501: Networking Theory & Fundamentals
Entropy Rates of a Stochastic Process
Graduate School of Information Sciences, Tohoku University
Symbolic dynamics of Markov chains P S Thiagarajan School of Computing National University of Singapore Joint work with: Manindra Agrawal, S Akshay, Blaise.
6.896: Probability and Computation Spring 2011 Constantinos (Costis) Daskalakis lecture 2.
Overview of Markov chains David Gleich Purdue University Network & Matrix Computations Computer Science 15 Sept 2011.
Markov Chains Lecture #5
Chapter 6 Information Theory
Introduction to PageRank Algorithm and Programming Assignment 1 CSC4170 Web Intelligence and Social Computing Tutorial 4 Tutor: Tom Chao Zhou
What if time ran backwards? If X n, 0 ≤ n ≤ N is a Markov chain, what about Y n = X N-n ? If X n follows the stationary distribution, Y n has stationary.
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
1 Markov Chains Algorithms in Computational Biology Spring 2006 Slides were edited by Itai Sharon from Dan Geiger and Ydo Wexler.
3/23/04 1 Information Rates for Two-Dimensional ISI Channels Jiangxin Chen and Paul H. Siegel Center for Magnetic Recording Research University of California,
Division of Engineering and Applied Sciences DIMACS-04 Iterative Timing Recovery Aleksandar Kavčić Division of Engineering and Applied Sciences Harvard.
Problems, cont. 3. where k=0?. When are there stationary distributions? Theorem: An irreducible chain has a stationary distribution  iff the states are.
. Markov Chains Lecture #5 Background Readings: Durbin et. al. Section 3.1 Prepared by Shlomo Moran, based on Danny Geiger’s and Nir Friedman’s.
Basic Definitions Positive Matrix: 5.Non-negative Matrix:
CS6800 Advanced Theory of Computation Fall 2012 Vinay B Gavirangaswamy
Review of Probability.
The effect of New Links on Google Pagerank By Hui Xie Apr, 07.
Introduction to AEP In information theory, the asymptotic equipartition property (AEP) is the analog of the law of large numbers. This law states that.
§1 Entropy and mutual information
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
6. Markov Chain. State Space The state space is the set of values a random variable X can take. E.g.: integer 1 to 6 in a dice experiment, or the locations.
Entropy Rate of a Markov Chain
§4 Continuous source and Gaussian channel
9. Convergence and Monte Carlo Errors. Measuring Convergence to Equilibrium Variation distance where P 1 and P 2 are two probability distributions, A.
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Channel Capacity.
6.896: Probability and Computation Spring 2011 Constantinos (Costis) Daskalakis lecture 3.
Dynamical Systems Model of the Simple Genetic Algorithm Introduction to Michael Vose’s Theory Rafal Kicinger Summer Lecture Series 2002.
COMMUNICATION NETWORK. NOISE CHARACTERISTICS OF A CHANNEL 1.
Information Theory for Data Streams David P. Woodruff IBM Almaden.
Communication System A communication system can be represented as in Figure. A message W, drawn from the index set {1, 2,..., M}, results in the signal.
Time to Equilibrium for Finite State Markov Chain 許元春(交通大學應用數學系)
Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
Discrete Time Markov Chains
The Finite-state channel was introduced as early as 1953 [McMillan'53]. Memory captured by channel state at end of previous symbol's transmission: - S.
STA347 - week 91 Random Vectors and Matrices A random vector is a vector whose elements are random variables. The collective behavior of a p x 1 random.
COMS Network Theory Week 5: October 6, 2010 Dragomir R. Radev Wednesdays, 6:10-8 PM 325 Pupin Terrace Fall 2010.
Stochastic Processes and Transition Probabilities D Nagesh Kumar, IISc Water Resources Planning and Management: M6L5 Stochastic Optimization.
Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.
Refinement of Two Fundamental Tools in Information Theory Raymond W. Yeung Institute of Network Coding The Chinese University of Hong Kong Joint work with.
Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.
Distributed Compression For Still Images
Availability Availability - A(t)
(5) Notes on the Least Squares Estimate
Introduction to Information theory
Advanced Statistical Computing Fall 2016
Discrete-time markov chain (continuation)
Markov Chains Mixing Times Lecture 5
Outline Introduction Signal, random variable, random process and spectra Analog modulation Analog to digital conversion Digital transmission through baseband.
Hiroki Sayama NECSI Summer School 2008 Week 3: Methods for the Study of Complex Systems Information Theory p I(p)
Digital Multimedia Coding
6. Markov Chain.
Randomized Algorithms Markov Chains and Random Walks
Distributed Compression For Binary Symetric Channels
Markov Chains Lecture #5
Boltzmann Machine (BM) (§6.4)
CPSC 503 Computational Linguistics
Presentation transcript:

Tim Holliday Peter Glynn Andrea Goldsmith Stanford University Capacity of Finite-State Channels: Lyapunov Exponents and Shannon Entropy Tim Holliday Peter Glynn Andrea Goldsmith Stanford University

Introduction We show the entropies H(X), H(Y), H(X,Y), H(Y|X) for finite state Markov channels are Lyapunov exponents. This result provides an explicit connection between dynamic systems theory and information theory It also clarifies Information Theoretic connections to Hidden Markov Models This allows novel proof techniques from other fields to be applied to Information Theory problems

Finite-State Channels Channel state Zn  {c0, c1, … cd} is a Markov Chain with transition matrix R(cj, ck) States correspond to distributions on the input/output symbols P(Xn=x, Yn=y)=q(x ,y|zn, zn+1) Commonly used to model ISI channels, magnetic recording channels, etc. c0 c1 c3 c2 R(c0, c2) R(c1, c3)

Time-varying Channels with Memory We consider finite state Markov channels with no channel state information Time-varying channels with finite memory induce infinite memory in the channel output. Capacity for time-varying infinite memory channels is defined in terms of a limit

Previous Research Mutual information for the Gilbert-Elliot channel [Mushkin Bar-David, 1989] Finite-state Markov channels with i.i.d. inputs [Goldsmith/Varaiya, 1996] Recent research on simulation based computation of mutual information for finite-state channels [Arnold, Vontobel, Loeliger, Kavčić, 2001, 2002, 2003] [Pfister, Siegel, 2001, 2003]

G(x,y)(c0,c1) = R(c0,c1) q(x0 ,y0|c0,c1),  (c0,c1)  Z Symbol Matrices For each symbol pair (x,y)  X x Y define a |Z|x|Z| matrix G(x,y) Where (c0,c1) are channel states at times (n,n+1) Each element corresponds to the joint probability of the symbols and channel transition G(x,y)(c0,c1) = R(c0,c1) q(x0 ,y0|c0,c1),  (c0,c1)  Z

Probabilities as Matrix Products Let m be the stationary distribution of the channel The matrices G are deterministic functions of the random pair (x,y)

Entropy as a Lyapunov Exponent The Shannon entropy is equivalent to the Lyapunov exponent for G(X,Y) Similar expressions exist for H(X), H(Y), H(X,Y)

Growth Rate Interpretation The typical set An is the set of sequences x1,…,xn satisfying By the AEP P(An)>1-e for sufficiently large n The Lyapunov exponent is the average rate of growth of the probability of a typical sequence In order to compute l(X) we need information about the “direction” of the system

Lyapunov Direction Vector The vector pn is the “direction” associated with l(X) for any m. Also defines the conditional channel state probability Vector has a number of interesting properties It is the standard prediction filter in hidden Markov models pn is a Markov chain if m is the stationary distribution for the channel) m G G ... G p = X X X = P ( Z | X n 1 2 n ) n m n + 1 || G G ... G || X X X 1 1 2 n

Random Perron-Frobenius Theory The vector p is the random Perron-Frobenius eigenvector associated with the random matrix GX For all n we have For the stationary version of p we have The Lyapunov exponent we wish to compute is

Technical Difficulties The Markov chain pn is not irreducible if the input/output symbols are discrete! Standard existence and uniqueness results cannot be applied in this setting We have shown that pn possesses a unique stationary distribution if the matrices GX are irreducible and aperiodic Proof exploits the contraction property of positive matrices

Computing Mutual Information Compute the Lyapunov exponents l(X), l(Y), and l(X,Y) as expectations (deterministic computation) Then mutual information can be expressed as We also prove continuity of the Lyapunov exponents on the domain q, R, hence

Simulation-Based Computation (Previous Work) Step 1: Simulate a long sequence of input/output symbols Step 2: Estimate entropy using Step 3: For sufficiently large n, assume that the sample-based entropy has converged. Problems with this approach: Need to characterize initialization bias and confidence intervals Standard theory doesn’t apply for discrete symbols

Simulation Traces for Computation of H(X,Y)

Rigorous Simulation Methodology We prove a new functional central limit theorem for sample entropy with discrete symbols A new confidence interval methodology for simulated estimates of entropy How good is our estimate? A method for bounding the initialization bias in sample entropy simulations How long do we have to run the simulation? Proofs involve techniques from stochastic processes and random matrix theory

Computational Complexity of Lyapunov Exponents Lyapunov exponents are notoriously difficult to compute regardless of computation method NP-complete problem [Tsitsiklis 1998] Dynamic systems driven by random matrices typically posses poor convergence properties Initial transients in simulations can linger for extremely long periods of time.

Conclusions Lyapunov exponents are a powerful new tool for computing the mutual information of finite-state channels Results permit rigorous computation, even in the case of discrete inputs and outputs Computational complexity is high, multiple computation methods are available New connection between Information Theory and Dynamic Systems provides information theorists with a new set of tools to apply to challenging problems