More on complexity measures Statistical complexity J. P. Crutchfield. The calculi of emergence. Physica D. 1994.

Slides:



Advertisements
Similar presentations
Time averages and ensemble averages
Advertisements

Automata Theory Part 1: Introduction & NFA November 2002.
Lecture 2: Basic Information Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
CS 345: Chapter 9 Algorithmic Universality and Its Robustness
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Supervisory Control of Hybrid Systems Written by X. D. Koutsoukos et al. Presented by Wu, Jian 04/16/2002.
Information Theory EE322 Al-Sanie.
1 Dynamic Programming Week #4. 2 Introduction Dynamic Programming (DP) –refers to a collection of algorithms –has a high computational complexity –assumes.
Use of Kalman filters in time and frequency analysis John Davis 1st May 2011.
Planning under Uncertainty
Simulation Where real stuff starts. ToC 1.What, transience, stationarity 2.How, discrete event, recurrence 3.Accuracy of output 4.Monte Carlo 5.Random.
Mutual Information Mathematical Biology Seminar
Fundamental limits in Information Theory Chapter 10 :
Introduction to chaotic dynamics
Spatial and Temporal Data Mining
R C Ball, Physics Theory Group and Centre for Complexity Science University of Warwick R S MacKay, Maths M Diakonova, Physics&Complexity Emergence in Quantitative.
MAE 552 Heuristic Optimization
Simulation.
MAE 552 – Heuristic Optimization Lecture 26 April 1, 2002 Topic:Branch and Bound.
Regular Expressions and Automata Chapter 2. Regular Expressions Standard notation for characterizing text sequences Used in all kinds of text processing.
Itti: CS564 - Brain Theory and Artificial Intelligence. Systems Concepts 1 CS564 - Brain Theory and Artificial Intelligence University of Southern California.
Admin stuff. Questionnaire Name Math courses taken so far General academic trend (major) General interests What about Chaos interests you the most?
Information Theory and Security
Noise, Information Theory, and Entropy
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
Radial Basis Function Networks
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Dr. Richard Young Optronic Laboratories, Inc..  Uncertainty budgets are a growing requirement of measurements.  Multiple measurements are generally.
1 GEM2505M Frederick H. Willeboordse Taming Chaos.
EM and expected complete log-likelihood Mixture of Experts
EME: Information Theoretic views of emergence and self-organisation Continuing the search for useful definitions of emergence and self organisation.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
How and why things crackle We expect that there ought to be a simple, underlying reason that earthquakes occur on all different sizes. The very small earthquake.
Generic Approaches to Model Validation Presented at Growth Model User’s Group August 10, 2005 David K. Walters.
Algorithmic Information Theory and the Emergence of Order Entropy and replication Sean Devine victoria management school.
Ch 9.8: Chaos and Strange Attractors: The Lorenz Equations
Lecture 1 Computation and Languages CS311 Fall 2012.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Dynamical Systems Model of the Simple Genetic Algorithm Introduction to Michael Vose’s Theory Rafal Kicinger Summer Lecture Series 2002.
STATISTICAL COMPLEXITY ANALYSIS Dr. Dmitry Nerukh Giorgos Karvounis.
Jochen Triesch, UC San Diego, 1 Motivation: natural processes unfold over time: swinging of a pendulum decay of radioactive.
Discussion of time series and panel models
Motor Control. Beyond babbling Three problems with motor babbling: –Random exploration is slow –Error-based learning algorithms are faster but error signals.
Some figures adapted from a 2004 Lecture by Larry Liebovitch, Ph.D. Chaos BIOL/CMSC 361: Emergence 1/29/08.
CHAPTER 5 SIGNAL SPACE ANALYSIS
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
ECE-7000: Nonlinear Dynamical Systems Overfitting and model costs Overfitting  The more free parameters a model has, the better it can be adapted.
Computer simulation Sep. 9, QUIZ 2 Determine whether the following experiments have discrete or continuous out comes A fair die is tossed and the.
Chapter 8: Simple Linear Regression Yang Zhenlin.
CS851 – Biological Computing February 6, 2003 Nathanael Paul Randomness in Cellular Automata.
Long time correlation due to high-dimensional chaos in globally coupled tent map system Tsuyoshi Chawanya Department of Pure and Applied Mathematics, Graduate.
Advanced Engineering Mathematics, 7 th Edition Peter V. O’Neil © 2012 Cengage Learning Engineering. All Rights Reserved. CHAPTER 4 Series Solutions.
Sharpening Occam’s razor with Quantum Mechanics SISSA Journal Club Matteo Marcuzzi 8th April, 2011.
ECE-7000: Nonlinear Dynamical Systems 2. Linear tools and general considerations 2.1 Stationarity and sampling - In principle, the more a scientific measurement.
1Causal Performance Models Causal Models for Performance Analysis of Computer Systems Jan Lemeire TELE lab May 24 th 2006.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Information complexity - Presented to HCI group. School of Computer Science. University of Oklahoma.
Relative complexity measures See also: R. Badii, A. Politi. Complexity. Cambridge University Press
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Modelling Complex Systems Video 4: A simple example in a complex way.
Theory of Computation Automata Theory Dr. Ayman Srour.
The Cournot duopoly Kopel Model
Introduction to chaotic dynamics
CHAPTER 29: Multiple Regression*
Introduction to chaotic dynamics
CONTEXT DEPENDENT CLASSIFICATION
Markov Random Fields Presented by: Vladan Radosavljevic.
A measurable definition of Emergence in quantitative systems
Presentation transcript:

More on complexity measures Statistical complexity J. P. Crutchfield. The calculi of emergence. Physica D. 1994

xxx n : 2 complex  random Entropy and algorithmic complexity associate maximum complexity with randomness  pure order and pure noise are not “complex”  complex systems have  intricate structure on multiple scales  repeating patterns  continual variation, …  complexity lies between order and chaos  Wolfram’s class 4 CAs  Langton’s “edge of chaos” Mutual information shows complexity  RBN transition example (also k-SAT):  are there other measures like this? randomness, H C H I

HIDDEN: statistical complexity

xxx n : 4 when randomness = noise the measures so far assume that randomness is information  even logical depth  Randomness is not very “deep” information Sometimes, the “randomness” actually is information  the output of good compression algorithms is highly “random”  else the remaining structure could be used to compress it more “any sufficiently advanced communication is indistinguishable from noise”  crypto functions output “random” strings  else the remaining structure could be used to break the code

xxx n : 5 randomness and noise in the real world, (some) randomness is just “noise”  of no interest, carrying no “information” these pictures are all different microscopically, but all just “white noise” macroscopically  the differences are not important  information measures “overfit” noise as data  this kind of noisy randomness is intuitively simple  a small change to the noise, is just the same noise

xxx n : 6 to model a coin toss … how would you create an ensemble of random bit strings?  … just toss a coin!  in other words, use a stochastic automaton that’s quite a short description  conforming to our intuition that random strings are not very complex H | ½T | ½

xxx n : 7 Statistical complexity In certain circumstances, we can use theory of discrete computation and statistics to create equivalent models  Needs a discrete stochastic process that is conditionally stable  Future states do not depend on time, but only on previous states Complexity, C is the size of a minimal model yielding a finite description that is at the least computationally powerful level  infer the machine from data ensemble  The collection of observed strings generated by process of interest Statistical complexity ignores the “computational resource”  So randomness and periodicity have zero complexity J. P. Crutchfield. The calculi of emergence. Physica D. 1994

xxx n : 8 the inferred minimal model is called an   - machine  minimal model  size of the minimal stochastic machine  finite description  size of machine does not grow unboundedly with the size of the state  least computationally powerful level  e.g. finite state automaton, stack machine, UTM Intuition:  Each observation represents a state, which incorporates an indirect indication of the hidden environment  States that lead to the same next state help to predict the environment  Causal states  An   – machine captures a minimal sequence of causal states J. P. Crutchfield. The calculi of emergence. Physica D. 1994

xxx n : 9 Consider a simple process The process is a simple automaton A system with a two-symbol alphabet, α = {0,1}  Two recurrent states, A and B State A can, with equal probability,  emit a 0 and return to itself  emit a 1 and go to state B State B always emits 1 and goes to A But all we have is a black-box process C. R. Shalizi, K. L. Shalizi, J. P. Crutchfield. An algorithm for pattern discovery in time series This is Weiss’s “even process”: a 1 cannot be completely surrounded by other 1s

xxx n : 10 Record the process output We need to deduce the automaton from data observations Run the process many times  To get statistically useful data  e.g runs to word length = 4 C. R. Shalizi, K. L. Shalizi, J. P. Crutchfield. An algorithm for pattern discovery in time series

xxx n : 11 example : “even” process (1) Work out probabilities and infer a machine  “homogonisation” because homogeneous states are merged  Merge is the main source of error – need a lot of observations For full calculation, see: C. R. Shalizi, K. L. Shalizi, J. P. Crutchfield. An algorithm for pattern discovery in time series

xxx n : 12 example : “even” process (2) Check all states have incoming transitions  Reachability Remove transient states  A and B form a transient cycle  The only exit is to produce a 0 and go to C  Every C state goes to C (adding 0) or D (adding 1)  Every state in D goes to C (adding 1)  “Determinisation” Final   – machine has states C and D only C. R. Shalizi, K. L. Shalizi, J. P. Crutchfield. An algorithm for pattern discovery in time series

xxx n : 13   – machines and stability Replicating a process in an   – machine requires stability  Previous states aren’t always “causal” in unstable systems Stability is related to temporal scale  Recall flocking  At the level of birds, apparently arbitrary motion, few patterns  At the level of the flock, coherent, apparently co-ordinated motion So, can change level (scale) to one where there is stability  A bit like choosing the level to represent in differential equations We can tell a system is not suitably stable if the inferred   – machine changes with the word length  That is, as the process runs over time, the   – machine has to change to express its statistical behaviour

xxx n : 14 HIDDEN inferring  - machines start at the lowest level of the computational hierarchy, and infer a model (a stochastic finite automata:  - machine) from an ensemble  there are efficient algorithms to do this investigate how the machine size varies with length of strings L in the ensemble if the machines continue increasing in size as L increases, then increase the computational level of the machines why might the size increase…?

xxx n : 15   – machines and continuous systems Most natural systems are continuous Symbolic dynamics used to extract discrete time systems  Partition the state space and label each partition with a symbol  Over time, each point in the state space has a sequence of symbols  Its symbol at each observation point in its past and future  Loses information  Often deterministic continuous system gives stochastic discrete system dynamics.htmlhttp://vserver1.cscs.lsa.umich.edu/~crshalizi/notabene/symbolic- dynamics.html (and citations) Ґ Ж Ц Ђ Ϡ a Ґ Ж Ц Ђ Ϡ a Point a is in region Ж at time t Over a series of discrete time observations, a moves through different regions:... Ж Ж Ж ϠϠЂЂЂЂ Ж Ж … a a a a a a a a

xxx n : 16 Symbolic dynamics (1) recast a continuous (space / time) dynamical system into a discrete one partition the continuous phase space U into a finite number of sets, each labelled with a unique element from a finite alphabet  : U i  observe the system at discretised time intervals, and note the label of the set U i it occupies, to give a sequence of symbols:  d c a a b d d a a …  rationale : sequences represent “results” of “measurements” of the underlying system a b c d

xxx n : 17 Symbolic dynamics (2) the symbolic dynamics of the system is the set of all sequences that can be produced (different initial conditions, etc)  defines a language analyse the dynamics of these sequences  using entropy, mutual information,  - machines, etc. e.g. Crutchfield’s analysis of the complexity and entropy of the logistic map: see J. P. Crutchfield. The calculi of emergence. Physica D a b c d 3.5 <  < 4 = …

xxx n : 18 HIDDEN example : logistic map simple iterated equation Bifurcation diagram of logistic map:  Plot, as a function of λ, a series of values for x n obtained by starting with a random value x 0, iterating many times, and discarding points before the iterates converge to the attractor  i.e. set of fixed points of x n corresponding to a value of λ, plotted for increasing values of λ 1 <  < <  < 4 = …

xxx n : 19 HIDDEN Symbolic dynamics to analyse logistic map discretise the continuous logistic trajectory x 0 x 1 x 2 x 3 x 4 … into a bit string b 0 b 1 b 2 b 3 b 4 …  partition x space [0,1] into [0, ½ ), labelled 0, and [ ½,1], labelled 1  so: b n = if x n  [0, ½ ) then 0 else 1 0 1

xxx n : 20 HIDDEN logistic map (3) for each  for each L  produce an ensemble of bitstrings of length L from the discretised logistic process  infer the model (finite state automaton,  -machine) that describes this ensemble  calculate the statistical complexity C (size of  -machine) and entropy H (of the ensemble) [Crutchfield 1994]

xxx n : 21 HIDDEN logistic map (4) example: a 47 state machine constructed from an L = 16 ensemble with = … (the first period doubling onset of chaos) [Crutchfield 1994, fig 7a]

xxx n : 22 HIDDEN logistic map (5) results for L = 16 ; 193 different values of periodic chaotic values of C grows without bound at c = … : need to move to a higher level computational machine (stack machine) [Crutchfield 1994, fig 6]

xxx n : 23 Analysis results for logistic map periodic behaviour, small H, small C  automaton size = the period chaotic behaviour, large H, small C  a small automaton captures the random behaviour (“coin toss”) randomness, H C J. P. Crutchfield. The calculi of emergence. Physica D Complex behaviour, mid H, large C  near the transition from periodic to chaotic behaviour (“edge of chaos”) there is structure “on all scales”

HIDDEN multi-information : hierarchical complexity

xxx n : 25 Another complexity measure: multi-information recall mutual information between two systems:  where H(X) is the entropy of system X  H(X,Y) is the joint entropy of the systems X and Y  I = 0 if X and Y are independent For subsystems X 1, X 2, and the overall system X 1,2,this gives:

xxx n : 26 multi-information (1) multi-information generalises this to n subsystems of an overall system System X  = X 1,2,…n Subsystems X 1, X 2, …, X n  where MI = 0 if all the subsystems are independent M. Studeny, J. Vejnarova. The multiinfomration function as a tool for measuring stochastic dependence. In Learning in Graphical Models. Kluwer, 1998

xxx n : 27 multi-information (2) now consider partitioning the top level system X  into two subcomponents X a, comprising subsystems X 1, …, X k, and X b comprising subsystems X k+1, …, X n the relationship between the multi-information of the whole system and its two big components is  rearranging, and substituting so (unless the subsystems X a and X b are independent) : the MI of the whole is bigger than the parts X1X1 X2X2 … XkXk … XnXn XaXa XbXb XX

xxx n : 28 multi-information (3) instead of considering one big subcomponent comprising k subsystems, now consider all possible such big subcomponents of k subsystems, each comprising subsystems Xi 1, …, Xi k consider the average multi-information of these,  note that and given the MI of the whole is bigger than the parts, we have so the MI increases with the size of the subsystems considered X1X1 X2X2 X3X3 XX X1X1 X2X2 X1X1 X3X3 X2X2 X3X3 X12X12 X22X22 X32X32

xxx n : 29 multi-information = complexity complexity is the difference between actual increase of this average, and a linear increase: C  0 C is low if the system is random  all subsystems are independent, and so MI = 0 C is low if the system is homogeneously structured  average MI increases linearly C is high in the intermediate case, inhomogeneous groupings and clumpings  high, non-linearly increasing, average MI s G. Tononi, et al. A measure for brain complexity: relating functional segregation and integration in the nervous system. PNAS 91: , 1994

HIDDEN which complexity?

xxx n : 31 which complexity measure? unconditional entropy is probably not appropriate  counts randomness as maximally “complex”  entropy variance readily calculated  between different space / time parts of self algorithmic complexity K  useful for theoretical analyses, but not for analysing practical results conditional entropy/mutual information/multi-information  between two systems  which can be between different space / time parts of self  appears to be maximised around interesting transitions  or between hierarchical levels of a system statistical complexity C  of single system; appears to be maximised at “edge of chaos”

xxx n : 32 Some general sources R. Badii, A. Politi. Complexity. Cambridge University Press J. P. Sethna. Statistical mechanics. Oxford University Press. 2006