Relative complexity measures See also: R. Badii, A. Politi. Complexity. Cambridge University Press. 1997.

Slides:



Advertisements
Similar presentations
Lecture (7) Random Variables and Distribution Functions.
Advertisements

Entropy in the Quantum World Panagiotis Aleiferis EECS 598, Fall 2001.
© 2004 Prentice-Hall, Inc.Chap 5-1 Basic Business Statistics (9 th Edition) Chapter 5 Some Important Discrete Probability Distributions.
© 2003 Prentice-Hall, Inc.Chap 5-1 Basic Business Statistics (9 th Edition) Chapter 5 Some Important Discrete Probability Distributions.
Measurement. xxx n : 2 Measurement Recap: emergence and self-organisation are associated with complexity Can we identify systems that are complex? Can.
1 exercise in the previous class Determine the stationary probabilities. Compute the probability that 010 is produced. A BC 0/0.4 0/0.5 1/0.6 0/0.81/0.5.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 4-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Protein- Cytokine network reconstruction using information theory-based analysis Farzaneh Farhangmehr UCSD Presentation#3 July 25, 2011.
ENGS Lecture 8 ENGS 4 - Lecture 8 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,
Background Knowledge Brief Review on Counting,Counting, Probability,Probability, Statistics,Statistics, I. TheoryI. Theory.
Short review of probabilistic concepts Probability theory plays very important role in statistics. This lecture will give the short review of basic concepts.
Class notes for ISE 201 San Jose State University
Distributed Cluster Repair for OceanStore Irena Nadjakova and Arindam Chakrabarti Acknowledgements: Hakim Weatherspoon John Kubiatowicz.
INFM 718A / LBSC 705 Information For Decision Making Lecture 6.
Information Theory Rong Jin. Outline  Information  Entropy  Mutual information  Noisy channel model.
June 1, 2004Computer Security: Art and Science © Matt Bishop Slide #32-1 Chapter 32: Entropy and Uncertainty Conditional, joint probability Entropy.
1 Pertemuan 04 Peubah Acak dan Sebaran Peluang Matakuliah: A0392 – Statistik Ekonomi Tahun: 2006.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 22, 2005
CEEN-2131 Business Statistics: A Decision-Making Approach CEEN-2130/31/32 Using Probability and Probability Distributions.
Probability and Statistics Review Thursday Sep 11.
1 Random Variables and Discrete probability Distributions SESSION 2.
Modern Navigation Thomas Herring
X= {x 0, x 1,….,x J-1 } Y= {y 0, y 1, ….,y K-1 } Channel Finite set of input (X= {x 0, x 1,….,x J-1 }), and output (Y= {y 0, y 1,….,y K-1 }) alphabet.
Albert Gatt Corpora and Statistical Methods. Probability distributions Part 2.
If we measured a distribution P, what is the tree- dependent distribution P t that best approximates P? Search Space: All possible trees Goal: From all.
Basic Concepts in Information Theory
Probability and Probability Distributions
§1 Entropy and mutual information
2. Mathematical Foundations
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
© 2001 Prentice-Hall, Inc.Chap 5-1 BA 201 Lecture 7 The Probability Distribution for a Discrete Random Variable.
§4 Continuous source and Gaussian channel
Two Random Variables W&W, Chapter 5. Joint Distributions So far we have been talking about the probability of a single variable, or a variable conditional.
Chapter 11: The Data Survey Supplemental Material Jussi Ahola Laboratory of Computer and Information Science.
Random Variables. A random variable X is a real valued function defined on the sample space, X : S  R. The set { s  S : X ( s )  [ a, b ] is an event}.
 History and Relevance of probability theory Probability theory began with the study of game of chance that were related to gambling, like throwing a.
COMMUNICATION NETWORK. NOISE CHARACTERISTICS OF A CHANNEL 1.
Problem Introduction Chow’s Problem Solution Example Proof of correctness.
Chap 4-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 4 Using Probability and Probability.
JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Essential Information Theory I AI-lab
1 Information Theory Nathanael Paul Oct. 09, 2002.
Information Theory Basics What is information theory? A way to quantify information A lot of the theory comes from two worlds Channel.
Basic Principles (continuation) 1. A Quantitative Measure of Information As we already have realized, when a statistical experiment has n eqiuprobable.
Conditional Probability Mass Function. Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred.
Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.
1 Lecture 7 System Models Attributes of a man-made system. Concerns in the design of a distributed system Communication channels Entropy and mutual information.
CS851 – Biological Computing February 6, 2003 Nathanael Paul Randomness in Cellular Automata.
More on complexity measures Statistical complexity J. P. Crutchfield. The calculi of emergence. Physica D
1 Probability: Introduction Definitions,Definitions, Laws of ProbabilityLaws of Probability Random VariablesRandom Variables DistributionsDistributions.
Probability Any event occurring as a result of a random experiment will usually be denoted by a capital letter from the early part of the alphabet. e.g.
Chapter 2: Probability. Section 2.1: Basic Ideas Definition: An experiment is a process that results in an outcome that cannot be predicted in advance.
Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.
Lecture 3 Appendix 1 Computation of the conditional entropy.
JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Essential Information Theory II AI-lab
SEAC-3 J.Teuhola Information-Theoretic Foundations Founder: Claude Shannon, 1940’s Gives bounds for:  Ultimate data compression  Ultimate transmission.
Chap 4-1 Chapter 4 Using Probability and Probability Distributions.
(C) 2000, The University of Michigan 1 Language and Information Handout #2 September 21, 2000.
Statistical methods in NLP Course 2 Diana Trandab ă ț
Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.
Statistical methods in NLP Course 2
Chapter 4 Using Probability and Probability Distributions
Learning Tree Structures
Hiroki Sayama NECSI Summer School 2008 Week 3: Methods for the Study of Complex Systems Information Theory p I(p)
Discrete Random Variables: Basics
Discrete Random Variables: Basics
Independence and Counting
Independence and Counting
Discrete Random Variables: Basics
A random experiment gives rise to possible outcomes, but any particular outcome is uncertain – “random”. For example, tossing a coin… we know H or T will.
Presentation transcript:

Relative complexity measures See also: R. Badii, A. Politi. Complexity. Cambridge University Press. 1997

xxx n : 2 relative measures Information is rarely absolute  usually relative to some other information  the state of another, coupled, system  e.g. beacon signalling the fall of Troy and return of Agamemnon (~ 1200 BC)  one bit signal – a lot of “information” – in the coupled system  that one bit in a different context would not carry the same message  the state of the same system at another place or time  the flow of information through space/time : computation  e.g. earlier CA examples used entropy variance joint entropy; conditional entropy; mutual information

xxx n : 3 Useful concepts: joint probability (1) p(x,y) = probability that a pair of elements drawn at random from X,Y will have value x,y  finite sets X, Y of size N X N Y, with elements x i, y i p(x,y) = p(x) p(y)  X and Y are independent  e.g. toss a coin and throw a die  p cd ( ,5) = probability of tossing a head and throwing a 5  p cd ( ,5) = p(  ) p(5) = 1/2 × 1/6 X  Y

xxx n : 4 Useful concepts: joint probability (2) First die Second die c.f. probability of first die being 1 or total of both being 6 independent p 1st (1) p sum (6) = 1/6 × 5/36 = 5/216 probability of first die being 1 and total of both being 6 not independent p 12 (1,6) = 1/6 × 1/6 = 1/36 Dependent events 3457

xxx n : 5 Useful concepts: joint probability (3) p(x,y) = p(y)  X is determined completely by Y  probability of throwing an even number (E) and throwing a 6  6 is even, so if throw 6, throw even  p d (E,6) = p(6) = 1/

xxx n : 6 joint entropy: independent systems joint entropy of systems X and Y uses the relevant joint probability: H(X,Y) = H(X) + H(Y)  X and Y are independent  H(coin, die) = H(coin) + H(die) = log log 2 6 H(X,Y) = H(Y)  X is determined completely by Y  H(parity, die) = H(die) = log 2 6

xxx n : 7 Entropy of independent systems is additive H(X,Y) = H(X) + H(Y)  X and Y are independent consider a string S of N S characters, each of N C bits If a the characters of the string are independent  entropy of the string of bits = entropy of string of characters of bits:...… String of characters of bits possible different characters = 2 N C H C = log 2 N C = N C H = H C 1 + … + H C N S = N S H C = N S N C String of bits total number of bits = N S N C 2 N S N C possible strings H = log 2 N = N S N C

xxx n : 8 joint entropy: summary H(X,Y)H(X,Y) H(X)H(X)H(Y)H(Y) H(X,Y) = H(X) + H(Y) independent entropies are additive X and Y are independent H(X,Y) = H(Y) X is determined by Y

xxx n : 9 example: spatial CA states (1) random : each site randomly on or off  system X 4 possible states equal probabilities  system Y 4 possible statesequal probabilities  p (x i ) = ¼ p (y j ) = ¼  H(X) = 2 H(Y) = 2  2 bits of information in system X, 2 in Y  all 16 possible states for X, Y  all equally probable: p (x i, y j ) = 1/16  H(X,Y) = 4  4 bits of information in joint system X, Y  H(X,Y) = H(X) + H(Y) - independent systems system X system Y system X system Y

xxx n : 10 example: spatial CA states (2) semi-random: upper sites oscillates, lower sites random  system X 2 possible states equal probabilities  system Y 4 possible statesequal probabilities  p (x i ) = ½ p (y j ) = ¼  H(X) = 1 H(Y) = 2  1 bit of information in system X, 2 in Y  all 8 possible states for (X, Y)  all equally probable: p (x i, y j ) = 1/8  H(X,Y) = 3  3 bits of information in joint system X, Y  H(X,Y) = H(X) + H(Y) - independent systems system X system Y

xxx n : 11 example: spatial CA states (3) semi-random: upper sites oscillate, lower random  But select different X and Y:  system X 4 possible states equal probabilities  system Y 4 possible statesequal probabilities  H(X) = 2 H(Y) = 2  2 bits of information in system X, 2 in Y  only 8 possible states for (X, Y)  all equally probable: p (x i, y j ) = 1/8  H(X,Y) = 3  3 bits of information in joint system X, Y  H(X,Y)  H(X) + H(Y) - not independent systems system X system Y

xxx n : 12 example: spatial CA states (4) Oscillating sites  system X 2 possible states equal probabilities  system Y 2 possible statesequal probabilities  H(X) = 1 H(Y) = 1  1 bit of information in system X, 1 in Y  only 2 possible states for (X, Y)  equally probable: p (x i, y j ) = ½  H(X,Y) = 1  1 bit of information in joint system X, Y  H(X,Y)  H(X) + H(Y) - not independent systems system X system Y system X system Y

xxx n : 13 Another example: temporal CA states (1) random : each site randomly on or off  system X = tile at time t 16 possible states equal probabilities  system Y at t + 1: 16 possible statesequal probabilities  p (x i ) = 16 p (y j ) = 16  H(X) = 4 H(Y) = 4  4 bits of information in system X, 4 in Y  all 16 2 possible states for X, Y  all equally probable: p (x i, y j ) = 1/256  H(X,Y) = 8  8 bits of information in joint system X, Y  H(X,Y) = H(X) + H(Y) -- so independent systems

xxx n : 14 Another example: temporal CA states (2) semi-random: e.g. a rule that causes upper sites to oscillate, lower random  system X = tile at time t 8 possible statesequal probabilities  system Y at t possible statesequal probabilities  p (x i ) = 1/8 p (y j ) = 1/8  H(X) = 3 H(Y) = 3  3 bits of information in system X, 3 in Y  8x4 possible states for (X, Y), all equally probable: p (x i, y j ) = 1/2 5  H(X,Y) = 5  5 bits of information in joint system X, Y  H(X,Y)  H(X) + H(Y) -- not independent systems

xxx n : 15 Another example: temporal CA states (3) Oscillating (rule)  system X = tile at time t 2 possible statesequal probabilities  system Y at t possible statesequal probabilities  H(X) = 1 H(Y) = 1  1 bit of information in system X, 1 in Y  only 2 possible states for (X, Y)  equally probable: p (x i, y j ) = ½  H(X,Y) = 1  1 bit of information in joint system X, Y  H(X,Y)  H(X) + H(Y) - not independent systems

xxx n : 16 Useful concepts: conditional probability p(x|y) = probability that an element drawn from X has value x, given that y occurs p(x|y) = p(x)  X and Y are independent  p cd (H|6) = probability of tossing a head given a 6 is thrown on a die  p cd (H|6) = p(H) = 1/2 useful identity:  p 12 (1|6) = p 12 (1,6) /p 2 (6) = (1/36) / (5/36) = 1/5 c.f. probability of first die being 1 independent p 1st (1) = 1/6 probability of first die being 1 given total of both is 6 not independent p 12 (1|6) = 1/5

xxx n : 17 conditional entropy conditional entropy is entropy due to X, given we know Y  H(X | Y) = H(X)  X and Y are independent  H(X | Y) = 0  X is determined completely by Y equivalently : the joint entropy of X and Y is the entropy of Y plus whatever entropy is left in X once we know Y substituting for H(X,Y) and H(Y), after a little algebra:

xxx n : 18 mutual information the mutual information in two systems is  I(X;Y) = 0  X and Y are independent in terms of conditional entropy:  I(X;Y) = H(X)  X is determined completely by Y in terms of probabilities:

xxx n : 19 conditional entropy, mutual information H(X)H(X)H(Y)H(Y) H(X|Y) = H(X) I(X;Y) = 0 H(X|Y) = 0 I(X;Y) = H(X) H(X|Y)H(Y|X)I(X;Y) X and Y are independent X is determined by Y

xxx n : 20 example: spatial CA states (1) random : each site randomly on or off  H(X) = 2 H(Y) = 2 H(X,Y) = 4  H(X,Y) = H(X) + H(Y)  independent systems  I(X;Y) = H(X) + H(Y)  H(X,Y) = 0  no mutual information system X system Y system X system Y

xxx n : 21 example: spatial CA states (2) semi-random: upper sites oscillating, lower random  H(X) = 1 H(Y) = 2 H(X,Y) = 3  H(X,Y) = H(X) + H(Y)  independent systems  I(X;Y) = H(X) + H(Y)  H(X,Y) = 0  no mutual information system X system Y

xxx n : 22 example: spatial CA states (3) semi-random: upper sites oscillating, lower random  H(X) = 2 H(Y) = 2 H(X,Y) = 3  H(X,Y)  H(X) + H(Y)  not independent systems  I(X;Y) = H(X) + H(Y)  H(X,Y) = 1  one bit of mutual information system X system Y

xxx n : 23 example: spatial CA states (4) oscillating  H(X) = 1 H(Y) = 1 H(X,Y) = 1  H(X,Y)  H(X) + H(Y)  not independent systems  I(X;Y) = H(X) + H(Y)  H(X,Y) = 1  one bit of mutual information system X system Y system X system Y

xxx n : 24 Another example: temporal CA states (1) random : each site randomly on or off  H(X) = 4H(Y) = 4 H(X,Y) = 8  I(X;Y) = H(X) + H(Y)  H(X,Y) = 0 : no mutual information semi-random : upper sites oscillating, lower random  H(X) = 3 H(Y) = 3 H(X,Y) = 5  I(X;Y) = H(X) + H(Y)  H(X,Y) = 1 : one bit of mutual information oscillating  H(X) = 1 H(Y) = 1 H(X,Y) = 1  I(X;Y) = H(X) + H(Y)  H(X,Y) = 1 : one bit of mutual information

xxx n : 25 temporal CA states and Langton’s randomly generated 8-state, 2D CAs I(A; B) is the mutual information between a cell, A, and itself at the next time step, B relationship between Langton’s and mutual information, I  I is low at extreme s (corresponding to orderly class 1 and 2 CAs, and to chaotic class 3 CAs)  I is highest at intermediate (class 4 CAs) interesting behaviour depends on transmission of information H.A. Gutowitz and C.G. Langton. Methods for Designing Cellular Automata with “Interesting” Behavior.

xxx n : 26 Mutual information in RBNs N = 50 (nodes); K = 3 (inputs per node) p = average proportion of 1s in the randomly generated Boolean functions at the nodes mutual information reveals a transition H(t+1) H(t+1|t) H H I pp B. Luque, A. Ferrera. Measuring Mutual Information in Random Boolean Networks. adap-org/

xxx n : 27 low mutual information MI can be low because: 1.the correlation is low 2.or, the entropy is low

xxx n : 28 Mutual information in Ising model e.g. Ising model mutual Information (MI) between time steps 1.low temperature, low MI because low entropy 2.mid temperature, high MI (around phase transition) 3. high temperature, low MI because low correlation

xxx n : 29 Mutual information and evolution Consider information in a genome (G) in the context of information in the environment (E)  same genome would be less fit in a different environment evolution increases mutual information between E an G  fitter organisms exploit the environment better, so must contain more info about their environment  total information in a genome can change, as genome changes size, etc H(G 1 |E) I(G3;E)I(G3;E) Env E G1G1 G2G2 G3G3 increasing fitness of genome G in Env E C. Adami. What is complexity? BioEssays, 24:1085–1094, 2002

xxx n : 30 example: emergence consider information in the high-level description (S) in the context of information in the low-level description (E)  same high-level model in a different low-level environment wouldn’t be as good Env E S1S1 S2S2 S3S3 increasing fit of model S to Env E I(S3;E)I(S3;E) H(S 1 |E) A. Weeks, S. Stepney, F. A. C. Polack. Neutral Emergence: a proposal. Symposium on Complex Systems Engineering, RAND Corporation, Santa Monica, CA, USA, January 2007 modelling /engineering as increasing mutual information  small H(S|E)  good model  large H(E|S)  redundancy  Could use MI as a fitness function to search for better models