Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 3, CS5671 Information theory Uncertainty –Can we measure it? –Can we work with it? Information (Uncertainty == Information)? Related concepts Surprise,

Similar presentations


Presentation on theme: "Lecture 3, CS5671 Information theory Uncertainty –Can we measure it? –Can we work with it? Information (Uncertainty == Information)? Related concepts Surprise,"— Presentation transcript:

1 Lecture 3, CS5671 Information theory Uncertainty –Can we measure it? –Can we work with it? Information (Uncertainty == Information)? Related concepts Surprise, Surprise!

2 Lecture 3, CS5672 Uncertainty Quantum mechanics –Heisenberg principle –Is everything still at absolute zero? –What is the temperature of a black hole? Mathematical uncertainty – Godel –Some propositions are not amenable to mathematical proof Can you guarantee that a given computer program will ever terminate? – Turing –The Halting problem Intractable problems –NP complete, NP hard Chaos theory –Weather forecasting (“guess casting”)

3 Lecture 3, CS5673 Can we measure/work with uncertainty? Quantum mechanics –Planck’s constant represents the lower bound on uncertainty in quantum mechanics –Satisfactory explanation of numerous observations that defy classical physics Undecidability –Many problems worth deciding upon can still be decided upon Computational Intractability –Important to correctly classifying a problem (P, NP,NPC, NPH) –Work with small n –Find heuristic and locally optimally solutions Chaos theory –Still allows for prediction in short time (or other parameter) domains –“Weather forecaster makes or breaks viewer rating”

4 Lecture 3, CS5674 Information Common interpretation –Data Information as capacity –1 bit for Boolean data, 8 for a word –2 bits for nucleic acid character/6 bits for codon/4.3 bits for amino acid character –8 bit wide channel transmission capacity –8 bit ASCII Information as information gained –“received” the sequence ATGC, got 8 bits – “received” the sequence A?GC, got 6 bits –“received” the sequence “NO CLASS TODAY”, got 112 bits and a bonus surge of joy! Information as additional information gained –I know she’ll be at the party, in a red or blue dress Seen at party, but too far off to see color of dress => No information gained Seen in red or blue dress => 1 bit of information gained

5 Lecture 3, CS5675 Information == Uncertainty Information as uncertainty –Higher the uncertainty, higher the potential information to be gained –Sequence of alphabetical characters implies uncertainty of 5.7 bits/character –Sequence of amino acids implies uncertainty of 4.3 bits/character Higher the noise, lesser the information (gained) Higher the noise, lesser the uncertainty!

6 Lecture 3, CS5676 Related concepts Uncertainty Information Complexity –The more bits needed to specify something, higher the complexity Probability –If all messages are equally probable, information (gained) is maximum => Uniform probability distribution has the highest information (is most uncertain) –If a particular message is received most of the time, information (gained) is low => Biased distribution has lower information Entropy –Degree of disorder/Number of possible states Surprise

7 Lecture 3, CS5677 Surprise, Surprise! Response to information received Degrees of surprise –“The instructor is in FH302 right now” I already know that. Yawn…… (Certainty, Foregone conclusion) –“The instructor is going to leave the room between 1:45 and 2:00 pm” That’s about usual (Likely) –“The instructor’s research will change the world and he’s getting the Award for best teaching this semester” Wow! (Unlikely, but not impossible. Probability = 10 -1000000000000000000 ) –“The instructor is actually a robotic machine that teaches machine learning” No way !! (Impossible, Disbelief)

8 Lecture 3, CS5678 Measuring surprise Measures: Level of adrenaline, muscular activity or volume of voice Lower the P(x i ), higher the surprise Surprise = 1/P (x i )? –Magnitude OK –Not defined for impossible events (By conventional interpretation of surprise) –But Surprise = 1 for certain events, 2 for half likely things? Surprise = Log (P(x i )) –0 for certain events –But negative for most events Surprise = - Log (P(x i )) –0 for certain events –Positive value –Proportional to degree of surprise –If base 2 logarithm used, expressed in bits

9 Lecture 3, CS5679 Information Theory Surprise/Surprisal Entropy Relative entropy –Versus differences in information content –Versus expectation Mutual Information Conditional Entropy

10 Lecture 3, CS56710 Surprise Surprise = - log P(x i ) “Average” surprise = Expectation (Surprise) =  i P(x i ) (- log P(x i )) = -  i P(x i ) log P(x i ) = Uncertainty = Entropy = H (P) Uncertainty of a coin toss = 1 bit Uncertainty of a double headed coin toss = 0 bit For uniform distribution, entropy is maximal For distribution where only a particular event occurs and others never do, entropy is zero Between these two extremes for all other distributions

11 Lecture 3, CS56711 Entropy For uniform probability distributions, entropy increases monotonically with number of possible outcomes –Entropy for coin toss, nucleic acid base, amino acid is 1, 2 and 4.3 bits respectively –Which is why we win small lucky draws but not the grand sweepstakes Can entropy be negative? Zero?

12 Lecture 3, CS56712 Relative Entropy H (P,Q) =  i P(x i ) log (P(x i )/Q(x i )) Cross-entropy/Kullback-Liebler ‘distance’ Difference in entropy between two distributions –Early poll ratings of 2 candidates P: (75%,25%) –Later poll ratings of 2 candidates Q: (50%,50%) –H (P,Q) = 0.19 bit; H (Q,P) = 0.21 bit “One-way” asymmetric distance along “axis of uncertainty”

13 Lecture 3, CS56713 Relative Entropy == Difference in information content? Information may be gained or lost between two uncertain states H (Q) – H (P) = 1 – 0.8 = 0.2 bit = H (P,Q) Difference in information content equals H (P,Q) only if Q is uniform If Q = (0.4,0.6) then H (P,Q) = 0.35 bit and H (Q) – H (P) = 0.97 – 0.8 = 0.2 bit ≠ H (P,Q) Is information gained always positive? Can it be zero or negative? Is relative entropy always positive? Can it be zero or negative?

14 Lecture 3, CS56714 Relative Entropy == Expectation? For sequence alignment scores expressed as log- odds ratios – Random variable: How unusual is this amino acid? –Distribution P: Domain specific distribution model Biased probability of amino acid occurrences –Distribution Q: Null model Uniform probability of amino acid occurrences Expected score for occurrence of amino acid =  a P(a) log (P(a)/Q(a)) Generally applicable to all log-odds representations

15 Lecture 3, CS56715 Mutual Information Related to the independence of variables –Does P(a i,b i ) equal P(a i )P(b i )? Given knowledge of variable b, does it change the uncertainty of variable a –Do I have a better idea of what value a will take, or am I still in the dark? Mutual Information = Relative entropy between P(a i,b i ) and P(a i )P(b i ) = M (a,b) =  i P(a i,b i ) log (P(a i,b i ) / [P(a i )P(b i )])

16 Lecture 3, CS56716 Conditional Entropy Conditional Entropy of successive positions in a random sequence = 2 bits for DNA, 4.3 for protein Conditional Entropy of DNA base pairs = 0 Probability that a student is present in the room and is taking this course, given that the student is present? (Conditional probability in terms of information content: Still some residual uncertainty)


Download ppt "Lecture 3, CS5671 Information theory Uncertainty –Can we measure it? –Can we work with it? Information (Uncertainty == Information)? Related concepts Surprise,"

Similar presentations


Ads by Google