Lecture 3, CS5671 Information theory Uncertainty –Can we measure it? –Can we work with it? Information (Uncertainty == Information)? Related concepts Surprise,

Slides:



Advertisements
Similar presentations
Decision Making Under Uncertainty CSE 495 Resources: –Russell and Norwick’s book.
Advertisements

Information Theory EE322 Al-Sanie.
Random Variable A random variable X is a function that assign a real number, X(ζ), to each outcome ζ in the sample space of a random experiment. Domain.
Section 5.1 and 5.2 Probability
CS420 lecture one Problems, algorithms, decidability, tractability.
OUTLINE Scoring Matrices Probability of matching runs Quality of a database match.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Probability Of An Event Dhon G. Dungca, M.Eng’g..
Chain Rules for Entropy
Middle Term Exam 03/04, in class. Project It is a team work No more than 2 people for each team Define a project of your own Otherwise, I will assign.
ENGS Lecture 8 ENGS 4 - Lecture 8 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,
Background Knowledge Brief Review on Counting,Counting, Probability,Probability, Statistics,Statistics, I. TheoryI. Theory.
Heuristic alignment algorithms and cost matrices
A Bit of Information Theory Unsupervised Learning Working Group Assaf Oron, Oct Based mostly upon: Cover & Thomas, “Elements of Inf. Theory”,
Position-Specific Substitution Matrices. PSSM A regular substitution matrix uses the same scores for any given pair of amino acids regardless of where.
1 Chapter 5 A Measure of Information. 2 Outline 5.1 Axioms for the uncertainty measure 5.2 Two Interpretations of the uncertainty function 5.3 Properties.
Information Theory Rong Jin. Outline  Information  Entropy  Mutual information  Noisy channel model.
Lecture 2: Basic Information Theory Thinh Nguyen Oregon State University.
Applicable Mathematics “Probability”
Noise, Information Theory, and Entropy
1 Statistical NLP: Lecture 5 Mathematical Foundations II: Information Theory.
Basic Concepts in Information Theory
Some basic concepts of Information Theory and Entropy
STATISTIC & INFORMATION THEORY (CSNB134)
2. Mathematical Foundations
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
1 CY1B2 Statistics Aims: To introduce basic statistics. Outcomes: To understand some fundamental concepts in statistics, and be able to apply some probability.
Standardized Score, probability & Normal Distribution
Theory of Probability Statistics for Business and Economics.
● Uncertainties abound in life. (e.g. What's the gas price going to be next week? Is your lottery ticket going to win the jackpot? What's the presidential.
Institute for Experimental Physics University of Vienna Institute for Quantum Optics and Quantum Information Austrian Academy of Sciences Undecidability.
Lecture 2, CS5671 Review of Probability & Statistics Living with (Im)probability Probability Theory versus Calculus Permutations & Combinations Independence.
COMMUNICATION NETWORK. NOISE CHARACTERISTICS OF A CHANNEL 1.
JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Essential Information Theory I AI-lab
Week 21 Conditional Probability Idea – have performed a chance experiment but don’t know the outcome (ω), but have some partial information (event A) about.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Coding Theory Efficient and Reliable Transfer of Information
Mathematical Foundations Elementary Probability Theory Essential Information Theory Updated 11/11/2005.
Prepared by: Engr. Jo-Ann C. Viñas 1 MODULE 2 ENTROPY.
Lecture 2 Molecular dynamics simulates a system by numerically following the path of all particles in phase space as a function of time the time T must.
Math I.  Probability is the chance that something will happen.  Probability is most often expressed as a fraction, a decimal, a percent, or can also.
QR 32 Section #6 November 03, 2008 TA: Victoria Liublinska
Basic Concepts of Probability
Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.
L56 – Discrete Random Variables, Distributions & Expected Values
Lecture 6 Dustin Lueker.  Standardized measure of variation ◦ Idea  A standard deviation of 10 may indicate great variability or small variability,
MATH 256 Probability and Random Processes Yrd. Doç. Dr. Didem Kivanc Tureli 14/10/2011Lecture 3 OKAN UNIVERSITY.
Week 21 Rules of Probability for all Corollary: The probability of the union of any two events A and B is Proof: … If then, Proof:
Review of statistical modeling and probability theory Alan Moses ML4bio.
1 Chapter 10 Probability. Chapter 102 Idea of Probability u Probability is the science of chance behavior u Chance behavior is unpredictable in the short.
Basic Concepts of Information Theory A measure of uncertainty. Entropy. 1.
3/7/20161 Now it’s time to look at… Discrete Probability.
Essential Probability & Statistics (Lecture for CS397-CXZ Algorithms in Bioinformatics) Jan. 23, 2004 ChengXiang Zhai Department of Computer Science University.
SEAC-3 J.Teuhola Information-Theoretic Foundations Founder: Claude Shannon, 1940’s Gives bounds for:  Ultimate data compression  Ultimate transmission.
The Law of Averages. What does the law of average say? We know that, from the definition of probability, in the long run the frequency of some event will.
Mutually Exclusive & Independence PSME 95 – Final Project.
Probability Distributions ( 확률분포 ) Chapter 5. 2 모든 가능한 ( 확률 ) 변수의 값에 대해 확률을 할당하는 체계 X 가 1, 2, …, 6 의 값을 가진다면 이 6 개 변수 값에 확률을 할당하는 함수 Definition.
Statistical methods in NLP Course 2 Diana Trandab ă ț
Statistical methods in NLP Course 2
PROBABILITY AND PROBABILITY RULES
What is Probability? Quantification of uncertainty.
Information Theory Michael J. Watts
Unit 4 Probability Basics
COT 5611 Operating Systems Design Principles Spring 2012
COT 5611 Operating Systems Design Principles Spring 2014
Module #16: Probability Theory
Applicable Mathematics “Probability”
Module #16: Probability Theory
28th September 2005 Dr Bogdan L. Vrusias
Module #16: Probability Theory
Presentation transcript:

Lecture 3, CS5671 Information theory Uncertainty –Can we measure it? –Can we work with it? Information (Uncertainty == Information)? Related concepts Surprise, Surprise!

Lecture 3, CS5672 Uncertainty Quantum mechanics –Heisenberg principle –Is everything still at absolute zero? –What is the temperature of a black hole? Mathematical uncertainty – Godel –Some propositions are not amenable to mathematical proof Can you guarantee that a given computer program will ever terminate? – Turing –The Halting problem Intractable problems –NP complete, NP hard Chaos theory –Weather forecasting (“guess casting”)

Lecture 3, CS5673 Can we measure/work with uncertainty? Quantum mechanics –Planck’s constant represents the lower bound on uncertainty in quantum mechanics –Satisfactory explanation of numerous observations that defy classical physics Undecidability –Many problems worth deciding upon can still be decided upon Computational Intractability –Important to correctly classifying a problem (P, NP,NPC, NPH) –Work with small n –Find heuristic and locally optimally solutions Chaos theory –Still allows for prediction in short time (or other parameter) domains –“Weather forecaster makes or breaks viewer rating”

Lecture 3, CS5674 Information Common interpretation –Data Information as capacity –1 bit for Boolean data, 8 for a word –2 bits for nucleic acid character/6 bits for codon/4.3 bits for amino acid character –8 bit wide channel transmission capacity –8 bit ASCII Information as information gained –“received” the sequence ATGC, got 8 bits – “received” the sequence A?GC, got 6 bits –“received” the sequence “NO CLASS TODAY”, got 112 bits and a bonus surge of joy! Information as additional information gained –I know she’ll be at the party, in a red or blue dress Seen at party, but too far off to see color of dress => No information gained Seen in red or blue dress => 1 bit of information gained

Lecture 3, CS5675 Information == Uncertainty Information as uncertainty –Higher the uncertainty, higher the potential information to be gained –Sequence of alphabetical characters implies uncertainty of 5.7 bits/character –Sequence of amino acids implies uncertainty of 4.3 bits/character Higher the noise, lesser the information (gained) Higher the noise, lesser the uncertainty!

Lecture 3, CS5676 Related concepts Uncertainty Information Complexity –The more bits needed to specify something, higher the complexity Probability –If all messages are equally probable, information (gained) is maximum => Uniform probability distribution has the highest information (is most uncertain) –If a particular message is received most of the time, information (gained) is low => Biased distribution has lower information Entropy –Degree of disorder/Number of possible states Surprise

Lecture 3, CS5677 Surprise, Surprise! Response to information received Degrees of surprise –“The instructor is in FH302 right now” I already know that. Yawn…… (Certainty, Foregone conclusion) –“The instructor is going to leave the room between 1:45 and 2:00 pm” That’s about usual (Likely) –“The instructor’s research will change the world and he’s getting the Award for best teaching this semester” Wow! (Unlikely, but not impossible. Probability = ) –“The instructor is actually a robotic machine that teaches machine learning” No way !! (Impossible, Disbelief)

Lecture 3, CS5678 Measuring surprise Measures: Level of adrenaline, muscular activity or volume of voice Lower the P(x i ), higher the surprise Surprise = 1/P (x i )? –Magnitude OK –Not defined for impossible events (By conventional interpretation of surprise) –But Surprise = 1 for certain events, 2 for half likely things? Surprise = Log (P(x i )) –0 for certain events –But negative for most events Surprise = - Log (P(x i )) –0 for certain events –Positive value –Proportional to degree of surprise –If base 2 logarithm used, expressed in bits

Lecture 3, CS5679 Information Theory Surprise/Surprisal Entropy Relative entropy –Versus differences in information content –Versus expectation Mutual Information Conditional Entropy

Lecture 3, CS56710 Surprise Surprise = - log P(x i ) “Average” surprise = Expectation (Surprise) =  i P(x i ) (- log P(x i )) = -  i P(x i ) log P(x i ) = Uncertainty = Entropy = H (P) Uncertainty of a coin toss = 1 bit Uncertainty of a double headed coin toss = 0 bit For uniform distribution, entropy is maximal For distribution where only a particular event occurs and others never do, entropy is zero Between these two extremes for all other distributions

Lecture 3, CS56711 Entropy For uniform probability distributions, entropy increases monotonically with number of possible outcomes –Entropy for coin toss, nucleic acid base, amino acid is 1, 2 and 4.3 bits respectively –Which is why we win small lucky draws but not the grand sweepstakes Can entropy be negative? Zero?

Lecture 3, CS56712 Relative Entropy H (P,Q) =  i P(x i ) log (P(x i )/Q(x i )) Cross-entropy/Kullback-Liebler ‘distance’ Difference in entropy between two distributions –Early poll ratings of 2 candidates P: (75%,25%) –Later poll ratings of 2 candidates Q: (50%,50%) –H (P,Q) = 0.19 bit; H (Q,P) = 0.21 bit “One-way” asymmetric distance along “axis of uncertainty”

Lecture 3, CS56713 Relative Entropy == Difference in information content? Information may be gained or lost between two uncertain states H (Q) – H (P) = 1 – 0.8 = 0.2 bit = H (P,Q) Difference in information content equals H (P,Q) only if Q is uniform If Q = (0.4,0.6) then H (P,Q) = 0.35 bit and H (Q) – H (P) = 0.97 – 0.8 = 0.2 bit ≠ H (P,Q) Is information gained always positive? Can it be zero or negative? Is relative entropy always positive? Can it be zero or negative?

Lecture 3, CS56714 Relative Entropy == Expectation? For sequence alignment scores expressed as log- odds ratios – Random variable: How unusual is this amino acid? –Distribution P: Domain specific distribution model Biased probability of amino acid occurrences –Distribution Q: Null model Uniform probability of amino acid occurrences Expected score for occurrence of amino acid =  a P(a) log (P(a)/Q(a)) Generally applicable to all log-odds representations

Lecture 3, CS56715 Mutual Information Related to the independence of variables –Does P(a i,b i ) equal P(a i )P(b i )? Given knowledge of variable b, does it change the uncertainty of variable a –Do I have a better idea of what value a will take, or am I still in the dark? Mutual Information = Relative entropy between P(a i,b i ) and P(a i )P(b i ) = M (a,b) =  i P(a i,b i ) log (P(a i,b i ) / [P(a i )P(b i )])

Lecture 3, CS56716 Conditional Entropy Conditional Entropy of successive positions in a random sequence = 2 bits for DNA, 4.3 for protein Conditional Entropy of DNA base pairs = 0 Probability that a student is present in the room and is taking this course, given that the student is present? (Conditional probability in terms of information content: Still some residual uncertainty)