02.09.2005Prof. Pushpak Bhattacharyya, IIT Bombay1 Basics Of Entropy CS 621 Artificial Intelligence Lecture 13 - 02/09/05 Prof. Pushpak Bhattacharyya.

Slides:



Advertisements
Similar presentations
CS626: NLP, Speech and the Web
Advertisements

Lecture 3: Source Coding Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Chapter 10 Shannon’s Theorem. Shannon’s Theorems First theorem:H(S) ≤ L n (S n )/n < H(S) + 1/n where L n is the length of a certain code. Second theorem:
 The running time of an algorithm as input size approaches infinity is called the asymptotic running time  We study different notations for asymptotic.
CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture– 4, 5, 6: A* properties 9 th,10 th and 12 th January,
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 12- Completeness Proof; Self References and Paradoxes 16 th August,
Final Exam: May 10 Thursday. If event E occurs, then the probability that event H will occur is p ( H | E ) IF E ( evidence ) is true THEN H ( hypothesis.
Maximum Likelihood-Maximum Entropy Duality : Session 1 Pushpak Bhattacharyya Scribed by Aditya Joshi Presented in NLP-AI talk on 14 th January, 2014.
Entropy and Shannon’s First Theorem
The University of Manchester Introducción al análisis del código neuronal con métodos de la teoría de la información Dr Marcelo A Montemurro
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
Data Broadcast in Asymmetric Wireless Environments Nitin H. Vaidya Sohail Hameed.
Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.
On Uniform Amplification of Hardness in NP Luca Trevisan STOC 05 Paper Review Present by Hai Xu.
1 University of Freiburg Computer Networks and Telematics Prof. Christian Schindelhauer Mobile Ad Hoc Networks Theory of Data Flow and Random Placement.
Machine Learning CMPT 726 Simon Fraser University
Information Theory and Security. Lecture Motivation Up to this point we have seen: –Classical Crypto –Symmetric Crypto –Asymmetric Crypto These systems.
Information Theory Rong Jin. Outline  Information  Entropy  Mutual information  Noisy channel model.
Lecture 2: Basic Information Theory Thinh Nguyen Oregon State University.
CSI Uncertainty in A.I. Lecture 201 Basic Information Theory Review Measuring the uncertainty of an event Measuring the uncertainty in a probability.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 22, 2005
Noise, Information Theory, and Entropy
Bioinformatics lectures at Rice University Lecture 4: Shannon entropy and mutual information.
Noise, Information Theory, and Entropy
Some basic concepts of Information Theory and Entropy
Information Theory & Coding…
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21- Forward Probabilities and Robotic Action Sequences.
CS-424 Gregory Dudek Today’s outline Administrative issues –Assignment deadlines: 1 day = 24 hrs (holidays are special) –The project –Assignment 3 –Midterm.
Prof. Pushpak Bhattacharyya, IIT Bombay.1 Application of Noisy Channel, Channel Entropy CS 621 Artificial Intelligence Lecture /09/05.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 3 (10/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Statistical Formulation.
CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 29 and 30– Decision Tree Learning; ID3;Entropy.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 27, 2012.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 12- Completeness Proof; Self References and Paradoxes 16 th August,
CS344: Introduction to Artificial Intelligence (associated lab: CS386)
Instructor: Prof. Pushpak Bhattacharyya 13/08/2004 CS-621/CS-449 Lecture Notes CS621/CS449 Artificial Intelligence Lecture Notes Set 7: 29/10/2004.
Bayesian Classification Using P-tree  Classification –Classification is a process of predicting an – unknown attribute-value in a relation –Given a relation,
Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture /08/05 Prof. Pushpak Bhattacharyya Fuzzy Set (contd) Fuzzy.
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 3 - Search.
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 5: Power of Heuristic; non- conventional search.
Information theory (part II) From the form g(R) + g(S) = g(RS) one can expect that the function g( ) shall be a logarithm function. In a general format,
Presented by Minkoo Seo March, 2006
Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture /08/05 Prof. Pushpak Bhattacharyya Fuzzy Logic Application.
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 33,34– HMM, Viterbi, 14 th Oct, 18 th Oct, 2010.
CS623: Introduction to Computing with Neural Nets (lecture-12) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS623: Introduction to Computing with Neural Nets (lecture-17) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.
Oliver Schulte Machine Learning 726 Decision Tree Classifiers.
Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05 Prof. Pushpak Bhattacharyya Fundamentals of Information.
CS621: Artificial Intelligence
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 9– Uncertainty.
ENTROPY Entropy measures the uncertainty in a random experiment. Let X be a discrete random variable with range S X = { 1,2,3,... k} and pmf p k = P X.
UNIT I. Entropy and Uncertainty Entropy is the irreducible complexity below which a signal cannot be compressed. Entropy is the irreducible complexity.
Information Theory Information Suppose that we have the source alphabet of q symbols s 1, s 2,.., s q, each with its probability p(s i )=p i. How much.
Shannon Entropy Shannon worked at Bell Labs (part of AT&T)
Artificial Intelligence
Introduction to Information Theory- Entropy
Markov Chains Mixing Times Lecture 5
Ch9: Decision Trees 9.1 Introduction A decision tree:
LECTURE 10: EXPECTATION MAXIMIZATION (EM)
SAD: 6º Projecto.
Chapter 20 Information Theory
CS621: Artificial Intelligence
CS344: Introduction to Artificial Intelligence
ARTIFICIAL INTELLIGENCE
CS621: Artificial Intelligence
CS 621 Artificial Intelligence Lecture /09/05 Prof
Presentation transcript:

Prof. Pushpak Bhattacharyya, IIT Bombay1 Basics Of Entropy CS 621 Artificial Intelligence Lecture /09/05 Prof. Pushpak Bhattacharyya

Prof. Pushpak Bhattacharyya, IIT Bombay2 Entropy Measure of uncertainty/structure in the data used in classification.

Prof. Pushpak Bhattacharyya, IIT Bombay3 Tabular data A 1 A 2 A 3 A 4 ……… D V 11 V 21 V 31 ….……… Y V 21 V 22 V 32 ….……… N Concentrate on an attribute. Partition on the value of attribute.

Prof. Pushpak Bhattacharyya, IIT Bombay4 Information theory through Entropy paves way for classification. E(S) = - P + log P + - P - log P - P + = Proportion of +ve examples ( D = Y ) P - = Proportion of –ve examples ( D = N )

Prof. Pushpak Bhattacharyya, IIT Bombay5 Focus on attribute A i Gain( S, A i ) = E(S) = Σ E(A i )(|s v | / |s|) v v belongs to Values of A i Decision tree is constructed from GAIN using ID 3 algorithm.

Prof. Pushpak Bhattacharyya, IIT Bombay6 Entropy Define the terminology. S = {s 1, s 2 … s q } Symbols from source P(S i ) = Emission problem of s i = P i

Prof. Pushpak Bhattacharyya, IIT Bombay7 Notion of “Information” from S If P(s i ) = P i = 1 There is no information conveyed. “NO SURPRISE” Convention: P i = 0, I = ∞

Prof. Pushpak Bhattacharyya, IIT Bombay8 I is the “Information” function 1)I(S i ) = 0 If P(s i ) = P i = 1 2)I(s i s j ) = I(s i ) + I(s j ) wanted Assuming emission of s i and s j are independent events P(s i s j ) = P(s i ) * P( s j )

Prof. Pushpak Bhattacharyya, IIT Bombay9 Form of I is I(s i ) = log 1/ (P i = P(s i )) I(s i ) = infinity for P i = 0 Information Amount of surprise. Length of code to send the message.

Prof. Pushpak Bhattacharyya, IIT Bombay10 Average information = Expected value of Information. = E(I(s i )) = ΣP i log(1/P i ) Entropy of S.

Prof. Pushpak Bhattacharyya, IIT Bombay11 Properties of Entropy Minimum value of Entropy: If any of P i = 1 => P j = 0 for all j ≠ i E(S) = 0  Minimum value

Prof. Pushpak Bhattacharyya, IIT Bombay12 Maximum value By convention E(S) = 0 when P i = 0 for all i. LimitP i log 1/P i = 0 P i  0

Prof. Pushpak Bhattacharyya, IIT Bombay13 Example S = {P 1, P 2 } P 1 + P 2 = 1 Tossing of coin E(s) = - [ P 1 log P 1 + P 2 log P 2 ] where P 1 = 0.5 = P 2 E(s) = 1.0

Prof. Pushpak Bhattacharyya, IIT Bombay14 If all events are equally likely then entropy is max. We expect that S = {s 1, s 2 … s q } E(S) will be max when each P i = 1/q.

Prof. Pushpak Bhattacharyya, IIT Bombay15 Theorem E(S) = maximum when each P i = 1/q S = {s 1, s 2 … s q } Lemma : ln(x) = log e x <= x – 1 consider f(x) = x -1 – ln x f(1) = 0

Prof. Pushpak Bhattacharyya, IIT Bombay16 df(x)/dx = 1 – 1/x equating to 0 x = 1 f(x) had extremum at x = 1 d 2 f(x)/dx 2 = 1/x 2 > 0 for x > 0 x = 1 is a minima

Prof. Pushpak Bhattacharyya, IIT Bombay17 f(x) had minimum at x =1 f(x) = x -1 – ln x ln x = 1

Prof. Pushpak Bhattacharyya, IIT Bombay18 Corollary Let, Σ x i = 1 i = 1 to m Σ y i = 1 and x i >= 0, y i > 0 i = 1 to m x i and y i are probability distributions. Σ x i ln 1/ x i <= Σ x i ln 1/ y i i = 1 to m i = 1 to m

Prof. Pushpak Bhattacharyya, IIT Bombay19 Proof Σ x i ln 1/ x i - Σ x i ln 1/ y i i = 1 to m = Σ x i ln y i / x i i = 1 to m <= Σ x i (y i / x i - 1) i = 1 to m = Σ y i - Σ x i = 0 i = 1 to m i = 1 to m

Prof. Pushpak Bhattacharyya, IIT Bombay20 This proves that Σ x i ln 1/ x i <= Σ x i ln 1/ y i i = 1 to m i = 1 to m Proof of theorem follows from this corollary.

Prof. Pushpak Bhattacharyya, IIT Bombay21 S = {s 1, s 2 … s q } P(s i ) =P i, i = 1… q Choose x i = P i, y i = 1/q Σ x i ln 1/x i <= Σ x i ln1/y i i = 1 to q i = 1 to n Σ P i ln 1/P i <= Σ x i ln q i = 1 to q i = 1 to n

Prof. Pushpak Bhattacharyya, IIT Bombay22 So, E(S) <= Σ P i ln q i = 1 to m = ln q. Σ P i = ln q E(S) is upper bounded by Σ P i ln q = ln q i = 1 to m Which value is reached when each P i = q. This establishes the maximum value for Entropy.

Prof. Pushpak Bhattacharyya, IIT Bombay23 E(s) is defined as Σ P i log r 1/ P i i = 1 to q log r x to ln x transformation is a constant multiplying factor log r x = log r e * log e x

Prof. Pushpak Bhattacharyya, IIT Bombay24 Summary Review Established the intuition for information function I(s i ). Related to the ‘surprise’. Average information is called entropy. Minimum value of E = 0. Maximum of E ? – Lemma ln x <= x – 1 – Corollary Σ x i ln 1/ x i <= Σ x i ln 1/ y i Σ x i = Σ y i = 1 Max E is ln q * k and is reached where p i = 1/q for each i

Prof. Pushpak Bhattacharyya, IIT Bombay25 Shannon asked what is the “entropy of English language” ? S = {a,b,c,…. ‘,’,‘:’, ….} P(a) = Relative frequency of ‘a’ from large corpus. P(b) = … This gives P i s E(English) = Σ P i log1/P i = 4.08

Prof. Pushpak Bhattacharyya, IIT Bombay26 Max entropy for tossing of coin: = 1.0 when the coin is unbiased. Interest of the reader/listener Novel is more interesting than a scientific paper for some people.

Prof. Pushpak Bhattacharyya, IIT Bombay27 Summary Application of Noisy Channel to ASR. Formulated as Bayesian Decision Making. Studied phonetic problems. Why probability model is needed?