Bioinformatics lectures at Rice University Lecture 4: Shannon entropy and mutual information.

Slides:



Advertisements
Similar presentations
DCSP-7: Information Jianfeng Feng Department of Computer Science Warwick Univ., UK
Advertisements

Lecture 2: Basic Information Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
Probability Distributions CSLU 2850.Lo1 Spring 2008 Cameron McInally Fordham University May contain work from the Creative Commons.
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Entropy in the Quantum World Panagiotis Aleiferis EECS 598, Fall 2001.
Michael A. Nielsen University of Queensland Quantum entropy Goals: 1.To define entropy, both classical and quantum. 2.To explain data compression, and.
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
Continuous Probability Distributions.  Experiments can lead to continuous responses i.e. values that do not have to be whole numbers. For example: height.
Copyright © 2009 Cengage Learning 9.1 Chapter 9 Sampling Distributions.
Chain Rules for Entropy
An intuitive introduction to information theory Ivo Grosse Leibniz Institute of Plant Genetics and Crop Plant Research Gatersleben Bioinformatics Centre.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
R C Ball, Physics Theory Group and Centre for Complexity Science University of Warwick R S MacKay, Maths M Diakonova, Physics&Complexity Emergence in Quantitative.
Statistics Lecture 20. Last Day…completed 5.1 Today Parts of Section 5.3 and 5.4.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Lecture 2: Basic Information Theory Thinh Nguyen Oregon State University.
Chapter 7 Probability and Samples: The Distribution of Sample Means
So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.
Noise, Information Theory, and Entropy
Basic Concepts in Information Theory
STATISTIC & INFORMATION THEORY (CSNB134)
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
Concepts and Notions for Econometrics Probability and Statistics.
PROBABILITY & STATISTICAL INFERENCE LECTURE 3 MSc in Computing (Data Analytics)
Introduction: Why statistics? Petter Mostad
PARAMETRIC STATISTICAL INFERENCE
LECTURE 19 THURSDAY, 14 April STA 291 Spring
1 7. Two Random Variables In many experiments, the observations are expressible not as a single quantity, but as a family of quantities. For example to.
Chapter 6 Lecture 3 Sections: 6.4 – 6.5.
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 9 Samples.
Prof. Pushpak Bhattacharyya, IIT Bombay1 Basics Of Entropy CS 621 Artificial Intelligence Lecture /09/05 Prof. Pushpak Bhattacharyya.
Summer 2004CS 4953 The Hidden Art of Steganography A Brief Introduction to Information Theory  Information theory is a branch of science that deals with.
Chapter 7 Probability and Samples: The Distribution of Sample Means
Computer Vision – Compression(1) Hanyang University Jong-Il Park.
Presentation on Statistics for Research Lecture 7.
Coding Theory Efficient and Reliable Transfer of Information
1 Psych 5500/6500 Measures of Variability Fall, 2008.
Basic Principles (continuation) 1. A Quantitative Measure of Information As we already have realized, when a statistical experiment has n eqiuprobable.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
Copyright © 2009 Cengage Learning 9.1 Chapter 9 Sampling Distributions ( 표본분포 )‏
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 10 Rate-Distortion.
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
Digital Image Processing Lecture 22: Image Compression
Chapter 6 Lecture 3 Sections: 6.4 – 6.5. Sampling Distributions and Estimators What we want to do is find out the sampling distribution of a statistic.
Discrete Random Variables. Introduction In previous lectures we established a foundation of the probability theory; we applied the probability theory.
Combinatorics (Important to algorithm analysis ) Problem I: How many N-bit strings contain at least 1 zero? Problem II: How many N-bit strings contain.
Entropy (YAC- Ch. 6)  Introduce the thermodynamic property called Entropy (S)  Entropy is defined using the Clausius inequality  Introduce the Increase.
Lecture 5 Introduction to Sampling Distributions.
Presented by Minkoo Seo March, 2006
Gillat Kol (IAS) joint work with Anat Ganor (Weizmann) Ran Raz (Weizmann + IAS) Exponential Separation of Information and Communication.
Basic Concepts of Information Theory A measure of uncertainty. Entropy. 1.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Mutual Information Brian Dils I590 – ALife/AI
Information Theory: Connection to Statistical Mechanics.
Fisher Information and Applications MLCV Reading Group 3Mar16.
Statistical Concepts Basic Principles An Overview of Today’s Class What: Inductive inference on characterizing a population Why : How will doing this allow.
Sampling Distributions
Shannon Entropy Shannon worked at Bell Labs (part of AT&T)
LECTURE 10: EXPECTATION MAXIMIZATION (EM)
Inferring Models of cis-Regulatory Modules using Information Theory
Lecture 13 Sections 5.4 – 5.6 Objectives:
COT 5611 Operating Systems Design Principles Spring 2012
COT 5611 Operating Systems Design Principles Spring 2014
A Brief Introduction to Information Theory
Chapter 7 Sampling Distributions.
Quantum Information Theory Introduction
Chapter 7 Sampling Distributions.
Chapter 7 Sampling Distributions.
Information Theoretical Analysis of Digital Watermarking
Chapter 7 Sampling Distributions.
Presentation transcript:

Bioinformatics lectures at Rice University Lecture 4: Shannon entropy and mutual information

-- from Science 16 December 2011

The definition of Shannon entropy In information theory, entropy is a measure of the uncertainty associated with a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message, usually in units such as bits. In this context, a 'message' means a specific realization of the random variable.information theoryrandom variableexpected valueinformationbits Equivalently, the Shannon entropy is a measure of the average information content one is missing when one does not know the value of the random variable. The concept was introduced by Claude E. Shannon in his 1948 paper "A Mathematical Theory of Communication".information contentClaude E. ShannonA Mathematical Theory of Communication

How do we measure information in a message? Definition a message: a string of symbols. The following was Shannon’s argument: From Shannon’s ‘A mathematical theory of communication’

Shannon entropy was established in the context of telegraph communication

Shannon’s argument to name H as the entropy:

Some Properties of H: The amount of entropy is not always an integer number of bits. Many data bits may not convey information. For example, data structures often store information redundantly, or have identical sections regardless of the information in the data structure. For a message of n characters, H is larger when a larger character set is used. Thus ‘ ’ has less information than ‘qwertasdfg12’. However, if a character is rarely used, its contribution to H is small, because since p * log(p)  0 as p  0. Also, if a character constitute the vast majority, e.g., ‘ ’, the contribution of 1s to H is small, since p * log(p)  0 as p  1. For a random variable with n outcomes, H reaches maximum when probabilities of the outcomes are all the same, i.e., 1/n, and H = log(n). When x is a continuous variable with a fixed variance, Shannon proved that H reaches maximum when x follows a Gaussian distribution.

Shannon’s proof:

Summary Or simply: Note that: H>=0  p i =1 H is larger when there are more probable states. H can be generally computed whenever there is a p distribution.

Mutual information Mutual information is measure of dependence. The concept was introduced by Shannon in 1948 and has become widely used in many different fields

Formulation of mutual information

Meaning of MI

MI and H

Properties of MI Connection between correlation and MI.

Example of MI application Estimated MI between goals a team scored in a game and whether the team was playing at home or away. The heights of the grey bars provide the approximate 95% of the null points. The Canada value is below the line because it would have been hidden by the grey shading above. Home advantage

Reference reading A correlation for the 21st century. Terry Speed. Commentary on MIC. Detecting Novel Associations in Large Data Sets. Reshaf et al., Science December. Some data analyses using mutual information. David Brillinger