Statistical methods in NLP Course 2 Diana Trandab ă ț

Slides:



Advertisements
Similar presentations
Probability Basic Probability Concepts Probability Distributions Sampling Distributions.
Advertisements

© 2004 Prentice-Hall, Inc.Chap 5-1 Basic Business Statistics (9 th Edition) Chapter 5 Some Important Discrete Probability Distributions.
© 2003 Prentice-Hall, Inc.Chap 5-1 Basic Business Statistics (9 th Edition) Chapter 5 Some Important Discrete Probability Distributions.
Chapter 6 Information Theory
Background Knowledge Brief Review on Counting,Counting, Probability,Probability, Statistics,Statistics, I. TheoryI. Theory.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 4-1 Introduction to Statistics Chapter 5 Random Variables.
Probability Distributions
Information Theory Rong Jin. Outline  Information  Entropy  Mutual information  Noisy channel model.
Distributed Source Coding 教師 : 楊士萱 老師 學生 : 李桐照. Talk OutLine Introduction of DSCIntroduction of DSC Introduction of SWCQIntroduction of SWCQ ConclusionConclusion.
Probability Mass Function Expectation 郭俊利 2009/03/16
Mutually Exclusive: P(not A) = 1- P(A) Complement Rule: P(A and B) = 0 P(A or B) = P(A) + P(B) - P(A and B) General Addition Rule: Conditional Probability:
Information Theory and Security
Joint Probability distribution
X= {x 0, x 1,….,x J-1 } Y= {y 0, y 1, ….,y K-1 } Channel Finite set of input (X= {x 0, x 1,….,x J-1 }), and output (Y= {y 0, y 1,….,y K-1 }) alphabet.
Albert Gatt Corpora and Statistical Methods. Probability distributions Part 2.
Review of Probability Theory. © Tallal Elshabrawy 2 Review of Probability Theory Experiments, Sample Spaces and Events Axioms of Probability Conditional.
1 Statistical NLP: Lecture 5 Mathematical Foundations II: Information Theory.
Basic Concepts in Information Theory
Chapter 5 Discrete Probability Distribution I. Basic Definitions II. Summary Measures for Discrete Random Variable Expected Value (Mean) Variance and Standard.
Some basic concepts of Information Theory and Entropy
§1 Entropy and mutual information
2. Mathematical Foundations
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 4 and 5 Probability and Discrete Random Variables.
Sets, Combinatorics, Probability, and Number Theory Mathematical Structures for Computer Science Chapter 3 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesProbability.
1 CY1B2 Statistics Aims: To introduce basic statistics. Outcomes: To understand some fundamental concepts in statistics, and be able to apply some probability.
§4 Continuous source and Gaussian channel
1 Foundations of Statistical Natural Language Processing By Christopher Manning & Hinrich Schutze Course Book.
1 Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND STATISTICS FOR SCIENTISTS AND ENGINEERS Systems.
Lectures prepared by: Elchanan Mossel elena Shvets Introduction to probability Stat 134 FAll 2005 Berkeley Follows Jim Pitman’s book: Probability Section.
1 Chapter 16 Random Variables. 2 Expected Value: Center A random variable assumes a value based on the outcome of a random event.  We use a capital letter,
1 Lecture 7: Discrete Random Variables and their Distributions Devore, Ch
Discrete Random Variables A random variable is a function that assigns a numerical value to each simple event in a sample space. Range – the set of real.
COMMUNICATION NETWORK. NOISE CHARACTERISTICS OF A CHANNEL 1.
JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Essential Information Theory I AI-lab
Summer 2004CS 4953 The Hidden Art of Steganography A Brief Introduction to Information Theory  Information theory is a branch of science that deals with.
Information Theory Basics What is information theory? A way to quantify information A lot of the theory comes from two worlds Channel.
Lesson Objective Understand what we mean by a Random Variable in maths Understand what is meant by the expectation and variance of a random variable Be.
Mathematical Foundations Elementary Probability Theory Essential Information Theory Updated 11/11/2005.
BUS304 – Probability Theory1 Probability Distribution  Random Variable:  A variable with random (unknown) value. Examples 1. Roll a die twice: Let x.
1 Lecture 7 System Models Attributes of a man-made system. Concerns in the design of a distributed system Communication channels Entropy and mutual information.
AP STATISTICS Section 7.1 Random Variables. Objective: To be able to recognize discrete and continuous random variables and calculate probabilities using.
Presented by Minkoo Seo March, 2006
Basic Concepts of Information Theory A measure of uncertainty. Entropy. 1.
Chapter 2: Probability. Section 2.1: Basic Ideas Definition: An experiment is a process that results in an outcome that cannot be predicted in advance.
Mutual Information, Joint Entropy & Conditional Entropy
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Essential Probability & Statistics (Lecture for CS397-CXZ Algorithms in Bioinformatics) Jan. 23, 2004 ChengXiang Zhai Department of Computer Science University.
Probability Distribution. Probability Distributions: Overview To understand probability distributions, it is important to understand variables and random.
1 Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND STATISTICS FOR SCIENTISTS AND ENGINEERS Systems.
Random Variables Lecture Lecturer : FATEN AL-HUSSAIN.
1. 2 At the end of the lesson, students will be able to (c)Understand the Binomial distribution B(n,p) (d) find the mean and variance of Binomial distribution.
UNIT I. Entropy and Uncertainty Entropy is the irreducible complexity below which a signal cannot be compressed. Entropy is the irreducible complexity.
(C) 2000, The University of Michigan 1 Language and Information Handout #2 September 21, 2000.
1 COMP 791A: Statistical Language Processing Mathematical Essentials Chap. 2.
Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.
Statistical methods in NLP Course 2
Random variables (r.v.) Random variable
CHAPTER 2 RANDOM VARIABLES.
Introduction to Information theory
Discrete Random Variables
Chapter 16 Random Variables.
Corpora and Statistical Methods
Mathematical Foundations
Discrete Probability Distributions
COT 5611 Operating Systems Design Principles Spring 2012
COT 5611 Operating Systems Design Principles Spring 2014
Probability Review for Financial Engineers
Statistical NLP: Lecture 4
Probability Key Questions
AP Statistics Chapter 16 Notes.
Presentation transcript:

Statistical methods in NLP Course 2 Diana Trandab ă ț

Quick Recap Out of three prisoners, one, randomly selected, without their knowldge, will be executed, and the other two will be released. One of the prisoners asks the guard to show him which of the other two will be released (at least one will be released anyway). If the quard answers, will the prisoner have more information than before?

Quick Recap

Essential Information Theory Developed by Shannon in the 40s Maximizing the amount of information that can be transmitted over an imperfect communication channel Data compression (entropy) Transmission rate (channel capacity)

Probability mass function The probability that a random variable X has differen numeric values p(x) = P(X=x) =P(A x ) where A x ={    : X(  ) = x} Example: The probabiliy of heads when flipping 2 coins p(0) = ¼ p(1) = ½ p(2) = ¼

Probability mass function The probability that a random variable X has differen numeric values p(x) = P(X=x) =P(A x ) where A x ={    : X(  ) = x} Example: The probabiliy of heads when flipping 2 coins p(nr_heads=0) = ¼ p(nr_heads= 1) = ½ p(nr_heads= 2) = ¼

Probability mass function The probability that a random variable X has differen numeric values p(x) = P(X=x) =P(A x ) where A x ={    : X(  ) = x} Example: The probabiliy of heads when flipping 2 coins p(nr_heads=0) = ¼ p(nr_heads= 1) = ½ p(nr_heads= 2) = ¼

Expectation The expectation is the mean or average of a random variable Example: Expectation of rolling one die and Y being the value of its face is: E(X+Y) = E(X)+E(Y) E(XY) = E(X)E(Y) if X and Y are independent

Variance The variance of a random variable is a measure of whether the values of the variable tend to be consistent over trials or to vary a lot. Var(X) = E((X-E(X)) 2 ) = E(X 2 ) – E 2 (X) The commonly used standard deviation σ is the square root of variance.

Entropy X: discrete random variable; p(x) = probability mass function of the random variable X Entropy (or self-information) Entropy measures the amount of information in a random variable It is the average length of the message needed to transmit an outcome of that variable using the optimal code (in bits)

Entropy (cont) i.e when the value of X is determinate, hence providing no new information

Exercise Compute the Entropy of tossing a coin

Exercise

Exercise 2 Example: Entropy of rolling a 8-sided die.

Exercise 2 Example: Entropy of rolling a 8-sided die

Exercise 3 Entropy of biased die P(X=1)=1/2 P(X=2)=1/4 P(X=3)=0 P(X=4)=0 P(X=5)=1/8 P(X=6)=1/8

Exercise 3 Entropy of biased die P(X=1)=1/2 P(X=2)=1/4 P(X=3)=0 P(X=4)=0 P(X=5)=1/8 P(X=6)=1/8

Symplified Polynesian – letter frequencies – per-letter entropy – coding ptkaiu ptkaiu 1/81/41/81/41/8

Symplified Polynesian – letter frequencies – per-letter entropy – coding ptkaiu ptkaiu 1/81/41/81/41/8

Joint Entropy The joint entropy of 2 random variables X,Y is the amount of the information needed on average to specify both their values

Conditional Entropy The conditional entropy of a random variable Y given another X, expresses how much extra information one still needs to supply on average to communicate Y given that the other party knows X

Chain Rule

Simplified Polynesian Revisited – syllable structure all words consist of sequences of CV syllables. C: consonant, V: vowel

More on entropy Entropy Rate(per-word/per-letter entropy) Entropy of a Language

Mutual Information I(X,Y) is the mutual information between X and Y. It is the measure of dependence between two random variables, or the amount of information one random variable contains about the other

Mutual Information (cont) I is 0 only when X,Y are independent: H(X|Y)=H(X) H(X)=H(X)-H(X|X)=I(X,X) Entropy is the self-information

More on Mutual Information Conditional Mutual Information Chain Rule Pointwise Mutual Information

Exercise 4 Let p(x; y) be given by Find: (a) H(X), H(Y ) (b) H(X|Y ), H(Y|X) (c) H(X,Y) (d) I(X,Y) X | Y01 01/3 10

Entropy and Linguistics Entropy is measure of uncertainty. The more we know about something the lower the entropy. If a language model captures more of the structure of the language, then the entropy should be lower. We can use entropy as a measure of the quality of our models

Entropy and Linguistic Measure of how different two probability distributions are Average number of bits that are wasted by encoding events from a distribution p with a code based on a not-quite right distribution q Noisy channel! = > next class!!!

Great! See you next time!