Introduction to information theory

Slides:



Advertisements
Similar presentations
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Advertisements

CS1010 Programming Methodology
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
An Introduction to LDA Tools Kuan-Yu Chen Institute of Information Science, Academia Sinica.
Chain Rules for Entropy
Protein- Cytokine network reconstruction using information theory-based analysis Farzaneh Farhangmehr UCSD Presentation#3 July 25, 2011.
Chapter 6 Information Theory
Maximum Entropy Model (I) LING 572 Fei Xia Week 5: 02/05-02/07/08 1.
Middle Term Exam 03/04, in class. Project It is a team work No more than 2 people for each team Define a project of your own Otherwise, I will assign.
Final review LING572 Fei Xia Week 10: 03/13/08 1.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
Introduction LING 572 Fei Xia Week 1: 1/3/06. Outline Course overview Problems and methods Mathematical foundation –Probability theory –Information theory.
1 Language Model (LM) LING 570 Fei Xia Week 4: 10/21/2009 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAA A A.
Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic.
Locally Decodable Codes Uri Nadav. Contents What is Locally Decodable Code (LDC) ? Constructions Lower Bounds Reduction from Private Information Retrieval.
Information Theory Rong Jin. Outline  Information  Entropy  Mutual information  Noisy channel model.
Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.
1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08.
1 Introduction LING 575 Week 1: 1/08/08. Plan for today General information Course plan HMM and n-gram tagger (recap) EM and forward-backward algorithm.
Noise, Information Theory, and Entropy
Probability and Statistics Review Thursday Sep 11.
X= {x 0, x 1,….,x J-1 } Y= {y 0, y 1, ….,y K-1 } Channel Finite set of input (X= {x 0, x 1,….,x J-1 }), and output (Y= {y 0, y 1,….,y K-1 }) alphabet.
Albert Gatt Corpora and Statistical Methods. Probability distributions Part 2.
Lecture 3. Relation with Information Theory and Symmetry of Information Shannon entropy of random variable X over sample space S: H(X) = ∑ P(X=x) log 1/P(X=x)‏,
1 Statistical NLP: Lecture 5 Mathematical Foundations II: Information Theory.
If we measured a distribution P, what is the tree- dependent distribution P t that best approximates P? Search Space: All possible trees Goal: From all.
Basic Concepts in Information Theory
Some basic concepts of Information Theory and Entropy
1 Advanced Smoothing, Evaluation of Language Models.
2. Mathematical Foundations
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
Final review LING572 Fei Xia Week 10: 03/11/
1. Entropy as an Information Measure - Discrete variable definition Relationship to Code Length - Continuous Variable Differential Entropy 2. Maximum Entropy.
§4 Continuous source and Gaussian channel
Week 7 Working with the BASH Shell. Objectives  Redirect the input and output of a command  Identify and manipulate common shell environment variables.
Condor Tugba Taskaya-Temizel 6 March What is Condor Technology? Condor is a high-throughput distributed batch computing system that provides facilities.
Bigben Pittsburgh Supercomputing Center J. Ray Scott
1 Foundations of Statistical Natural Language Processing By Christopher Manning & Hinrich Schutze Course Book.
Experiences with a HTCondor pool: Prepare to be underwhelmed C. J. Lingwood, Lancaster University CCB (The Condor Connection Broker) – Dan Bradley
Feature selection LING 572 Fei Xia Week 4: 1/29/08 1.
Basic Concepts of Encoding Codes, their efficiency and redundancy 1.
Grid job submission using HTCondor Andrew Lahiff.
Problem Introduction Chow’s Problem Solution Example Proof of correctness.
1 Information Theory Nathanael Paul Oct. 09, 2002.
1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.
HTCondor and Workflows: An Introduction HTCondor Week 2015 Kent Wenger.
Mathematical Foundations Elementary Probability Theory Essential Information Theory Updated 11/11/2005.
Abdullah Aldahami ( ) April 6,  Huffman Coding is a simple algorithm that generates a set of variable sized codes with the minimum average.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
Summarizing Risk Analysis Results To quantify the risk of an output variable, 3 properties must be estimated: A measure of central tendency (e.g. µ ) A.
1 Lecture 7 System Models Attributes of a man-made system. Concerns in the design of a distributed system Communication channels Entropy and mutual information.
The NorduGrid toolkit user interface Mattias Ellert Presented at the 3 rd NorduGrid workshop, Helsinki,
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
JSS Job Submission Service Massimo Sgaravatto INFN Padova.
STAR Scheduling status Gabriele Carcassi 9 September 2002.
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor Job Router.
Mutual Information, Joint Entropy & Conditional Entropy
Oliver Schulte Machine Learning 726 Decision Tree Classifiers.
JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Essential Information Theory II AI-lab
Essential Probability & Statistics (Lecture for CS397-CXZ Algorithms in Bioinformatics) Jan. 23, 2004 ChengXiang Zhai Department of Computer Science University.
Ch 1. Introduction (Latter) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by J.W. Ha Biointelligence Laboratory, Seoul National.
Linux Administration Working with the BASH Shell.
Statistical methods in NLP Course 2 Diana Trandab ă ț
Statistical methods in NLP Course 2
Dhruv Batra Georgia Tech
Learning Tree Structures
Quantum Information Theory Introduction
The Condor JobRouter.
LECTURE 23: INFORMATION THEORY REVIEW
CPSC 503 Computational Linguistics
Presentation transcript:

Introduction to information theory LING 572 Fei Xia, Dan Jinguji Week 1: 1/10/06

Today Information theory Hw #1 Exam #1

Information theory

Information theory Reading: M&S 2.2 It is the use of probability theory to quantify and measure “information”. Basic concepts: Entropy Cross entropy and relative entropy Joint entropy and conditional entropy Entropy of the language and perplexity Mutual information

Entropy Entropy is a measure of the uncertainty associated with a distribution. The lower bound on the number of bits that it takes to transmit messages. An example: Display the results of horse races. Goal: minimize the number of bits to encode the results.

An example Uniform distribution: pi=1/8. Non-uniform distribution: (1/2,1/4,1/8, 1/16, 1/64, 1/64, 1/64, 1/64) (0, 10, 110, 1110, 111100, 111101, 111110, 111111) Uniform distribution has higher entropy. MaxEnt: make the distribution as “uniform” as possible.

Cross Entropy Entropy: Cross Entropy: Cross entropy is a distance measure between p(x) and q(x): p(x) is the true probability; q(x) is our estimate of p(x).

Relative Entropy Also called Kullback-Leibler divergence: Another “distance” measure between probability functions p and q. KL divergence is asymmetric (not a true distance):

Reading assignment #1 Read M&S 2.2: Essential Information Theory Questions: For a random variable X, p(x) and q(x) are two distributions: Assuming p is the true distribution. p(X=a)=p(X=b)=1/8, p(X=c)=1/4, p(X=d)=1/2 q(X=a)=q(X=b)=q(X=c)=q(X=d)=1/4 (a) What is H(X)? What is H(X, q)? What is KL divergence D(p||q)? What is D(q||p)?

H(X) and H(X, q)

D(p||q)

D(q||p)

Joint and conditional entropy Joint entropy: Conditional entropy:

Entropy of a language (per-word entropy) The entropy of a language L: If we make certain assumptions that the language is “nice”, then the cross entropy can be calculated as:

Per-word entropy (cont) p(x1n) can be calculated by n-gram models Ex: unigram model

Perplexity Perplexity is 2H. Perplexity is the weighted average number of choices a random variable has to make. => We learned how to calculate perplexity in LING570.

Mutual information It measures how much is in common between X and Y: I(X;Y)=KL(p(x,y)||p(x)p(y)) I(X;Y) = I(Y;X)

Summary on Information theory Reading: M&S 2.2 It is the use of probability theory to quantify and measure “information”. Basic concepts: Entropy Cross entropy and relative entropy Joint entropy and conditional entropy Entropy of the language and perplexity Mutual information

Hw1

Hw1 Q1-Q5: Information theory Q6: Condor submit Q7: Hw10 from LING570. You are not required to turn in anything for Q7. If you want feedback on this, you can choose to turn it in. It won’t be graded. You get 30 points for free.

Q6: condor submission http://staff.washington.edu/brodbd/orientation.pdf Especially Slide #22 - #28.

For a command we can run as: mycommand -a -n <mycommand For a command we can run as: mycommand -a -n <mycommand.in >mycommand.out The submit file might look like this: save it to *.cmd Executable = mycommand  The command Universe = vanilla getenv = true input = mycommand.in  STDIN output = mycommand.out  STDOUT error = mycommand.error  STDERR Log = /tmp/brodbd/mycommand.log  A log file that stores the results of condor sumbission arguments = "-a -n“  The arguments for the command transfer_executable = false Queue

Submission and monitoring jobs on condor condor_submit mycommand.cmd => get a job number List the job queue: condor_q Status changes from “I” (idle) to “R” (run) to “H”: means the job fails. Look at the log file specified in *.cmd Disappeared from the queue: You will receive an email Use “man condor_q” etc. to learn more about those commands.

The path names for files in *.cmd In the *.cmd file: Executable = aa194.exec input = file1 The environment (e.g., ~/.bash_profile) might not be set properly It assumes that the files are in the current directory (the dir where the job is submitted) => Use the full part names if needed.