Computational Linguistics Seminar LING-696G Week 5.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Review of Probability. Definitions (1) Quiz 1.Let’s say I have a random variable X for a coin, with event space {H, T}. If the probability P(X=H) is.
Rules for Means and Variances
Probability & Statistical Inference Lecture 7 MSc in Computing (Data Analytics)
Probability & Statistical Inference Lecture 6 MSc in Computing (Data Analytics)
Background Knowledge Brief Review on Counting,Counting, Probability,Probability, Statistics,Statistics, I. TheoryI. Theory.
Introduction to Probability and Statistics
Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.
Statistics Lecture 9. Last day/Today: Discrete probability distributions Assignment 3: Chapter 2: 44, 50, 60, 68, 74, 86, 110.
Lecture 5 Probability and Statistics. Please Read Doug Martinson’s Chapter 3: ‘Statistics’ Available on Courseworks.
BCOR 1020 Business Statistics Lecture 14 – March 4, 2008.
Digital Image Processing, 2nd ed. www. imageprocessingbook.com © 2001 R. C. Gonzalez & R. E. Woods 1 Objective To provide background material in support.
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
June 1, 2004Computer Security: Art and Science © Matt Bishop Slide #32-1 Chapter 32: Entropy and Uncertainty Conditional, joint probability Entropy.
(1) A probability model respecting those covariance observations: Gaussian Maximum entropy probability distribution for a given covariance observation.
X= {x 0, x 1,….,x J-1 } Y= {y 0, y 1, ….,y K-1 } Channel Finite set of input (X= {x 0, x 1,….,x J-1 }), and output (Y= {y 0, y 1,….,y K-1 }) alphabet.
Albert Gatt Corpora and Statistical Methods. Probability distributions Part 2.
1 Statistical NLP: Lecture 5 Mathematical Foundations II: Information Theory.
Natural Language Processing Expectation Maximization.
§1 Entropy and mutual information
: Appendix A: Mathematical Foundations 1 Montri Karnjanadecha ac.th/~montri Principles of.
1. Entropy as an Information Measure - Discrete variable definition Relationship to Code Length - Continuous Variable Differential Entropy 2. Maximum Entropy.
Principles of Pattern Recognition
§4 Continuous source and Gaussian channel
Statistical Experiment A statistical experiment or observation is any process by which an measurements are obtained.
Chapter 6: Statistical Inference: n-gram Models over Sparse Data
Chapter 11: The Data Survey Supplemental Material Jussi Ahola Laboratory of Computer and Information Science.
Expressions, Equations, and Functions Chapter 1 Introductory terms and symbols: Variable – A letter or symbol to represent an unknown – Examples: Algebraic.
Lecture 19 Nov10, 2010 Discrete event simulation (Ross) discrete and continuous distributions computationally generating random variable following various.
F—06/11/10—HW #79: Pg 663: 36-38; Pg 693: odd; Pg 671: 60-63(a only) 36) a(n) = (-107\48) + (11\48)n38) a(n) = – 4.1n 60) 89,478,48562) -677,985,854.
COMMUNICATION NETWORK. NOISE CHARACTERISTICS OF A CHANNEL 1.
JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Essential Information Theory I AI-lab
Image Modeling & Segmentation Aly Farag and Asem Ali Lecture #2.
Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang I2R SMT-Reading Group.
Information Theory Metrics Giancarlo Schrementi. Expected Value Example: Die Roll (1/6)*1+(1/6)*2+(1/6)*3+(1/6)*4+(1/6)*5+( 1/6)*6 = 3.5 The equation.
NLP. Introduction to NLP Very important for language processing Example in speech recognition: –“recognize speech” vs “wreck a nice beach” Example in.
Exam 2: Rules Section 2.1 Bring a cheat sheet. One page 2 sides. Bring a calculator. Bring your book to use the tables in the back.
Chapter 5 Joint Probability Distributions The adventure continues as we consider two or more random variables all at the same time. Chapter 5B Discrete.
Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.
Section Means and Variances of Random Variables AP Statistics
Lecture 1: Basic Statistical Tools. A random variable (RV) = outcome (realization) not a set value, but rather drawn from some probability distribution.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Maximum Entropy … the fact that a certain prob distribution maximizes entropy subject to certain constraints representing our incomplete information, is.
Computer vision: models, learning and inference Chapter 2 Introduction to probability.
Lecture 3 Appendix 1 Computation of the conditional entropy.
Digital Image Processing, 2nd ed. www. imageprocessingbook.com © 2001 R. C. Gonzalez & R. E. Woods 1 Review: Linear Systems Some Definitions With reference.
JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Essential Information Theory II AI-lab
Computational Linguistics Seminar LING-696G Week 6.
Naive Bayes Collaborative Filtering ICS 77B Max Welling.
(C) 2000, The University of Michigan 1 Language and Information Handout #2 September 21, 2000.
Computational Linguistics Seminar LING-696G Week 7.
Statistical methods in NLP Course 2 Diana Trandab ă ț
Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.
Statistical methods in NLP Course 2
Neural Machine Translation
Statistical Machine Translation Part II: Word Alignments and EM
Random Variable 2013.
Computer vision: models, learning and inference
Corpora and Statistical Methods
Arithmetic Sequences and Series
Data Mining Lecture 11.
3.1 Expectation Expectation Example
Vital Statistics Probability and Statistics for Economics and Business
HW L9-2 pg 341 #10-32 evens L9-2 Notes: Solving Addition
Linear Systems Review Objective
The Geometric Distributions
Warmup Blue book- pg 105 # 1-5.
ELEMENTARY STATISTICS, BLUMAN
Presentation transcript:

Computational Linguistics Seminar LING-696G Week 5

Administrivia Guest lecturer: – Bryan Heidorn, School of Information – ISTA 455/555: Applied NLP – March 2nd

Chapters 3 and 4 Statistical Machine Translation by Koehn, P. – Chapter 3: Probability Theory – Chapter 4: Machine Translation by Translating Words

Chapter 3 errata p. 70: Text at very top should read This rule defines a conditional probability distribution p(x|y) (also called the posterior) in terms of its inverse p(y|x) and the two elementaty probability distributions p(x) (called the prior) and p(y)... instead of This rule a conditional probability distribution p(x|y) expresses in terms of its inverse p(y|x) (called the posterior) and the two elementaty probability distributions p(x) (called the prior) and p(y)... p. 71: Text before Equation 3.17 should read variance... is computed as the *arithmetic* mean of the *squared* difference between... instead of variance... is computed as the geometric mean of the difference between... p. 71: Final number in Equation 3.18 should read 2.04 instead of 2.4. p. 72: Part of paragraph 2 should read one-in-a-million chance instead of one-in-a- million change. p. 73: In caption to Figure 3.4, entropy should be H(X) instead of E(X). p. 75: In caption to Figure 3.5, joint entropy should be H(X,Y) and conditional entropy should be H(Y|X). The figure itself is correct.

4.1.4 Alignment pg.85: re-ordering, 1  2

4.1.4 Alignment flavoring particle (pleonastic?)

4.1.4 Alignment NULL (word 0): do I (2 nd choice) Subjunctive

Chapter 4 errata p. 84: Text before Equation 4.5 should read at position j to a German output at position i. The indexes are erroneously flipped in the book. p. 90: Text before Equation 4.13 should read the same simplifications as in Equation (4.10). The book makes an erroneous reference to Equation p. 91: The two end for on line 12 and 13 should be one indentation to the left. p. 91: Left hand side of equation 4.14 should be instead of. p. 93: The first sum should be a product. p. 93: The result of the computation of Equation 4.20 should be instead of. p. 93: The text should clarify that in Equation 4.21 perplexity is computed over the whole corpus, and not the average per sentence.

4.1.5 IBM Model 1 f = f 1 f 2.. f lf e = e 1 e 2.. e le a:j  i ε = normalization constant l f + 1 input words because word 0 = NULL corpus not typically word aligned