Presentation is loading. Please wait.

Presentation is loading. Please wait.

12/08/1999 JHU CS 600.465/Jan Hajic 1 Introduction to Natural Language Processing (600.465) Statistical Translation: Alignment and Parameter Estimation.

Similar presentations


Presentation on theme: "12/08/1999 JHU CS 600.465/Jan Hajic 1 Introduction to Natural Language Processing (600.465) Statistical Translation: Alignment and Parameter Estimation."— Presentation transcript:

1 12/08/1999 JHU CS 600.465/Jan Hajic 1 Introduction to Natural Language Processing (600.465) Statistical Translation: Alignment and Parameter Estimation Dr. Jan Hajič CS Dept., Johns Hopkins Univ. hajic@cs.jhu.edu www.cs.jhu.edu/~hajic

2 12/08/1999JHU CS 600.465/ Intro to NLP/Jan Hajic2 Alignment Available corpus assumed: –parallel text (translation E ↔  F) No alignment present (day marks only)! Sentence alignment –sentence detection –sentence alignment Word alignment –tokenization –word alignment (with restrictions)

3 12/08/1999JHU CS 600.465/ Intro to NLP/Jan Hajic3 Sentence Boundary Detection Rules, lists: –Sentence breaks: paragraphs (if marked) certain characters: ?, !, ; (...almost sure) The Problem: period. –could be end of sentence (... left yesterday. He was heading to...) –decimal point: 3.6 (three-point-six) –thousand segment separator: 3.200 (three-thousand-two-hundred) –abbreviation never at the end of sentence: cf., e.g., Calif., Mt., Mr. –ellipsis:... –other languages: ordinal number indication (2nd ~ 2.) –initials: A. B. Smith Statistical methods: e.g., Maximum Entropy

4 12/08/1999JHU CS 600.465/ Intro to NLP/Jan Hajic4 Sentence Alignment The Problem: sentences detected only: E: F: Desired output: Segmentation with equal number of segments, spanning continuously the whole text. Original sentence boundaries kept: E: F: Alignments obtained: 2-1, 1-1, 1-1, 2-2, 2-1, 0-1 New segments called “ sentences ” from now on.

5 12/08/1999JHU CS 600.465/ Intro to NLP/Jan Hajic5 Alignment Methods Several methods (probabilistic and not prob.) –character-length based –word-length based –“ cognates ” (word identity used) using an existing dictionary (F: prendre ~ E: make, take) using word “ distance ” (similarity): names, numbers, borrowed words, Latin origin words,... Best performing: –statistical, word- or character- length based (with some words perhaps)

6 12/08/1999JHU CS 600.465/ Intro to NLP/Jan Hajic6 Length-based Alignment First, define the problem probabilistically: argmax A P(A|E,F) = argmax A P(A,E,F) (E,F fixed) Define a “ bead ” : E: F: Approximate: P(A,E,F)   i=1..n P(B i ), where B i is a bead; P(B i ) does not depend on the rest of E,F. “ bead ” (2:2 in this case)

7 12/08/1999JHU CS 600.465/ Intro to NLP/Jan Hajic7 The Alignment Task Given the model definition, P(A,E,F)   i=1..n P(B i ), find the partitioning of (E,F) into n beads B i=1..n, that maximizes P(A,E,F) over training data. Define B i = p:q  i, where p:q  {0:1,1:0,1:1,1:2,2:1,2:2} –describes the type of alignment Want to use some sort of dynamic programming: Define Pref(i,j)... probability of the best alignment from the start of (E,F) data (1,1) up to (i,j)

8 12/08/1999JHU CS 600.465/ Intro to NLP/Jan Hajic8 Recursive Definition Initialize: Pref(0,0) = 0. Pref(i,j) = max ( Pref(i,j-1) P( 0:1  k ), Pref(i-1,j) P( 1:0  k ), Pref(i-1,j-1) P( 1:1  k ), Pref(i-1,j-2) P( 1:2  k ), Pref(i-2,j-1) P( 2:1  k ), Pref(i-2,j-2) P( 2:2  k ) ) This is enough for a Viterbi-like search. E: F: i j Pref(i-2,j-2) P( 2:2  k ) Pref(i-2,j-1) P( 2:1  k ) Pref(i-1,j-2) P( 1:2  k ) Pref(i-1,j-1) P( 1:1  k ) Pref(i-1,j) P( 1:0  k ) Pref(i,j-1) P( 0:1  k )

9 12/08/1999JHU CS 600.465/ Intro to NLP/Jan Hajic9 Probability of a Bead Remains to define P( p:q  k ) (the red part): –k refers to the “ next ” bead, with segments of p and q sentences, lengths l k,e and l k,f. Use normal distribution for length variation: P( p:q  k ) = P(  l k,e,l k,f, ,  2 ,p:q)  P(  l k,e,l k,f, ,  2  )P(p:q)  l k,e,l k,f, ,  2  = ( l k,f -  l k,e )/  l k,e  2 Estimate P(p:q) from small amount of data, or even guess and re-estimate after aligning some data. Words etc. might be used as better clues in P( p:q a k ) def.

10 12/08/1999JHU CS 600.465/ Intro to NLP/Jan Hajic10 Saving time For long texts (> 10 4 sentences), even Viterbi (in the version needed) is not effective (o(S 2 ) time) Go paragraph by paragraph if they are aligned 1:1 What if not? Apply the same method first to paragraphs! –identify paragraphs roughly in both languages –run the algorithm to get aligned paragraph-like segments –then, run on sentences within paragraphs. Performs well if not many consecutive 1:0 or 0:1 beads.

11 12/08/1999JHU CS 600.465/ Intro to NLP/Jan Hajic11 Word alignment Length alone does not help anymore. –mainly because words can be swapped, and mutual translations have often vastly different length....but at least, we have “ sentences ” (sentence-like segments) aligned; that will be exploited heavily. Idea: –Assume some (simple) translation model (such as Model 1). –Find its parameters by considering virtually all alignments. –After we have the parameters, find the best alignment given those parameters.

12 12/08/1999JHU CS 600.465/ Intro to NLP/Jan Hajic12 Word Alignment Algorithm Start with sentence-aligned corpus. Let (E,F) be a pair of sentences (actually, a bead). Initialize p(f|e) randomly (e.g., uniformly), f  F, e  E. Compute expected counts over the corpus: c(f,e) =  (E,F);e  E,f  F p(f|e)  aligned pair (E,F), find if e in E and f in F; if yes, add p(f|e). Reestimate: p(f|e) = c(f,e) / c(e) [c(e) =  f c(f,e)] Iterate until change of p(f|e) is small.

13 12/08/1999JHU CS 600.465/ Intro to NLP/Jan Hajic13 Best Alignment Select, for each (E,F), A = argmax A P(A|F,E) = argmax A P(F,A|E)/P(F) = argmax A P(F,A|E) = argmax A (  / (l+1) m  j=1..m p(f j |e a j )) = argmax A  j=1..m p(f j |e a j ) Again, use dynamic programming, Viterbi-like algorithm. Recompute p(f|e) based on the best alignment (only if you are inclined to do so; the “ original ” summed-over-all distribution might perform better). Note: we have also got all Model 1 parameters.


Download ppt "12/08/1999 JHU CS 600.465/Jan Hajic 1 Introduction to Natural Language Processing (600.465) Statistical Translation: Alignment and Parameter Estimation."

Similar presentations


Ads by Google