Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computational Linguistics Seminar LING-696G Week 10.

Similar presentations


Presentation on theme: "Computational Linguistics Seminar LING-696G Week 10."— Presentation transcript:

1 Computational Linguistics Seminar LING-696G Week 10

2 Today's Topics Rewrote code for IBM Models 1, 2 and 3a Chaucer Project

3 Resources Code: Source CodeExample UsageDescription ibm1pre.py python3 ibm1pre.py 0.1 ibm2x2align.txt – o ibm2x2align_model1.txt IBM Model 1 accepting formatted training data only ibm2pre.py python3 ibm2pre.py 0.1 ibm2x2align.txt ibm2x2align_model1.txt –o ibm2x2align_model2.txt IBM Model 2 accepting formatted training data only + Model 1 data ibm3apre.py python3 ibm3apre.py 0.1 ibm2x2align.txt ibm2x2align_model2.txt IBM Model 3 accepting formatted training data only + Model 2 data

4 Resources Data transformation: Source File: ibm_fertility.py Example Usage: python ibm_fertility.py -n 1 -f 1 ibm2x1.txt ibm2x1nonealign.txt Parameter:Value: -n (--none)0 (don't generate n 0ne ), 1 (generate n0ne ) -f (--fertility)0,1,2,3 etc.. (maximum fertility) filename (input raw training data) e.g. ibm2x1.txt filename (output formatted training data) e.g. ibm2x1nonealign.txt

5 Resources Training data: Raw training data Formatted training dataParameters ibm2x2.txtibm2x2align.txt ibm2x2nonealign.txt --none 0 --fertility 2 --none 1 –-fertility 2 ibm2x2r.txtibm2x2ralign.txt ibm2x2rnonealign.txt --none 0 --fertility 2 --none 1 –-fertility 2 ibm2x1.txtibm2x1align.txt ibm2x1nonealign.txt --none 0 --fertility 1 --none 1 –-fertility 1

6 Probabilities Training Data Training Data IBM Models 1, 2 and 3a A cascading architecture: unaligned sentence pairs, e.g. the house das Haus unaligned sentence pairs, e.g. the house das Haus Model 1: compute t(e|f) Model 2: compute t(e|f) and a(i|j,l e,l f ) t(e|f) Compute Alignment/Fe rtility possibilities aligned sentence pairs, e.g. the house n0ne das das Haus 1 1 1 2 2 0 aligned sentence pairs, e.g. the house n0ne das das Haus 1 1 1 2 2 0 t(e|f) a(i|j,l e,l f ) t(e|f) a(i|j,l e,l f ) Model 3a: compute t(e|f), a(i|j,l e,l f ) and n(ɸ|f) t(e|f) a(i|j,l e,l f ) n(ɸ|f) t(e|f) a(i|j,l e,l f ) n(ɸ|f) ibm_fertility.py ibm1pre.py ibm2pre.py ibm3apre.py

7 = 0.50 Training Data Assumptions: – no ordering implied – every word has a translation Example: 1.the house 1.das Haus Let t(e|f) be the probability that foreign word f translates into English word e Let E = set of all English words in the corpus Initially (uniform distribution): – t(e i |f) = 1/|E| We will estimate: 1.t(the|das) 2.t(house|das) 3.t(the|Haus) 4.t(house|Haus) – ∑ i t(e i |f) = 1 for all f e.g. = 0.99 = 0.01 = 0.05 = 0.95 Assumption: not all words are translated Example: 1.n0ne small 1.ja klein Add: t( n0ne |f j ) for all f j initially: t(e i |f) = 1/(|E|+1)

8 Expectation/Maximization (EM) Cycle: Training data: pairs e,f weighted count for # e,f pairs in training data c(e|f) += t(e|f)*occ(e,f) updated t(e|f) = t(e|f) * occ(e,f) rescaled wrt. f e = some English word f = some foreign word t(e|f) = probability that f translates as e occ(e,f) = # of occurrences of training pair e and f Initial t(e|f)

9 Training Recall IBM Model 1: – no assumption about alignment – i.e. all alignments are possible 1.the house 1.das Haus See training datum #2: 1.the house 1.das Haus Compute a weighted sum of the # times we see a datum: Initially, set: – t( the | Haus ) = 0.5 – t( house | Haus ) = 0.5 – uniform distribution t(house|Haus) = k c(house|H aus) +k c(Haus) +k Normalize: update t( house | Haus ) = c( house | Haus )/c( Haus ) Normalize: update t( house | Haus ) = c( house | Haus )/c( Haus ) This is the model 1 we've played with so far …

10 Training IBM Model 1: – no assumptions about alignment – Suppose it's also possible some f isn't translated at all 1.the house n0ne 2.das Haus Let's line the words up and rewrite this as: 1.the house n0ne ⇕ ⇕ ⇕ 2.das das Haus 1.the house 2.das Haus is equivalent to the 4 pairs below: 1.the house 2.das Haus 3.the house 4.Haus das 5.the house n0ne 6.das das Haus 7.n0ne the house 8.das Haus Haus (with e and f in ( ⇕ ) 1-to-1 correspondence) produced by ibm_fertility.py

11 2x2 training data with n0ne ibm2x2nonealign.txt Expansion of : 1.the a book house 2.das ein Buch Haus 3.the house 4.das Haus 5.the book 6.das Buch 7.a book 8.ein Buch with fertility possibilities 0,1,2 1.none the a book house 2.das ein Buch Haus 3.the house 4.das Haus 5.the house 6.Haus das 7.the house none 8.das das Haus 9.none the house 10.das Haus Haus 11.the book 12.das Buch 13.the book 14.Buch das 15.the book none 16.das das Buch 17.none the book 18.das Buch Buch 19.a book 20.ein Buch 21.a book 22.Buch ein 23.a book none 24.ein ein Buch 25.none a book 26.ein Buch Buch 3 pairs yields 12 pre-aligned pairs 3 pairs yields 12 pre-aligned pairs Assume training sentence pairs are now pre-formatted in 1-to-1 correspondence, i.e. pre-aligned Example: 5. the house none ⇕ ⇕ ⇕ 6. das das Haus Note: for simplicity, alignment data not shown here

12 2x2 training data with n0ne Iteration threshold: 0.1 Iteration 2 t(none |das ) = 0.18 t(the |das ) = 0.51 t(book |das ) = 0.16 t(house|das ) = 0.16 t(none |ein ) = 0.15 t(a |ein ) = 0.48 t(book |ein ) = 0.37 t(none |Buch ) = 0.18 t(the |Buch ) = 0.16 t(a |Buch ) = 0.16 t(book |Buch ) = 0.51 t(none |Haus ) = 0.15 t(the |Haus ) = 0.37 t(house|Haus ) = 0.48 Iteration threshold: 0.01 Iteration 11 t(none |das ) = 0.27 t(the |das ) = 0.71 t(book |das ) = 0.01 t(house|das ) = 0.01 t(none |ein ) = 0.05 t(a |ein ) = 0.74 t(book |ein ) = 0.21 t(none |Buch ) = 0.27 t(the |Buch ) = 0.01 t(a |Buch ) = 0.01 t(book |Buch ) = 0.71 t(none |Haus ) = 0.05 t(the |Haus ) = 0.21 t(house|Haus ) = 0.74 Iteration threshold: 0.001 Iteration 26 t(none |das ) = 0.33 t(the |das ) = 0.67 t(a |ein ) = 0.76 t(book |ein ) = 0.24 t(none |Buch ) = 0.33 t(book |Buch ) = 0.67 t(the |Haus ) = 0.24 t(house|Haus ) = 0.76 Iteration threshold: 1e-08 Iteration 68 t(none |das ) = 0.33 t(the |das ) = 0.67 t(a |ein ) = 0.76 t(book |ein ) = 0.24 t(none |Buch ) = 0.33 t(book |Buch ) = 0.67 t(the |Haus ) = 0.24 t(house|Haus ) = 0.76 IBM Model 1 Best that can be done! python3 ibm1pre.py 0.1 ibm2x2nonealign.txt

13 2x2 training data ibm2x2align.txt Same dataset but no n0n e: 1.the a book house 2.das ein Buch Haus 3.the house 4.das Haus 5.the house 6.Haus das 7.the book 8.das Buch 9.the book 10.Buch das 11.a book 12.ein Buch 13.a book 14.Buch ein Iteration threshold: 0.1 Iteration 4 t(the |das ) = 0.83 t(book |das ) = 0.08 t(house|das ) = 0.09 t(a |ein ) = 0.72 t(book |ein ) = 0.28 t(the |Buch ) = 0.08 t(a |Buch ) = 0.09 t(book |Buch ) = 0.83 t(the |Haus ) = 0.28 t(house|Haus ) = 0.72 python3 ibm1pre.py 0.1 ibm2x2align.txt

14 2x2 training data Iteration threshold: 0.1 Iteration 4 t(the |das ) = 0.83 t(book |das ) = 0.08 t(house|das ) = 0.09 t(a |ein ) = 0.72 t(book |ein ) = 0.28 t(the |Buch ) = 0.08 t(a |Buch ) = 0.09 t(book |Buch ) = 0.83 t(the |Haus ) = 0.28 t(house|Haus ) = 0.72 Iteration threshold: 0.0001 Iteration 76 t(the |das ) = 1.00 t(a |ein ) = 0.99 t(book |ein ) = 0.01 t(book |Buch ) = 1.00 t(the |Haus ) = 0.01 t(house|Haus ) = 0.99 Iteration threshold: 1e-05 Iteration 229 t(the |das ) = 1.00 t(a |ein ) = 1.00 t(book |Buch ) = 1.00 t(house|Haus ) = 1.00 Iteration threshold: 0.01 Iteration 12 t(the |das ) = 1.00 t(a |ein ) = 0.94 t(book |ein ) = 0.06 t(book |Buch ) = 1.00 t(the |Haus ) = 0.06 t(house|Haus ) = 0.94 Iteration threshold: 0.001 Iteration 28 t(the |das ) = 1.00 t(a |ein ) = 0.98 t(book |ein ) = 0.02 t(book |Buch ) = 1.00 t(the |Haus ) = 0.02 t(house|Haus ) = 0.98 Converges nicely!

15 2x2 training data Model 1 summary Training data: 1.the a book house 2.das ein Buch Haus 3.the house 4.das Haus 5.the book 6.das Buch 7.a book 8.ein Buch No n0ne: Iteration threshold: 1e-05 Iteration 229 t(the |das ) = 1.00 t(a |ein ) = 1.00 t(book |Buch ) = 1.00 t(house|Haus ) = 1.00 Iteration threshold: 0.001 Iteration 26 t(none |das ) = 0.33 t(the |das ) = 0.67 t(a |ein ) = 0.76 t(book |ein ) = 0.24 t(none |Buch ) = 0.33 t(book |Buch ) = 0.67 t(the |Haus ) = 0.24 t(house|Haus ) = 0.76

16 2x1 training data with n0ne ibm2x1nonealign.txt Suppose we provide training data about a flavoring particle ja: 1.small big 2.ja klein groß 3.small 4.ja klein 5.small 6.klein 7.big 8.ja groß 9.big 10.groß Expands into pre-aligned: 1.none small big 2.ja klein groß 3.small none 4.ja klein 5.small none 6.klein ja 7.small 8.klein 9.big none 10.ja groß 11.big none 12.groß ja 13.big 14.groß 4 pairs yields 6 pre-aligned pairs 4 pairs yields 6 pre-aligned pairs

17 2x1 training data with n0ne Iteration threshold: 0.1 Iteration 4 t(none |ja ) = 0.92 t(small|ja ) = 0.04 t(big |ja ) = 0.04 t(none |klein) = 0.05 t(small|klein) = 0.95 t(none |groß ) = 0.05 t(big |groß ) = 0.95 Iteration threshold: 0.01 Iteration 7 t(none |ja ) = 0.99 t(none |klein) = 0.01 t(small|klein) = 0.99 t(none |groß ) = 0.01 t(big |groß ) = 0.99 IBM Model 1 Iteration threshold: 0.001 Iteration 11 t(none |ja ) = 1.00 t(small|klein) = 1.00 t(big |groß ) = 1.00 Converges nicely! python3 ibm1pre.py 0.1 ibm2x1nonealign.txt

18 2x1 training data ibm2x1align.txt Same dataset but no n0n e: 1.small 2.ja klein 3.small 4.klein 5.big 6.ja groß 7.big 8.groß Iteration threshold: 0.1 Iteration 2 t(small|ja ) = 0.50 t(big |ja ) = 0.50 t(small|klein) = 1.00 t(big |groß ) = 1.00 python3 ibm1pre.py 0.1 ibm2x1align.txt Best we can do given limitation of no n0ne ! Best we can do given limitation of no n0ne !

19 With n0ne Summary Model 1: Iteration threshold: 0.1 Iteration 2 t(small|ja ) = 0.50 t(big |ja ) = 0.50 t(small|klein) = 1.00 t(big |groß ) = 1.00 Iteration threshold: 0.001 Iteration 11 t(none |ja ) = 1.00 t(small|klein) = 1.00 t(big |groß ) = 1.00 Iteration threshold: 1e-05 Iteration 229 t(the |das ) = 1.00 t(a |ein ) = 1.00 t(book |Buch ) = 1.00 t(house|Haus ) = 1.00 Iteration threshold: 0.001 Iteration 26 t(none |das ) = 0.33 t(the |das ) = 0.67 t(a |ein ) = 0.76 t(book |ein ) = 0.24 t(none |Buch ) = 0.33 t(book |Buch ) = 0.67 t(the |Haus ) = 0.24 t(house|Haus ) = 0.76 Best!

20 Training Recall IBM Model 2: – t(e|f) from Model 1 – alignment probability distribution: a(i|j,l e,l f ) l e,l f = length of English and foreign sentences (resp.) i,j = index into foreign and English sentences (resp.) Assume training pair: 1.the house 2.das Haus is equivalent to: 1.the house 2.das Haus 3.1 1 2 2 4.the house 5.Haus das 6.2 1 1 2 7.the house none 8.das das Haus 9.1 1 1 2 2 0 10.the house none 11.Haus Haus das 12.2 1 2 2 1 0 (e and f in ( ⇕ ) 1-to-1 correspondence) Alignment line: i j i j … 1st 2nd pair ( j =0 if n0ne )

21 Training Model 2 training data: 7.the house n0ne ⇕ ⇕ ⇕ 8.das das Haus 9.1 1 1 2 2 0 See training datum #2: 7.house 8.das 9.1 2 Compute a weighted sum of the # times we see a datum: Initially, set: – a( 1 | 1,2,2 ) = 0.50 – a( 2|1,2,2 ) = 0.50 – uniform distribution t(house|das)*a(1|2,2,2) = k c(house|d as) +k c(das) +k Normalize: new t( house | das ) = c( house | das )/c(d as ) new a(1|2,2,2) = c(1|2,2,2)/c(2,2,2) Normalize: new t( house | das ) = c( house | das )/c(d as ) new a(1|2,2,2) = c(1|2,2,2)/c(2,2,2) c(1|2,2,2) +k c(2,2,2) +k

22 2x2 training data with n0ne ibm2x2nonealign.txt Expansion of : 1.the a book house 2.das ein Buch Haus 3.the house 4.das Haus 5.the book 6.das Buch 7.a book 8.ein Buch Assume fertility 0, 1, 2. Assume n0ne. 1.n0ne the a book house 2.das ein Buch Haus 3.the house 4.das Haus 5.1 1 2 2 6.the house 7.Haus das 8.2 1 1 2 9.the house none 10.das das Haus 11.1 1 1 2 2 0 12.the house none 13.Haus Haus das 14.2 1 2 2 1 0 15.the book 16.das Buch 17.1 1 2 2 18.the book 19.Buch das 20.2 1 1 2 3 pairs yields 12 pre- aligned pairs 3 pairs yields 12 pre- aligned pairs 21.the book none 22.das das Buch 23.1 1 1 2 2 0 24.the book none 25.Buch Buch das 26.2 1 2 2 1 0 27.a book 28.ein Buch 29.1 1 2 2 30.a book 31.Buch ein 32.2 1 1 2 33.a book none 34.ein ein Buch 35.1 1 1 2 2 0 36.a book none 37.Buch Buch ein 38.2 1 2 2 1 0

23 Probabilities Training Data Training Data IBM Models 1 and 2 Recall the cascading architecture: unaligned sentence pairs, e.g. the house das Haus unaligned sentence pairs, e.g. the house das Haus Model 1: compute t(e|f) Model 2: compute t(e|f) and a(i|j,l e,l f ) t(e|f) Compute Alignment/Fe rtility possibilities aligned sentence pairs, e.g. the house n0ne das das Haus 1 1 1 2 2 0 aligned sentence pairs, e.g. the house n0ne das das Haus 1 1 1 2 2 0 t(e|f) a(i|j,l e,l f ) t(e|f) a(i|j,l e,l f )

24 2x2 training data with n0ne Stage 1: python3 ibm1pre.py 0.1 ibm2x2nonealign.txt -o ibm2x2nonealign_model1.txt Iteration threshold: 0.1 Training data: ibm2x2nonealign.txt t(e|f) output: ibm2x2nonealign_model1.txt Number of pairs read: 12 Iteration 2 t(none |das ) = 0.18 t(the |das ) = 0.51 t(book |das ) = 0.16 t(house|das ) = 0.16 t(none |ein ) = 0.15 t(a |ein ) = 0.48 t(book |ein ) = 0.37 t(none |Buch ) = 0.18 t(the |Buch ) = 0.16 t(a |Buch ) = 0.16 t(book |Buch ) = 0.51 t(none |Haus ) = 0.15 t(the |Haus ) = 0.37 t(house|Haus ) = 0.48 Stage 2: python3 ibm2pre.py 0.01 ibm2x2nonealign.txt ibm2x2nonealign_model1.txt Read training data from file: ibm2x2nonealign.txt, pairs: 12 Read IBM model 1 from file: ibm2x2nonealign_model1.txt Iteration 5 t(the |das ) = 1.00 t(a |ein ) = 1.00 t(book |Buch ) = 1.00 t(house|Haus ) = 1.00 a(1|0,2,2)=0.50 a(2|0,2,2)=0.50 a(1|1,2,2)=1.00 a(2|2,2,2)=1.00 foreign -> English 1 -> 1 2 -> 2 foreign -> English 1 -> 1 2 -> 2

25 2x2 training data ibm2x2align.txt Same dataset: 1.the a book house 2.das ein Buch Haus 3.the house 4.das Haus 5.the book 6.das Buch 7.a book 8.ein Buch Assume fertility 1. Assume no n0ne. 1.the a book house 2.das ein Buch Haus 3.the house 4.das Haus 5.1 1 2 2 6.the house 7.Haus das 8.2 1 1 2 9.the book 10.das Buch 11.1 1 2 2 12.the book 13.Buch das 14.2 1 1 2 15.a book 16.ein Buch 17.1 1 2 2 18.a book 19.Buch ein 20.2 1 1 2 3 pairs yields 6 pre-aligned pairs 3 pairs yields 6 pre-aligned pairs

26 2x2 training data Stage 1: python3 ibm1pre.py 0.1 ibm2x2align.txt -o ibm2x2align_model1.txt Iteration threshold: 0.1 Training data: ibm2x2align.txt t(e|f) output: ibm2x2align_model1.txt Number of pairs read: 6 Iteration 4 t(the |das ) = 0.83 t(book |das ) = 0.08 t(house|das ) = 0.09 t(a |ein ) = 0.72 t(book |ein ) = 0.28 t(the |Buch ) = 0.08 t(a |Buch ) = 0.09 t(book |Buch ) = 0.83 t(the |Haus ) = 0.28 t(house|Haus ) = 0.72 Stage 2: python3 ibm2pre.py 0.1 ibm2x2align.txt ibm2x2align_model1.txt Iteration threshold: 0.1 Read training data from file: ibm2x2align.txt, pairs: 6 Read IBM model 1 from file: ibm2x2align_model1.txt Iteration 1 t(the |das ) = 1.00 t(a |ein ) = 1.00 t(book |Buch ) = 1.00 t(house|Haus ) = 1.00 a(1|1,2,2)=1.00 a(2|2,2,2)=1.00 foreign -> English 1 -> 1 2 -> 2 foreign -> English 1 -> 1 2 -> 2

27 2x2 reverse training data ibm2x2ralign.txt Word order reversed: 1.the a book house 2.das ein Buch Haus 3.the house 4.Haus das 5.the book 6.Buch das 7.a book 8.Buch ein Assume fertility 1. Assume no n0ne. 1.the a book house 2.das ein Buch Haus 3.the house 4.Haus das 5.1 1 2 2 6.the house 7.das Haus 8.2 1 1 2 9.the book 10.Buch das 11.1 1 2 2 12.the book 13.das Buch 14.2 1 1 2 15.a book 16.Buch ein 17.1 1 2 2 18.a book 19.ein Buch 20.2 1 1 2 3 pairs yields 6 pre-aligned pairs 3 pairs yields 6 pre-aligned pairs

28 2x2 reverse training data Stage 1: python3 ibm1pre.py 0.1 ibm2x2ralign.txt -o ibm2x2ralign_model1.txt Iteration threshold: 0.1 Training data: ibm2x2ralign.txt t(e|f) output: ibm2x2ralign_model1.txt Number of pairs read: 6 Iteration 4 t(the |das ) = 0.83 t(book |das ) = 0.08 t(house|das ) = 0.09 t(a |ein ) = 0.72 t(book |ein ) = 0.28 t(the |Buch ) = 0.08 t(a |Buch ) = 0.09 t(book |Buch ) = 0.83 t(the |Haus ) = 0.28 t(house|Haus ) = 0.72 Stage 2: python3 ibm2pre.py 0.01 ibm2x2ralign.txt ibm2x2ralign_model1.txt Iteration threshold: 0.01 Read training data from file: ibm2x2ralign.txt, pairs: 6 Read IBM model 1 from file: ibm2x2ralign_model1.txt Iteration 4 t(the |das ) = 1.00 t(a |ein ) = 1.00 t(book |Buch ) = 1.00 t(house|Haus ) = 1.00 a(2|1,2,2)=1.00 a(1|2,2,2)=1.00 foreign -> English 2 -> 1 1 -> 2 foreign -> English 2 -> 1 1 -> 2

29 Reverse With n0ne Summary: 2x2 training data Model 2: Iteration threshold: 0.01 Iteration 4 t(the |das ) = 1.00 t(a |ein ) = 1.00 t(book |Buch ) = 1.00 t(house|Haus ) = 1.00 a(2|1,2,2)=1.00 a(1|2,2,2)=1.00 Iteration threshold: 0.01 Iteration 5 t(the |das ) = 1.00 t(a |ein ) = 1.00 t(book |Buch ) = 1.00 t(house|Haus ) = 1.00 a(1|0,2,2)=0.50 a(2|0,2,2)=0.50 a(2|1,2,2)=1.00 a(1|2,2,2)=1.00 Iteration threshold: 0.1 Iteration 1 t(the |das ) = 1.00 t(a |ein ) = 1.00 t(book |Buch ) = 1.00 t(house|Haus ) = 1.00 a(1|1,2,2)=1.00 a(2|2,2,2)=1.00 Iteration threshold: 0.01 Iteration 5 t(the |das ) = 1.00 t(a |ein ) = 1.00 t(book |Buch ) = 1.00 t(house|Haus ) = 1.00 a(1|0,2,2)=0.50 a(2|0,2,2)=0.50 a(1|1,2,2)=1.00 a(2|2,2,2)=1.00

30 2x1 training data with n0ne 1.none small big 2.ja klein groß 3.small none 4.klein ja 5.2 1 1 0 6.small none 7.ja klein 8.1 1 2 0 9.small 10.klein 11.1 1 12.big none 13.groß ja 14.2 1 1 0 15.big none 16.ja groß 17.1 1 2 0 18.big 19.groß 20.1 1 4 pairs yields 6 pre-aligned pairs 4 pairs yields 6 pre-aligned pairs ibm2x1nonealign.txt From: 1.small big 2.ja klein groß 3.small 4.ja klein 5.small 6.klein 7.big 8.ja groß 9.big 10.groß. Assume n0ne. Assume fertility 0 and 1.

31 2x1 training data with n0ne Stage 1: python3 ibm1pre.py 0.1 ibm2x1nonealign.txt -o ibm2x1nonealign_model1.txt Iteration threshold: 0.1 Training data: ibm2x1nonealign.txt t(e|f) output: ibm2x1nonealign_model1.txt Number of pairs read: 6 Iteration 4 t(none |ja ) = 0.92 t(small|ja ) = 0.04 t(big |ja ) = 0.04 t(none |klein) = 0.05 t(small|klein) = 0.95 t(none |groß ) = 0.05 t(big |groß ) = 0.95 Stage 2: python3 ibm2pre.py 0.01 ibm2x1nonealign.txt ibm2x1nonealign_model1.txt Iteration threshold: 0.01 Read training data from file: ibm2x1nonealign.txt, pairs: 6 Read IBM model 1 from file: ibm2x1nonealign_model1.txt Iteration 3 t(none |ja ) = 1.00 t(small|klein) = 1.00 t(big |groß ) = 1.00 a(1|0,1,2)=1.00 a(2|1,1,2)=1.00 a(1|1,1,1)=1.00 foreign -> English 2 -> 1 1 -> 0 foreign -> English 2 -> 1 1 -> 0 foreign -> English 1 -> 1 foreign -> English 1 -> 1

32 2x1 training data 1.small big 2.ja klein groß 3.small 4.ja klein 5.1 1 2 1 6.small 7.klein 8.1 1 9.big 10.ja groß 11.1 1 2 1 12.big 13.groß 14.1 1 4 pairs yields 4 pre-aligned pairs 4 pairs yields 4 pre-aligned pairs ibm2x1align.txt From: 1.small big 2.ja klein groß 3.small 4.ja klein 5.small 6.klein 7.big 8.ja groß 9.big 10.groß Assume fertility 1. Assume no n0ne.

33 2x1 training data Stage 1: python3 ibm1pre.py 0.1 ibm2x1align.txt -o ibm2x1align_model1.txt Iteration threshold: 0.1 Training data: ibm2x1align.txt t(e|f) output: ibm2x1align_model1.txt Number of pairs read: 6 Iteration 2 t(small|ja ) = 0.50 t(big |ja ) = 0.50 t(small|klein) = 1.00 t(big |groß ) = 1.00 Stage 2: python3 ibm2pre.py 0.01 ibm2x1align.txt ibm2x1align_model1.txt Iteration threshold: 0.01 Read training data from file: ibm2x1align.txt, pairs: 6 Read IBM model 1 from file: ibm2x1align_model1.txt Iteration 1 t(small|ja ) = 0.50 t(big |ja ) = 0.50 t(small|klein) = 1.00 t(big |groß ) = 1.00 a(1|1,1,2)=0.33 a(2|1,1,2)=0.67 a(1|1,1,1)=1.00 foreign -> English 1 -> 1 (33%), 2 -> 1 (67%) when l e = 1, l f = 2, and 1 -> 1, l e = 1, l f = 1 foreign -> English 1 -> 1 (33%), 2 -> 1 (67%) when l e = 1, l f = 2, and 1 -> 1, l e = 1, l f = 1

34 with n0ne Summary: 2x1 training data Model 2: Iteration threshold: 0.01 Iteration 1 t(small|ja ) = 0.50 t(big |ja ) = 0.50 t(small|klein) = 1.00 t(big |groß ) = 1.00 a(1|1,1,2)=0.33 a(2|1,1,2)=0.67 a(1|1,1,1)=1.00 Iteration threshold: 0.1 Iteration 3 t(none |ja ) = 1.00 t(small|klein) = 1.00 t(big |groß ) = 1.00 a(1|0,1,2)=1.00 a(2|1,1,2)=1.00 a(1|1,1,1)=1.00

35 Training IBM Model "3a": – t(e|f) from Model 1 – a(i|j,l e,l f ) from Model 2 – fertility probability distribution: n(ɸ|f) ɸ = fertility (0,1,2..) f = foreign word – no sampling: "hill climbing" Assume training pair with alignment: 1.the house n0ne ⇕ ⇕ ⇕ 1.das das Haus 2.1 1 1 2 2 0 (e and f in ( ⇕ ) 1-to-1 correspondence) Infer fertility mapping: 1 -> 2 2 -> 0 t( n0ne |Haus)*a(2|0,2,2) *n(0|Haus) = k c( n0ne |Ha us) +k c(Haus) +k c(2|0,2,2) +k c(0,2,2) +k c(0|Haus)c(Haus) +k nn a at t

36 Training Model 3 training data: 7.the house n0ne ⇕ ⇕ ⇕ 8.das das Haus 9.1 1 1 2 2 0 Initially, set: – n( 0 | Haus ) = 0.33 – n( 1|Haus ) = 0.33 – n( 2|Haus ) = 0.33 – uniform distribution Normalize: new t( n0ne | Haus ) = c( n0ne | Haus )/c( Haus ) new a(2|0,2,2) = c(2|0,2,2)/c(0,2,2) new n(0| Haus ) = c n (0| Haus )/c n ( Haus ) Normalize: new t( n0ne | Haus ) = c( n0ne | Haus )/c( Haus ) new a(2|0,2,2) = c(2|0,2,2)/c(0,2,2) new n(0| Haus ) = c n (0| Haus )/c n ( Haus ) t( n0ne |Haus)*a(2|0,2,2) *n(0|Haus) = k c( n0ne |Ha us) +k c(Haus) +k c(2|0,2,2) +k c(0,2,2) +k c(0|Haus)c(Haus) +k nn a at t

37 Probabilities Training Data Training Data IBM Models 1, 2 and 3a A cascading architecture: unaligned sentence pairs, e.g. the house das Haus unaligned sentence pairs, e.g. the house das Haus Model 1: compute t(e|f) Model 2: compute t(e|f) and a(i|j,l e,l f ) t(e|f) Compute Alignment/Fe rtility possibilities aligned sentence pairs, e.g. the house n0ne das das Haus 1 1 1 2 2 0 aligned sentence pairs, e.g. the house n0ne das das Haus 1 1 1 2 2 0 t(e|f) a(i|j,l e,l f ) t(e|f) a(i|j,l e,l f ) Model 3a: compute t(e|f), a(i|j,l e,l f ) and n(ɸ|f) t(e|f) a(i|j,l e,l f ) n(ɸ|f) t(e|f) a(i|j,l e,l f ) n(ɸ|f)

38 2x1 training data with n0ne ibm2x1nonealign.txt From original dataset: 1.small big 2.ja klein groß 3.small 4.ja klein 5.small 6.klein 7.big 8.ja groß 9.big 10.groß Assume fertility 0 and 1. Assume n0ne. Initially: n(0|ja)=0.50 n(1|ja)=0.50 n(0|klein)=0.50 n(1|klein)=0.50 n(0|groß)=0.50 n(1|groß)=0.50

39 2x1 training data with n0ne Stage 2: python3 ibm2pre.py 0.1 ibm2x1nonealign.txt ibm2x1nonealign_model1.txt -o ibm2x1nonealign_model2.txt Iteration threshold: 0.1 Read training data from file: ibm2x1nonealign.txt, pairs: 6 Read IBM model 1 from file: ibm2x1nonealign_model1.txt Iteration 1 t(none |ja ) = 0.96 t(small|ja ) = 0.02 t(big |ja ) = 0.02 t(none |klein) = 0.02 t(small|klein) = 0.98 t(none |groß ) = 0.02 t(big |groß ) = 0.98 a(1|0,1,2)=0.95 a(2|0,1,2)=0.05 a(1|1,1,2)=0.04 a(2|1,1,2)=0.96 a(1|1,1,1)=1.00 Stage 3: python3 ibm3apre.py 0.1 ibm2x1nonealign.txt ibm2x1nonealign_model2.txt Read training data from ibm2x1nonealign.txt, pairs: 6, none: True, max fertility: 1 Read IBM model 2 from ibm2x1nonealign_model2.txt Iteration 1 t(none |ja ) = 1.00 t(small|klein) = 1.00 t(big |groß ) = 1.00 a(1|0,1,2)=1.00 a(2|1,1,2)=1.00 a(1|1,1,1)=1.00 n(0|ja)=1.00 n(1|klein)=1.00 n(1|groß)=1.00 fertility converges nicely!

40 2x1 training data ibm2x1align.txt From original dataset: 1.small big 2.ja klein groß 3.small 4.ja klein 5.small 6.klein 7.big 8.ja groß 9.big 10.groß Assume fertility 1. Assume no n0ne. 1.small big 2.ja klein groß 3.small 4.ja klein 5.1 1 2 1 6.small 7.klein ja 8.2 1 1 1 9.small 10.klein 11.1 1 12.big 13.ja groß 14.1 1 2 1 15.big 16.groß ja 17.2 1 1 1 18.big 19.groß 20.1 1

41 2x1 training data Stage 2: python3 ibm2pre.py 0.1 ibm2x1align.txt ibm2x1align_model1.txt -o ibm2x1align_model2.txt Iteration threshold: 0.1 Read training data from file: ibm2x1align.txt, pairs: 6 Read IBM model 1 from file: ibm2x1align_model1.txt Iteration 1 t(small|ja ) = 0.50 t(big |ja ) = 0.50 t(small|klein) = 1.00 t(big |groß ) = 1.00 a(1|1,1,2)=0.33 a(2|1,1,2)=0.67 a(1|1,1,1)=1.00 Stage 3: python3 ibm3apre.py 0.1 ibm2x1align.txt ibm2x1align_model2.txt Read training data from ibm2x1align.txt, pairs: 6, none: False, max fertility: 1 Read IBM model 2 from ibm2x1align_model2.txt Iteration 1 t(small|ja ) = 0.50 t(big |ja ) = 0.50 t(small|klein) = 1.00 t(big |groß ) = 1.00 a(1|1,1,2)=0.20 a(2|1,1,2)=0.80 a(1|1,1,1)=1.00 n(1|ja)=1.00 n(1|klein)=1.00 n(1|groß)=1.00 Only fertility 1 is possible given n0ne not available

42 2x2 training data ibm2x2.txt Data: 1.the a book house 2.das ein Buch Haus 3.the house 4.das Haus 5.the book 6.das Buch 7.a book 8.ein Buch Files: no n0ne, max fertility 2: ibm2x2align.txt Model 1: ibm2x2align_model1.txt Model 2: ibm2x2align_model2.txt no n0ne, max fertility 2: ibm2x2nonealign.txt Model 1: ibm2x2nonealign_model1.txt Model 2: ibm2x2nonealign_model2.txt

43 Stage 1: python3 ibm1pre.py 0.1 ibm2x2align.txt -o ibm2x2align_model1.txt Iteration threshold: 0.1 Training data: ibm2x2align.txt t(e|f) output: ibm2x2align_model1.txt Number of pairs read: 6 Iteration 4 t(the |das ) = 0.83 t(book |das ) = 0.08 t(house|das ) = 0.09 t(a |ein ) = 0.72 t(book |ein ) = 0.28 t(the |Buch ) = 0.08 t(a |Buch ) = 0.09 t(book |Buch ) = 0.83 t(the |Haus ) = 0.28 t(house|Haus ) = 0.72 2x2 training data Stage 2: python3 ibm2pre.py 0.1 ibm2x2align.txt ibm2x2align_model1.txt Iteration threshold: 0.1 Read training data from file: ibm2x2align.txt, pairs: 6 Read IBM model 1 from file: ibm2x2align_model1.txt Iteration 1 t(the |das ) = 1.00 t(a |ein ) = 1.00 t(book |Buch ) = 1.00 t(house|Haus ) = 1.00 a(1|1,2,2)=1.00 a(2|2,2,2)=1.00 Stage 3: python3 ibm3apre.py 0.1 ibm2x2align.txt ibm2x2align_model2.txt Read training data from ibm2x2align.txt, pairs: 6, none: False, max fertility: 1 Read IBM model 2 from ibm2x2align_model2.txt Iteration 1 t(the |das ) = 1.00 t(a |ein ) = 1.00 t(book |Buch ) = 1.00 t(house|Haus ) = 1.00 a(1|1,2,2)=1.00 a(2|2,2,2)=1.00 n(1|das)=1.00 n(1|ein)=1.00 n(1|Buch)=1.00 n(1|Haus)=1.00

44 2x2 training data with n0ne python3 ibm2pre.py 0.1 ibm2x2nonealign.txt ibm2x2nonealign_model1.txt Iteration threshold: 0.1 Read training data from file: ibm2x2nonealign.txt, pairs: 12 Read IBM model 1 from file: ibm2x2nonealign_model1.txt Iteration 4 t(the |das ) = 1.00 t(a |ein ) = 0.99 t(book |Buch ) = 1.00 t(house|Haus ) = 0.99 a(1|0,2,2)=0.50 a(2|0,2,2)=0.50 a(1|1,2,2)=1.00 a(2|2,2,2)=1.00 python3 ibm3apre.py 0.1 ibm2x2nonealign.txt ibm2x2nonealign_model2.txt Read training data from ibm2x2nonealign.txt, pairs: 12, none: True, max fertility: 2 Read IBM model 2 from ibm2x2nonealign_model2.txt Iteration 1 t(the |das ) = 1.00 t(a |ein ) = 1.00 t(book |Buch ) = 1.00 t(house|Haus ) = 1.00 a(1|0,2,2)=0.50 a(2|0,2,2)=0.50 a(1|1,2,2)=1.00 a(2|2,2,2)=1.00 n(1|das)=0.50 n(2|das)=0.50 n(1|ein)=0.50 n(2|ein)=0.50 n(1|Buch)=0.50 n(2|Buch)=0.50 n(1|Haus)=0.50 n(2|Haus)=0.50 Initially, fertility: n(0|das)=0.33 n(1|das)=0.33 n(2|das)=0.33 n(0|ein)=0.33 n(1|ein)=0.33 n(2|ein)=0.33 n(0|Buch)=0.33 n(1|Buch)=0.33 n(2|Buch)=0.33 n(0|Haus)=0.33 n(1|Haus)=0.33 n(2|Haus)=0.33 Fertility doesn't converge!


Download ppt "Computational Linguistics Seminar LING-696G Week 10."

Similar presentations


Ads by Google