Last Time Statistical Machine Translation (SMT) –popular now –Language Weaver (Arabic, also French etc.) newest one: Persian
Translating is EU's new boom industry 2004 article
market is there: opportunities for machine translation?
Statistical MT Avoid the explicit construction of linguistically sophisticated models of grammar Pioneered by IBM researchers (Brown et al., 1990) –Language Model Pr(S) estimated by n-grams –Translation Model Pr(T|S) estimated through alignment models
N-grams language model: P(sentence) idea: –collect statistics on co-occurrence of adjacent words Brown corpus (1 million words): –word wfrequency(w)probability(w) –the 69, –rabbit example: –Just then, the white –expectation is p(white rabbit) > p(white the) –but p(the) > p(rabbit)
Statistical Machine Translation Machine Translation –Source sentence S –Target sentence T –Every pair (S,T) has a probability –P(T|S) = probability target is T given S
Statistical Machine Translation The Language Model: P(S) –bigrams: w 1 w 2 w 3 w 4 w 5 w 1 w 2, w 2 w 3, w 3 w 4, w 4 w 5 –sequences of words S = w 1 … w n P(S) = P(w 1 )P(w 2 | w 1 )…P(w n | w 1 …w n-1 ) –product of probability of w i given preceding context for w i problem: we need to know too many probabilities –bigram approximation limit the context to the previous word P(S) ≈ P(w 1 )P(w 2 | w 1 )…P(w n | w n-1 ) –bigram probability estimation from corpora P(w i | w i-1 ) ≈ freq(w i-1 w i )/freq(w i-1 ) in a corpus
Statistical Machine Translation The Translation Model: P(T|S) –Alignment model: assume there is a transfer relationship between source and target words not necessarily 1-to-1 –Example S = w 1 w 2 w 3 w 4 w 5 w 6 w 7 T = u 1 u 2 u 3 u 4 u 5 u 6 u 7 u 8 u 9 w 4 -> u 3 u 5 fertility of w 4 = 2 distortion w 5 -> u 9
Statistical Machine Translation Alignment notation –use word positions in parentheses –no word position, no mapping –Example ( Les propositions ne seront pas mises en application maintenant | The(1) proposal(2) will(4) not(3,5) now(9) be implemented(6,7,8) ) This particular alignment is not correct, an artifact of their algorithm
Statistical Machine Translation How to compute probability of an alignment? –Need to estimate Fertility probabilities –P(fertility=n|w) = probability word w has fertility n Distortion probabilities –P(i|j,l) = probability target word is at position i given source word at position j and l is the length of the target –Example (Le chien est battu par Jean | John(6) does beat(3,4) the(1) dog(2)) –P(f=1|John)P(Jean|John) x –P(f=0|does) x –P(f=2|beat)P(est|beat)P(battu|beat) x –P(f=1|the)P(Le|the) x –P(f=1|dog)P(chien|dog) x –P(f=1| )P(par| ) x distortion probabilities…
Statistical Machine Translation Not done yet –Given T –translation problem is to find S that maximizes P(S)P(T|S) –can’t look for all possible S in the language Idea (Search): –construct best S incrementally –start with a highly likely word transfer –and find a valid alignment –extending candidate S at each step –(Jean aime Marie | * ) –(Jean aime Marie | John(1) * ) Failure? –best S not a good translation language model failed or translation model failed –couldn’t find best S search failure
Statistical Machine Translation Parameter Estimation –English/French from the Hansard corpus –100 million words –bilingual Canadian parliamentary proceedings –unaligned corpus –Language Model P(S) from bigram model –Translation Model how to estimate this with an unaligned corpus? Used EM (Estimation and Maximization) algorithm, an iterative algorithm for re- estimating probabilities Need –P(u|w) for words u in T and w in S –P(n|w) for fertility n and w in S –P(i|j,l) for target position i and source position j and target length l
Statistical Machine Translation Experiment 1: Parameter Estimation for the Translation Model –Pick 9,000 most common words for French and English –40,000 sentence pairs –81,000,000 parameters –Initial guess: minimal assumptions
Statistical Machine Translation Experiment 1: results –(English) Hear, hear! –(French) Bravo!
Statistical Machine Translation Experiment 2: Translation from French to English –Make task manageable English lexicon –1,000 most frequent English words in corpus French lexicon –1,700 most frequent French words in translations completely covered by the selected English words 117,000 sentence pairs with words covered by the lexicons 17 million parameters estimated for the translation model bigram model of English –570,000 sentences –12 million words –73 test sentences Categories: (exact, alternate, different), wrong, ungrammatical
48% (Exact, alternate, different) Editing 776 keystrokes 1,916 Hansard
English Grammar: DCG rules sbar(PA) --> np(X,wh), do(_,_), s_objectwh(_,S,P), {headof(X,O), PA =..[P,S,O]}. sbar(S) --> s(S). s_objectwh(s(Y,Z),S,P) --> np(Y,_), vp_objectwh(Z), {headof(Y,S),headof(Z,P)}. s(PA) --> np(Y,_), vp(Z,_), {predarg(Y,Z,1,PA)}. np(np(Y),Q) --> pronoun(Y,Q). np(np(Y),notwh) --> proper_noun(Y). np(np(D,N),Q) --> det(D,Number), common_noun(N,Number,Q). vp(vp(v(died)),ed) --> [kicked,the,bucket]. vp(vp(Y,Z),F) --> transitive(Y,F), np(Z,_). vp(vp(A,V),F) --> aux(A,F), transitive(V,en). vp_objectwh(vp(Y)) --> transitive(Y,root). det(det(the),_) --> [the]. det(det(a),sg) --> [a]. common_noun(n(bucket),sg,notwh) --> [bucket]. common_noun(n(buckets),pl,notwh ) --> [buckets]. common_noun(n(apple),sg,notwh) --> [apple]. common_noun(n(apples),pl,notwh) --> [apples]. common_noun(n(man),sg,notwh) -- > [man]. common_noun(n(book),sg,notwh) - -> [book]. common_noun(n(books),pl,notwh) --> [books].
English Grammar: pronoun(who,wh) --> [who]. pronoun(what,wh) --> [what]. proper_noun(john) --> [john]. transitive(v(eats),s) --> [eats]. transitive(v(ate),ed) --> [ate]. transitive(v(eaten),en) --> [eaten]. transitive(v(buy),root) --> [buy]. transitive(v(buys),s) --> [buys]. transitive(v(bought),ed) --> [bought]. transitive(v(bought),en) --> [bought]. transitive(v(kicks),s) --> [kicks]. transitive(v(kicked),ed) --> [kicked]. transitive(v(kicked),en) --> [kicked]. aux(aux(was),ed) --> [was]. aux(aux(is),s) --> [is]. do(aux(does),s) --> [does]. do(aux(did),ed) --> [did].
Japanese Grammar: DCG Rules s(PA) --> np(Y,Q1), nomcase, vp(Z,Q2), sf(Q1,Q2), {predarg(Y,Z,2,PA)}. vp(vp(Z,Y),Q) --> np(Z,Q), acccase, transitive(Y). transitive(v(katta)) --> [katta]. nomcase --> [ga]. acccase --> [o]. np(np(taroo),notwh) --> [taroo]. np(np(hon),notwh) --> [hon]. np(np(dare),wh) --> [dare]. np(np(nani),wh) --> [nani]. sf(wh,notwh) --> [ka]. sf(notwh,wh) --> [ka]. sf(notwh,notwh) --> []. sf(wh,wh) --> [ka]. predarg(X,Y,Order,PA) :- headof(X,S), headof(Y,P), order(Order,Y,NP), headof(NP,O), PA =.. [P,S,O]. predarg(X,Y,_,PA) :- headof(X,S), headof(Y,P), Y = vp(_), PA =.. [P,S]. order(1,vp(_,NP),NP). order(2,vp(NP,_),NP). headof(np(_,n(N)),N). headof(vp(v(V),_),V). headof(vp(_,v(V)),V). headof(vp(v(V)),V). headof(np(N),N).
Translator: Prolog translation code –translate(E,J) :-% Translator – sbar(X,E,[]), % English grammar – mapPA(X,Xp), – js(Xp,J,[]). % Japanese grammar –mapPA(E,J) :- % Map predicate-argument E =.. [P,S,O], je(PJ,P), je(SJ,S), je(OJ,O), J =.. [PJ,SJ,OJ]. –je(katta,bought).% Bilingual dictionary –je(hon,book). –je(taroo,john). –je(dare,who). –je(nani,what). –je(katta,buy).
Question 2: Tense Homework Question –(A) English morphology and tense –(1) (1pt) Why does ?- translate(X,[taroo,ga,hon,o,katta]). return duplicate answers? –(2) (2pts) fix the problem –(3) (1pt) *John buy the book(John buys the book) –are accepted by the English grammar fix the problem –submit both your grammar and relevant examples
Question 2: Tense English Grammar –code –vp(vp(Y,Z),F) --> transitive(Y,F), {F \== en, F \== root}, np(Z,_). –replacing unrestricted –vp(vp(Y,Z),F) --> transitive(Y,F), np(Z,_). –blocks both –*John bought the bookbought = -en form –*John buy the bookbuy = root form
Question 2: Tense Homework Question –(B) Tense and predicate-argument structure –let’s expand the grammar slightly –assume kau (buy(s)) is the present tense form of katta (bought) –(3pts) –Modify the translator to respect tenses when translating between John buys a booktaroo-ga hon-o kau John bought a booktaroo-ga hon-o katta –submit both your code and all relevant translations, e.g. ?- translate([john,buys,a,book],X). ?- translate(X,[taroo,ga,hon,o,kau]).
Question 2: Tense Translator –Code –je(kau,buys). Japanese grammar –Code –jtransitive(v(kau)) --> [kau].
Question 3: Yes-No Questions Homework Question –modify the English and Japanese grammars to incorporate yes-no questions –modify the translator to operate on yes-no questions –examples: Did John buy a book? yesno(buy(john,book)) Taroo-ga hon-o katta ka yesno(katta(taroo,hon))
Question 3: Yes-No Questions English grammar –Code –sbar(yesno(PA)) --> do(_,_), s_rootv(PA). –s_rootv(PA) --> np(Y,_), vp_rootv(Z,_), {predarg(Y,Z,1,PA)}. –vp_rootv(vp(Y,Z),root) --> transitive(Y,root), np(Z,_). Japanese grammar –Code –js(yesno(PA)) --> jnp(Y,notwh), nomcase, jvp(Z,notwh), [ka], {predarg(Y,Z,2,PA)}. Translator –Code –mapPA(yesno(E),yesno(J)) :- mapPA(E,J).
Question 4: English Idiom Complete the translator so that –John kicked the bucket has both a literal and an idiomatic translation Taroo-ga buketsu-o ketta Taroo-ga shinda buketsu = bucket shinda = died ketta = kicked –John kicked the buckets has only a literal translation Taroo-ga buketsu-o ketta (assuming Japanese does not distinguish number)
Question 4: English Idiom Japanese grammar –Code –jvp(vp(Y),notwh) --> jintransitive(Y). –jintransitive(v(shinda)) --> [shinda]. –jtransitive(v(ketta)) --> [ketta]. –jnp(np(buketsu),notwh) --> [buketsu]. Translator –Code –je(ketta,kicked). –je(shinda,died). –je(buketsu,bucket). –je(buketsu,buckets). –mapPA(E,J) :- % for intransitive P(S) PJ(SJ) –E =.. [P,S], je(PJ,P), je(SJ,S), J =.. [PJ,SJ].
Question 5: Japanese Idiom examples –taroo-ga sensei-ni goma-o sutta –taroo-nom teacher-dat sesame-acc grinded –“John flattered the teacher” –taroo-ga Hanako-ni goma-o sutta –taroo-nom Hanako-dat sesame-acc grinded –“John flattered Mary” –ni = (dat) dative Case marker –odateta is the Japanese counterpart for flattered
Question 5: Japanese Idiom English grammar –Code –common_noun(n(teacher),sg,notwh) --> [teacher]. –proper_noun(mary) --> [mary]. –transitive(v(flattered),ed) --> [flattered]. Japanese grammar –Code –jvp(vp(Z,v(odateta)),Q) --> jnp(Z,Q),datcase, [goma],acccase, [sutta]. –jtransitive(v(odateta)) --> [odateta]. –jnp(np(hanako),notwh) --> [hanako]. –jnp(np(sensei),notwh) --> [sensei]. –datcase --> [ni]. Translator –Code –je(odateta,flattered). –je(hanako,mary). –je(sensei,teacher).
