Presentation is loading. Please wait.

Presentation is loading. Please wait.

Semantic Representation and Formal Transformation kevin knight usc/isi MURI meeting, CMU, Nov 3-4, 2011.

Similar presentations


Presentation on theme: "Semantic Representation and Formal Transformation kevin knight usc/isi MURI meeting, CMU, Nov 3-4, 2011."— Presentation transcript:

1 Semantic Representation and Formal Transformation kevin knight usc/isi MURI meeting, CMU, Nov 3-4, 2011

2 Machine Translation Phrase-based MT Syntax-based MT Meaning-based MT source string meaning representation target string source string target string source string source tree target tree target string source tree target tree NIST 2009 c2e

3 Meaning-based MT Too big for this MURI: – What content goes into the meaning representation? linguistics, annotation – How are meaning representations probabilistically generated, transformed, scored, ranked? automata theory, efficient algorithms – How can a full MT system be built and tested? engineering, language modeling, features, training

4 Meaning-based MT Too big for this MURI: – What content goes into the meaning representation? linguistics, annotation – How are meaning representations probabilistically generated, transformed, scored, ranked? automata theory, efficient algorithms – How can a full MT system be built and tested? engineering, language modeling, features, training

5 Meaning-based MT Too big for this MURI: – What content goes into the meaning representation? linguistics, annotation – How are meaning representations probabilistically generated, transformed, scored, ranked? automata theory, efficient algorithms – How can a full MT system be built and tested? engineering, language modeling, features, training Language-independent theory. But driven by practical desires.

6 Automata Frameworks How to represent and manipulate linguistic representations? Linguistics, NLP, and Automata Theory used to be together (1960s, 70s) – Context-free grammars were invented to model human language – Tree transducers were invented to model transformational grammar They drifted apart Renewed connections around MT (this century) Role: greatly simplify systems!

7 Finite-State Transducer (FST) k n i g h t q k  q2 *e* q2 n  q N q i  q AY q g  q3 *e* q4 t  qfinal T q3 h  q4 *e* Original input:Transformation: q k n i g h t FST q q2 qfinal q3q4 k : *e* n : N h : *e* g : *e* t : T i : AY

8 Finite-State (String) Transducer q2 n i g h t q k  q2 *e* q2 n  q N q i  q AY q g  q3 *e* q4 t  qfinal T q3 h  q4 *e* Original input:Transformation: k n i g h t FST q q2 qfinal q3q4 k : *e* n : N h : *e* g : *e* t : T i : AY

9 Finite-State (String) Transducer N q i g h t q k  q2 *e* q2 n  q N q i  q AY q g  q3 *e* q4 t  qfinal T q3 h  q4 *e* Original input:Transformation: k n i g h t FST q q2 qfinal q3q4 k : *e* n : N h : *e* g : *e* t : T i : AY

10 Finite-State (String) Transducer q g h t q k  q2 *e* q2 n  q N q i  q AY q g  q3 *e* q4 t  qfinal T q3 h  q4 *e* AY N Original input:Transformation: k n i g h t FST q q2 qfinal q3q4 k : *e* n : N h : *e* g : *e* t : T i : AY

11 Finite-State (String) Transducer q3 h t q k  q2 *e* q2 n  q N q i  q AY q g  q3 *e* q4 t  qfinal T q3 h  q4 *e* AY N Original input:Transformation: k n i g h t FST q q2 qfinal q3q4 k : *e* n : N h : *e* g : *e* t : T i : AY

12 Finite-State (String) Transducer q4 t q k  q2 *e* q2 n  q N q i  q AY q g  q3 *e* q4 t  qfinal T q3 h  q4 *e* AY N Original input:Transformation: k n i g h t FST q q2 qfinal q3q4 k : *e* n : N h : *e* g : *e* t : T i : AY

13 Finite-State (String) Transducer q k  q2 *e* q2 n  q N q i  q AY q g  q3 *e* q4 t  qfinal T q3 h  q4 *e* T qfinal AY N k n i g h t Original input:Transformation: FST q q2 qfinal q3q4 k : *e* n : N h : *e* g : *e* t : T i : AY

14 Transliteration Angela Knight a n ji ra na i to transliteration Frequently occurring translation problem for languages with different sound systems and character sets. (Japanese, Chinese, Arabic, Russian, English…) Can’t be solved by dictionary lookup.

15 Transliteration Angela Knight WFST 7 input symbols13 output symbols

16 Transliteration Angela Knight WFST B WFSA A WFST D AE N J EH L UH N AY T WFST C a n j i r a n a i t o

17 WFST B WFSA A WFST D WFST C a n j i r a n a i t o AE N J IH R UH N AY T AH N J IH L UH N AY T OH + millions more DECODE

18 General-Purpose Algorithms for String Automata N-best …… paths through an WFSA (Viterbi, 1967; Eppstein, 1998) EM trainingForward-backward EM (Baum & Welch, 1971; Eisner 2001) Determinization …… of weighted string acceptors (Mohri, 1997) IntersectionWFSA intersection Applicationstring  WFST  WFSA Transducer compositionWFST composition (Pereira & Riley, 1996) General-purpose toolkitCarmel (Graehl & Knight 97), OpenFST (Google, via AT&T),...

19 S NPVP PRO he VBZ enjoys NP VBG listening VP P to NP SBAR music Original input:Transformation: q S NPVP PRO he VBZ enjoys NP VBG listening VP P to NP SBAR music Top-Down Tree Transducer (W. Rounds 1970; J. Thatcher 1970)

20 S NPVP PRO he VBZ enjoys NP VBG listening VP P to NP SBAR music Original input:Transformation: q S NPVP PRO he VBZ enjoys NP VBG listening VP P to NP SBAR music Top-Down Tree Transducer (W. Rounds 1970; J. Thatcher 1970) q S x0:NPVP  s x0, wa, r x2, ga, q x1 x1:VBZx2:NP 0.2

21 S NPVP PRO he VBZ enjoys NP VBG listening VP P to NP SBAR music Original input:Transformation: s NP PRO he q VBZ enjoys r NP VBG listening VP P to NP SBAR music,, Top-Down Tree Transducer (W. Rounds 1970; J. Thatcher 1970), wa, ga

22 S NPVP PRO he VBZ enjoys NP VBG listening VP P to NP SBAR music Original input:Transformation: s NP PRO he q VBZ enjoys r NP VBG listening VP P to NP SBAR music,, Top-Down Tree Transducer (W. Rounds 1970; J. Thatcher 1970), wa, ga s NP PRO  kare he 0.7

23 S NPVP PRO he VBZ enjoys NP VBG listening VP P to NP SBAR music Original input:Transformation: q VBZ enjoys r NP VBG listening VP P to NP SBAR music, karewa, Top-Down Tree Transducer (W. Rounds 1970; J. Thatcher 1970),, ga

24 S NPVP PRO he VBZ enjoys NP VBG listening VP P to NP SBAR music karekikuongakuowadaisukidesugano Original input:Final output:,,,,,,,, Top-Down Tree Transducer (W. Rounds 1970; J. Thatcher 1970)

25 Top-Down Tree Transducer Introduced by Rounds (1970) & Thatcher (1970) “Recent developments in the theory of automata have pointed to an extension of the domain of definition of automata from strings to trees … parts of mathematical linguistics can be formalized easily in a tree-automaton setting …” (Rounds 1970, “Mappings on Grammars and Trees”, Math. Systems Theory 4(3)) Large theory literature – e.g., Gécseg & Steinby (1984), Comon et al (1997) Once again re-connecting with NLP practice – e.g., Knight & Graehl (2005), Galley et al (2004, 2006), May & Knight (2006, 2010), Maletti et al (2009)

26 Tree Transducers Can be Extracted from Bilingual Data (Galley et al, 04) i felt obliged to do my part 我 有 责任 尽 一份 力 TREE TRANSDUCER RULES: VBD(felt)  有 VBN(obliged)  责任 VB(do)  尽 NN(part)  一份 NN(part)  一份 力 VP-C(x0:VBN x1:SG-C)  x0 x1 VP(TO(to) x0:VP-C)  x0 … S(x0:NP-C x1:VP)  x0 x1 S NP-C VP VP-C VBD SG-C VP VBN TO VP-C VB NP-C NPB PRP PRP$ NN

27 这 7 人 中包括 来自 法国 和 俄罗斯 的 宇航 员. Syntax-Based Decoding

28 这 7 人 中包括 来自 法国 和 俄罗斯 的 宇航 员. RULE 1: DT(these)  这 RULE 2: VBP(include)  中包括 RULE 6: NNP(Russia)  俄罗斯 RULE 4: NNP(France)  法国 RULE 8: NP(NNS(astronauts))  宇航, 员 RULE 5: CC(and)  和 RULE 9: PUNC(.) . “these”“Russia”“astronauts”“.”“include”“France”“and” Syntax-Based Decoding

29 RULE 1: DT(these)  这 RULE 2: VBP(include)  中包括 RULE 6: NNP(Russia)  俄罗斯 RULE 4: NNP(France)  法国 RULE 8: NP(NNS(astronauts))  宇航, 员 RULE 5: CC(and)  和 RULE 9: PUNC(.) . 这 7 人 中包括 来自 法国 和 俄罗斯 的 宇航 员. RULE 13: NP(x0:NNP, x1:CC, x2:NNP)  x0, x1, x2 “France and Russia” “include”“these”“France”“and”“Russia”“astronauts”“.” Syntax-Based Decoding

30 RULE 1: DT(these)  这 RULE 2: VBP(include)  中包括 RULE 6: NNP(Russia)  俄罗斯 RULE 4: NNP(France)  法国 RULE 8: NP(NNS(astronauts))  宇航, 员 RULE 5: CC(and)  和 RULE 9: PUNC(.) . 这 7 人 中包括 来自 法国 和 俄罗斯 的 宇航 员. RULE 13: NP(x0:NNP, x1:CC, x2:NNP)  x0, x1, x2 RULE 11: VP(VBG(coming), PP(IN(from), x0:NP))  来自, x0 “France and Russia” “coming from France and Russia” “these”“Russia”“astronauts”“.”“include”“France”“&” Syntax-Based Decoding

31 RULE 1: DT(these)  这 RULE 2: VBP(include)  中包括 RULE 6: NNP(Russia)  俄罗斯 RULE 4: NNP(France)  法国 RULE 8: NP(NNS(astronauts))  宇航, 员 RULE 5: CC(and)  和 RULE 9: PUNC(.) . 这 7 人 中包括 来自 法国 和 俄罗斯 的 宇航 员. RULE 13: NP(x0:NNP, x1:CC, x2:NNP)  x0, x1, x2 RULE 11: VP(VBG(coming), PP(IN(from), x0:NP))  来自, x0 RULE 16: NP(x0:NP, x1:VP)  x1, 的, x0 “astronauts coming from France and Russia” “France and Russia” “coming from France and Russia” “these”“Russia”“astronauts”“.”“include”“France”“&” Syntax-Based Decoding

32 RULE 1: DT(these)  这 RULE 2: VBP(include)  中包括 RULE 6: NNP(Russia)  俄罗斯 RULE 4: NNP(France)  法国 RULE 8: NP(NNS(astronauts))  宇航, 员 RULE 5: CC(and)  和 RULE 9: PUNC(.) . 这 7 人 中包括 来自 法国 和 俄罗斯 的 宇航 员. RULE 13: NP(x0:NNP, x1:CC, x2:NNP)  x0, x1, x2 RULE 16: NP(x0:NP, x1:VP)  x1, 的, x0 RULE 11: VP(VBG(coming), PP(IN(from), x0:NP))  来自, x0 RULE 14: VP(x0:VBP, x1:NP)  x0, x1 “include astronauts coming from France and Russia” “France and Russia” “coming from France and Russia” “astronauts coming from France and Russia” “these”“Russia”“astronauts”“.”“include”“France”“&”

33 RULE 1: DT(these)  这 RULE 2: VBP(include)  中包括 RULE 6: NNP(Russia)  俄罗斯 RULE 4: NNP(France)  法国 RULE 8: NP(NNS(astronauts))  宇航, 员 RULE 5: CC(and)  和 RULE 9: PUNC(.) . 这 7 人 中包括 来自 法国 和 俄罗斯 的 宇航 员. RULE 10: NP(x0:DT, CD(7), NNS(people)  x0, 7 人 RULE 13: NP(x0:NNP, x1:CC, x2:NNP)  x0, x1, x2 RULE 15: S(x0:NP, x1:VP, x2:PUNC)  x0, x1, x2 RULE 16: NP(x0:NP, x1:VP)  x1, 的, x0 RULE 11: VP(VBG(coming), PP(IN(from), x0:NP))  来自, x0 RULE 14: VP(x0:VBP, x1:NP)  x0, x1 “These 7 people include astronauts coming from France and Russia” “France and Russia” “coming from France and Russia” “astronauts coming from France and Russia” “these 7 people” “include astronauts coming from France and Russia” “these”“Russia”“astronauts”“.”“include”“France”“&”

34 These7peopleincludeastronautscomingfromFranceandRussia. DTCDVBPNNSIN NNPCCNNPPUNC NP VP NP VP S NNSVBG PP NP Derived English Tree

35 General-Purpose Algorithms for Tree Automata String Automata Algorithms Tree Automata Algorithms N-best …… paths through an WFSA (Viterbi, 1967; Eppstein, 1998) … trees in a weighted forest (Jiménez & Marzal, 2000; Huang & Chiang, 2005) EM trainingForward-backward EM (Baum/Welch, 1971; Eisner 2003) Tree transducer EM training (Graehl & Knight, 2004) Determinization …… of weighted string acceptors (Mohri, 1997) … of weighted tree acceptors (Borchardt & Vogler, 2003; May & Knight, 2005) IntersectionWFSA intersectionTree acceptor intersection Applying transducersstring  WFST  WFSAtree  TT  weighted tree acceptor Transducer compositionWFST composition (Pereira & Riley, 1996) Many tree transducers not closed under composition (Maletti et al 09) General-purpose toolsCarmel, OpenFSTTiburon (May & Knight 10)

36 Machine Translation Phrase-based MT Syntax-based MT Meaning-based MT source string meaning representation target string source string target string source string source tree target tree target string source tree target tree

37 Five Equivalent Meaning Representation Formats (w / WANT :agent (b / BOY) :patient (g / GO :agent b))) w, b, g : instance(w, WANT) ^ instance(g, GO) ^ instance(b, BOY) ^ agent(w, b) ^ patient(w, g) ^ agent(g, b) E WANT BOY GO instance agent patient agent ((x0 instance) = WANT ((x1 instance) = BOY ((x2 instance) = GO ((x0 agent) = x1 ((x0 patent) = x2 ((x2 agent) = x1 instance: WANT agent: patient: instance: GO agent: instance: BOY 1 1 LOGICAL FORM PATH EQUATIONS FEATURE STRUCTURE DIRECTED ACYCLIC GRAPH PENMAN “The boy wants to go.”

38 Example “Government forces closed on rebel outposts on Thursday, showering the western mountain city of Zintan with missiles and attacking insurgents holed up near the Tunisian border, according to rebel sources.” (s / say :agent (s2 / source :mod (r / rebel)) :patient (a / and :op1 (c / close-on :agent (f / force :mod (g / government)) :patient (o / outpost :mod (r2 / rebel)) :temporal-locating (t / thursday)) :op2 (s / shower :agent f :patient (c / city :mod (m / mountain) :mod (w / west) :name "Zintan") :instrument (m2 / missile)) :op3 (a / attack :agent f :patient (i / insurgent :agent-of (h / hole-up :pp-near (b / border :gpi (c2 / country :name "Tunisia"] Slogan: “more logical than a parse tree”

39 General-Purpose Algorithms for Feature Structures (Graphs) String Automata Algorithms Tree Automata Algorithms Graph Automata Algorithms? N-best …… paths through an WFSA (Viterbi, 1967; Eppstein, 1998) … trees in a weighted forest (Jiménez & Marzal, 2000; Huang & Chiang, 2005) EM trainingForward-backward EM (Baum/Welch, 1971; Eisner 2003) Tree transducer EM training (Graehl & Knight, 2004) Determinization…… of weighted string acceptors (Mohri, 1997) … of weighted tree acceptors (Borchardt & Vogler, 2003; May & Knight, 2005) IntersectionWFSA intersectionTree acceptor intersection Applying transducers string  WFST  WFSAtree  TT  weighted tree acceptor Transducer composition WFST composition (Pereira & Riley, 1996) Many tree transducers not closed under composition (Maletti et al 09) General toolsCarmel, OpenFSTTiburon (May & Knight 10)

40 Automata Frameworks Hyperedge-replacement graph grammars – (Drewes et al) DAG acceptors – (Hart 75) DAG-to-tree transducers – (Kamimura & Slutski 82)

41 Mapping Between Meaning and Text the boy wants to see WANT BOY SEE instance agent patient agent foreign text

42 Mapping Between Meaning and Text the boy wants to be seen WANT BOY SEE instance agent patient foreign text

43 Mapping Between Meaning and Text the boy wants the girl to be seen WANT BOY SEE instance agent patient GIRL instance foreign text

44 Mapping Between Meaning and Text the boy wants to see the girl WANT BOY SEE instance agent patient GIRL instance agent foreign text

45 Mapping Between Meaning and Text the boy wants to see himself WANT BOY SEE instance agent patient agent foreign text

46 Bottom-Up DAG-to-Tree Transduction (Kamimura & Slutski 82) PROMISE BOY GO instance agent patient recipient GIRL instance agent

47 Bottom-Up DAG-to-Tree Transduction (Kamimura & Slutski 82) PROMISE BOY GO instance agent patient recipient GIRL instance

48 Bottom-Up DAG-to-Tree Transduction (Kamimura & Slutski 82) PROMISE BOY GO instance agent patient recipient instance q.NN | girl

49 Bottom-Up DAG-to-Tree Transduction (Kamimura & Slutski 82) PROMISE GO instance agent patient recipient instance q.NN | girl q.NN | boy

50 Bottom-Up DAG-to-Tree Transduction (Kamimura & Slutski 82) PROMISE instance agent patient recipient instance q.VBZ | goes q.NN | girl q.NN | boy

51 Bottom-Up DAG-to-Tree Transduction (Kamimura & Slutski 82) PROMISE instance agent patient recipient instance q.VB | go q.NN | girl q.NN | boy

52 Bottom-Up DAG-to-Tree Transduction (Kamimura & Slutski 82) instance agent patient recipient q.NN | girl instance q.VB | go q.VBD | promised q.NN | boy

53 Bottom-Up DAG-to-Tree Transduction (Kamimura & Slutski 82) instance agent patient recipient q.VB | go q.VBD | promised q.NP | NN | boy DT | the q. NP | NN | girl DT | the

54 Bottom-Up DAG-to-Tree Transduction (Kamimura & Slutski 82) instance patient recipient q.VB | go q.VBD | promised q.agt NP | NN | boy DT | the q. NP | NN | girl DT | the q.subj NP | NN | boy DT | the

55 Bottom-Up DAG-to-Tree Transduction (Kamimura & Slutski 82) instance patient recipient q.VP / \ TO VB | | to go q.VBD | promised q. NP | NN | girl DT | the q.agt NP | NN | boy DT | the

56 Bottom-Up DAG-to-Tree Transduction (Kamimura & Slutski 82) instance recipient qpatsubj.VP / \ TO VB | | to go q.VBD | promised q.agt NP | NN | boy DT | the q. NP | NN | girl DT | the

57 Bottom-Up DAG-to-Tree Transduction (Kamimura & Slutski 82) instance q.VBD | promised qpatsubj.VP / \ TO VB | | to go q.agt NP | NN | boy DT | the q.rec NP | NN | girl DT | the

58 Bottom-Up DAG-to-Tree Transduction (Kamimura & Slutski 82) VBD | promised VP / \ TO VB | | to go S NP | NN | boy DT | the NP | NN | girl DT | the VP

59 Bottom-Up DAG-to-Tree Transduction (Kamimura & Slutski 82) VBD | promised VP / \ TO VB | | to go S NP | NN | boy DT | the NP | NN | girl DT | the VP PROMISE BOY GO instance agent patient recipient GIRL instance agent

60 Bottom-Up DAG-to-Tree Transduction (Kamimura & Slutski 82) VBD | persuaded S NP | NN | boy DT | the NP | NN | girl DT | the VP PERSUADE BOY GO instance agent patient recipient GIRL instance agent VP / \ TO VB | | to go

61 Bottom-Up DAG-to-Tree Transduction (Kamimura & Slutski 82) VBD | persuaded VP that he would go S NP | NN | boy DT | the NP | NN | girl DT | the VP PERSUADE BOY GO instance agent patient recipient GIRL instance agent

62 General-Purpose Algorithms for Feature Structures String Automata Algorithms Tree Automata Algorithms Graph Automata Algorithms? N-best …… paths through an WFSA (Viterbi, 1967; Eppstein, 1998) … trees in a weighted forest (Jiménez & Marzal, 2000; Huang & Chiang, 2005) EM trainingForward-backward EM (Baum/Welch, 1971; Eisner 2003) Tree transducer EM training (Graehl & Knight, 2004) Determinization…… of weighted string acceptors (Mohri, 1997) … of weighted tree acceptors (Borchardt & Vogler, 2003; May & Knight, 2005) IntersectionWFSA intersectionTree acceptor intersection Applying transducers string  WFST  WFSAtree  TT  weighted tree acceptor Transducer composition WFST composition (Pereira & Riley, 1996) Many tree transducers not closed under composition (Maletti et al 09) General toolsCarmel, OpenFSTTiburon (May & Knight 10)

63 Automata for Statistical MT Used in SMT Automata Framework Devised 19601970198019902000 1990 2000 2010 2020 2010 FST TDTT DAG

64 end


Download ppt "Semantic Representation and Formal Transformation kevin knight usc/isi MURI meeting, CMU, Nov 3-4, 2011."

Similar presentations


Ads by Google