Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Trainable Transfer-based MT Approach for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University Joint.

Similar presentations


Presentation on theme: "A Trainable Transfer-based MT Approach for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University Joint."— Presentation transcript:

1 A Trainable Transfer-based MT Approach for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University Joint Work with: Lori Levin, Jaime Carbonell, Katharina Probst, Erik Peterson, Stephan Vogel and Ariadna Font-Llitjos

2 Mar 03, 2004Haifa Univ CS Seminar2 Machine Translation: History MT started in 1950’s, first practical application of computers Failed miserably, recognized as an extremely difficult, “AI-complete” problem MT Revival started in earnest in 1980s Rule-based approaches, requiring 100s of K- years of manual development Economic incentive for developing MT systems for small number of language pairs (mostly European languages)

3 Mar 03, 2004Haifa Univ CS Seminar3 Machine Translation: Where are we today? Age of Internet and Globalization – great demand for MT: –multiple official languages of UN, EU, Canada, (Israel?), etc. –Documentation dissemination for large manufacturers (Microsoft, IBM, Caterpillar) Economic incentive is still primarily within a small number of language pairs Some fairly good commercial products in the market for these language pairs

4 Mar 03, 2004Haifa Univ CS Seminar4 Mi chiamo Alon LavieMy name is Alon Lavie Give-information+personal-data (name=alon_lavie) [ s [ vp accusative_pronoun “chiamare” proper_name]] [ s [ np [possessive_pronoun “name”]] [ vp “be” proper_name]] Direct Transfer Interlingua Analysis Generation Approaches to MT: Vaquois MT Triangle

5 Mar 03, 2004Haifa Univ CS Seminar5 Statistical MT (SMT) Proposed by IBM in early 1990s: a direct, purely statistical, model for MT Statistical translation models are trained on a sentence-aligned translation corpus Attractive: completely automatic, no manual rules, much reduced manual labor Main drawbacks: –Effective only with huge volumes (several mega- words) of parallel text –Very domain-sensitive –Still viable only for small number of language pairs! Impressive progress in last 3-4 years due to large DARPA funding program (TIDES)

6 Mar 03, 2004Haifa Univ CS Seminar6 Why Machine Translation for Minority and Indigenous Languages? Commercial MT economically feasible for only a handful of major languages with large resources (corpora, human developers) Is there hope for MT for languages with limited resources? Benefits include: –Better government access to indigenous communities (Epidemics, crop failures, etc.) –Better indigenous communities participation in information-rich activities (health care, education, government) without giving up their languages. –Language preservation –Civilian and military applications (disaster relief)

7 Mar 03, 2004Haifa Univ CS Seminar7 MT for Minority and Indigenous Languages: Challenges Minimal amount of parallel text Possibly competing standards for orthography/spelling Often relatively few trained linguists Access to native informants possible Need to minimize development time and cost

8 Mar 03, 2004Haifa Univ CS Seminar8 AVENUE Partners LanguageCountryInstitutions Mapudungun (in place) Chile Universidad de la Frontera, Institute for Indigenous Studies, Ministry of Education Quechua (discussion) Peru Ministry of Education Iñupiaq (discussion) US (Alaska) Ilisagvik College, Barrow school district, Alaska Rural Systemic Initiative, Trans-Arctic and Antarctic Institute, Alaska Native Language Center Aymara (discussion) Bolivia, Peru Ministry of Education

9 Mar 03, 2004Haifa Univ CS Seminar9 AVENUE: Two Technical Approaches Generalized EBMT Parallel text 50K- 2MB (uncontrolled corpus) Rapid implementation Proven for major L’s with reduced data Transfer-rule learning Elicitation (controlled) corpus to extract grammatical properties Seeded version- space learning

10 Mar 03, 2004Haifa Univ CS Seminar10 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score:0.0470 PP::PP [NP POSTP] -> [PREP NP] ((X2::Y1) (X1::Y2)) Translation Lexicon Run Time Transfer System Lattice Decoder English Language Model Word-to-Word Translation Probabilities Word-aligned elicited data

11 Mar 03, 2004Haifa Univ CS Seminar11 Learning Transfer-Rules for Languages with Limited Resources Rationale: –Large bilingual corpora not available –Bilingual native informant(s) can translate and align a small pre-designed elicitation corpus, using elicitation tool –Elicitation corpus designed to be typologically comprehensive and compositional –Transfer-rule engine and new learning approach support acquisition of generalized transfer-rules from the data

12 Mar 03, 2004Haifa Univ CS Seminar12 English-Chinese Example

13 Mar 03, 2004Haifa Univ CS Seminar13 English-Hindi Example

14 Mar 03, 2004Haifa Univ CS Seminar14 Spanish-Mapudungun Example

15 Mar 03, 2004Haifa Univ CS Seminar15 English-Arabic Example

16 Mar 03, 2004Haifa Univ CS Seminar16 The Elicitation Corpus Translated, aligned by bilingual informant Corpus consists of linguistically diverse constructions Based on elicitation and documentation work of field linguists (e.g. Comrie 1977, Bouquiaux 1992) Organized compositionally: elicit simple structures first, then use them as building blocks Goal: minimize size, maximize linguistic coverage

17 Mar 03, 2004Haifa Univ CS Seminar17 Transfer Rule Formalism Type information Part-of-speech/constituent information Alignments x-side constraints y-side constraints xy-constraints, e.g. ((Y1 AGR) = (X1 AGR)) ; SL: the old man, TL: ha-ish ha-zaqen NP::NP [DET ADJ N] -> [DET N DET ADJ] ( (X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2) ((X1 AGR) = *3-SING) ((X1 DEF = *DEF) ((X3 AGR) = *3-SING) ((X3 COUNT) = +) ((Y1 DEF) = *DEF) ((Y3 DEF) = *DEF) ((Y2 AGR) = *3-SING) ((Y2 GENDER) = (Y4 GENDER)) )

18 Mar 03, 2004Haifa Univ CS Seminar18 Transfer Rule Formalism (II) Value constraints Agreement constraints ;SL: the old man, TL: ha-ish ha-zaqen NP::NP [DET ADJ N] -> [DET N DET ADJ] ( (X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2) ((X1 AGR) = *3-SING) ((X1 DEF = *DEF) ((X3 AGR) = *3-SING) ((X3 COUNT) = +) ((Y1 DEF) = *DEF) ((Y3 DEF) = *DEF) ((Y2 AGR) = *3-SING) ((Y2 GENDER) = (Y4 GENDER)) )

19 Mar 03, 2004Haifa Univ CS Seminar19 The Transfer Engine Analysis Source text is parsed into its grammatical structure. Determines transfer application ordering. Example: 他 看 书。 (he read book) S NP VP N V NP 他 看 书 Transfer A target language tree is created by reordering, insertion, and deletion. S NP VP N V NP he read DET N a book Article “a” is inserted into object NP. Source words translated with transfer lexicon. Generation Target language constraints are checked and final translation produced. E.g. “reads” is chosen over “read” to agree with “he”. Final translation: “He reads a book”

20 Mar 03, 2004Haifa Univ CS Seminar20 Rule Learning - Overview Goal: Acquire Syntactic Transfer Rules Use available knowledge from the source side (grammatical structure) Three steps: 1.Flat Seed Generation: first guesses at transfer rules; flat syntactic structure 2.Compositionality: use previously learned rules to add hierarchical structure 3.Seeded Version Space Learning: refine rules by learning appropriate feature constraints

21 Mar 03, 2004Haifa Univ CS Seminar21 Flat Seed Rule Generation Learning Example: NP Eng: the big apple Heb: ha-tapuax ha-gadol Generated Seed Rule: NP::NP [ART ADJ N]  [ART N ART ADJ] ((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2))

22 Mar 03, 2004Haifa Univ CS Seminar22 Flat Seed Generation Create a transfer rule that is specific to the sentence pair, but abstracted to the POS level. No syntactic structure. ElementSource SL POS sequencef-structure TL POS sequenceTL dictionary, aligned SL words Type informationcorpus, same on SL and TL Alignmentsinformant x-side constraintsf-structure y-side constraintsTL dictionary, aligned SL words (list of projecting features)

23 Mar 03, 2004Haifa Univ CS Seminar23 Compositionality Initial Flat Rules: S::S [ART ADJ N V ART N]  [ART N ART ADJ V P ART N] ((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2) (X4::Y5) (X5::Y7) (X6::Y8)) NP::NP [ART ADJ N]  [ART N ART ADJ] ((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2)) NP::NP [ART N]  [ART N] ((X1::Y1) (X2::Y2)) Generated Compositional Rule: S::S [NP V NP]  [NP V P NP] ((X1::Y1) (X2::Y2) (X3::Y4))

24 Mar 03, 2004Haifa Univ CS Seminar24 Compositionality - Overview Traverse the c-structure of the English sentence, add compositional structure for translatable chunks Adjust constituent sequences, alignments Remove unnecessary constraints, i.e. those that are contained in the lower- level rule

25 Mar 03, 2004Haifa Univ CS Seminar25 Seeded Version Space Learning Input: Rules and their Example Sets S::S [NP V NP]  [NP V P NP] {ex1,ex12,ex17,ex26} ((X1::Y1) (X2::Y2) (X3::Y4)) NP::NP [ART ADJ N]  [ART N ART ADJ] {ex2,ex3,ex13} ((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2)) NP::NP [ART N]  [ART N] {ex4,ex5,ex6,ex8,ex10,ex11} ((X1::Y1) (X2::Y2)) Output: Rules with Feature Constraints: S::S [NP V NP]  [NP V P NP] ((X1::Y1) (X2::Y2) (X3::Y4) (X1 NUM = X2 NUM) (Y1 NUM = Y2 NUM) (X1 NUM = Y1 NUM))

26 Mar 03, 2004Haifa Univ CS Seminar26 Seeded Version Space Learning: Overview Goal: add appropriate feature constraints to the acquired rules Methodology: –Preserve general structural transfer –Learn specific feature constraints from example set Seed rules are grouped into clusters of similar transfer structure (type, constituent sequences, alignments) Each cluster forms a version space: a partially ordered hypothesis space with a specific and a general boundary The seed rules in a group form the specific boundary of a version space The general boundary is the (implicit) transfer rule with the same type, constituent sequences, and alignments, but no feature constraints

27 Mar 03, 2004Haifa Univ CS Seminar27 Seeded Version Space Learning: Generalization The partial order of the version space: Definition: A transfer rule tr 1 is strictly more general than another transfer rule tr 2 if all f- structures that are satisfied by tr 2 are also satisfied by tr 1. Generalize rules by merging them: –Deletion of constraint –Raising two value constraints to an agreement constraint, e.g. ((x1 num) = *pl), ((x3 num) = *pl)  ((x1 num) = (x3 num))

28 Mar 03, 2004Haifa Univ CS Seminar28 Seeded Version Space Learning NP v det nNP VP … 1.Group seed rules into version spaces as above. 2.Make use of partial order of rules in version space. Partial order is defined via the f-structures satisfying the constraints. 3.Generalize in the space by repeated merging of rules: 1.Deletion of constraint 2.Moving value constraints to agreement constraints, e.g. ((x1 num) = *pl), ((x3 num) = *pl)  ((x1 num) = (x3 num) 4. Check translation power of generalized rules against sentence pairs

29 Mar 03, 2004Haifa Univ CS Seminar29 Seeded Version Space Learning: The Search The Seeded Version Space algorithm itself is the repeated generalization of rules by merging A merge is successful if the set of sentences that can correctly be translated with the merged rule is a superset of the union of sets that can be translated with the unmerged rules, i.e. check power of rule Merge until no more successful merges

30 Mar 03, 2004Haifa Univ CS Seminar30 Seeded VSL: Some Open Issues Three types of constraints: –X-side constrain applicability of rule –Y-side assist in generation –X-Y transfer features from SL to TL Which of the three types improves translation performance? –Use rules without features to populate lattice, decoder will select the best translation… –Learn only X-Y constraints, based on list of universal projecting features Other notions of version-spaces of feature constraints: –Current feature learning is specific to rules that have identical transfer components –Important issue during transfer is to disambiguate among rules that have same SL side but different TL side – can we learn effective constraints for this?

31 Mar 03, 2004Haifa Univ CS Seminar31 Examples of Learned Rules (Hindi-to-English) {NP,14244} ;;Score:0.0429 NP::NP [N] -> [DET N] ( (X1::Y2) ) {NP,14434} ;;Score:0.0040 NP::NP [ADJ CONJ ADJ N] -> [ADJ CONJ ADJ N] ( (X1::Y1) (X2::Y2) (X3::Y3) (X4::Y4) ) {PP,4894} ;;Score:0.0470 PP::PP [NP POSTP] -> [PREP NP] ( (X2::Y1) (X1::Y2) )

32 Mar 03, 2004Haifa Univ CS Seminar32 Manual Transfer Rules: Hindi Example ;; PASSIVE OF SIMPLE PAST (NO AUX) WITH LIGHT VERB ;; passive of 43 (7b) {VP,28} VP::VP : [V V V] -> [Aux V] ( (X1::Y2) ((x1 form) = root) ((x2 type) =c light) ((x2 form) = part) ((x2 aspect) = perf) ((x3 lexwx) = 'jAnA') ((x3 form) = part) ((x3 aspect) = perf) (x0 = x1) ((y1 lex) = be) ((y1 tense) = past) ((y1 agr num) = (x3 agr num)) ((y1 agr pers) = (x3 agr pers)) ((y2 form) = part) )

33 Mar 03, 2004Haifa Univ CS Seminar33 Manual Transfer Rules: Example ; NP1 ke NP2 -> NP2 of NP1 ; Ex: jIvana ke eka aXyAya ; life of (one) chapter ; ==> a chapter of life ; {NP,12} NP::NP : [PP NP1] -> [NP1 PP] ( (X1::Y2) (X2::Y1) ; ((x2 lexwx) = 'kA') ) {NP,13} NP::NP : [NP1] -> [NP1] ( (X1::Y1) ) {PP,12} PP::PP : [NP Postp] -> [Prep NP] ( (X1::Y2) (X2::Y1) ) NP PP NP1 NP P Adj N N1 ke eka aXyAya N jIvana NP NP1 PP Adj N P NP one chapter of N1 N life

34 Mar 03, 2004Haifa Univ CS Seminar34 A Limited Data Scenario for Hindi-to-English Conducted during a DARPA “Surprise Language Exercise” (SLE) in June 2003 Put together a scenario with “miserly” data resources: –Elicited Data corpus: 17589 phrases –Cleaned portion (top 12%) of LDC dictionary: ~2725 Hindi words (23612 translation pairs) –Manually acquired resources during the SLE: 500 manual bigram translations 72 manually written phrase transfer rules 105 manually written postposition rules 48 manually written time expression rules No additional parallel text!!

35 Mar 03, 2004Haifa Univ CS Seminar35 Manual Grammar Development Covers mostly NPs, PPs and VPs (verb complexes) ~70 grammar rules, covering basic and recursive NPs and PPs, verb complexes of main tenses in Hindi (developed in two weeks)

36 Mar 03, 2004Haifa Univ CS Seminar36 Adding a “Strong” Decoder XFER system produces a full lattice of translation fragments, ranging from single words to long phrases or sentences Edges are scored using word-to-word translation probabilities, trained from the limited bilingual data Decoder uses an English LM (70m words) Decoder can also reorder words or phrases (up to 4 positions ahead) For XFER (strong), ONLY edges from basic XFER system are used!

37 Mar 03, 2004Haifa Univ CS Seminar37 Testing Conditions Tested on section of JHU provided data: 258 sentences with four reference translations –SMT system (stand-alone) –EBMT system (stand-alone) –XFER system (naïve decoding) –XFER system with “strong” decoder No grammar rules (baseline) Manually developed grammar rules Automatically learned grammar rules –XFER+SMT with strong decoder (MEMT)

38 Mar 03, 2004Haifa Univ CS Seminar38 Automatic MT Evaluation Metrics Intends to replace or complement human assessment of translation quality of MT produced translation Principle idea: compare how similar is the MT produced translation with human translation(s) of the same input Main metric in use today: IBM’s BLEU –Count n-gram (unigrams, bigrams, trigrams, etc) overlap between the MT output and several reference translations –Calculate a combined n-gram precision score NIST variant of BLEU used for official DARPA evaluations

39 Mar 03, 2004Haifa Univ CS Seminar39 Results on JHU Test Set SystemBLEUM-BLEUNIST EBMT0.0580.1654.22 SMT0.0930.1914.64 XFER (naïve) man grammar 0.0550.1774.46 XFER (strong) no grammar 0.1090.2245.29 XFER (strong) learned grammar 0.1160.2315.37 XFER (strong) man grammar 0.1350.2435.59 XFER+SMT0.1360.2435.65

40 Mar 03, 2004Haifa Univ CS Seminar40 Effect of Reordering in the Decoder

41 Mar 03, 2004Haifa Univ CS Seminar41 Observations and Lessons (I) XFER with strong decoder outperformed SMT even without any grammar rules in the miserly data scenario –SMT Trained on elicited phrases that are very short –SMT has insufficient data to train more discriminative translation probabilities –XFER takes advantage of Morphology Token coverage without morphology: 0.6989 Token coverage with morphology: 0.7892 Manual grammar currently somewhat better than automatically learned grammar –Learned rules did not yet use version-space learning –Large room for improvement on learning rules –Importance of effective well-founded scoring of learned rules

42 Mar 03, 2004Haifa Univ CS Seminar42 Observations and Lessons (II) MEMT (XFER and SMT) based on strong decoder produced best results in the miserly scenario. Reordering within the decoder provided very significant score improvements –Much room for more sophisticated grammar rules –Strong decoder can carry some of the reordering “burden”

43 Mar 03, 2004Haifa Univ CS Seminar43 XFER MT for Hebrew-to-English Two month intensive effort to apply our XFER approach to the development of a Hebrew-to- English MT system Challenges: –No large parallel corpus –Only limited coverage translation lexicon –Morphology: incomplete analyzer available Plan: –Collect available resources, establish methodology for processing Hebrew input –Translate and align Elicitation Corpus –Learn XFER rules –Develop (small) manual XFER grammar as a point of comparison –Evaluate performance on unseen test data using automatic evaluation metrics

44 Mar 03, 2004Haifa Univ CS Seminar44 Hebrew-to-English XFER System First end-to-end integration of system completed yesterday (March 2 nd ) No transfer rules yet, just word-to-word Hebrew-to-English translation No strong decoding yet Amusing Example: office brains the government crack H$BW& in committee the elections the central et the possibility conduct poll crowd about TWKNIT the NSIGH from goat

45 Mar 03, 2004Haifa Univ CS Seminar45 Conclusions Transfer rules (both manual and learned) offer significant contributions that can complement existing data-driven approaches –Also in medium and large data settings? Initial steps to development of a statistically grounded transfer-based MT system with: –Rules that are scored based on a well-founded probability model –Strong and effective decoding that incorporates the most advanced techniques used in SMT decoding Working from the “opposite” end of research on incorporating models of syntax into “standard” SMT systems [Knight et al] Our direction makes sense in the limited data scenario

46 Mar 03, 2004Haifa Univ CS Seminar46 Future Directions Continued work on automatic rule learning (especially Seeded Version Space Learning) Improved leveraging from manual grammar resources, interaction with bilingual speakers Developing a well-founded model for assigning scores (probabilities) to transfer rules Improving the strong decoder to better fit the specific characteristics of the XFER model MEMT with improved –Combination of output from different translation engines with different scorings – strong decoding capabilities


Download ppt "A Trainable Transfer-based MT Approach for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University Joint."

Similar presentations


Ads by Google