A Trainable Transfer-based MT Approach for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University Joint.

A Trainable Transfer-based MT Approach for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University Joint Work with: Lori Levin, Jaime Carbonell, Katharina Probst, Erik Peterson, Stephan Vogel and Ariadna Font-Llitjos

26 April, 2004EAMT Meeting/ Malta2 Why Machine Translation for Languages with Limited Resources? Commercial MT economically feasible for only a handful of major languages with large resources (corpora, human developers) Statistical MT looks promising – but requires very large volumes of parallel texts Is there hope for MT for languages with limited electronic data resources? Benefits include: –Better government information access to indigenous communities –Better indigenous communities participation in information-rich activities (health care, education, government) without giving up their languages. –Civilian and military applications (disaster relief) –Language preservation

26 April, 2004EAMT Meeting/ Malta3 MT for Languages with Limited Resources: Challenges Minimal amount of parallel text Possibly lack of standards for orthography/spelling Often relatively few trained linguists Access to native bilingual informants possible No real economic incentive, Limited financial resources for developing MT –Need to minimize development time and cost

26 April, 2004EAMT Meeting/ Malta4 AVENUE Partners LanguageCountryInstitutions Mapudungun (in place) Chile Universidad de la Frontera, Institute for Indigenous Studies, Ministry of Education Quechua (discussion) Peru Ministry of Education Aymara (discussion) Bolivia, Peru Ministry of Education

26 April, 2004EAMT Meeting/ Malta5 AVENUE: Two Technical Approaches Generalized EBMT Parallel text 50K- 2MB (uncontrolled corpus) Rapid implementation Proven for major L’s with reduced data Transfer-rule learning Elicitation (controlled) corpus to extract grammatical properties Seeded version- space learning

26 April, 2004EAMT Meeting/ Malta6 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score:0.0470 PP::PP [NP POSTP] -> [PREP NP] ((X2::Y1) (X1::Y2)) Translation Lexicon Run Time Transfer System Lattice Decoder English Language Model Word-to-Word Translation Probabilities Word-aligned elicited data SL input TL output

26 April, 2004EAMT Meeting/ Malta7 Learning Transfer-Rules for Languages with Limited Resources Rationale: –Large bilingual corpora not available –Bilingual native informant(s) can translate and align a small pre-designed elicitation corpus, using elicitation tool –Elicitation corpus designed to be typologically comprehensive and compositional –Transfer-rule engine and new learning approach support acquisition of generalized transfer-rules from the data

26 April, 2004EAMT Meeting/ Malta8 English-Hindi Example

26 April, 2004EAMT Meeting/ Malta9 Spanish-Mapudungun Example

26 April, 2004EAMT Meeting/ Malta10 English-Arabic Example

26 April, 2004EAMT Meeting/ Malta11 The Elicitation Corpus Translated, aligned by bilingual informant Rich information about the sentences elicited Corpus consists of linguistically diverse constructions Based on elicitation and documentation work of field linguists (e.g. Comrie 1977, Bouquiaux 1992) Organized compositionally: elicit simple structures first, then use them as building blocks Goal: minimize size, maximize linguistic coverage Typological EC currently of about ~1000 sentences Work in progress: –Feature Detection –Navigation control through the corpus during elicitation –Extensions to phenomena not currently covered –Experimenting with alternative types of elicited data

26 April, 2004EAMT Meeting/ Malta12 Transfer Rule Formalism Type information Part-of-speech/constituent information Alignments x-side constraints y-side constraints xy-constraints, e.g. ((Y1 AGR) = (X1 AGR)) ; SL: the old man, TL: ha-ish ha-zaqen NP::NP [DET ADJ N] -> [DET N DET ADJ] ( (X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2) ((X1 AGR) = *3-SING) ((X1 DEF = *DEF) ((X3 AGR) = *3-SING) ((X3 COUNT) = +) ((Y1 DEF) = *DEF) ((Y3 DEF) = *DEF) ((Y2 AGR) = *3-SING) ((Y2 GENDER) = (Y4 GENDER)) )

26 April, 2004EAMT Meeting/ Malta13 Transfer Rule Formalism (II) Value constraints Agreement constraints ;SL: the old man, TL: ha-ish ha-zaqen NP::NP [DET ADJ N] -> [DET N DET ADJ] ( (X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2) ((X1 AGR) = *3-SING) ((X1 DEF = *DEF) ((X3 AGR) = *3-SING) ((X3 COUNT) = +) ((Y1 DEF) = *DEF) ((Y3 DEF) = *DEF) ((Y2 AGR) = *3-SING) ((Y2 GENDER) = (Y4 GENDER)) )

26 April, 2004EAMT Meeting/ Malta14 The Transfer Engine Analysis Source text is parsed into its grammatical structure. Determines transfer application ordering. Example: 他看书。 (he read book) S NP VP N V NP 他看书 Transfer A target language tree is created by reordering, insertion, and deletion. S NP VP N V NP he read DET N a book Article “a” is inserted into object NP. Source words translated with transfer lexicon. Generation Target language constraints are checked and final translation produced. E.g. “reads” is chosen over “read” to agree with “he”. Final translation: “He reads a book”

26 April, 2004EAMT Meeting/ Malta15 Rule Learning - Overview Goal: Acquire Syntactic Transfer Rules Use available knowledge from the source side (grammatical structure) Three steps: 1.Flat Seed Generation: first guesses at transfer rules; flat syntactic structure 2.Compositionality: use previously learned rules to add hierarchical structure 3.Seeded Version Space Learning: refine rules by learning appropriate feature constraints

26 April, 2004EAMT Meeting/ Malta16 Flat Seed Rule Generation Learning Example: NP Eng: the big apple Heb: ha-tapuax ha-gadol Generated Seed Rule: NP::NP [ART ADJ N]  [ART N ART ADJ] ((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2))

26 April, 2004EAMT Meeting/ Malta17 Compositionality Initial Flat Rules: S::S [ART ADJ N V ART N]  [ART N ART ADJ V P ART N] ((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2) (X4::Y5) (X5::Y7) (X6::Y8)) NP::NP [ART ADJ N]  [ART N ART ADJ] ((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2)) NP::NP [ART N]  [ART N] ((X1::Y1) (X2::Y2)) Generated Compositional Rule: S::S [NP V NP]  [NP V P NP] ((X1::Y1) (X2::Y2) (X3::Y4))

26 April, 2004EAMT Meeting/ Malta18 Compositionality - Overview Traverse the c-structure of the English sentence, add compositional structure for translatable chunks Adjust constituent sequences, alignments in the transfer rule

26 April, 2004EAMT Meeting/ Malta19 Seeded Version Space Learning Input: Rules and their Example Sets S::S [NP V NP]  [NP V P NP] {ex1,ex12,ex17,ex26} ((X1::Y1) (X2::Y2) (X3::Y4)) NP::NP [ART ADJ N]  [ART N ART ADJ] {ex2,ex3,ex13} ((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2)) NP::NP [ART N]  [ART N] {ex4,ex5,ex6,ex8,ex10,ex11} ((X1::Y1) (X2::Y2)) Output: Rules with Feature Constraints: S::S [NP V NP]  [NP V P NP] ((X1::Y1) (X2::Y2) (X3::Y4) (X1 NUM = X2 NUM) (Y1 NUM = Y2 NUM) (X1 NUM = Y1 NUM))

26 April, 2004EAMT Meeting/ Malta20 Seeded Version Space Learning: Overview Goal: add appropriate feature constraints to the acquired rules Methodology: –Preserve general structural transfer –Learn specific feature constraints from example set Seed rules are grouped into clusters of similar transfer structure (type, constituent sequences, alignments) Each cluster forms a version space: a partially ordered hypothesis space with a specific and a general boundary The seed rules in a group form the specific boundary of a version space The general boundary is the (implicit) transfer rule with the same type, constituent sequences, and alignments, but no feature constraints

26 April, 2004EAMT Meeting/ Malta21 Examples of Automatically Learned Rules (Hindi-to-English) {NP,14244} ;;Score:0.0429 NP::NP [N] -> [DET N] ( (X1::Y2) ) {NP,14434} ;;Score:0.0040 NP::NP [ADJ CONJ ADJ N] -> [ADJ CONJ ADJ N] ( (X1::Y1) (X2::Y2) (X3::Y3) (X4::Y4) ) {PP,4894} ;;Score:0.0470 PP::PP [NP POSTP] -> [PREP NP] ( (X2::Y1) (X1::Y2) )

26 April, 2004EAMT Meeting/ Malta22 Manual Transfer Rules: Hindi Example ;; PASSIVE OF SIMPLE PAST (NO AUX) WITH LIGHT VERB ;; passive of 43 (7b) {VP,28} VP::VP : [V V V] -> [Aux V] ( (X1::Y2) ((x1 form) = root) ((x2 type) =c light) ((x2 form) = part) ((x2 aspect) = perf) ((x3 lexwx) = 'jAnA') ((x3 form) = part) ((x3 aspect) = perf) (x0 = x1) ((y1 lex) = be) ((y1 tense) = past) ((y1 agr num) = (x3 agr num)) ((y1 agr pers) = (x3 agr pers)) ((y2 form) = part) )

26 April, 2004EAMT Meeting/ Malta23 Manual Transfer Rules: Example ; NP1 ke NP2 -> NP2 of NP1 ; Ex: jIvana ke eka aXyAya ; life of (one) chapter ; ==> a chapter of life ; {NP,12} NP::NP : [PP NP1] -> [NP1 PP] ( (X1::Y2) (X2::Y1) ; ((x2 lexwx) = 'kA') ) {NP,13} NP::NP : [NP1] -> [NP1] ( (X1::Y1) ) {PP,12} PP::PP : [NP Postp] -> [Prep NP] ( (X1::Y2) (X2::Y1) ) NP PP NP1 NP P Adj N N1 ke eka aXyAya N jIvana NP NP1 PP Adj N P NP one chapter of N1 N life

26 April, 2004EAMT Meeting/ Malta24 A Limited Data Scenario for Hindi-to-English Conducted during a DARPA “Surprise Language Exercise” (SLE) in June 2003 Put together a scenario with “miserly” data resources: –Elicited Data corpus: 17589 phrases –Cleaned portion (top 12%) of LDC dictionary: ~2725 Hindi words (23612 translation pairs) –Manually acquired resources during the SLE: 500 manual bigram translations 72 manually written phrase transfer rules 105 manually written postposition rules 48 manually written time expression rules No additional parallel text!!

26 April, 2004EAMT Meeting/ Malta25 Manual Grammar Development Covers mostly NPs, PPs and VPs (verb complexes) ~70 grammar rules, covering basic and recursive NPs and PPs, verb complexes of main tenses in Hindi (developed in two weeks)

26 April, 2004EAMT Meeting/ Malta26 Adding a “Strong” Decoder XFER system produces a full lattice of translation fragments, ranging from single words to long phrases or sentences Edges are scored using word-to-word translation probabilities, trained from the limited bilingual data Decoder uses an English LM (70m words) Decoder can also reorder words or phrases (up to 4 positions ahead) For XFER (strong), ONLY edges from basic XFER system are used!

26 April, 2004EAMT Meeting/ Malta27 Testing Conditions Tested on section of JHU provided data: 258 sentences with four reference translations –SMT system (stand-alone) –EBMT system (stand-alone) –XFER system (naïve decoding) –XFER system with “strong” decoder No grammar rules (baseline) Manually developed grammar rules Automatically learned grammar rules –XFER+SMT with strong decoder (MEMT)

26 April, 2004EAMT Meeting/ Malta28 Automatic MT Evaluation Metrics Intend to replace or complement human assessment of translation quality of MT produced translation Principle idea: compare how similar is the MT produced translation with human reference translation(s) of the same input Main metric in use today: IBM’s BLEU –Count n-gram (unigrams, bigrams, trigrams, etc) overlap between the MT output and several reference translations –Calculate a combined n-gram precision score NIST variant of BLEU used for official DARPA evaluations

26 April, 2004EAMT Meeting/ Malta29 Results on JHU Test Set SystemBLEUM-BLEUNIST EBMT0.0580.1654.22 SMT0.0930.1914.64 XFER (naïve) man grammar 0.0550.1774.46 XFER (strong) no grammar 0.1090.2245.29 XFER (strong) learned grammar 0.1160.2315.37 XFER (strong) man grammar 0.1350.2435.59 XFER+SMT0.1360.2435.65

26 April, 2004EAMT Meeting/ Malta30 Effect of Reordering in the Decoder

26 April, 2004EAMT Meeting/ Malta31 Observations and Lessons (I) XFER with strong decoder outperformed SMT even without any grammar rules in the miserly data scenario –SMT Trained on elicited phrases that are very short –SMT has insufficient data to train more discriminative translation probabilities –XFER takes advantage of Morphology Token coverage without morphology: 0.6989 Token coverage with morphology: 0.7892 Manual grammar currently somewhat better than automatically learned grammar –Learned rules did not yet use version-space learning –Large room for improvement on learning rules –Importance of effective well-founded scoring of learned rules

26 April, 2004EAMT Meeting/ Malta32 Observations and Lessons (II) MEMT (XFER and SMT) based on strong decoder produced best results in the miserly scenario. Reordering within the decoder provided very significant score improvements –Much room for more sophisticated grammar rules –Strong decoder can carry some of the reordering “burden”

26 April, 2004EAMT Meeting/ Malta33 XFER MT for Hebrew-to-English Two month intensive effort to apply our XFER approach to the development of a Hebrew-to-English MT system Challenges: –No large parallel corpus –Limited coverage translation lexicon –Rich Morphology: incomplete analyzer available Plan: –Collect available resources, establish methodology for processing Hebrew input –Translate and align Elicitation Corpus –Learn XFER rules –Develop (small) manual XFER grammar as a point of comparison –System debugging and development –Evaluate performance on unseen test data using automatic evaluation metrics

26 April, 2004EAMT Meeting/ Malta34 Hebrew-to-English XFER System Accomplished: –Baseline system in place –Good lexical coverage: 24634 translation pairs –Reasonable morphological coverage –Small manual grammar: 29 rules, mostly NPs –Translated and aligned elicitation corpora –Learning of automatic grammar –Testing and development on dev-test in progress –Results on unseen data within a couple of weeks… Translation Example: in agreement with the interior ministry that copy fund will come to Haaretz agreed hotel homes to do all efforts to remove the african employees from Israel within days from the arrival of the new workers and to let people activities immigration police

26 April, 2004EAMT Meeting/ Malta35 Conclusions Transfer rules (both manual and learned) offer significant contributions that can outperform existing data-driven approaches –Also in medium and large data settings? Initial steps to development of a well-grounded transfer-based MT system with: –Translation segments that are scored based on a well-founded probability model –Strong and effective decoding that incorporates the most advanced techniques used in SMT decoding Working from the “opposite” end of research on incorporating models of syntax into “standard” SMT systems [Knight et al] Our direction makes sense in the limited data scenario

26 April, 2004EAMT Meeting/ Malta36 Future Directions Continued work on automatic rule learning (especially Seeded Version Space Learning) –Use Hebrew and Hindi systems as test platforms for experimenting with advanced learning research Correcting and refining transfer rules by interaction with native bilingual speakers Developing a well-founded model for assigning scores (probabilities) to transfer rules Improving the strong decoder to better fit the specific characteristics of the XFER model Further improved MEMT with: –Combination of output from different translation engines with different scorings – strong decoding capabilities

A Trainable Transfer-based MT Approach for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University Joint.

Similar presentations

Presentation on theme: "A Trainable Transfer-based MT Approach for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University Joint."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Trainable Transfer-based MT Approach for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University Joint.

Similar presentations

Presentation on theme: "A Trainable Transfer-based MT Approach for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University Joint."— Presentation transcript:

Similar presentations

About project

Feedback