Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pre-processing Tasks for Rule- Based English-Korean Machine Translation System Sung-Dong Kim, Dept. of Computer Engineering, Hansung University, Seoul,

Similar presentations


Presentation on theme: "Pre-processing Tasks for Rule- Based English-Korean Machine Translation System Sung-Dong Kim, Dept. of Computer Engineering, Hansung University, Seoul,"— Presentation transcript:

1 Pre-processing Tasks for Rule- Based English-Korean Machine Translation System Sung-Dong Kim, Dept. of Computer Engineering, Hansung University, Seoul, Republic of Korea

2 Contents Abstract Introduction Related Works Pre-processing Tasks Experiments Conclusion Hansung University, Seoul, Korea2

3 Abstract English-Korean Machine Translation Language differences between English and Korean Rule-based EKMT Lexical, syntactic, transfer rules  limitations Pre-processing tasks Pre-processing problem + solution Classification of preprocessing tasks Experiments Usefulness of the defined pre-processing tasks Hansung University, Seoul, Korea3

4 1. Introduction (1) Difficult problems in English-Korean machine translation Long sentences Sentences with special patterns Sentences including non-word elements (parentheses, quotation marks, list markers, …) Hansung University, Seoul, Korea4 EKMT System (SmarTran) Rule-based translation Idiom approach Sentence segmentation

5 1. Introduction (2) Hansung University, Seoul, Korea5 Pre-processing Problems Sentence with pre-processing problem Solution Sentence appropriate to MT system

6 2. Related Works (1) Definition of the functions of pre-processor for EKMT (Yuh et al., 1996) Hyphenated words Abbreviations Special symbols Composition words (multi-word numeric expression, geographical names, organization names, …) All for word level pre-processing  before lexical analysis Hansung University, Seoul, Korea6

7 2. Related Works (2) For long sentence translation Sentence segmentation (Kim et al., 2001) Comma rewriting (Kim et al., 2008) Phrase/sentence level pre-processing This paper, Search necessary problems in word, phrase, sentence levels Design solutions for the problems Rearrange the pre-processing tasks Hansung University, Seoul, Korea7

8 3. Pre-processing Tasks (1) SmarTran EKMT system Hansung University, Seoul, Korea8

9 3. Pre-processing Tasks (2) Tasks before lexical analysis Hansung University, Seoul, Korea9 I have two books : Computer Graphics and Architecture I have two booksComputer Graphics and Architecture “Next year, I may evaluate it”, said Stan Guest. Next year, I may evaluate itQ1, said Stan Guest

10 3. Pre-processing Tasks (3) Tasks before lexical analysis (cont.) Hansung University, Seoul, Korea10 I.Computer architecture II.Artificial intelligence Computer architecture Artificial intelligence My balance was $ 10,000 last year.My balance was [$ 10,000] last year. He says as if he had been there. He says [as if] he had been there. [as long as], [as well as], [even if], [even though], [because of], [in front of], ….

11 3. Pre-processing Tasks (4) Tasks before lexical analysis (cont.) Hansung University, Seoul, Korea11 Specify the keywords so that only one address space is selected. Specify the keywords [, so] only one address space is selected. He will join the board as a nonexecutive director Nov. 29. He will join the board as a nonexecutive director [Nov. 29].

12 3. Pre-processing Tasks (5) Tasks after lexical analysis Hansung University, Seoul, Korea12 Pierre Vinken, 61 years old, will join the board [Pierre Vinken, 61 years old,] will join the board I live in Brynmawr, PA. I live in [Brynmawr, PA.]

13 3. Pre-processing Tasks (6) Tasks after lexical analysis (cont.) Hansung University, Seoul, Korea13 But for now, they're looking forward to their winter meeting they're looking forward to their winter meeting But for now He hopes not only to make certain tasks easier but also to transform the way He hopes to make certain tasks easier and to transform the way

14 3. Pre-processing Tasks (7) Tasks after lexical analysis (cont.) Hansung University, Seoul, Korea14 I need small, fast computer I need small and fast computer

15 3. Pre-processing Tasks (8) Tasks after sentence segmentation Hansung University, Seoul, Korea15 It was just another one of the risk factors that led to the company's decision to withdraw from the bidding It was just another one of the risk factors led to the company's decision to withdraw from the bidding

16 3. Pre-processing Tasks (9) Tasks after sentence segmentation (cont.) Hansung University, Seoul, Korea16 “We continue to improve …”, a Morgan Stanley official saidWe continue to improve…Q1, a Morgan Stanley official saidA Morgan Stanley official said Q1

17 3. Pre-processing Tasks (10) Tasks after sentence segmentation (cont.) Hansung University, Seoul, Korea17 Radio Free Europe, in face, is in danger of suffering from its success Radio Free Europe is in danger of suffering from its success Insertion modifier: in fact

18 3. Pre-processing Tasks (11) Corresponding post-processing Combination of translation results for each translation unit  split by non-word elements Append of translation results for translation units enclosed by quotation marks or parentheses Translation of combined date and name-age words using translation patterns Combination of partial parsing results using split information provided by split pattern … Hansung University, Seoul, Korea18

19 3. Pre-processing Tasks (12) SmarTran EKMT system Hansung University, Seoul, Korea19

20 4. Experiments (1) Search sentences with defined pre-processing problems from PENN Treebank WSJ: 53, 838 Brown: 50,440 ECTB: 3,825 IBM: 4,404 Hansung University, Seoul, Korea20

21 4. Experiments (2) Statistics for sentences with pre-processing problems (%) Hansung University, Seoul, Korea21 WSJBrownECTBIBM Before lexical analysis 51.840.634.211.7 After lexical analysis 17.71312.73 After sentence segmentation 29.116.915.31.4 Total* 64.448.253.122.2 Total*: % of sentences with one or more pre-processing problems

22 5. Conclusion (1) This paper, Pre-processing tasks in English-Korean machine translation Classification of tasks based on the time of pre-processing Hansung University, Seoul, Korea22 Pre-processing solutions Sentence split Symbol/words deletion Word conversion Combination of words Rewriting (words, phrases, commas) Segment removal Sentence elements repositioning

23 5. Conclusion (2) Further works Representation of patterns and other information Verification of proposed solutions Degree of translation quality improvement by the pre-processing Hansung University, Seoul, Korea23

24 Hansung University, Seoul, Korea24 Q/A


Download ppt "Pre-processing Tasks for Rule- Based English-Korean Machine Translation System Sung-Dong Kim, Dept. of Computer Engineering, Hansung University, Seoul,"

Similar presentations


Ads by Google