Presentation is loading. Please wait.

Presentation is loading. Please wait.

Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.

Similar presentations


Presentation on theme: "Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi."— Presentation transcript:

1 Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi

2 Overview of Kyoto-U System Translation Examples J: 図書館で新聞を読む E: I read a newspaper in the library J: 政治の本が売れ残っている E: A book in politics was left on the shelf ・・・・・

3 本 が 売れ残って いる 政治 の a book in politics was left on the shelf 図書館 で 新聞 を 読む I read a newspaper in the library library in newspaper ACC read politics in book NOM left unsold Overview of Kyoto-U System Translation Examples

4 Input: 図書館で政治の 本を読む。 Output: I read a book in politics in the library 本 が 売れ残って いる 政治 の a book in politics was left on the shelf 図書館 で 新聞 を 読む I read a newspaper in the library ・・・・・ 図書館 で 本 を 読む 政治 の read book ACC politics in library in a book in politics in the library I read Overview of Kyoto-U System Translation Examples

5 Input: 図書館で政治の 本を読む。 Output: I read a book in politics in the library 本 が 売れ残って いる 政治 の a book in politics was left on the shelf 図書館 で 新聞 を 読む I read a newspaper in the library ・・・・・ 図書館 で 本 を 読む 政治 の read book ACC politics in library in a book in politics in the library I read Overview of Kyoto-U System Translation Examples

6 Alignment

7 J: 交差点で、突然あの車が 飛び出して来たのです。 E : The car came at me from the side at the intersection.

8 Alignment 交差 点 で 、点 で 、 突然 あの 車 が車 が 飛び出して 来た のです the car came at me from the side at the intersection 1.Transformation into dependency structure J: JUMAN/KNP E: Charniak’s nlparser → Dependency tree

9 Alignment 交差 点 で 、点 で 、 突然 あの 車 が車 が 飛び出して 来た のです the car came at me from the side at the intersection 1.Transformation into dependency structure 2.Detection of word(s) correspondences

10 Finding Correspondences Bilingual dictionaries (500K entries) Substring co-occurrence (Cromieres 2006) Numeral normalization 二百十六万 → 2,160,000 ← 2.16 million Transliteration (Katakana words, NEs) ローズワイン → rosuwain ⇔ rose wine (similarity:0.78) 新宿 → shinjuku ⇔ shinjuku (similarity:1.0)

11 Alignment 交差 点 で 、点 で 、 突然 あの 車 が車 が 飛び出して 来た のです the car came at me from the side at the intersection 1.Transformation into dependency structure 2.Detection of word(s) correspondences 3.Disambiguation of correspondences

12 Alignment 交差 点 で 、点 で 、 突然 あの 車 が車 が 飛び出して 来た のです the car came at me from the side at the intersection 1.Transformation into dependency structure 2.Detection of word(s) correspondences 3.Disambiguation of correspondences 4.Handling of remaining phrases Extension to leaf-nodes

13 Alignment 交差 点 で 、点 で 、 突然 あの 車 が車 が 飛び出して 来た のです the car came at me from the side at the intersection 1.Transformation into dependency structure 2.Detection of word(s) correspondences 3.Disambiguation of correspondences 4.Handling of remaining phrases 5.Registration to translation example database

14 Alignment Ambiguities you will have to file insurance an claim insurance with the office in Japan 日本 で 保険 会社 に 対して 保険 請求 の 申し立て が 可能です よ [in Japan] [insurance] [of claim] [to the company] [file] [be able to]

15 Alignment: Consistency Near Far

16 For each pair of candidates a i and a j calculate the J-side distance d J and the E-side distance d E Give a consistency score to the pair based on d J and d E Calculate consistency scores for all the pairs in a possible set of alignment candidates

17 Baseline Distance of Each Branch: 1 Consistency Score: 1/1+1/2=1.5 … … …

18 Consistency Score The frequency of distance pair in gold-standard alignment data (Mainichi newspaper 40K sentence pairs) [Uchimoto04] Frequency (log) Dist of J-Side Dist of E-Side

19 Distance based on Dependency Type you will have to file insurance an claim insurance with the office in Japan 日本 で 保険 会社 に 対して 保険 請求 の 申し立て が 可能です よ デ格 文節内 連用 文節内 ノ格 ガ格 NP NN PP NN PP 3 1 1 3 2 3 3 3 3 3 1 1 [in Japan] [insurance] [of claim] [to the company] [file] [be able to]

20 you will have to file insurance an claim insurance with the office in Japan 日本 で 保険 会社 に 対して 保険 請求 の 申し立て が 可能です よ デ格 文節内 連用 文節内 ノ格 ガ格 NP NN PP NN PP 3 1 1 3 2 3 3 3 3 3 1 1 [in Japan] [insurance] [of claim] [to the company] [file] [be able to] Distance based on Dependency Type

21 you will have to file insurance an claim insurance with the office in Japan 日本 で 保険 会社 に 対して 保険 請求 の 申し立て が 可能です よ 3 1 1 3 2 3 3 3 3 1 1 デ格 文節内 連用 文節内 ノ格 ガ格 NP NN PP NN PP 3 [in Japan] [insurance] [of claim] [to the company] [file] [be able to] Distance based on Dependency Type

22 Example of Alignment Improvement Proposed modelWord-base alignment

23 Translation

24 Input: 図書館で政治の 本を読む。 Output: I read a book in politics in the library 本 が 売れ残って いる 政治 の a book in politics was left on the shelf 図書館 で 新聞 を 読む I read a newspaper in the library ・・・・・ 図書館 で 本 を 読む 政治 の read book ACC politics in library in a book in politics in the library I read Translation Translation Examples

25 Selection of Translation Examples Score for an example 1.Size of an example 2.Similarity of neighboring nodes 3.Translation probability Beam search from the root of the input [Sato 91]

26 Input: 図書館 で 本 を 読む 政治 の read bookACC politics in library in 読む a newspaper I read a newspaper in the library I study in the library I read a newspaper in the library 0.7 Translation example: 新聞 を 図書館 で

27 Input: 図書館で政治の 本を読む。 本 が 売れ残って いる 政治 の a book in politics was left on the shelf 図書館 で 新聞 を 読む I read a newspaper in the library ・・・・・ 図書館 で 本 を 読む 政治 の read book ACC politics in library in a book in politics in the library I read Combination of TMs Translation Examples

28 ┌ 記録 ┌ 領域 で の ├ 変形 ┌ 形状 と , │ ┌ 記録 ├ 特性 の ┌ 関係 を 調べた 。 ┌ the relationship ││ ┌ deformation ││┌ shape and │││ │ ┌ recording │││ └ in the region ││ ├ recording │└ between characteristics was examined Input Dependency Tree Input :記録領域での変形形状と,記録特性の関係を調べた。 Output Dependency Tree ┌ 状況 を 調べた 。 ┌ the situation was examined ┌ 相互 ┌ 作用 と │┌ 記録 ├ 特性 の ┌ 関係 を 調べた 。 ┌ the relationship ││┌ interaction and ││├ recording │└ between characteristics was investigated ┌ 大変 ┌ 形 ┌ 領域 で の ├ 断面 ┌ 形状 を 模擬 した ┌ cross-sectional ┌ shape ││ ┌ large ││┌ deformation │└ in the region was └ simulated ┌ 記録 領域 の ┌ recording of the areas ┌ 変形 パターン を ┌ deformation the pattern Translation Examples Output : The relationship between deformation shape in the recording region and recording characteristics was examined.

29 Evaluation Results and Discussion

30 BLEUAdequacyFluencyAverage 27.20NTT3.81tsbmt4.02Japio3.88tsbmt 27.14moses3.71Japio3.94tsbmt3.86Japio 27.14MIT3.15MIT3.66MIT3.40MIT 25.48NAIST-NTT2.96NTT3.65NTT3.30NTT 24.79NICT-ATR2.85Kyoto-U3.55moses3.18moses 24.49KLE2.81moses3.44tori3.10Kyoto-U 23.10tsbmt2.66NAIST-NTT3.43NAIST-NTT3.04NAIST-NTT 22.29tori2.59KLE3.35Kyoto-U3.01tori 21.57Kyoto-U2.58tori3.28HIT22.94KLE 19.93mibel2.47NICT-ATR3.28KLE2.86HIT2 19.48HIT22.44HIT23.09mibel2.78NICT-ATR 19.46Japio2.38mibel3.08NICT-ATR2.74mibel 15.90TH1.87TH2.42 FDU-MCandWI 2.13TH 9.55 FDU-MCandWI 1.75 FDU-MCandWI 2.39TH2.08 FDU-MCandWI 1.41NTNU1.08NTNU1.04NTNU1.06NTNU Intrinsic J-E Evaluation Result

31 BLEUAdequacyFluencyAverage 30.58moses3.53tsbmt3.69moses3.60tsbmt 29.15NICT-ATR2.90moses3.67tsbmt3.30moses 28.07NTT2.74NTT3.54NTT3.14NTT 22.65Kyoto-U2.59NICT-ATR3.20NICT-ATR2.89NICT-ATR 17.46tsbmt2.42Kyoto-U2.54Kyoto-U2.48Kyoto-U Intrinsic E-J Evaluation Result

32 Not caring whether a child node is a pre- child or post-child –Resulting target structure goes wrong After resolving this defect, BLEU score in EJ translation rose to 24.02 from 22.65 Critical Defect in EJ Translation BLEUAdequacyFluencyAverage 30.58moses3.53tsbmt3.69moses3.60tsbmt 29.15NICT-ATR2.90moses3.67tsbmt3.30moses 28.07NTT2.74NTT3.54NTT3.14NTT 22.65Kyoto-U2.59NICT-ATR3.20NICT-ATR2.89NICT-ATR 17.46tsbmt2.42Kyoto-U2.54Kyoto-U2.48Kyoto-U 24.02 ? ??

33 Kyoto-U Fully Syntactic EBMT system: 1.Alignment: Consistency 2.Alignment: Extension 3.Translation: Discontinuous example 4.Translation: Easy combination By using syntactic information, we could achieve reasonably high quality translation For patent translation, we may need some pre-processings to handle special expressions which cause parsing errors Conclusion


Download ppt "Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi."

Similar presentations


Ads by Google