Download presentation
Presentation is loading. Please wait.
Published byWalter Evans Modified over 9 years ago
1
Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi Sato (Kyoto-U), Hideo Watanabe (IBM Japan)
2
Our method Introduction 1-2% Co-occurrence information Parallel Corpus Syntactic Information Translation dictionary Statistical approach 50% Translation examples
3
Goal 大きく 寄与して いること が (great) (contribution) case-maker 大きく 寄与して いること が (great) (contribution) case-maker This paper shows great contributions of TFP ・・・ 示されている (show) 示されている (show) ・・・ 全要素生産性 が (TFP) case-maker 全要素生産性 が (TFP) case-maker
4
Problems For finding many correspondences Translation Dictionary 1: some words can not be consulted by a dictionary 2: ambiguity resolution of consulting dictionary 2 Problems
5
Overview Introduction Method Experiments Conclusion
6
Method Step 1 Detection of Phrasal Dependency Structure Detection of Basic Phrasal Correspondences by Consulting Dictionary Discovery of New Correspondences By Handling Remaining Phrases Step 2 Step 3
7
Step1: Phrasal Dependency Structures I bought this car by monthly installments I bought this car by monthly installments. ESG (English Parser) Rules
8
Step1: Phrasal Dependency Structures Rules Function words are grouped together with a following content-word. A compound noun is considered as one phrase. Auxiliary verbs are grouped together with a following verb. (is playing, was tired, …) A parallel-relation word is considered as one phrase. ( and, or,… )
9
Step2: Detection of Phrasal Correspondences information technologyin science technology 科学 技術 に (Science Technology) おける情報 技術 (Information Technology) …… … …
10
information technologyin science technology 科学 技術 に (Science Technology) おける情報 技術 (Information Technology) Step2: Detection of Phrasal Correspondences …… … …
11
information technologyin science technology 科学 技術 に (Science Technology) おける情報 技術 (Information Technology) Step2: Detection of Phrasal Correspondences …… … …
12
information technologyin science technology 科学 技術 に (Science Technology) おける情報 技術 (Information Technology) … … … …
13
Step2: Detection of Phrasal Correspondences in science technology 科学 技術 に (Science Technology) おける情報 技術 (Information Technology) … … … information technology …
14
Criteria to choose phrasal correspondences –Correspondences of content words –Correspondences of neighboring phrases # of word-link X 2 # of J content-word + # of E content-word Step2: Detection of Phrasal Correspondences
15
Method Step 1 Detection of Phrasal Dependency Structure Detection of Basic Phrasal Correspondences by Consulting Dictionary Discovery of New Correspondences By Handling Remaining Phrases Step 2 Step 3
16
Step3: Discovery of New Correspondences By Handling Remaining Phrases (New) in post Cold war years 冷戦 終結 後 に (cold-war) (end) (after) case-maker 冷戦 終結 後 に (cold-war) (end) (after) case-maker and services goods 物 や (object) サービス の (service) サービス の (service) (merge)
17
Criteria to discover new correspondences –Local and Global supports Local support: other phrasal correspondences within two-phrase distance in the dependency structure. Global support: phrase correspondences in the parallel sentences. –POS Consistency –Inner Sufficiency Step3: Discovery of New Correspondences By Handling Remaining Phrases
18
Japan the role 日本 は (Japan) case-maker 日本 は (Japan) case-maker 役割 を (Role) case-maker 役割 を (Role) case-maker 果たす (Achieve) play Step3: Discovery of New Correspondences By Handling Remaining Phrases
19
・・・ technology become important 技術 が (technology) case-maker 技術 が (technology) case-maker 重要 と ( important ) 重要 と ( important ) なっている ( become ) has ・・・ Step3: Discovery of New Correspondences By Handling Remaining Phrases
20
Experiments Evaluation data: 200 sentence-pairs form White Paper & Example sentences in a Japanese-English dictionary Gold standard data: We manually tagged correct correspondences on these sentences. Correct : Exactly equal with a pre-aligned Near-correct: Partly matches with a pre-aligned Wrong : No match with Correct & Near-correct
21
Output Examples EnglishJapaneseScore is being pursued of G7 nations geographical proximity 行われている (is doing by ) 先進 7 カ国の (advanced 7 countries ) 地理的に近い (near in geography) 2.75 2.6 2.0 tree (become) went [to bed] She ( held) その木は (That tree is) 寝る (Go to bed) 彼女は (She is) 1.2 1.0 0.5 Near-correct Correct
22
Precision – Recall Correct→ Correct + Near-Correct × 0.5→
23
Conclusion We can find more correspondences than statistical approach. In comparable corpus, a statistical approach seems to be effective, however in parallel corpus, our approach is more effective to get large number of translation examples. Statistical approach 1-2% of the input corpus Our system51-68% of the input corpus
25
Future Directions Correspondences which are found by this system effectively works? Necessary for the tests in a translation system
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.