Structural Phrase Alignment Based on Consistency Criteria Toshiaki Nakazawa, Kun Yu, Sadao Kurohashi (Graduate School of Informatics, Kyoto University)

Slides:



Advertisements
Similar presentations
Iterative Bilingual Lexicon Extraction from Comparable Corpora Using Topic Model and Context Based Methods Chenhui Chu, Toshiaki Nakazawa, Sadao Kurohashi.
Advertisements

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
A Comparison of Statistical Post-Editing on Chinese and Japanese Midori Tatsumi and Yanli Sun Under the supervision of: Sharon O’Brien; Minako O’Hagan;
The Moment of AHA! Dec. 25, 2004 hideki
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
ヴァウドットコム株式会社 ObjectIN 見積書作成. Biz Rule: データ入力チェック B iz Apply: データベース連携 COM +イベントによる 代替オペレーション ポーリング方式でなく アプリケーション監視による 最新データの同期機能 Biz Exchange (注 1 ) :
線形符号(10章).
公的年金 財政学B(財政学) 第5回 畑農鋭矢.
JPN 311: Conversation and Composition 勧誘 (invitation)
JPN 311: Conversation and Composition 許可 (permission)
「ネット社会の歩き方」レッスンキット プレゼンテーション資料集 15. チャットで個人情報は 言わない プレゼンテーション資料 著作権は独立行政法人情報処理推進機構( IPA )及び経済産業省に帰属します。
Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.
図書館における 個人対応検索システム                03k1001 赤塚 拓巳.
Breaking the Resource Bottleneck for Multilingual Parsing Rebecca Hwa, Philip Resnik and Amy Weinberg University of Maryland.
金融の基本Q&A50  Q37~Q /6/24 蔵内雄大.
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
Alignment by Bilingual Generation and Monolingual Derivation Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
平成 二十六年 一月五日・月曜日 Bellwork: 先生の日 学校に来なくてもいい Assignments: -
Acquiring Reliable Predicate- argument Structures from Raw Corpora for Case Frame Compilation Daisuke Kawahara 1 and Sadao Kurohashi 1,2 LREC2010, 2010/05/20.
Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.
Language Knowledge Engineering Lab. Kyoto University NTCIR-10 PatentMT, Japan, Jun , 2013 Description of KYOTO EBMT System in PatentMT at NTCIR-10.
1 Negative & Interrogative Sentences 2 Grammar and Vocabulary Ⅱ September 30, 2011.
Iterative Bilingual Lexicon Extraction from Comparable Corpora with Topical and Contextual Knowledge Chenhui Chu, Toshiaki Nakazawa, Sadao Kurohashi Graduate.
Exploiting Shared Chinese Characters in Chinese Word Segmentation Optimization for Chinese-Japanese Machine Translation Chenhui Chu, Toshiaki Nakazawa,
1 A Finite-State Approach to Machine Translation Srinivas Bangalore Giuseppe Riccardi AT&T Labs-Research NAACL 2001, Pittsburgh,
1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.
Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and.
Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei.
Accurate Parallel Fragment Extraction from Quasi-Comparable Corpora using Alignment Model and Translation Lexicon Chenhui Chu, Toshiaki Nakazawa, Sadao.
Japanese Dependency Analysis using Cascaded Chunking Taku Kudo 工藤 拓 Yuji Matsumoto 松本 裕治 Nara Institute Science and Technology, JAPAN.
Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.
2006/12/081 Large Scale Crawling the Web for Parallel Texts Chikayama Taura lab. M1 Dai Saito.
Bayesian Subtree Alignment Model based on Dependency Trees Toshiaki Nakazawa Sadao Kurohashi Kyoto University 1 IJCNLP2011.
B12433 Midori Maezawa 1. 2  GGI (= Gender Gap Index ) ジェンダー・ギャップ指数 世界経済フォーラムが、各国内の男女間の格差を 数値化しランク付けしたもの。経済分野、教育分野、 政治分野及び保険分野のデータから算出される。0 が完全平等、1が不完全平等を意味する。
Example-based Machine Translation Pursuing Fully Structural NLP
Korea Maritime and Ocean University NLP Jung Tae LEE
A 01 Is it Yummy? Is it Big? How do Adjectives work?
Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.
Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi.
2007/4/201 Extracting Parallel Texts from Massive Web Documents Chikayama Taura lab. M2 Dai Saito.
Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Chun Kai Chen Author : Qing.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011.
Example-based Machine Translation based on Deeper NLP 1. Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan,
Example-based Machine Translation Pursuing Fully Structural NLP Sadao Kurohashi, Toshiaki Nakazawa, Kauffmann Alexis, Daisuke Kawahara University of Tokyo.
A New Approach for English- Chinese Named Entity Alignment Donghui Feng Yayuan Lv Ming Zhou USC MSR Asia EMNLP-04.
Linguistically-motivated Tree-based Probabilistic Phrase Alignment Toshiaki Nakazawa, Sadao Kurohashi (Kyoto University)
英語勉強会(坂田英語) B4 詫間 風人. A Corrected English Composition Sharing System Classification Display and Interface for Searching A corrected English composition.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.
英語勉強会 (橋本さんの) 10月9日 坂田梨紗. 英語の文章の 成り立ち 言いたいこと 説明 言いたいこと I went to the library to read Harry Potter.
RELATIVE CLAUSES Adjectival Clauses/Modifiers. RELATIVE CLAUSES A relative clause is the part of a sentence which describes a noun Eg. The cake (which)
Japanese-Chinese Phrase Alignment Exploiting Shared Chinese Characters Chenhui Chu, Toshiaki Nakazawa and Sadao Kurohashi Graduate School of Informatics,
ようこそ日本・日本語のクラ スへ Welcome to Japanese Class! Transition Year 2011.
Cross-language Projection of Dependency Trees Based on Constrained Partial Parsing for Tree-to-Tree Machine Translation Yu Shen, Chenhui Chu, Fabien Cromieres.
Fabien Cromieres Chenhui Chu Toshiaki Nakazawa Sadao Kurohashi
SCTB: A Chinese Treebank in Scientific Domain
Semantic Parsing for Question Answering
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Improved Word Alignments Using the Web as a Corpus
再生は2回します。 1回目で全問正解→3点 2回目で全問正解→1点 about Asakoh.
Statistical Machine Translation Papers from COLING 2004
An Empirical Comparison of Domain Adaptation Methods for
Kyoto University Participation to WAT 2016
Presentation transcript:

Structural Phrase Alignment Based on Consistency Criteria Toshiaki Nakazawa, Kun Yu, Sadao Kurohashi (Graduate School of Informatics, Kyoto University) {nakazawa, my traffic The light was green when entering the intersection Language Models My traffic light was green when entering the intersection. Output came at me from the side at the intersection 私 の私 の サイン 家 に家 に 入る 時 脱ぐ 交差 点 で 、点 で 、 突然 飛び出して 来た のです 。 信号 は 青 でした 。 my signature traffic The light was green to remove when entering a house Translation Examples (suddenly) (rush out) (house) (put off) (signal) (enter) (when) (cross) (point) (my) (signal) (blue) (was) Input 交差 点 に点 に 入る 時 私 の私 の 信号 は 青 でした 。 (cross) (point) (enter) (when) (my) (signal) (blue) (was) 交差点に入る時 私の信号は青でし た。 Near! Far! J-Side DistanceE-Side Distance Consistency Score Frequency (log) Dist of J-Side Dist of E-Side Score J-Side Distance E-Side Distance Flow of Our EBMT System Core Steps of Alignment Searching Correspondence Candidates –Fine alignment is efficient in translation –Search candidates as much as possible using variety of linguistic information Bilingual dictionaries Transliteration (Katakana words, NEs) ローズワイン → rosuwain ⇔ rose wine (similarity:0.78) 新宿 → shinjuku ⇔ shinjuku (similarity:1.0) Numeral normalization 二百十六万 → 2,160,000 ← 2.16 million Japanese flexible matching (Odani et. al. 2007) Substring co-occurrence measure (Cromieres 2006) Selecting Correspondence Candidates –More candidates derive more ambiguities and improper alignments –Necessity of robust alignment method which can align parallel sentences consistently by selecting the adequate candidates set PreRecF Baseline Consistency Score Proposed(+CS,+DpndType) Filtering (80%) Moses (SMT Toolkit)* Manual (upper bound) English- French English- Romanian English- Korean HLT-NAACL ACL ( Gildea, 2003 ) --32 GIZA Experimental Result 500 test sentences from Mainichi newspaper parallel corpus Bilingual dictionary: KENKYUSYA J-E/J-E 500K entries Evaluation criteria: Precision / Recall / F-measure Character-base for Japanese, word-base for English Quality of Other Language Pairs * Using 300K newspaper domain bi-sentences for training (AER) Conclusion Selecting Correspondence Candidates Using Consistency Score and Dependency Type you will have to file insurance an claim insurance with the office in Japan 日本 で 保険 会社 に 対して 保険 請求 の 申し立て が 可能ですよ (in Japan) (insurance) (to company) (claim) (instance) (you can) Ambiguities! Improper alignments! Distribution of the distance of alignment pairs in hand-annotated data (Mainichi newspaper 40K sentence pairs) [Uchimoto04] Consistency Score Function “Near-Near” pair → Positive Score “Far-Far” pair → 0 “Near-Far” pair → Negative Score 1/1+1/2=1.5 baseline Japanese predicate: level C6 predicate: level B+/B5 predicate: level B-/A4 case no / rentai2 Inside clause1 predicate: level A- Others3 English S / SBAR / SQ …5 VP / WHADVP4 WHADJP ADVP / ADJP NP / PP / INTJ 3 QP / PRT / PRN Others1 Dependency Type Distance How to reflect the inconsistency? Proposed a new phrase alignment method using consistency criteria. Enough alignment accuracy compared to other language pairs. We need to acquire the parameters automatically by machine learning. We are planning to evolve the framework which revises the parse result. (There is a translation demos in exhibition corner by NICT which is using our system!) you will have to file insurance an claim insurance with the office in Japan 日本 で 保険 会社 に 対して 保険 請求 の 申し立て が 可能です よ 3 1 1 3 2 3 3 3 3 1 1 デ格 文節内 連用 文節内 ノ格ノ格 ガ格 NP NN PP NN PP 3 Pair 1: (Ds, Dt) = (1, 1) Positive Score Pair 2: (Ds, Dt) = (1, 7) Negative Score (in Japan) (insurance) (to company) (claim) (instance) (you can) [case “de”] [case “ga”] [renyou] [case “no”] [inside clause] Near! Far!