An Improved Hierarchical Word Sequence Language Model Using Word Association 2015.11.26 NARA Institute of Science and Technology Xiaoyi WuYuji Matsumoto.

Slides:



Advertisements
Similar presentations
1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.
Advertisements

REDUCED N-GRAM MODELS FOR IRISH, CHINESE AND ENGLISH CORPORA Nguyen Anh Huy, Le Trong Ngoc and Le Quan Ha Hochiminh City University of Industry Ministry.
Language Model Based Arabic Word Segmentation By Saleh Al-Zaid Software Engineering.
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
Using Web Queries for Learner Error Detection Michael Gamon, Microsoft Research Claudia Leacock, Butler-Hill Group.
BLEU, Its Variants & Its Critics Arthur Chan Prepared for Advanced MT Seminar.
The current status of Chinese- English EBMT -where are we now Joy (Ying Zhang) Ralf Brown, Robert Frederking, Erik Peterson Aug 2001.
The current status of Chinese-English EBMT research -where are we now Joy, Ralf Brown, Robert Frederking, Erik Peterson Aug 2001.
1 Language Model (LM) LING 570 Fei Xia Week 4: 10/21/2009 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAA A A.
Minimum Error Rate Training in Statistical Machine Translation By: Franz Och, 2003 Presented By: Anna Tinnemore, 2006.
1 Language Model Adaptation in Machine Translation from Speech Ivan Bulyko, Spyros Matsoukas, Richard Schwartz, Long Nguyen, and John Makhoul.
MT Summit VIII, Language Technologies Institute School of Computer Science Carnegie Mellon University Pre-processing of Bilingual Corpora for Mandarin-English.
LEARNING WORD TRANSLATIONS Does syntactic context fare better than positional context? NCLT/CNGL Internal Workshop Ankit Kumar Srivastava 24 July 2008.
1 Advanced Smoothing, Evaluation of Language Models.
12/13/2007Chia-Ho Ling1 SRILM Language Model Student: Chia-Ho Ling Instructor: Dr. Veton Z. K ë puska.
Name:Venkata subramanyan sundaresan Instructor:Dr.Veton Kepuska.
Constrained Optimization for Validation-Guided Conditional Random Field Learning Minmin Chen , Yixin Chen , Michael Brent , Aaron Tenney Washington University.
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
1IBM T.J. Waston CLSP, The Johns Hopkins University Using Random Forests Language Models in IBM RT-04 CTS Peng Xu 1 and Lidia Mangu 2 1. CLSP, the Johns.
6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection -Augmentative.
Large Language Models in Machine Translation Conference on Empirical Methods in Natural Language Processing 2007 報告者:郝柏翰 2013/06/04 Thorsten Brants, Ashok.
Combining Lexical Semantic Resources with Question & Answer Archives for Translation-Based Answer Finding Delphine Bernhard and Iryna Gurevvch Ubiquitous.
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
PowerConc: An R-gram Based Corpus Analysis Tool Jiajin Xu & Yunlong Jia Beijing Foreign Studies University.
Automated Scoring of Picture- based Story Narration Swapna Somasundaran Chong Min Lee Martin Chodorow Xinhao Wang.
The CoNLL-2013 Shared Task on Grammatical Error Correction Hwee Tou Ng, Yuanbin Wu, and Christian Hadiwinoto 1 Siew.
1 Named Entity Recognition based on three different machine learning techniques Zornitsa Kozareva JRC Workshop September 27, 2005.
Efficient Minimal Perfect Hash Language Models David Guthrie, Mark Hepple, Wei Liu University of Sheffield.
Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.
Better Punctuation Prediction with Dynamic Conditional Random Fields Wei Lu and Hwee Tou Ng National University of Singapore.
Deriving Paraphrases for Highly Inflected Languages from Comparable Documents Kfir Bar, Nachum Dershowitz Tel Aviv University, Israel.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Chinese Word Segmentation and Statistical Machine Translation Presenter : Wu, Jia-Hao Authors : RUIQIANG.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
Debug Concern Navigator Masaru Shiozuka(Kyushu Institute of Technology, Japan) Naoyasu Ubayashi(Kyushu University, Japan) Yasutaka Kamei(Kyushu University,
Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,
Efficient Language Model Look-ahead Probabilities Generation Using Lower Order LM Look-ahead Information Langzhou Chen and K. K. Chin Toshiba Research.
Cluster-specific Named Entity Transliteration Fei Huang HLT/EMNLP 2005.
LING 388: Language and Computers Sandiway Fong Lecture 27: 12/6.
Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.
Alignment of Bilingual Named Entities in Parallel Corpora Using Statistical Model Chun-Jen Lee Jason S. Chang Thomas C. Chuang AMTA 2004.
Ngram models and the Sparcity problem. The task Find a probability distribution for the current word in a text (utterance, etc.), given what the last.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Statistical Machine Translation Raghav Bashyal. Statistical Machine Translation Uses pre-translated text (copora) Compare translated text to original.
A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Haitham Elmarakeby.  Speech recognition
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Comparing Word Relatedness Measures Based on Google n-grams Aminul ISLAM, Evangelos MILIOS, Vlado KEŠELJ Faculty of Computer Science Dalhousie University,
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Learning a Monolingual Language Model from a Multilingual Text Database Rayid Ghani & Rosie Jones School of Computer Science Carnegie Mellon University.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Text Classification Improved through Multigram Models.
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences Recurrent Neural Network-based Language Modeling for an Automatic.
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Statistical NLP Spring 2011 Lecture 3: Language Models II Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
Automatic Question Answering Beyond the Factoid Radu Soricut Information Sciences Institute University of Southern California Eric Brill Microsoft Research.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
Language Modeling Part II: Smoothing Techniques Niranjan Balasubramanian Slide Credits: Chris Manning, Dan Jurafsky, Mausam.
Automatically Labeled Data Generation for Large Scale Event Extraction
KantanNeural™ LQR Experiment
Improving IBM Word-Alignment Model 1(Robert C. MOORE)
Presentation transcript:

An Improved Hierarchical Word Sequence Language Model Using Word Association NARA Institute of Science and Technology Xiaoyi WuYuji Matsumoto Kevin Duh Hiroyuki Shindo

Motivation Continuous Language Model Continuous Language Model a a selfish man Unseen Sequence a a man a a selfish man Learned Sequences Data Sparsity 1

Motivation a a selfish man a a Training data Discontinuous Sequence a a man P( ) ≈ ? Smoothing techniques 30 years worth of newswire text 1/3 trigrams are unseen (Allison et al., 2005) 1/3 trigrams are unseen (Allison et al., 2005) 2

HWS language model 3 as soon as possible as soon as possible n-gram HWS continuous discontinuous utterance-oriented pattern-oriented

Basic Idea of HWS Patterns are discontinuous(Sentence are divided into several sections by patterns) – x is a y of z Patterns are hierarchical – x is a y of z → x is y of z → x is z Words are generated from certain position of patterns (Words depend on patterns) 4

Basic Idea of HWS 5 is Tom a a boy of nine is Tom a a boy of nine discontinuous Hierarchical Word depends on pattern Word depends on pattern

Proposed Approach (Frequency-based HWS Model) Corpus Mrs. Allen is a senior editor of insight magazine 6 of Mrs. Allen is a senior editor Insight magazine

Proposed Approach (Frequency-based HWS Model) Corpus Mrs. Allen is a senior editor of insight magazine 7 of is magazine Mrs. Allen a senior editor Insight

Proposed Approach (Original HWS Model) Corpus Mrs. Allen is a senior editor of insight magazine 8 of is magazine Mrs. a a Insight Allen editor senior

Proposed Approach (Original HWS Model) Corpus Mrs. Allen is a senior editor of insight magazine 9 of is magazine Mrs. a a Insight Allen editor senior ($, of), (of, a), (a, is), (is, Mrs.), (Mrs., Allen), (a, senior), (senior, editor), (of, magazine), (magazine, insight)

Advantage of HWS: discontinuity 10 as soon as possible as soon as possible n-gram HWS

Word Association Based HWS 11 too much to handle Frequency-based HWS Frequency-based HWS too much to handle Word Association Based HWS Word Association Based HWS Frequency Word Association Score

Extra Techniques 1/2: Directionalization 12 as soon as possible as soon as possible ($, $, as), ($, as, soon), (as, soon, as), (soon, as, possible) ($, $, as), ($, as, soon), (as, soon, as), (soon, as, possible) ($, $, as), ($, as, as), (as, as, soon), (as, as, possible) ($, $, as), ($, as, as), (as, as, soon), (as, as, possible) n-gram (One-side generation) n-gram (One-side generation) HWS (Double-side generation) HWS (Double-side generation)

Extra Techniques 1/2: Directionalization 13 as soon as possible HWS (Directionalization) HWS (Directionalization) as soon as possible HWS R L R ($, as, as), (as, as, soon), (as, as, possible) ($, as, as), (as, as, soon), (as, as, possible) ($-R, as-R, as), (as-R, as-L, soon), (as-R, as-R, possible) ($-R, as-R, as), (as-R, as-L, soon), (as-R, as-R, possible)

Extra Techniques 2/2: Unification 14 the.. HWS (Unification) HWS (Unification) the.. when constructing a HWS structure, for each word in one sentence, we only count it once. when constructing a HWS structure, for each word in one sentence, we only count it once.

Intrinsic Experiments Training data – British National Corpus ( 449,755 sentences, 10 million words ) Test data – English Gigaword Corpus ( 44,702 sentences, 1 million words ) Preprocessing – NLTK tokenizer – Lowercase Word Association Score – Dice coefficient Smoothing methods – MKN(Modeified Kneser-Ney) (Chen & Goodman, 1999) – GLM(Generalized language model) (Pickhardt et. al.,2014) Evaluation measures – Perplexity – Coverage (|TR∩TE| / |TE|) – Usage (|TR∩TE| / |TR|) 15

Evaluation 16

Extrinsic Experimental Settings Training data – TED talks parallel corpus French-English ( sentence pairs) Test data – TED talks parallel corpus French-English (1617sentence pairs) Translation toolkit – Moses system Evaluation measures – BLEU (Papineni et al., 2002) – METEOR(Banerjee & Lavie, 2005) – TER (Snover et al., 2006) 17

Extrinsic Evaluation 18

Conclusions We proposed an improved hierarchical language model using word association and two extra techniques Proposed model can model natural language more precisely than the original FB-HWS Proposed model has better performance on both intrinsic and extrinsic experiments Source code can be downloaded at – 19

` Thank you for your attention! 20