Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.

Slides:

Advertisements

Similar presentations

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.

Advertisements

Measuring the Influence of Long Range Dependencies with Neural Network Language Models Le Hai Son, Alexandre Allauzen, Franc¸ois Yvon Univ. Paris-Sud and.

歡迎 IBM Watson 研究員詹益毅博士蒞臨國立台灣師範大學. Hai-Son Le, Ilya Oparin, Alexandre Allauzen, Jean-Luc Gauvain, Franc¸ois Yvon ICASSP 2011 許曜麒 Structured Output.

Trie/Suffix Trie/Suffix Tree. Trie A trie (from retrieval), is a multi-way tree structure useful for storing strings over an alphabet. It has been used.

Tries Standard Tries Compressed Tries Suffix Tries.

Vamshi Ambati | Stephan Vogel | Jaime Carbonell Language Technologies Institute Carnegie Mellon University A ctive Learning and C rowd-Sourcing for Machine.

Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.

Modern Information Retrieval

BTrees & Bitmap Indexes

Finite Automata Finite-state machine with no output. FA consists of States, Transitions between states FA is a 5-tuple Example! A string x is recognized.

CAS-IA System Description Jinhua Du CNGL July 23, 2008.

1 Language Model (LM) LING 570 Fei Xia Week 4: 10/21/2009 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAA A A.

Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.

1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.

Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.

English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.

Technical Report of NEUNLPLab System for CWMT08 Xiao Tong, Chen Rushan, Li Tianning, Ren Feiliang, Zhang Zhuyu, Zhu Jingbo, Wang Huizhen

6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection -Augmentative.

Large Language Models in Machine Translation Conference on Empirical Methods in Natural Language Processing 2007 報告者：郝柏翰 2013/06/04 Thorsten Brants, Ashok.

Finite State Automata and Tries Sambhav Jain IIIT Hyderabad.

Efficient Minimal Perfect Hash Language Models David Guthrie, Mark Hepple, Wei Liu University of Sheffield.

2012: Monolingual and Crosslingual SMS-based FAQ Retrieval Johannes Leveling CNGL, School of Computing, Dublin City University, Ireland.

Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.

A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.

Better Punctuation Prediction with Dynamic Conditional Random Fields Wei Lu and Hwee Tou Ng National University of Singapore.

1 The LIG Arabic / English Speech Translation System at IWSLT07 Laurent BESACIER, Amar MAHDHAOUI, Viet-Bac LE LIG*/GETALP (Grenoble, France)

Part-Of-Speech Tagging using Neural Networks Ankur Parikh LTRC IIIT Hyderabad

Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,

Coşkun Mermer, Hamza Kaya, Mehmet Uğur Doğan National Research Institute of Electronics and Cryptology (UEKAE) The Scientific and Technological Research.

NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.

Languages Given an alphabet , we can make a word or string by concatenating the letters of . Concatenation of “x” and “y” is “xy” Typical example: 

SCRIBE SUBMISSION GROUP 8 Date: 7/8/2013 By – IKHAR SUSHRUT MEGHSHYAM 11CS10017 Lexical Analyser Constructing Tokens State-Transition Diagram S-T Diagrams.

Chapter6. Statistical Inference : n-gram Model over Sparse Data 이 동 훈 Foundations of Statistic Natural Language Processing.

Sequence Models With slides by me, Joshua Goodman, Fei Xia.

HIERARCHICAL SEARCH FOR LARGE VOCABULARY CONVERSATIONAL SPEECH RECOGNITION Author :Neeraj Deshmukh, Aravind Ganapathiraju and Joseph Picone.

1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.

Efficient Language Model Look-ahead Probabilities Generation Using Lower Order LM Look-ahead Information Langzhou Chen and K. K. Chin Toshiba Research.

Tries1. 2 Outline and Reading Standard tries (§9.2.1) Compressed tries (§9.2.2) Suffix tries (§9.2.3)

Tokenization & POS-Tagging

Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.

NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.

Yuya Akita , Tatsuya Kawahara

Xinhao Wang, Jiazhong Nie, Dingsheng Luo, and Xihong Wu Speech and Hearing Research Center, Department of Machine Intelligence, Peking University September.

1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

A System to Generate Test Data and Symbolically Execute Programs Lori A. Clarke Presented by: Xia Cheng.

Latent Topic Modeling of Word Vicinity Information for Speech Recognition Kuan-Yu Chen, Hsuan-Sheng Chiu, Berlin Chen ICASSP 2010 Hao-Chin Chang Department.

Relevance Language Modeling For Speech Recognition Kuan-Yu Chen and Berlin Chen National Taiwan Normal University, Taipei, Taiwan ICASSP /1/17.

Finite State Machines 1.Finite state machines with output 2.Finite state machines with no output 3.DFA 4.NDFA.

Modeling Computation: Finite State Machines without Output

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences Recurrent Neural Network-based Language Modeling for an Automatic.

1.2 Three Basic Concepts Languages start variables Grammars Let us see a grammar for English. Typically, we are told “a sentence can Consist.

Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From

Statistical NLP Spring 2011 Lecture 3: Language Models II Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint manual before you delete.

Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.

2009 NIST Language Recognition Systems Yan SONG, Bing Xu, Qiang FU, Yanhua LONG, Wenhui LEI, Yin XU, Haibing ZHONG, Lirong DAI USTC-iFlytek Speech Group.

Language Model for Machine Translation Jang, HaYoung.

Finite-State Machines (FSM) Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth Rosen.

Feature learning for multivariate time series classification Mustafa Gokce Baydogan * George Runger * Eugene Tuv † * Arizona State University † Intel Corporation.

Finite Automata.

BCT 2083 DISCRETE STRUCTURE AND APPLICATIONS

Hierarchical Finite State Controllers for Generalized Planning

Digital Speech Processing

Personalized Celebrity Video Search Based on Cross-space Mining

Compiler Construction

Using Multilingual Neural Re-ranking Models for Low Resource Target Languages in Cross-lingual Document Detection Using Multilingual Neural Re-ranking.

Recap lecture 19 NFA corresponding to Closure of FA, Examples, Memory required to recognize a language, Example, Distinguishing one string from another,

RECAP Lecture 7 FA of EVEN EVEN, FA corresponding to finite languages(using both methods), Transition graphs.

Finite-State Machines with No Output

Recap Lecture 4 Regular expression of EVEN-EVEN language, Difference between a* + b* and (a+b)*, Equivalent regular expressions; sum, product and closure.

Presentation transcript:

Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive Digital Media Technology Research Center, CASIA

1 Outline Task Problems Solution Our Approach Results Conclusion

2 Outline Task Problems Solution Our Approach Results Conclusion

3 Task N-gram Language Model assign probabilities to string of words or tokens Let w L denote a string of L tokens over a fixed vocabulary smoothing techniques – back-off – Define

4 Outline Task Problems Solution Our Approach Results Conclusion

5 Problems Query in trie structure Useless queries Problems in Forward Query Problems in Back-off Query

6 Outline Task Problems Solution Our Approach Results Conclusion

7 Solution Another point of view a random procedure a continuous process Benefit Speed up Forward Query Speed up Back-off Query Goal Fast Compact

8 Outline Task Problems Solution Our Approach Results Conclusion

9 Our Approaches FAST WFSA 5-turple M=(Q, Σ, I, F, δ ) Definition Qa set of states Ia set of initial states Fa set of final states Σa alphabet which represents the input and output labels δ δ Q×(Σ ∪ {ε}), a transition relation

10 Our Approaches FAST WFSA 5-turple M=(Q, Σ, I, F, δ ) Example Qa set of states Ia set of initial states Fa set of final states Σa alphabet which represents the input and output labels δ δ Q×(Σ ∪ {ε}), a transition relation

11 Our Approaches Compact Trie Sort Array

12 Our Approaches Compact Trie Sort Array Link index

13 Our Approaches WFSA-based LM Trie structure Note: – T f triggers corresponding to forward query – T b triggers spontaneously without any input – reaches to the leaves – carries out back-off queries Qthe nodes in trie Ithe root of trie FEach node of trie except the root Σthe alphabet of input sentences δforward transition T f and roll-back transition T b

14 Our Approaches WFSA-based LM

15 Our Approaches WFSA-based LM Probability Back-off Index Probability Back-off Index Roll-back index

16 Our Approaches WFSA-based LM Probability Back-off Index Probability Back-off Index Roll-back index Cross Layer

17 Our Approaches Query Method

18 Our Approaches Query Method

19 Our Approaches Query Method

20 Our Approaches Query Method

21 Our Approaches Query Method

22 Our Approaches Query Method

23 Our Approaches Query Method

24 Our Approaches Query Method

25 Our Approaches Query Method

26 Our Approaches State Transitions

27 Our Approaches Query LM

28 Our Approaches For HPB SMT For a source sentence – A huge number of LM queries – Ten Millions – Most of these are repetitive Hash cache

29 Our Approaches For HPB SMT Hash cache – Small & fast – Hash size 24bit – 16M – Simple operation – Additive Operation – Bitwise Operation Hash clear – For each sentence

30 Outline Task Problems Solution Our Approach Results Conclusion

31 Results Setup LM Toolkit: SRILM Decoder: Hierarchical phrase-based translation system Test data: IWSLT-07(489) & NIST-06(1664) Training data: TasksModel Parallel sentences Chinese words English words IWSLT-07 TM [1] [1] 0.38M3.0M3.1M LM [2] [2] 1.3M —— 15.2M NIST-06 TM [3] [3] 3.4M64M70M LM [4] [4] 14.3M —— 377M [1] [1] The parallel corpus of BTEC (Basic Traveling Expression Corpus) and CJK (China-Japan-Korea corpus) [2] [2] The English corpus of BTEC+CJK+CWMT2008 [3] [3] LDC2002E18, LDC2002T01, LDC2003E07, LDC2003E14, LDC2003T17, LDC2004T07, LDC2004T08, LDC2005T06, LDC2005T10, LDC2005T34, LDC2006T04, LDC2007T09 [4] [4] LDC2007T07

32 Results Storage Space The storage sizes increase about 35% Linearly dependent with the nodes of trie Acceptable Tasksn-gramsSRILM (Mb)WFSA (Mb)Δ (%) IWSLT NIST The comparison of LM size between SRILM and WFSA

33 Results Query Speed WFSA – 60% in 4-grams – 70% in 5-grams WFSA+cache – Speed up by 75% n-gramsmethodsIWSLT-07(s)NIST-06(s) 4 SRILM WFSA WFSA+cache SRILM WFSA WFSA+cache596128

34 Results Analysis Repetitive queries and back-off queries in SMT 4-gram – back-off queries are widely existed – most of these queries are repetitive WFSA based LM can speed up queries effectively TasksBack-offRepetitive IWSLT %95.5% NIST %96.4%

35 Outline Task Problems Solution Our Approach Results Conclusion

36 Conclusion A faster WFSA-based LM Faster forward query Faster back-off query A compact WFSA-based LM Trie structure A simple caching technique For SMT system Other fields Speech recognition Information retrieval

Thanks!