Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Improved Hierarchical Word Sequence Language Model Using Word Association 2015.11.26 NARA Institute of Science and Technology Xiaoyi WuYuji Matsumoto.

Similar presentations


Presentation on theme: "An Improved Hierarchical Word Sequence Language Model Using Word Association 2015.11.26 NARA Institute of Science and Technology Xiaoyi WuYuji Matsumoto."— Presentation transcript:

1 An Improved Hierarchical Word Sequence Language Model Using Word Association 2015.11.26 NARA Institute of Science and Technology Xiaoyi WuYuji Matsumoto Kevin Duh Hiroyuki Shindo

2 Motivation Continuous Language Model Continuous Language Model a a selfish man Unseen Sequence a a man a a selfish man Learned Sequences Data Sparsity 1

3 Motivation a a selfish man a a Training data Discontinuous Sequence a a man P( ) ≈ ? Smoothing techniques 30 years worth of newswire text 1/3 trigrams are unseen (Allison et al., 2005) 1/3 trigrams are unseen (Allison et al., 2005) 2

4 HWS language model 3 as soon as possible as soon as possible n-gram HWS continuous discontinuous utterance-oriented pattern-oriented

5 Basic Idea of HWS Patterns are discontinuous(Sentence are divided into several sections by patterns) – x is a y of z Patterns are hierarchical – x is a y of z → x is y of z → x is z Words are generated from certain position of patterns (Words depend on patterns) 4

6 Basic Idea of HWS 5 is Tom a a boy of nine is Tom a a boy of nine discontinuous Hierarchical Word depends on pattern Word depends on pattern

7 Proposed Approach (Frequency-based HWS Model) Corpus Mrs. Allen is a senior editor of insight magazine 6 of Mrs. Allen is a senior editor Insight magazine

8 Proposed Approach (Frequency-based HWS Model) Corpus Mrs. Allen is a senior editor of insight magazine 7 of is magazine Mrs. Allen a senior editor Insight

9 Proposed Approach (Original HWS Model) Corpus Mrs. Allen is a senior editor of insight magazine 8 of is magazine Mrs. a a Insight Allen editor senior

10 Proposed Approach (Original HWS Model) Corpus Mrs. Allen is a senior editor of insight magazine 9 of is magazine Mrs. a a Insight Allen editor senior ($, of), (of, a), (a, is), (is, Mrs.), (Mrs., Allen), (a, senior), (senior, editor), (of, magazine), (magazine, insight)

11 Advantage of HWS: discontinuity 10 as soon as possible as soon as possible n-gram HWS

12 Word Association Based HWS 11 too much to handle Frequency-based HWS Frequency-based HWS too much to handle Word Association Based HWS Word Association Based HWS Frequency Word Association Score

13 Extra Techniques 1/2: Directionalization 12 as soon as possible as soon as possible ($, $, as), ($, as, soon), (as, soon, as), (soon, as, possible) ($, $, as), ($, as, soon), (as, soon, as), (soon, as, possible) ($, $, as), ($, as, as), (as, as, soon), (as, as, possible) ($, $, as), ($, as, as), (as, as, soon), (as, as, possible) n-gram (One-side generation) n-gram (One-side generation) HWS (Double-side generation) HWS (Double-side generation)

14 Extra Techniques 1/2: Directionalization 13 as soon as possible HWS (Directionalization) HWS (Directionalization) as soon as possible HWS R L R ($, as, as), (as, as, soon), (as, as, possible) ($, as, as), (as, as, soon), (as, as, possible) ($-R, as-R, as), (as-R, as-L, soon), (as-R, as-R, possible) ($-R, as-R, as), (as-R, as-L, soon), (as-R, as-R, possible)

15 Extra Techniques 2/2: Unification 14 the.. HWS (Unification) HWS (Unification) the.. when constructing a HWS structure, for each word in one sentence, we only count it once. when constructing a HWS structure, for each word in one sentence, we only count it once.

16 Intrinsic Experiments Training data – British National Corpus ( 449,755 sentences, 10 million words ) Test data – English Gigaword Corpus ( 44,702 sentences, 1 million words ) Preprocessing – NLTK tokenizer – Lowercase Word Association Score – Dice coefficient Smoothing methods – MKN(Modeified Kneser-Ney) (Chen & Goodman, 1999) – GLM(Generalized language model) (Pickhardt et. al.,2014) Evaluation measures – Perplexity – Coverage (|TR∩TE| / |TE|) – Usage (|TR∩TE| / |TR|) 15

17 Evaluation 16

18 Extrinsic Experimental Settings Training data – TED talks parallel corpus French-English (139761 sentence pairs) Test data – TED talks parallel corpus French-English (1617sentence pairs) Translation toolkit – Moses system Evaluation measures – BLEU (Papineni et al., 2002) – METEOR(Banerjee & Lavie, 2005) – TER (Snover et al., 2006) 17

19 Extrinsic Evaluation 18

20 Conclusions We proposed an improved hierarchical language model using word association and two extra techniques Proposed model can model natural language more precisely than the original FB-HWS Proposed model has better performance on both intrinsic and extrinsic experiments Source code can be downloaded at – https://github.com/aisophie/HWS 19

21 ` Thank you for your attention! 20


Download ppt "An Improved Hierarchical Word Sequence Language Model Using Word Association 2015.11.26 NARA Institute of Science and Technology Xiaoyi WuYuji Matsumoto."

Similar presentations


Ads by Google