Yuya Akita , Tatsuya Kawahara

Slides:



Advertisements
Similar presentations
Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.
Advertisements

Chapter 6: Statistical Inference: n-gram Models over Sparse Data
Language Modeling.
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Measuring the Influence of Long Range Dependencies with Neural Network Language Models Le Hai Son, Alexandre Allauzen, Franc¸ois Yvon Univ. Paris-Sud and.
IBM Labs in Haifa © 2007 IBM Corporation SSW-6, Bonn, August 23th, 2007 Maximum-Likelihood Dynamic Intonation Model for Concatenative Text to Speech System.
Measures of Coincidence Vasileios Hatzivassiloglou University of Texas at Dallas.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.
N-Gram Language Models CMSC 723: Computational Linguistics I ― Session #9 Jimmy Lin The iSchool University of Maryland Wednesday, October 28, 2009.
Modeling Consensus: Classifier Combination for WSD Authors: Radu Florian and David Yarowsky Presenter: Marian Olteanu.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 18: 10/26.
Scalable Text Mining with Sparse Generative Models
Language Modeling Approaches for Information Retrieval Rong Jin.
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
8/27/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 5 Giuseppe Carenini.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Example Clustered Transformations MAP Adaptation Resources: ECE 7000:
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu.
6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection -Augmentative.
NLP Language Models1 Language Models, LM Noisy Channel model Simple Markov Models Smoothing Statistical Language Models.
Chapter 6: Statistical Inference: n-gram Models over Sparse Data
Statistical NLP: Lecture 8 Statistical Inference: n-gram Models over Sparse Data (Ch 6)
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Language Modeling Anytime a linguist leaves the group the recognition rate goes up. (Fred Jelinek)
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
Combining Statistical Language Models via the Latent Maximum Entropy Principle Shaojum Wang, Dale Schuurmans, Fuchum Peng, Yunxin Zhao.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
COMPARISON OF A BIGRAM PLSA AND A NOVEL CONTEXT-BASED PLSA LANGUAGE MODEL FOR SPEECH RECOGNITION Md. Akmal Haidar and Douglas O’Shaughnessy INRS-EMT,
1 Sentence-extractive automatic speech summarization and evaluation techniques Makoto Hirohata, Yosuke Shinnaka, Koji Iwano, Sadaoki Furui Presented by.
Chapter6. Statistical Inference : n-gram Model over Sparse Data 이 동 훈 Foundations of Statistic Natural Language Processing.
Style & Topic Language Model Adaptation Using HMM-LDA Bo-June (Paul) Hsu, James Glass.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
Chapter 23: Probabilistic Language Models April 13, 2004.
1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.
Statistical Method of Gene Set Annotation Based on Literature Information Xin He 09/25/2007.
1 Introduction to Natural Language Processing ( ) Language Modeling (and the Noisy Channel) AI-lab
Subproject II: Robustness in Speech Recognition. Members (1/2) Hsiao-Chuan Wang (PI) National Tsing Hua University Jeih-Weih Hung (Co-PI) National Chi.
A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
Latent Topic Modeling of Word Vicinity Information for Speech Recognition Kuan-Yu Chen, Hsuan-Sheng Chiu, Berlin Chen ICASSP 2010 Hao-Chin Chang Department.
Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.
Relevance Language Modeling For Speech Recognition Kuan-Yu Chen and Berlin Chen National Taiwan Normal University, Taipei, Taiwan ICASSP /1/17.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
HMM vs. Maximum Entropy for SU Detection Yang Liu 04/27/2004.
Natural Language Processing Statistical Inference: n-grams
A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval Min Zhang, Xinyao Ye Tsinghua University SIGIR
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Statistical Significance Hypothesis Testing.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
2/29/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 5 Giuseppe Carenini.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Gaussian Mixture Language Models for Speech Recognition Mohamed Afify, Olivier Siohan and Ruhi Sarikaya.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Chinese Named Entity Recognition using Lexicalized HMMs.
Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Language Modeling Part II: Smoothing Techniques Niranjan Balasubramanian Slide Credits: Chris Manning, Dan Jurafsky, Mausam.
An Adaptive Learning with an Application to Chinese Homophone Disambiguation from Yue-shi Lee International Journal of Computer Processing of Oriental.
Arnar Thor Jensson Koji Iwano Sadaoki Furui Tokyo Institute of Technology Development of a Speech Recognition System For Icelandic Using Machine Translated.
N-best list reranking using higher level phonetic, lexical, syntactic and semantic knowledge sources Mithun Balakrishna, Dan Moldovan and Ellis K. Cave.
Compact Query Term Selection Using Topically Related Text
Chapter 8 - Estimation.
Chapter 6: Statistical Inference: n-gram Models over Sparse Data
CPSC 503 Computational Linguistics
Presentation transcript:

Yuya Akita , Tatsuya Kawahara Topic-independent Speaking-Style Transformation of Language model for Spontaneous Speech Recognition Yuya Akita , Tatsuya Kawahara

Introduction Spoken-style v.s. writing style Combination of document and spontaneous corpus Irrelevant linguistic expression Model transformation Simulated spoken-style text by randomly inserting fillers Weighted finite-state transducer framework (?) Statistical machine translation framework Problem with Model transformation methods Small corpus, data sparseness One of solutions: POS tag

Statistical Transformation of Language model Posteriori: X: source language model (document style) Y: target language model (spoken language) So, P(X|Y) and P(Y|X) are transformation model Transformation models can be estimated using parallel corpus n-gram count:

Statistical Transformation of Language model (cont.) Data sparseness problem for parallel corpus POS information Linear interpolation Maximum entropy

Training Use aligned corpus Word-based transformation probability POS-based transformation probability Pword(x|y) and PPOS(x|y) are estimated accordingly

Training (cont.) Back-off scheme Linear interpolation scheme Maximum entropy scheme ME model is applied to every n-gram entry of document-style model spoken-style n-garm is generated if transform probability is larger than a threshold

Experiments Training coprus: Test corpus: Baseline corpus: National Congress of Japan, 71M words Parallel corpus: budget committee in 2003, 666K Corpus of Spontaneous Japan, 2.9M words Test corpus: Another meeting of Budget committee in 2003, 63k words

Experiments (cont.) Evaluation of Generality of transformation model LM

Experiments (cont.) r

Conclusions Propose a novel statistical transformation model approach

Non-stationary n-gram model

Concept Probability of sentence Actually, n-gram LM Actually, Miss long-distance and word position information while applying Markov assumption

Concept (cont.)

Training (cont.) ML estimation Smoothing Combination Use low order Use small bins Transform with Smoothed normal ngram Combination Linear interpolation Back-off

Smoothing with lower order (cont.) Additive smoothing Back-off smoothing Linear interpolation

Smoothing with small bins (k=1) (cont.) Back-off smoothing Linear interpolation Hybrid smoothing

Transformation with smoothed ngram Novel method If t-mean(w) decreases, the word is more important Var(w) is used to balance t-mean(w) for active words active word: words can appears at any position in the sentences Back-off smoothing & linear interpolation

Experiments Observation: Marginal position & middle position

Experiments (cont.) NS bigram

Experiments (cont.) Comparison with three smoothing techniques

Experiments (cont.) Error rate with different bins

Conclusions Traditional n-gram model is enhanced by relaxing its stationary hypothesis and exploring the word positional information in language modeling

Two-way Poisson Mixture model

Multivariate Poisson, dim = p (lexicon size) Essential Poisson distribution Poisson mixture model Poisson Rk Poisson 1 Poisson 2 … Class k πk1 πk2 πkRk Σ xp ... X2 X1 Document x Multivariate Poisson, dim = p (lexicon size) *Word clustering: reduce Poisson dimension => Two-way mixtures