A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan.

Slides:



Advertisements
Similar presentations
Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei
Advertisements

GATSBY COMPUTATIONAL NEUROSCIENCE UNIT supported by the Gatsby Charitable Foundation FRANK WOOD, YEE WHYE TEH04/06/2014 A Hierarchical Nonparametric Bayesian.
Hierarchical Dirichlet Process (HDP)
1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.
Ouyang Ruofei Topic Model Latent Dirichlet Allocation Ouyang Ruofei May LDA.
Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.
Hierarchical Dirichlet Processes
DEPARTMENT OF ENGINEERING SCIENCE Information, Control, and Vision Engineering Bayesian Nonparametrics via Probabilistic Programming Frank Wood
N-gram model limitations Important question was asked in class: what do we do about N-grams which were not in our training corpus? Answer given: we distribute.
Adaption Adjusting Model’s parameters for a new speaker. Adjusting all parameters need a huge amount of data (impractical). The solution is to cluster.
Chapter 11- Confidence Intervals for Univariate Data Math 22 Introductory Statistics.
Dictionary Learning on a Manifold
Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.
1 Language Model CSC4170 Web Intelligence and Social Computing Tutorial 8 Tutor: Tom Chao Zhou
Latent Dirichlet Allocation a generative model for text
The Sequence Memoizer Frank Wood Cedric Archambeau Jan Gasthaus Lancelot James Yee Whye Teh Gatsby UCL Gatsby HKUST Gatsby TexPoint fonts used in EMF.
1 Language Model (LM) LING 570 Fei Xia Week 4: 10/21/2009 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAA A A.
Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.
Distributed Representations of Sentences and Documents
An Automatic Segmentation Method Combined with Length Descending and String Frequency Statistics for Chinese Shaohua Jiang, Yanzhong Dang Institute of.
1 Advanced Smoothing, Evaluation of Language Models.
8/27/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 5 Giuseppe Carenini.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
Adaption Def: To adjust model parameters for new speakers. Adjusting all parameters requires too much data and is computationally complex. Solution: Create.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
A Survey of ICASSP 2013 Language Model Department of Computer Science & Information Engineering National Taiwan Normal University 報告者:郝柏翰 2013/06/19.
1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu.
Forgetting Counts : Constant Memory Inference for a Dependent Hierarchical Pitman-Yor Process Nicholas Bartlett, David Pfau, Frank Wood Presented by Yingjian.
6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection -Augmentative.
Hierarchical Dirichelet Processes Y. W. Tech, M. I. Jordan, M. J. Beal & D. M. Blei NIPS 2004 Presented by Yuting Qi ECE Dept., Duke Univ. 08/26/05 Sharing.
Adaptor Grammars Ehsan Khoddammohammadi Recent Advances in Parsing Technology WS 2012/13 Saarland University 1.
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
The Sequence Memoizer Frank Wood Cedric Archambeau Jan Gasthaus Lancelot James Yee Whye Teh UCL Gatsby HKUST Gatsby TexPoint fonts used in EMF. Read the.
Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.
Combining Statistical Language Models via the Latent Maximum Entropy Principle Shaojum Wang, Dale Schuurmans, Fuchum Peng, Yunxin Zhao.
COMPARISON OF A BIGRAM PLSA AND A NOVEL CONTEXT-BASED PLSA LANGUAGE MODEL FOR SPEECH RECOGNITION Md. Akmal Haidar and Douglas O’Shaughnessy INRS-EMT,
Hidden Topic Markov Models Amit Gruber, Michal Rosen-Zvi and Yair Weiss in AISTATS 2007 Discussion led by Chunping Wang ECE, Duke University March 2, 2009.
Experiments Test different parking lot images captured in different luminance conditions The test samples include 1300 available parking spaces and 1500.
An Overview of Nonparametric Bayesian Models and Applications to Natural Language Processing Narges Sharif-Razavian and Andreas Zollmann.
Timeline: A Dynamic Hierarchical Dirichlet Process Model for Recovering Birth/Death and Evolution of Topics in Text Stream (UAI 2010) Amr Ahmed and Eric.
Variational Inference for the Indian Buffet Process
Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.
1 Dirichlet Process Mixtures A gentle tutorial Graphical Models – Khalid El-Arini Carnegie Mellon University November 6 th, 2006 TexPoint fonts used.
Stick-Breaking Constructions
The Infinite Hierarchical Factor Regression Model Piyush Rai and Hal Daume III NIPS 2008 Presented by Bo Chen March 26, 2009.
Latent Dirichlet Allocation
Characterizing the Function Space for Bayesian Kernel Models Natesh S. Pillai, Qiang Wu, Feng Liang Sayan Mukherjee and Robert L. Wolpert JMLR 2007 Presented.
Latent Topic Modeling of Word Vicinity Information for Speech Recognition Kuan-Yu Chen, Hsuan-Sheng Chiu, Berlin Chen ICASSP 2010 Hao-Chin Chang Department.
Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,
Relevance Language Modeling For Speech Recognition Kuan-Yu Chen and Berlin Chen National Taiwan Normal University, Taipei, Taiwan ICASSP /1/17.
NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.
Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture Duke University Machine Learning Group Presented by Kai Ni August.
1 ES Chapter 3 ~ Normal Probability Distributions.
An Iterative Monte Carlo Method for Nonconjugate Bayesian Analysis B. P. Carlin and A. E. Gelfand Statistics and Computing 1991 A Generic Approach to Posterior.
The Nested Dirichlet Process Duke University Machine Learning Group Presented by Kai Ni Nov. 10, 2006 Paper by Abel Rodriguez, David B. Dunson, and Alan.
APPLICATIONS OF DIRICHLET PROCESS MIXTURES TO SPEAKER ADAPTATION Amir Harati and Joseph PiconeMarc Sobel Institute for Signal and Information Processing,
Statistical NLP Spring 2011 Lecture 3: Language Models II Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
An Improved Hierarchical Word Sequence Language Model Using Word Association NARA Institute of Science and Technology Xiaoyi WuYuji Matsumoto.
Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23.
Language-model-based similarity on large texts Tolga Çekiç /10.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Presented by: Mingyuan Zhou Duke University, ECE October 24, 2012
HEADING OF PRESENTATION. HEADING OF PRESENTATION.
Collapsed Variational Dirichlet Process Mixture Models
Chapter 7: Introduction to Sampling Distributions
INF 141: Information Retrieval
Presentation transcript:

A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan Zhou Duke University, ECE December 18, 2009

Outline Background Pitman-Yor Process Hierachical Pitman-Yor Process Language Models Doubly Hierachical Pitman-Yor Process Language Model Inference Experimental results Summary

Background: Language modeling and n-Gram models “A language model is usually formulated as a probability distribution p(s) over strings s that attempts to reflect how frequently a string s occurs as a sentence”. n-Gram (n=2: bigram, n=3: trigram) Smoothing: Reference: S.F. Chen and J.T Goodman An empirical study of smoothing techniques for language modeling. Technical Report TR , Computer Science Group, Harvard University.

Example Smoothing Reference: S.F. Chen and J.T Goodman An empirical study of smoothing techniques for language modeling. Technical Report TR , Computer Science Group, Harvard University.

Evaluation Train the n-Gram model: Calculate: Cross-entropy: Perplexity: Reference: S.F. Chen and J.T Goodman An empirical study of smoothing techniques for language modeling. Technical Report TR , Computer Science Group, Harvard University.

Dirichlet Process and Pitman-Yor Process Dirichlet Process Number of unique words grows at Pitman-Yor Process Number of unique words grows at When d=0, Pitman-Yor Process reduces to DP Both can be understood through the Chinese Restaurant process DP Pitman-Yor Sitting at Table k Sitting at new Table

Power-law properties of the Pitman-Yor Process Number of unique words Number of words Proportion of words appearing once Number of words

Hierachical Pitman-Yor Process Language Models

Doubly Hierachical Pitman-Yor Process Language Model

Inference Direchlet Process, Chinese Restaurant Process Hierachical Direchlet Process, Chinese Restaurant Franchise Pitman-Yor Process, Chinese Restaurant Process Hierachical Pitman-Yor Process, Chinese Restaurant Franchise Doubly Hierachical Pitman-Yor Language Model, Graphical Pitman- Yor Process, Multi-floor Chinese Restaurant Process, Multi-floor Chinese Restaurant Franchise

Experimental results (HPYLM)

Experimental results (DHPYLM)

Summary DHPYLM achieves encouraging domain adaptation results. A graphical Pitman-Yor process is constructed and a multi-floor Chinese restaurant representation is proposed for doing sampling. DHPYLM may be integrated into topic models to eliminate “bag-of-words” assumptions.