A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan Zhou Duke University, ECE December 18, 2009
Outline Background Pitman-Yor Process Hierachical Pitman-Yor Process Language Models Doubly Hierachical Pitman-Yor Process Language Model Inference Experimental results Summary
Background: Language modeling and n-Gram models “A language model is usually formulated as a probability distribution p(s) over strings s that attempts to reflect how frequently a string s occurs as a sentence”. n-Gram (n=2: bigram, n=3: trigram) Smoothing: Reference: S.F. Chen and J.T Goodman An empirical study of smoothing techniques for language modeling. Technical Report TR , Computer Science Group, Harvard University.
Example Smoothing Reference: S.F. Chen and J.T Goodman An empirical study of smoothing techniques for language modeling. Technical Report TR , Computer Science Group, Harvard University.
Evaluation Train the n-Gram model: Calculate: Cross-entropy: Perplexity: Reference: S.F. Chen and J.T Goodman An empirical study of smoothing techniques for language modeling. Technical Report TR , Computer Science Group, Harvard University.
Dirichlet Process and Pitman-Yor Process Dirichlet Process Number of unique words grows at Pitman-Yor Process Number of unique words grows at When d=0, Pitman-Yor Process reduces to DP Both can be understood through the Chinese Restaurant process DP Pitman-Yor Sitting at Table k Sitting at new Table
Power-law properties of the Pitman-Yor Process Number of unique words Number of words Proportion of words appearing once Number of words
Hierachical Pitman-Yor Process Language Models
Doubly Hierachical Pitman-Yor Process Language Model
Inference Direchlet Process, Chinese Restaurant Process Hierachical Direchlet Process, Chinese Restaurant Franchise Pitman-Yor Process, Chinese Restaurant Process Hierachical Pitman-Yor Process, Chinese Restaurant Franchise Doubly Hierachical Pitman-Yor Language Model, Graphical Pitman- Yor Process, Multi-floor Chinese Restaurant Process, Multi-floor Chinese Restaurant Franchise
Experimental results (HPYLM)
Experimental results (DHPYLM)
Summary DHPYLM achieves encouraging domain adaptation results. A graphical Pitman-Yor process is constructed and a multi-floor Chinese restaurant representation is proposed for doing sampling. DHPYLM may be integrated into topic models to eliminate “bag-of-words” assumptions.