Download presentation
Presentation is loading. Please wait.
Published byLester Alexander Modified over 9 years ago
1
1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu
2
Speech Lab. NTNU 2 Reference Chia-Sheng Wu, “ Bayesian Latent Semantic Analysis for Text Categorization and Information Retrieval ”, 2005 Q. Huo and C.-H. Lee, “ On-line adaptive learning of the continuous density hidden Markov model based on approximate recursive Bayes estimate ”, 1997
3
Speech Lab. NTNU 3 Outline IntroductionPLSA ML (Maximum Likelihood) MAP (Maximum A Posterior) QB (Quasi-Bayes) ExperimentsConclusions
4
Speech Lab. NTNU 4 Introduction LSA vs. PLSA Linear algebra and probability Semantic space and latent topics Batch learning vs. Incremental learning
5
Speech Lab. NTNU 5 PLSA PLSA is a general machine learning technique, which adopts the aspect model to represent the co-occurrence data. Topics (hidden variables) Corpus (document-word pairs)
6
Speech Lab. NTNU 6 PLSA Assume that d i and w j are independent conditionally on the mixture of associated topic z k Joint probability:
7
Speech Lab. NTNU 7 ML PLSA Log likelihood of Y: ML estimation:
8
Speech Lab. NTNU 8 ML PLSA Maximization:
9
Speech Lab. NTNU 9 ML PLSA Complete data: Incomplete data: EM (Expectation-Maximization) Algorithm E-stepM-step
10
Speech Lab. NTNU 10 ML PLSA E-Step
11
Speech Lab. NTNU 11 ML PLSA Auxiliary function: And
12
Speech Lab. NTNU 12 ML PLSA M-step: Lagrange multiplier
13
Speech Lab. NTNU 13 ML PLSA Differentiation New parameter estimation:
14
Speech Lab. NTNU 14 MAP PLSA Estimation by Maximizing the posteriori probability: Definition of prior distribution: Dirichlet density: Prior density: Kronecker delta Assume and are independent
15
Speech Lab. NTNU 15 MAP PLSA Consider prior density: Maximum a Posteriori:
16
Speech Lab. NTNU 16 MAP PLSA E-step:expectation Auxiliary function:
17
Speech Lab. NTNU 17 MAP PLSA M-step Lagrange multiplier
18
Speech Lab. NTNU 18 MAP PLSA Auxiliary function:
19
Speech Lab. NTNU 19 MAP PLSA Differentiation New parameter estimation:
20
Speech Lab. NTNU 20 QB PLSA It needs to update continuously for an online information system. Estimation by maximize the posteriori probability: Posterior density is approximated by the closest tractable prior density with hyperparameters As compared to MAP PLSA, the key difference using QB PLSA is due to the updating of hyperparameters.
21
Speech Lab. NTNU 21 QB PLSA Conjugate prior: In Bayesian probability theory, a conjugate prior is a prior distribution which has the property that the posterior distribution is the same type of distribution. A close-form solution A reproducible prior/posteriori pair for incremental learning
22
Speech Lab. NTNU 22 QB PLSA Hyperparameter α:
23
Speech Lab. NTNU 23 QB PLSA After careful arrangement, exponential of posteriori expectation function can be expressed: A reproducible prior/posterior pair is generated to build the updating mechanism of hyperparameters
24
Speech Lab. NTNU 24 Initial Hyperparameters A open issue in Bayesian learning If the initial prior knowledge is too strong or after a lot of adaptation data have been incrementally processed, the new adaptation data usually have only a small impact on parameters updating in incremental training.
25
Speech Lab. NTNU 25 Experiments MED Corpus: 1033 medical abstracts with 30 queries 7014 unique terms 433 abstracts for ML training 600 abstracts for MAP or QB training Query subset for testing K=8Reuters-21578 4270 documents for training 2925 for QB learning 2790 documents for testing 13353 unique words 10 categories
26
Speech Lab. NTNU 26 Experiments
27
Speech Lab. NTNU 27 Experiments
28
Speech Lab. NTNU 28 Experiments
29
Speech Lab. NTNU 29 Conclusions This paper presented an adaptive text modeling and classification approach for PLSA based information system. Future work: Extension of PLSA for bigram or trigram will be explored. Application for spoken document classification and retrieval
30
30 Discriminative Maximum Entropy Language Model for Speech Recognition Chuang-Hua Chueh, To-Chang Chien and Jen- Tzung Chien Presenter: Hsuan-Sheng Chiu
31
Speech Lab. NTNU 31 Reference R. Rosenfeld, S. F. Chen and X. Zhu, “ Whole-sentence exponential language models : a vehicle for linguistic statistical integration ”, 2001 W.H. Tsai, “ An Initial Study on Language Model Estimation and Adaptation Techniques for Mandarin Large Vocabulary Continuous Speech Recognition ”, 2005
32
Speech Lab. NTNU 32 Outline Introduction Whole-sentence exponential model Discriminative ME language model ExperimentConclusions
33
Speech Lab. NTNU 33 Introduction Language model Statistical n-gram model Latent semantic language model Structured language model Based on maximum entropy principle, we can integrate different features to establish optimal probability distribution.
34
Speech Lab. NTNU 34 Whole-Sentence Exponential Model Traditional method: Exponential form: Usage: When used for speech recognition, the model is not suitable for the first pass of the recognizer, and should be used to re-score N-best lists.
35
Speech Lab. NTNU 35 Whole-Sentence ME Language Model Expectation of feature function: Empirical:Actual:Constraint:
36
Speech Lab. NTNU 36 Whole-Sentence ME Language Model To Solve the constrained optimization problem:
37
Speech Lab. NTNU 37 GIS algorithm
38
Speech Lab. NTNU 38 Discriminative ME Language Model In general, ME can be considered as a maximum likelihood model using log-linear distribution. Propose a Discriminative language model based on whole- sentence ME model (DME)
39
Speech Lab. NTNU 39 Discriminative ME Language Model Acoustic features for ME estimation: Sentence-level log-likelihood ratio of competing and target sentences Feature weight parameter: Namely, we activate feature parameter to be one for those speech signals observed in training database
40
Speech Lab. NTNU 40 Discriminative ME Language Model New estimation: Upgrade to discriminative linguistic parameters
41
Speech Lab. NTNU 41 Discriminative ME Language Model
42
Speech Lab. NTNU 42 Experiment Corpus: TCC300 32 mixtures 12 Mel-frequency cepstral coefficients 1 log-energy and first derivation 4200 sentences for training, 450 for testing Corpus: Academia Sinica CKIP balanced corpus Five million words Vocabulary 32909 words
43
Speech Lab. NTNU 43 Experiment
44
Speech Lab. NTNU 44 Conclusions A new ME language model integrating linguistic and acoustic features for speech recognition The derived ME language model was inherent with discriminative power. DME model involved a constrained optimization procedure and was powerful for knowledge integration.
45
Speech Lab. NTNU 45 Relation between DME and MMI MMI criterion: Modified MMI criterion: Express ME model as ML model:
46
Speech Lab. NTNU 46 Relation between DME and MMI The optimal parameter:
47
Speech Lab. NTNU 47 Relation between DME and MMI
48
Speech Lab. NTNU 48 Relation between DME and MMI
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.