Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Multi-span Language Modeling Frame Work For Speech Recognition Jimmy Wang Speech Lab, NTU.

Similar presentations


Presentation on theme: "A Multi-span Language Modeling Frame Work For Speech Recognition Jimmy Wang Speech Lab, NTU."— Presentation transcript:

1 A Multi-span Language Modeling Frame Work For Speech Recognition Jimmy Wang Speech Lab, NTU

2 Outline 1.Introduction. 2.N-gram Language Modeling. 3.Smoothing and Clustering of N-gram Language Model. 4.LSA Modeling. 5.Hybrid LSA+N-gram Language Model. 6.Conclusion.

3 INTRODUCTION ㄌㄧㄡˊ ㄅㄤ ㄧㄡ ˇ ㄒㄩㄝ ˇ ㄢˋ ㄓㄨㄚ ㄉㄠˋ ㄧˊ ㄉㄨㄟˋ ㄒㄧㄤˋ. 劉邦友血案抓到一對象 劉邦友血案抓到一隊象 ㄕㄨㄟ ㄐㄧㄠ ㄧ ㄨㄢ ㄉㄨㄛ ㄕㄠ ㄑㄧㄢ. 水餃一碗多少錢 睡覺一晚多少錢

4 INTRODUCTION Stochastic Modeling of Speech Recognition :

5 INTRODUCTION N-gram language modeling has been the the formalism of choice for ASR because of reliability, but can only constraint locally. For global constraints, parsing and rule- based grammar have been only successful in small vocabulary application.

6 INTRODUCTION N-gram+LSA (Latent Semantic Analysis) language models integrate local constraints via N-gram, and global constraints through LSA models.

7 N-gram Language Model Assume each word depends only on the previous N-1 words (N words total). N-gram=N-1 order Markov Model. P( 象 |… 抓到一隊 )  P( 象 | 抓到, 一隊 ). Perplexity:

8 N-gram Language Model N-gram Training From Text Corpus: Corpus Size ranges from hundreds Mbytes to several Gbytes. Maximum Likelihood Approach: P( “ the | nothing but ” )  C( “ nothing but the ” ) / C( “ nothing but ” ).

9 Smoothing and Clustering Terrible on test data: If no occurrences of C(xyz), probability is 0. Find  0<  <1 by optimizing on “ held- out ” data.

10 Smoothing and Clustering CLUSTERING = Classes of (same things). P(Tuesday | party on) or P(Tuesday | celebration on) = P(WEEKDAY|EVENT) Put words in clusters: P(WEEKDAY|EVENT) WEEKDAY = Sunday, Monday, Tuesday, EVENT=party, celebration, birthday. Clustering may lead to good result for very little training data.

11 Smoothing and Clustering Word Clustering Methods : 1.Build them by hand. 2.Part of Speech (POS) tags. 3.Automatic Clustering:Swap words between clusters to minimize perplexity. Automatic Clustering: 1.top-down splitting(Decision Tree): Fast. 2.bottom-up merging: Accurate.

12 LSA MODELING Word Co-Occurrence Matrix: W V=vocabulary of size M. M=40000~80000 T=training corpus of N documents. N=80000~100000 Ci,j=Number of words Wi in document Dj. Nj=Total number of words in Dj. Ei=normalized entropy of Wi in the corpus T.

13 LSA MODELING Vector Representation: SVD (Singular Value Decomposition) of W: U is MxR of vectors ui, represents words, S is RxR diagonal matrix of singular values, V is NxR of vectors vj, represents documents. Experiment on different values led to that R=100~300 seemed to be adequate balanced.

14 LSA MODELING Language Modeling: Hq-1:overall history of current document Word-Clustered LSA model : This clustering takes the global context and hence more semantic information.

15 LSA+N-gram Language Model Integration with N-grams: Maximum Entropy Estimation : Hq-1:overall history of n-gram component and LSA component.

16 LSA+N-gram Language Model Context Scope Selection: In real case, the prior probability would change over time. So we need to define the current document history or limit the size of history considered. Exponential Forgetting: 0< λ <=1

17 LSA+N-gram Language Model Initialization of V 0 : In the beginning, we may present the pseudo- document V 0 as: 1.Zero vector. 2.Centroid vector of all training documents. 3.If the domain is known, then we start at the centroid of specific region in the LSA space.

18 CONCLUSION Hybrid N-gram+LSA model performs much better than traditional N-gram in perplexity(~25%) and WER(~14%). LSA performs better in the within-domain testing data, and not so good for cross- domain testing. Discounting obsolete data using exponential forgetting can be better when the topics change incrementally.

19 CONCLUSION LSA modeling is much more sensitive to “ content words ” than “ function words ”, which is a complement for N-gram modeling Provided suitable domain adaptation framework, the hybrid LSA+N-gram model should improve the perplexity and recognition rate further more.


Download ppt "A Multi-span Language Modeling Frame Work For Speech Recognition Jimmy Wang Speech Lab, NTU."

Similar presentations


Ads by Google