Chapter 12 search and speaker adaptation 12.1 General Search Algorithm 12.2 Search Algorithms for Speech Recognition 12.3 Language Model States 12.4 Speaker Adaptation
12.1 General Search Algorithms(1) General Graph Searching Procedures (1) The Graph-Search Algorithm 1. Initialization : Put S in OPEN list and create an initially empty CLOSE list. 2. If OPEN list is empty, exit and declare failure. 3. Pop up the first node N in OPEN list, remove it from OPEN list and put it into CLOSE list. 4. If node N is a goal node, exit successfully with the solution obtained by tracing back the path along the pointers from N to S.
General Search Algorithms(2) 5. Expand node N by applying the successor operator to generate the successors set SS(N) of node N. Be sure to eliminate the ancestors of N, from SS(N). 6. For any v ∈ SS(N) do 6a. (optional) If v ∈ OPEN and the accumulated distance(cost) of the new path is smaller than that for the one in the OPEN list, do (1) Change the trace back (parent) pointer of v to N and adjust the accumulated distance(cost) for v (2) Go to step 7. 6b. (optional) If v ∈ CLOSE and the accumulated distance
General Search Algorithms(3) (cost) of the new path is smaller than the partial path ending at v in the CLOSE list, do (1) Change the trace back (parent) pointer of v to N and adjust the accumulated distance(cost) for all the path containing v. (2) Go to step 7. 6c. Create a pointer pointing to N and push it into OPEN list. 7. Reorder the OPEN list according to search strategy or some heuristic measurement. 8. Go to step 2.
General Search Algorithms(4) Depth First Search Breadth First Search Correspondingly modify the algorithm Heuristic Graph Search Algorithm Try to use a guidance to guide the search in correct direction. In general, this hill climbing style of guidance can help us to find destination much more efficiently. It needs domain-specific knowledge, and it is called heuristic. In most practical problems, the choice of different heuristics is usually a tradeoff between the quality of the solution and the cost of finding the solution.
General Search Algorithms(5) f(N)= g(N)+h(N) is the estimation of the total distance for the path going through node N. A heuristic search method uses f to re-order the OPEN list in step 7. The node with shortest distance will be explored first. h(N) is the heuristic estimate of the remaining distance from node N to goal node G. g(N) is the distance of the partial path already traveled from S to node N. The heuristic function that underestimate the distance(cost) are often used in search methods aiming to find the optimal solution.
General Search Algorithms(6) Best First (A * ) Search Beam Search has become one of the most popular methods for complicated speech recognition problem, because of its simplicity in both its search strategy and its requirement of domain-specific heuristic information. It is particularly attractive when integration of different knowledge sources is required in a time synchronous fashion. It has the advantage to have a consistent way of exploring nodes level by level and to offer minimally needed communication between different paths. It is also very suitable for parallel implementation because of its breadth- first search nature.
12.2 Search Algorithms for Speech Recognition (1) The basic problem for large-scale speaker independent continuous speech recognition could be expressed as : W = argmax w P(W|O) = argmax P(W)P(O|W)/P(O)
Search Algorithms for Speech Recognition (2) Almost all the search techniques can be two categories : sharing and pruning. Sharing means intermediate results can be kept, so they can be used by other paths without redundant re-computation, while pruning means unpromising subpaths can be discarded reliably without going too far. Search strategies based on dynamic programming or Viterbi algorithm with the help of clever pruning, have been applied successfully to a wide range of speech recognition tasks, ranging from small-vocabulary tasks, like digit recognition, to unconstraint large-vocabulary (more than words) speech recognition.
Search Algorithms for Speech Recognition (3) With Bayes’ formulation, searching the minimum- cost path(word sequence) is equivalent to finding the path with maximum probability. For the sake of consistency we will use the inverse of Bayes’ posteriori probability as our objective formula. By using logarithm, multiplications became into additions that will make a close resemble of speech decoder to the general graph search algorithms. The new criterion will be used to find the optimal word sequence W : (W is the word sequence) C(W|O) = -log[P(W)P(O|W) W= argmin C(W|A)
12.3 Language Model States (1) It deals with the search space (language model states) for various grammars for continuous speech recognition. Search space with Unigram In the grammar network the unigram probability is attached as the transition probability from starting state S to the first state of each word HMM. Search space with BigramWhen bigram is used, the probability of a word depends only on the immediately preceding word. In the grammar network the bigram expansion will be |V| 2.
Language Model States (2) Another is the rule based language model. The grammar should be defined first, then the sentence network could be created by compiler and the search space will be the entire network. Every node of the network will be a HHM model in acoustic level.
12.4 Speaker Adaptation (1) Adaptation means to adjust the model parameters according to new training data. It should cover a wide range of changes, for example speaking environment, channel characteristics, characteristics of speaker, task characteristics and the application domain. Here we only discuss about speaker adaptation, the methods are also suitable for others. In general, SD has better performance than SI under almost same conditions. The error rate of SD is only 1/3 to 1/4 of SI. Adaptation means using limited new speaker’s training data to modify the model or parameters of a existing model to make the new model adapted to the new speaker.
Speaker Adaptation (2) There are four ways to do speaker adaptation : SI data & SI model* adaptation clustering SI model SD data& SD model* speaker transformation SD model SD data & SI model speaker adaptation SA model SD data & SD model * serial adaptation SA model * represents optional, SA means speaker adaptation Adaptation clustering divides the data into a couple of types. The acoustic characteristics of the speaker in one type are closed to each other. A set of SI template could be created for every type. During recognition few data is used to decide which type the data belongs to, then the
Speaker Adaptation (3) corresponding template is used to do the recognition. The basic idea of speaker transformation is that the difference between two speakers mostly because they have difference of the short-time spectrum when they uttered same utterance. These differences stem from the difference of their oral organs. So it is possible to find a linear transformation of the short-time spectrum.
Speaker Adaptation (4) Serial adaptation gradually adjusts the parameters to get the optimal state. 1. Speaker clustering The number of clusters K is the key factor. It should not be too large or too small (2-10 are suitable). There are two clustering approaches : (1) Supervised and based on HMMS similarity At first, K types of speakers needs to be created. It could be done by some merging procedure.
Speaker Adaptation (5) Then combined training data are used to create the VQ codebook and HMM parameters for every type. When testing, the speaker utters some sentences prefixed, the probability the type creates the sentences could be calculated and compared, the maximum of the probability will determine the type. The accuracy of the system is about same as the original SI system.
Speaker Adaptation (6) (2) Unsupervised and based on GMM P(O|λ i )=Σ m=1 c p m P(O|λ i m ), Σ m=1 c p m =1 λ i ={λ i 1,λ i 2,…,λ i c }, p m needs preset How to determine λ i by N Feature vectors O i N ? The idea is λ i should make L = Σ i logP(O|λ i ) maximum.
Speaker Adaptation (7)
Speaker Adaptation (8) 2. Spectrum transformation Suitable for VQ system. The idea is by using small amount data of new speaker to get a relation between the old and new speakers.
Speaker Adaptation (9) 3. Bayes adaptation of CDHMM It adjusts the parameter according to Bayes estimation (or Bayes learning) Take the SI parameter as the priori probability of the mean (now mean is a random variable), and the SD data is provided, then the mean will have new parameters, this new model could be SD model
Speaker Adaptation (3)