Download presentation
Presentation is loading. Please wait.
1
HIWIRE Progress Report – July 2006 Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Alex Potamianos Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Alex Potamianos
2
Outline Work package 1 Task 1:Blind Source Separation for ASR Task 2,5: Feature extraction and fusion Task 4: Segment models for ASR Work package 2 Task 1,2: VTLN Task 2: Bayes optimal adaptation Work package 3 Task 1: Fixed platform integration
3
Blind Speech Separation (BSS) problem
4
: mixing impulse response matrix : spatial signature of the i-th speaker for lag τ : additive noise vector Objective: Estimate the inverse-channel impulse response matrix W(τ) from the observed signal L : Channel order Data Model – Problem Statement
5
BSS permutation problem Permutation problem: “Order” of mics may be different in the solution for each frequency bin To solve permutation combine Spatial constraints Continuity constraints in frequency domain Solution to the permutation problem can be formulated using ILS minimization criterion
6
Recent progress Improved solution to permutation problem Combining spatial and continuity constraints Trying out different continuity criteria Created a synthetic database using typical room impulse responses First ASR experiments using the “synthetic” database
7
Outline Work package 1 Task 1:Blind Source Separation for ASR Task 2,5: Feature extraction and fusion Task 4: Segment models for ASR Work package 2 Task 1,2: VTLN Task 2: Bayes optimal adaptation Work package 3 Task 1: Fixed platform integration
8
Motivation Combining classifiers/information sources is an important problem in machine learning apps. Simple, yet powerful, way to combine classifiers is “multi-stream” approach; assumes independent information sources Unsupervised stream weight computation for multi- stream classifiers is an open problem
9
Problem Definition Compute “optimal” exponent weights for each stream s [ HMM Gaussian mixture formulation; similar expressions for MM, naïve Bayes, Euclidean/Mahalonobois classifier] Optimality in the sense of minimizing “total classification error”
10
Optimal Stream Weights: Result I Equal error rate in single-stream classifiers optimal stream weights are inversely proportional to the total stream estimation error variance
11
Optimal Stream Weights: Result II Equal estimation error variance in each stream optimal weights are approximately inversely proportional to the single stream classification error
12
Recent Progress Experiments with synthetic data Gaussian distribution classification problem) Results show good match with theoretical results Experimental verification for Naïve Bayes classifiers utterance classification - NLP application First experiments with “unsupervised” estimates of stream weights “Intra-class” based metrics on observations AV-ASR application
13
Outline Work package 1 Task 1:Blind Source Separation for ASR Task 2,5: Feature extraction and fusion Task 4: Segment models for ASR Work package 2 Task 1,2: VTLN Task 2: Bayes optimal adaptation Work package 3 Task 1: Fixed platform integration
14
Dynamical System Segment Model Based on linear dynamical system Where x is state, y is observation, u control, w,v noise The system parameters should guarantee Identifiability, Controllability, Observability, Stability We investigated more generalized parameter structures
15
The system’s parameters have an identifiable canonical form F: “ones” in the superdiagonal; remaining with “zeros”. Row r i with free parameters (i=1,…,n) H: column dim. equal to F. Filled with “zeros”. Take r 0 =0 and then row i have a “one” in column r i-1 + 1. P, R: filled with free parameters. Propose a novel element-wise estimation based on EM algorithm for systems identification. Generalized forms of parameter structures
16
Application on speech Experiments on clean data from AURORA 2 11 word-models (one…nine, zero, oh) No. of segments of each model depends on the No. of phones of the word-model HTK for feature extraction (14 MFCCs) Alignments taken by HTK using HMMs 4000 training sentences; 600 isolated words for testing
17
Results Fig. (a) classification performance (using 3 different initializations) Fig. (b) the log-likelihood is increasing for the same runs
18
Conclusions & Future Work Developed new forms of Linear State-space models Proposed a novel element-wise parameter estimation process Performed training & classification on AURORA 2 based on speech segments and LDS Results shown correlation between performance and initialization In the future: investigation of optimal initialization Feature-segments alignment (through dynamic programming) Investigation of state space dimension
19
Outline Work package 1 Task 1:Blind Source Separation for ASR Task 2,5: Feature extraction and fusion Task 4: Segment models for ASR Work package 2 Task 1,2: VTLN Task 2: Bayes optimal adaptation Work package 3 Task 1: Fixed platform integration
20
Vocal Tract Length Normalization. Linear and Non-Linear Frequency Warping. Multi-Parameter Frequency Warping. Warping and Spectral Bias Addition by ML Estimation.
21
Linear and Non-Linear Warping: Analysis An optimal warping factor a is computed (for each phoneme), so that the Euclidean spectral distance (MSE) is minimized, between the warped g(X) and the corresponding unwraped spectrum X. Optimization is achieved by full search The mapped spectrum is warped according to this optimal warping factor.
22
Linear and Non-Linear Warping Frequency Warping is implemented by re-sampling the spectral envelope at linearly and nonlinearly frequency indices, i.e. 1. Linear 2. Piece-Wise Non-Linear 3. Power
23
Multi-Parameter Frequency Warping. After the computation of the optimal warping factor, we explore alternative piecewise linear frequency warping strategies Bi-Parametric Warping Function (2pts) Different warping factors are evaluated, for the low (F < 3 KHz) and high (F ≥ 3 KHz) frequencies. Four-Parametric Warping Function (4pts) Different warping factors are evaluated for the frequency ranges, 0-1.5, 1.5-3, 3-4.5 and 4.5-8 KHz.
24
Reduction in MSE: Non-linear warping
25
Reduction in MSE: Multi-parametric warping
26
Reduction in MSE: Bias Removal and Multi-parametric warping
27
Ongoing work Implementation of “phone-dependent” warping in HTK Implementation of multi-parametric warping and bias removal in HTK
28
Outline Work package 1 Task 1:Blind Source Separation for ASR Task 2,5: Feature extraction and fusion Task 4: Segment models for ASR Work package 2 Task 1,2: VTLN Task 2: Bayes optimal adaptation Work package 3 Task 1: Fixed platform integration
29
Optimal Bayes Adaptation Central problem is to determine Using Bayes rule we have 2 step process Obtain the priors from the SI models Compute the Likelihoods
30
Number of Dimensions (Cepstrum Coef) Number of Mixture Components 12M12M genone 1genone 2 Phone-Based Clustering Cluster the output distributions based on common central phone θ is every component of the above representation and stands for the prior
31
Our Implementation Computation of priors using : Computation of likelihoods by using Baum Welch algorithm and ML After computation of posterior probabilities we use smoothing Such techniques are: Flooring Uniform Delta
32
Outline Work package 1 Task 1:Blind Source Separation for ASR Task 2,5: Feature extraction and fusion Task 4: Segment models for ASR Work package 2 Task 1,2: VTLN Task 2: Bayes optimal adaptation Work package 3 Task 1: Fixed platform integration
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.