Histogram-based Quantization for Distributed / Robust Speech Recognition Chia-yu Wan, Lin-shan Lee College of EECS, National Taiwan University, R. O. C.

Slides:



Advertisements
Similar presentations
Feedback Reliability Calculation for an Iterative Block Decision Feedback Equalizer (IB-DFE) Gillian Huang, Andrew Nix and Simon Armour Centre for Communications.
Advertisements

Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08.
Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
Aggregating local image descriptors into compact codes
Entropy and Dynamism Criteria for Voice Quality Classification Applications Authors: Peter D. Kukharchik, Igor E. Kheidorov, Hanna M. Lukashevich, Denis.
2005/12/06OPLAB, Dept. of IM, NTU1 Optimizing the ARQ Performance in Downlink Packet Data Systems With Scheduling Haitao Zheng, Member, IEEE Harish Viswanathan,
Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.
Distribution-Based Feature Normalization for Robust Speech Recognition Leveraging Context and Dynamics Cues Yu-Chen Kao and Berlin Chen Presenter : 張庭豪.
Hacettepe University Robust Channel Shortening Equaliser Design Cenk Toker and Semir Altıniş Hacettepe University, Ankara, Turkey.
AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION Michael L. Seltzer, Dong Yu Yongqiang Wang ICASSP 2013 Presenter : 張庭豪.
Compressed-domain-based Transmission Distortion Modeling for Precoded H.264/AVC Video Fan li Guizhong Liu IEEE transactions on circuits and systems for.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Fundamental limits in Information Theory Chapter 10 :
Department of Communication Technology 30/08/ A Comparative Study of Feature-Domain Error Concealment Techniques for Distributed Speech Recognition.
MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, ASRU. IEEE Workshop on Author.
Detecting Image Region Duplication Using SIFT Features March 16, ICASSP 2010 Dallas, TX Xunyu Pan and Siwei Lyu Computer Science Department University.
Losslessy Compression of Multimedia Data Hao Jiang Computer Science Department Sept. 25, 2007.
HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Department of Communication Technology A Subvector-Based Error Concealment Algorithm for Speech Recognition over Mobile Networks - ICASSP 2004, Montreal,
A Low-Power Low-Memory Real-Time ASR System. Outline Overview of Automatic Speech Recognition (ASR) systems Sub-vector clustering and parameter quantization.
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
3/20/2013 Threshold Voltage Distribution in MLC NAND Flash: Characterization, Analysis, and Modeling Yu Cai 1, Erich F. Haratsch 2, Onur Mutlu 1, and Ken.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
1 Robust HMM classification schemes for speaker recognition using integral decode Marie Roch Florida International University.
Cepstral Vector Normalization based On Stereo Data for Robust Speech Recognition Presenter: Shih-Hsiang Lin Luis Buera, Eduardo Lleida, Antonio Miguel,
Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Codebook-based Feature Compensation for Robust Speech Recognition 2007/02/08 Shih-Hsiang Lin ( 林士翔 ) Graduate Student National Taiwan Normal University,
REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.
17.0 Distributed Speech Recognition and Wireless Environment References: 1. “Quantization of Cepstral Parameters for Speech Recognition over the World.
Survey of ICASSP 2013 section: feature for robust automatic speech recognition Repoter: Yi-Ting Wang 2013/06/19.
November 1, 2005IEEE MMSP 2005, Shanghai, China1 Adaptive Multi-Frame-Rate Scheme for Distributed Speech Recognition Based on a Half Frame-Rate Front-End.
LOG-ENERGY DYNAMIC RANGE NORMALIZATON FOR ROBUST SPEECH RECOGNITION Weizhong Zhu and Douglas O’Shaughnessy INRS-EMT, University of Quebec Montreal, Quebec,
Adaptive Multi-path Prediction for Error Resilient H.264 Coding Xiaosong Zhou, C.-C. Jay Kuo University of Southern California Multimedia Signal Processing.
DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.
Yi-zhang Cai, Jeih-weih Hung 2012/08/17 報告者:汪逸婷 1.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
Advances in digital image compression techniques Guojun Lu, Computer Communications, Vol. 16, No. 4, Apr, 1993, pp
Scalable Video Coding and Transport Over Broad-band wireless networks Authors: D. Wu, Y. Hou, and Y.-Q. Zhang Source: Proceedings of the IEEE, Volume:
Unequal Protection of JPEG2000 Code-Streams in Wireless Channels
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Subproject II: Robustness in Speech Recognition. Members (1/2) Hsiao-Chuan Wang (PI) National Tsing Hua University Jeih-Weih Hung (Co-PI) National Chi.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Analyzing wireless sensor network data under suppression and failure in transmission Alan E. Gelfand Institute of Statistics and Decision Sciences Duke.
ICASSP 2006 Robustness Techniques Survey ShihHsiang 2006.
 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
PHASE-BASED DUAL-MICROPHONE SPEECH ENHANCEMENT USING A PRIOR SPEECH MODEL Guangji Shi, M.A.Sc. Ph.D. Candidate University of Toronto Research Supervisor:
Statistical Models for Automatic Speech Recognition Lukáš Burget.
January 2001RESPITE workshop - Martigny Multiband With Contaminated Training Data Results on AURORA 2 TCTS Faculté Polytechnique de Mons Belgium.
A Hybrid Model of HMM and RBFN Model of Speech Recognition 길이만, 김수연, 김성호, 원윤정, 윤아림 한국과학기술원 응용수학전공.
Exploring the Use of Speech Features and Their Corresponding Distribution Characteristics for Robust Speech Recognition Shih-Hsiang Lin, Berlin Chen, Yao-Ming.
Feature Transformation and Normalization Present by Howard Reference : Springer Handbook of Speech Processing, 3.3 Environment Robustness (J. Droppo, A.
- A Maximum Likelihood Approach Vinod Kumar Ramachandran ID:
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
1 LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING ON AURORA 2.0 Chia-Ping Chen, Jeff Bilmes and Katrin Kirchhoff SSLI Lab Department of Electrical Engineering.
K. ZEBBICHE , F. KHELIFI and A. BOURIDANE
On the Integration of Speech Recognition into Personal Networks
Missing feature theory
Data Transformations targeted at minimizing experimental variance
A maximum likelihood estimation and training on the fly approach
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

Histogram-based Quantization for Distributed / Robust Speech Recognition Chia-yu Wan, Lin-shan Lee College of EECS, National Taiwan University, R. O. C. 2007/08/16

Outline Introduction Histogram-based Quantization (HQ) Joint Uncertainty Decoding (JUD) Three-stage Error Concealment (EC) Conclusion

Problems of Distance-based VQ Conventional Distance-based VQ (e.g. SVQ) was popularly used in DSR Dynamic Environmental noise and codebook mismatch jointly degrade the performance of SVQ Histogram-based Quantization (HQ) is proposed to solve the problems Noise moves clean speech to another partition cell (X to Y) Mismatch between fixed VQ codebook and test data increases distortion Quantization increases difference between clean and noisy features

Decision boundaries y i {i=1,…,N} are dynamically defined by C(y). Representative values z i {i=1,…,N} are fixed, transformed by a standard Gaussian. Histogram-based Quantization (HQ) T

T The actual decision boundaries (horizontal scale) for x t are dynamically defined by the inverse transformation of C(y).

Histogram-based Quantization (HQ) With histogram C’(y’), decision boundaries automatically changed to. Decision boundaries are adjusted according to local statistics, no codebook mismatch problem. T

Histogram-based Quantization (HQ) Based on CDF on the vertical scale and histogram, less sensitive to noise on the horizontal scale Disturbances are automatically absorbed into HQ block Dynamic nature of HQ  hidden codebook on vertical scale  transformed by dynamic C(y)  {y i } Dynamic on horizontal scale T

Histogram-based Vector Quantization (HVQ)

Discussions about robustness of Histogram-based Quantization (HQ) Distributed speech recognition: SVQ v.s. HQ Robust speech recognition: HEQ v.s. HQ

Comparison of Distance-based VQ and Histogram-based Quantization (HQ) Distance-based VQ (SVQ)Histogram-based Quantization (HQ) HQ solves the major problems of conventional Distance-based VQ Fixed codebook cannot well represent the noisy speech Dynamically adjusted to local statistics, no codebook mismatch Quantization increases difference between clean and noisy speech. Inherent robust nature, noise disturbances automatically absorbed by C(y)

HEQ performed point-to-point transformation point-based order-statistics are more disturbed HQ performed block-based transformation automatically absorbed disturbance within a block with proper choice of block size, block uncertainty can be compensated by GMM and uncertainty decoding Averaged normalized distance between clean and corrupted speech features based on AURORA 2 database HEQ (Histogram Equalization) v.s. HQ (Histogram-based Quantization)

HEQ performed point-to-point transformation point-based order-statistics are more disturbed HQ performed block-based transformation automatically absorbed disturbance within a block with proper choice of block size, block uncertainty can be compensated by GMM and uncertainty decoding HEQ (Histogram Equalization) v.s. HQ (Histogram-based Quantization) HQ gives smaller d for all SNR condition less influenced by the noise disturbance

HQ as a feature transformation method

HQ as a feature quantization method

Further analysis Bit rates v.s. SNR Clean-condition trainingmulti-condition training

HQ-JUD For both robust and/or distributed speech recognition For robust speech recognition HQ is used as the front-end feature transformation JUD as the enhancement approach at the backend recognizer For Distributed Speech Recognition (DSR) HQ is applied at the client for data compression JUD at the server Front-end Back-end ClientServer Robustness DSR

Joint Uncertainty Decoding (1/4) - Uncertainty Observation Decoding HMM would be less discriminate on features with higher uncertainty  Increasing larger variance for more uncertain features w: observation, o: uncorrupted features Assume

Joint Uncertainty Decoding (2/4) - Uncertainty for quantization errors Codeword is the observation w Samples in the partition cell are the uncorrupted features o p(o) is the pdf of the samples within the partition cell Variance of samples within partition cell

More uncertain regions Loosely quantized cells Joint Uncertainty Decoding (2/4) - Uncertainty for quantization errors Codeword is the observation w Samples in the partition cell are the possible distribution o p(o) is the pdf of the samples within the partition cell Increases the variances for the loosely quantized cells Variance of samples within partition cell

Joint Uncertainty Decoding (3/4) -Uncertainty for environmental noise Increase the variances for HQ features with a larger histogram shift Histogram shift

Jointly consider the uncertainty caused by both the environmental noise and the quantization errors. One of the above two would dominate Quantization errors (High SNR)  Disturbance absorbed into HQ block Environment noise (Low SNR)  Noisy features moved to another partition cells Joint Uncertainty Decoding (4/4)

HQ-JUD for robust speech recognition

Different types of noise, averaged over all SNR values Client HEQ-SVQ Client HEQ-SVQ Server UD Client HQ Client HQ Server JUD HQ-JUD for distributed speech recognition

Different types of noise, averaged over all SNR values Client HEQ-SVQ HEQSVQ-UD was slightly worse than HEQ for set C Client HEQ-SVQ Server UD HQ-JUD for distributed speech recognition

Different types of noise, averaged over all SNR values HEQSVQ-UD was slightly worse than HEQ for set C HQ-JUD consistently improved the performance of HQ Client HQ Client HQ Server JUD HQ-JUD for distributed speech recognition

Different types of noise, averaged over all SNR values Client HEQ-SVQ Client HQ HQ performed better than HEQ-SVQ for all types of noise HQ-JUD for distributed speech recognition

Different types of noise, averaged over all SNR values HQ performed better than HEQ-SVQ for all types of noise HQ-JUD consistently performed better than HEQSVQ-UD Client HQ Server JUD Client HEQ-SVQ Server UD HQ-JUD for distributed speech recognition

Different SNR conditions, averaged over all noise types HQ-JUD significantly improved the performance of SVQ-UD HQ-JUD consistently performed better than HEQSVQ-UD Client HEQ-SVQ Server UD Client HQ Server JUD Client SVQ Server UD Client HQ Server JUD HQ-JUD for distributed speech recognition

Three-stage error concealment (EC)

Stage 1 : error detection Frame-level error detection The received frame-pairs are first checked with CRC Subvector-level error detection The erroneous frame-pairs are then checked by the HQ consistency check The quantized codewords for HQ represent the order-statistics information of the original parameters Quantizaiton process does not change the order-statistics Re-perform HQ on received subvector codeword should fall in the same partition cell

Stage 1 : error detection Noise seriously affects the SVQ with data consistency check -precision degradation (from 66% at clean down to 12% at 0 dB) HQ-based consistency approach is much more stable at all SNR values, - both recall and precision rates are higher.

Stage 2 : reconstruction Based on the Maximum a posterior (MAP) criterion -Considering the probability for all possible codewords S t (i) at time t, given the current and previous received subvector codewords, R t and R t-1, -prior speech source statistics : HQ codeword bigram model -channel transition probability : the estimated BER from stage1 -reliability of the received subvectors : consider the relative reliability between prior speech source and wireless channel priorchannel

 Channel transition probability P(R t | S t (i)) -significantly differentiated (for different codeword i, with different d) when R t is more reliable (BER is smaller) -put more emphasis on prior speech source when R t is less reliable -the estimated BER is the number of inconsistent subvectors in the present frame divided by the total number of bits in the frame Stage 2 : reconstruction

Prior source information P(S t (i)| R t-1 ) -based on the codeword bi-gram trained from cleaning training data in AURORA 2 -HQ can estimate the lost subvectors more preciously than SVQ -The conditional entropy measure Stage 2 : reconstruction

Stage 3 : Compensation in Viterbi decoding The distribution of P(S t (i)|R t,R t-1 ) characterizes the uncertainty of the estimated features Assume the distribution P(S t (i)|R t,R t-1 ) is Gaussian, the variance of the distribution P(S t (i)|R t,R t-1 ) is used in Uncertainty Decoding Make the HMMs less discriminative for the estimated subvectors with higher uncertainty

HQ-based DSR system with transmission errors Features corrupted by noise are more susceptible to transmission errors For SVQ, 98% to 87% (clean), 60% to 36% (10 dB SNR)

HQ-based DSR system with transmission errors The improvements that HQ offered over HEQ-SVQ when transmission errors were present are consistent and significant at all SNR values HQ is robust against both environmental noise and transmission errors

Analyze the degradation of recognition accuracy caused by transmission errors Comparison of SVQ, HEQ-SVQ and HQ for the percentage of words which were correctly recognized if without transmission errors, but incorrectly recognized after transmission.

HQ-Based DSR with Wireless Channels and Error Concealment ETSI repetition technique actually degraded the performance of HEQ-SVQg the whole feature vectors including the correct subvectors are replaced by inaccurate estimations g: GPRS r: ETSI repetition c: three-stage EC

HQ-Based DSR with Wireless Channels and Error Concealment Three-stage EC improved the performance significantly for all cases. Robust against not only transmission errors, but against environmental noise as well. g: GPRS r: ETSI repetition c: three-stage EC

HQ-Based DSR with Wireless Channels and Error Concealment

Different client traveling speed (1/3)

Different client traveling speed (2/3)

Different client traveling speed (3/3)

Conclusions Histogram-based Quantization (HQ) is proposed a novel approach for robust and/or distributed speech recognition (DSR) robust against environmental noise (for all types of noise and all SNR conditions) and transmission errors For future personalized and context aware DSR environments HQ can be adapted to network and terminal capabilities with recognition performance optimized based on environmental conditions

Thank you for your attention