PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology.

Slides:



Advertisements
Similar presentations
Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
Advertisements

Speech Enhancement through Noise Reduction By Yating & Kundan.
Evaluation of Speech Detection Algorithm Project 1b Due October 11.
Project 1b Evaluation of Speech Detection Due: February 17 th, at the beginning of class.
Adapted representations of audio signals for music instrument recognition Pierre Leveau Laboratoire d’Acoustique Musicale, Paris - France GET - ENST (Télécom.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Speech Enhancement Based on a Combination of Spectral Subtraction and MMSE Log-STSA Estimator in Wavelet Domain LATSI laboratory, Department of Electronic,
3/24/2006Lecture notes for Speech Communications Multi-channel speech enhancement Chunjian Li DICOM, Aalborg University.
Subband-based Independent Component Analysis Y. Qi, P.S. Krishnaprasad, and S.A. Shamma ECE Department University of Maryland, College Park.
HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR.
Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Advances in WP1 and WP2 Paris Meeting – 11 febr
1 New Technique for Improving Speech Intelligibility for the Hearing Impaired Miriam Furst-Yust School of Electrical Engineering Tel Aviv University.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
SoundSense by Andrius Andrijauskas. Introduction  Today’s mobile phones come with various embedded sensors such as GPS, WiFi, compass, etc.  Arguably,
-1- ICA Based Blind Adaptive MAI Suppression in DS-CDMA Systems Malay Gupta and Balu Santhanam SPCOM Laboratory Department of E.C.E. The University of.
UmeVoice, Inc. Vikas Rangarajan, Sr. Software Engineer Adithya M.R. Padala, President & CEO. Noise cancellation technology for delivering clear audio.
Microphone Integration – Can Improve ARS Accuracy? Tom Houy
Speech Enhancement Using Spectral Subtraction
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.
Multiuser Detection (MUD) Combined with array signal processing in current wireless communication environments Wed. 박사 3학기 구 정 회.
LOG-ENERGY DYNAMIC RANGE NORMALIZATON FOR ROBUST SPEECH RECOGNITION Weizhong Zhu and Douglas O’Shaughnessy INRS-EMT, University of Quebec Montreal, Quebec,
Ali Al-Saihati ID# Ghassan Linjawi
Signature with Text-Dependent and Text-Independent Speech for Robust Identity Verification B. Ly-Van*, R. Blouet**, S. Renouard** S. Garcia-Salicetti*,
Nico De Clercq Pieter Gijsenbergh.  Problem  Solutions  Single-channel approach  Multichannel approach  Our assignment Overview.
Baseband Demodulation/Detection
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING Nobuyuki Miyake, Tetsuya Takiguchi and Yasuo Ariki Department of Computer and System.
Timo Haapsaari Laboratory of Acoustics and Audio Signal Processing April 10, 2007 Two-Way Acoustic Window using Wave Field Synthesis.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
Independent Component Analysis Algorithm for Adaptive Noise Cancelling 적응 잡음 제거를 위한 독립 성분 분석 알고리즘 Hyung-Min Park, Sang-Hoon Oh, and Soo-Young Lee Brain.
Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments 張智星
Real-Time Signal-To-Noise Ratio Estimation Techniques for Use in Turbo Decoding Javier Schlömann and Dr. Noneaker.
In-car Speech Recognition Using Distributed Microphones Tetsuya Shinde Kazuya Takeda Fumitada Itakura Center for Integrated Acoustic Information Research.
1.INTRODUCTION The use of the adaptive codebook (ACB) in CELP-like speech coders allows the achievement of high quality speech, especially for voiced segments.
A Semi-Blind Technique for MIMO Channel Matrix Estimation Aditya Jagannatham and Bhaskar D. Rao The proposed algorithm performs well compared to its training.
Robust Feature Extraction for Automatic Speech Recognition based on Data-driven and Physiologically-motivated Approaches Mark J. Harvilla1, Chanwoo Kim2.
Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Baseband Receiver Receiver Design: Demodulation Matched Filter Correlator Receiver Detection Max. Likelihood Detector Probability of Error.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
Spatial vs. Blind Approaches for Speaker Separation: Structural Differences and Beyond Julien Bourgeois RIC/AD.
Performance of Digital Communications System
Motorola presents in collaboration with CNEL Introduction  Motivation: The limitation of traditional narrowband transmission channel  Advantage: Phone.
[1] National Institute of Science & Technology Technical Seminar Presentation 2004 Suresh Chandra Martha National Institute of Science & Technology Audio.
Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.
1 LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING ON AURORA 2.0 Chia-Ping Chen, Jeff Bilmes and Katrin Kirchhoff SSLI Lab Department of Electrical Engineering.
Speech and Singing Voice Enhancement via DNN
Speech Enhancement Summer 2009
Traffic State Detection Using Acoustics
LECTURE 11: Advanced Discriminant Analysis
ARTIFICIAL NEURAL NETWORKS
Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing
EE513 Audio Signals and Systems
AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION
A maximum likelihood estimation and training on the fly approach
Govt. Polytechnic Dhangar(Fatehabad)
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology Labs, Speech Laboratory Phone: (815)

PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 Why do we need to study noise? Noise exists everywhere. It affects the performance of signal processing in reality. Since the noise cannot be avoided by system engineers, modern “noise-processing” technology has been researched and designed to overcome this problem. Hence many related research areas have been emerging, such as signal detection, signal enhancement/noise suppression and channel equalization.

PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 Spectral Truncation –Spectral Subtraction (1989): Time Truncation –Signal Detection: Spatial and/or Temporal Filtering –Equalization: –Array Signal Separation (Blind Source Separation): How to deal with noise? Cut it off!!!!

PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 Session 1. On-line Automatic End-of-speech Detection Algorithm (Time Truncation) 1. Project goal. 2. Review of current methods. 3. Introduction to voice metric based end-of-speech detector. 4. Simulation results. 5. Conclusion.

PCS Research & Advanced Technology Labs Speech LabNovember 14, Project Goal: Problem –Digit-dial recognition with unknown digit string length Solution 1 –fixed length window such as 10 seconds? (inconvenience to users) Solution 2 –Dynamic termination of data capture? (need a robust detection algorithm)

PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 Research and design a robust dynamic termination mechanism for speech recognizer. –a new on-line automatic end-of-speech detection algorithm with small computational complexity. Design a more robust front end to improve the recognition accuracy for speech recognizers. –a new algorithm can also decrease the excessive feature extraction of redundant noise.

PCS Research & Advanced Technology Labs Speech LabNovember 14, Review of Current Methods: Most speech detection algorithms can be characterized into three categories. Frame energy detection –short-term frame energy (20 msec) can be used for speech/noise classification. –it is not robust at large background noise levels. Zero-crossing rate detection – short-term zero-crossing rate can also be used for speech/noise classification. –it is not robust in a wide variety of noise types. Higher-order-spectral detection –short-term higher-order spectra can be used for speech/noise classification. –it implies a heavy computational complexity and its threshold is difficult to be pre-determined.

PCS Research & Advanced Technology Labs Speech LabNovember 14, Introduction to Voice Metric Based End-of-speech Detector: End-of-speech detection using voice metric features is based on the Mel- energies. Voice metric features are robust over a wide variety of background noise. Originally voice metric based speech/noise classifier was applied for IS-127 CELP speech coder standard. We modify and enhance voice-metric features to design a new end-of-speech detector for Motorola voice recognition front end (VR LITE III).

PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000

PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000

PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000

PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000

PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000

PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000

PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 voice metric score table

PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 Pre-S/N Classifier Voice Metric Mel- Spectrum SNR Estimate EOS Buffer Threshold Adaptation raw data FFT Speech Start? Silence Duration Threshold Post-S/N Classifier voice metric scores Original VR LITE Front End End-of-speech Detector data capture stops yes no

PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 VR LITE recognition engine feature vector frame buffer segmentation of speech into frames data capture terminates end of speech? yes no frame i next frame i+1 speech input front end with end- of-speech detector

PCS Research & Advanced Technology Labs Speech LabNovember 14, seconds 3.78 seconds 4.81 seconds raw data end point detected end point String “ ” in Car 55 mph

PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 Correct detection End point False detection false detection time error correct detection time error String “ ” in Car 55 mph seconds

PCS Research & Advanced Technology Labs Speech LabNovember 14, Simulation Results : (Simulation is done over Motorola digit-string database, including 16 speakers and 15,166 variable-length digit strings in 7 different conditions. Silence threshold is 1.85 seconds.) A. Receiver Operating Curve (ROC): ROC curve is the relationship between the end-of-speech detection rate versus the false (early) detection rate. We compare two different methods, namely, (1) new voice-metric based end-of-speech detector and (2) old speech/noise flag based end-of-speech detector.

PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 ROC curve false detection rate (%) detection rate (%)

PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 B. String-accuracy-convergence (SAC) curve: SAC curve is the relationship between the string recognition accuracy versus the false (early) detection rate. We compare two different methods, namely, (1) new voice-metric based end-of-speech detector and (2) old speech/noise flag based end-of-speech detector.

PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 false detection rate (%) string recognition accuracy (%) SAC curve

PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 C. Table of detection results: (This table illustrates the result among the Madison sub-database including data files with 1.85 seconds or more of silence after end of speech.)

PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 (This table illustrates the result over the small database collected by Motorola PCS CSSRL. All digits strings are recorded in 15 seconds of fixed window) ConditionAverage Time Error Average False Detection Time Error Average Correct Detection Time Error False Detection Rate String Numbers Total Detection Rate String Recognition Accuracy (w/i EOS) String Recognition Accuracy (w/o EOS) Overall1.82 seconds 0 seconds1.82 seconds 0% %50.41%29.75% Office Close-talk 1.85 seconds 0 seconds1.85 seconds 0%21100%66.67%61.90% Office- Arm- length 1.84 seconds 0 seconds1.84 seconds 0%20100%65.00% Café Close-talk 1.76 seconds 0 seconds1.76 seconds 0%40100%40.00%15.00% Café Arm- length 1.85 seconds 0 seconds1.85 seconds 0%4090%45.00%10.00%

PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 Analysis of the Simulation Result: Why didn’t EOS detection work well in babble noise?

PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 Optimal Detection Decision Bayes classifier Likelihood Ratio Test

PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 Digit “one” in close-talking mic, quiet office

PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 Digit “one” in handsfree mic, 55 mil/h car

PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 Digit “one” in far-talking mic, cafeteria

PCS Research & Advanced Technology Labs Speech LabNovember 14, Conclusion: New voice-metric based end-of-speech detector is robust over a wide variety of background noise. Only a small increase in the computational complexity will be brought by new voice-metric based end-of-speech detector and it can be real-time implementable. New voice-metric based end-of-speech detector can improve recognition performance by discarding extra noise due to the fixed data capture window. New voice-metric based end-of-speech detector needs further improvement in the babble noise environment.

PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 Session 2. Speech Enhancement Algorithms: Blind Source Separation Methods (Spatial and Temporal Filtering) 1. Motivation and research goal. 2. Statement of “blind source separation” problem. 3. Principles of blind source separation. 4. Criteria for blind source separation. 5. Application to blind channel equalization for digital communication systems. 6. Simulation and comparison. 7. Summary and conclusion.

PCS Research & Advanced Technology Labs Speech LabNovember 14, Motivation: Mimic human auditory system to differentiate the subject signals from other sounds, such as interfered sources, background noise for clear recognition of the subject contents. ‘One of the most striking facts about our ears is that we have two of them-- and yet we hear one acoustic world; only one voice per speaker.’ (E. C. Cherry and W. K. Taylor. Some further experiments on the recognition of speech, with one and two ears. Journal of the Acoustic Society of America, 26: , 1954) The ‘‘cocktail party effect’’--the ability to focus one’s listening attention on a single talker among a cacophony of conversations and background noise--has been recognized for some time. This specialized listening ability may be because of characteristics of the human speech production system, the auditory system, or high-level perceptual and language processing.

PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 Research Goal: Design a preprocessor with digital signal processing speech enhancement algorithms. The input signals are collected through multiple sensor (microphone) arrays. After the computation of embedded signal processing algorithms, we have clearly separated signals at the output.

PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 Audio Input Blind Source Separation Algorithms Enhanced Output

PCS Research & Advanced Technology Labs Speech LabNovember 14, Problem Statement of Blind Source Separation: What is “Blind Source Separation”? Given the N linearly mixed received input signals, we need to recover the M statistically independent sources as much as possible ( ).

PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 Formulation of Blind Source Separation Problem: A received signal vector from the array, X(t), is the original source vector S(t) through the channel distortion H(t), such that X(t) = H(t) S(t), where and We need to estimate a separator W(t) such that where

PCS Research & Advanced Technology Labs Speech LabNovember 14, Principles of Blind Source Separation: The independence measurement: Shannon’s Mutual information.

PCS Research & Advanced Technology Labs Speech LabNovember 14, Criteria to Separate Independent Sources: Constrained Entropy (Wu, IJCNN99): – Hardamard Measure (Wu, ICA99): – Frobenius Norm (Wu, NNSP97): – Quadratic Gaussianity (Wu, NNSP99): –

PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 We apply the minimization of modified constrained entropy to adapt an equalizer w(t) =[w 0, w 1,....] for a digital channel h(t). Assume a PAM signal constellation with symbols s(t) =, passing through a digital channel h(t) = [c(t, 0.11) + 0.8c(t-1, 0.11) - 0.4c(t-3, 0.11)]W 6T (t), where is raised-cosine function with roll-off factor  and is a rectangular window. the input signal to the equalizer is where n(t) is the background noise. We applied generalized anti-Hebbian learning to adapt w(t) such that. 5. Application to Blind Single Channel Equalization for Digital Communication Systems:

PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 Signal-to-noise Ratio (dB) Signal-to-interference Ratio (dB)

PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 Signal-to-noise Ratio (dB) Bit Error Rate

PCS Research & Advanced Technology Labs Speech LabNovember 14, Simulation and Comparison: The simulation results for comparison among our generalized anti-Hebbian learning, SDIF algorithm and Lee’s Informax method (Lee IJCNN97) over three real recordings downloaded from Salk Institute, University of California at San Diego.

PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 New VR LITE Frontend: Blind Source Separation + End-of-speech Detection

PCS Research & Advanced Technology Labs Speech LabNovember 14, Conclusion and Future Research: The computational efficiency of blind source separation needs to be reduced. Test BSS for EOS detection under microphone arrays of the same kind. Incorporate other array signal processing (beamformer?) technique to improve speech detection and recognition.