1 INTRODUCTION METHODSRESULTSCONCLUSION Noise Robust Speech Recognition Group SB740 Noise Robust Speech Recognition Group SB740.

Slides:



Advertisements
Similar presentations
Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
Advertisements

Introduction The aim the project is to analyse non real time EEG (Electroencephalogram) signal using different mathematical models in Matlab to predict.
Understanding Research Articles Microbiology Laboratory.
PERFORMANCE ANALYSIS OF AURORA LARGE VOCABULARY BASELINE SYSTEM Naveen Parihar, and Joseph Picone Center for Advanced Vehicular Systems Mississippi State.
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved. Business and Administrative Communication SIXTH EDITION.
Search Engines and Information Retrieval
F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)
Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.
Approaches to automatic summarization Lecture 5. Types of summaries Extracts – Sentences from the original document are displayed together to form a summary.
Design Plans CSCI102 - Systems ITCS905 - Systems MCS Systems.
185 Final Project (Also covers Project Proposal and Document Specification)
Report Writing Three phases of report writing Exploratory phase (MAPS)
CANKAYA UNIVERSITY FOREIGN LANGUAGES UNIT
Introduction to Automatic Speech Recognition
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
Literature Review and Parts of Proposal
Search Engines and Information Retrieval Chapter 1.
IMSS005 Computer Science Seminar
C HU H AI C OLLEGE O F H IGHER E DUCATION D EPARTMENT O F C OMPUTER S CIENCE Preparation of Final Year Project Report Bachelor of Science in Computer Science.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
EE LECTURE 4 REPORT STRUCTURE AND COMPONENTS Electrical Engineering Dept King Saud University.
Jacob Zurasky ECE5526 – Spring 2011
Supervisor: Dr. Eddie Jones Co-supervisor: Dr Martin Glavin Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification.
Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome.
© 2014 by McGraw-Hill Education. This is proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
Basics of Neural Networks Neural Network Topologies.
Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky.
1 Devising Longer Reports and Proposals Quarterly & annual reports/long range planning programs/systems evaluations/ grant requests/proposals Make strong.
Effective Communication for Colleges, 10 th ed., by Brantley & Miller, 2005© Chapter 11 Chapter 11 – Slide 1 Reports, Proposals, and Instructions for the.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Gammachirp Auditory Filter
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Robust Feature Extraction for Automatic Speech Recognition based on Data-driven and Physiologically-motivated Approaches Mark J. Harvilla1, Chanwoo Kim2.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Performance Comparison of Speaker and Emotion Recognition
Subband Feature Statistics Normalization Techniques Based on a Discrete Wavelet Transform for Robust Speech Recognition Jeih-weih Hung, Member, IEEE, and.
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
Communicating Marketing Research Findings
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
C HU H AI C OLLEGE O F H IGHER E DUCATION D EPARTMENT O F C OMPUTER S CIENCE Preparation of Final Year Project Report Bachelor of Science in Computer Science.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
DYNAMIC TIME WARPING IN KEY WORD SPOTTING. OUTLINE KWS and role of DTW in it. Brief outline of DTW What is training and why is it needed? DTW training.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Speaker Verification System Middle Term Presentation Performed by: Barak Benita & Daniel Adler Instructor: Erez Sabag.
Research refers to a search for knowledge Research means a scientific and systematic search for pertinent information on a specific topic In fact, research.
Report Writing Lecturer: Mrs Shadha Abbas جامعة كربلاء كلية العلوم الطبية التطبيقية قسم الصحة البيئية University of Kerbala College of Applied Medical.
Speech Processing Dr. Veton Këpuska, FIT Jacob Zurasky, FIT.
Academic writing.
Technical Report Writing
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
Report Writing Three phases of report writing Exploratory phase (MAPS)
ARTIFICIAL NEURAL NETWORKS
Speech Enhancement with Binaural Cues Derived from a Priori Codebook
INTRODUCTION TO RESEARCH PROJECT
Technical Report Writing
Biology Laboratory Report
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
Preparing Conference Papers (1)
Writing Careful Long Reports
ภาควิชาวิทยาการคอมพิวเตอร์ คณะวิทยาศาสตร์
Preparing Conference Papers (1)
Thieves—a great Previewing Textbook Strategy
Presented by Chen-Wei Liu
Presenter: Shih-Hsiang(士翔)
Measuring the Similarity of Rhythmic Patterns
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

1 INTRODUCTION METHODSRESULTSCONCLUSION Noise Robust Speech Recognition Group SB740 Noise Robust Speech Recognition Group SB740

2 INTRODUCTION METHODSRESULTSCONCLUSION Standard feature extraction FramingFFTFilter Bank Cepstrum Coefficients speech features

3 INTRODUCTION METHODSRESULTSCONCLUSION Improved feature extraction Filter Bank Cepstrum Coefficients Framed FFT spectrum features Pre- Processing Post- Processing

4 INTRODUCTION METHODS RESULTSCONCLUSION Pre-Processing Quantile Based Noise Estimation for spectral subtraction (QBNE) Pre-Processing Quantile Based Noise Estimation for spectral subtraction (QBNE) Assuming that each frequency band contain only noise in a fraction of time even during speech For each frequency band the frames are sorted by amplitude A fixed q-value equal for all frequency bands Intersection between the vertical line and each frequency band is the noise estimate Problem with mis-matched training and test conditions

5 INTRODUCTION METHODS RESULTSCONCLUSION Pre-Processing Adaptive Quantile Based Noise Estimation for spectral subtraction (AQBNE) Goal is to improve the performance when training with low noise and testing with high noise Adapt to the utterance and noise levels Adjust the q-value for each frequency band Result is a q-estimation curve as opposed to a fixed value High and low noise situations will converge to similar representations

6 INTRODUCTION METHODS RESULTSCONCLUSION Filter Bank Speech Band Emphasizing Filter Bank (SBE) Mel Frequency Cepstrum Coefficient (MFCC) –Motivated from human perception and critical bands Mel Frequency Filter Bank –Triangular filters –Highest resolution at low frequencies –Resulting Importance Function Speech Band Emphasizing Filter Bank –Emphasizes the primary speech band –Highest resolution at 1500 Hz

7 INTRODUCTIONMETHODS RESULTS CONCLUSION Results QBNE with Mel Frequency Filter Bank showed an improvement of 15% AQBNE with SBE Filter Bank showed an improvement of 28% AQBNE with SBE Filter Bank showed a remarkable result under highly mis- matched conditions: 80% improvement compared to 21% when using QBNE with Mel Frequency Filter Bank

8 INTRODUCTIONMETHODSRESULTSCONCLUSION Conclusion AQBNE avoids describing speech signals during training to a level of detail which is unattainable during testing under noisy conditions The suggested SBE Filter Bank, though empirically chosen, indicates that filter distributions other than the standard Mel-scale may attain improved performance in noisy conditions

9 Presentation of Abstract Agenda:  Purpose of the abstract.  Structure of the abstract.  Content of the abstract.

10 Purpose of abstract Announcement to the 17 th 7 semester conference the 21th of December Appetizer to attract the right audience. In the abstract it is kept in mind that the audience for this project is other 7 semester students from the institute of electronic systems in Aalborg and Esbjerg.

11 Structure of the abstract Title:  Topic:The long title gives a detailed description of the content: ”Noise Robust Automatic Speech Recognition with Adaptive Quantile Based Noise Estimation and Speech Band Emphasizing Filter Bank”  Nature: Noise estimation.  Scope: Automatic speech recognition. Text is structured as IMRaD structure.

12 Structure of the abstract Throughout the text important keywords are used:  ASR, Noise Estimation, Feature Extraction. Known methods presented before new methods to create continuity. Complexity increased during the abstract.

13 Content of the abstract Introduction:  Contains information of the initial problem, the proposes made in the paper and field of operation.  This is the shortest section in the abstract, but contains a lot of keywords.

14 Content of the abstract Methods:  This section is the longest of the abstract, and contains references to known methods as well as new methods and solutions are introduced.  The first sentence in this section is linket to the introduction by the phrase ”feature extraction”.  This section ends with an advertisment to the results.

15 Content of the abstract Results:  The methods that have improved the recognition performance is presented first.  The best result is mentioned with the exact result compared to known methods.  The proposed solutions that have not improved the recognition is mentioned last in the section.

16 Content of the abstract Discussion:  First the method that did not improve the recognition performance is explained.  Secondly the methods that have improved the recognition performance are described.  The abstract is concluded by the recommendations based on the results achieved in this project.

17

18 Structure of Paper IMRaD model  Introduction- Introduction  Methods- Methods (PP, QBNE, AQBNE, SBE)  Results- Experimental framework - Experimental results  Discussion- Conclusion

19 Introduction Problem definition  Noise in speech signals has a dramatic effect on ASR. Analysis  Analysis of known methods.  Interesting known methods (PP, QBNE, MFCC).  Results: Develop new methods and combine different methods.

20 Methods Known methods  PP – Short presentation of method and implementation.  QBNE – Short presentation of method and thorough description of implementation. New methods  AQBNE and SBE – Motivation (Why is this a good method?) – Implementation (Compared to QBNE and MFCC)

21 Results Description of measurement instrument (HTK) and SpeechDat-Car database. Results in tables

22 Results Discussion of results in text. Chosen results in graph.

23 Conclusion Contains a summary of the important results, so it can be read and understood right after reading the abstract.

24 Worksheets Agenda:  Structure and organization  Brief presentation of worksheets

25 Structure and organization The worksheets are basis for the paper and the implementation of our system  Directly information about methods  Necessary background knowledge Give the group members the necessary knowledge to understand a subject Write in english The topic of the project was completely new to us  Impossible to plan work for a long time period  Discuss subjects, study, discuss new subjects Writing procedure:  The group discusses which subjects that need to be investigated  1-2 persons work together and write a work sheet  The group read and give feedback  1 person finish it

26 Brief presentation of work sheets 1. Introduction  State the aim of the project and our initial problem 2. Speech production  Human speech characteristics 3. Hidden Markov Model  Often used in speech recognition systems 4. Unwanted noise and effects  Noise and affects that can affect our system 5. Java execution speed test  Consideration of implementation language 6. Java processor blocks  Documents the implementation of our system 7. Matlab related  How to read sound files from SpeechDat-Car database

27 Brief presentation of work sheets 8. Frontend Interfaces  Input: SpeechDat-Car audio wave format, Output: HTK format 9. The standard frontend  Transformation of the sampled audio data into freature vectors 10. Post-Processing 11. The Mel filterbank 12. Quantile Based Noise Estimation 13. Spectral subtraction 14. Experimental framework  How we have tested the methods influence on the speech recognition 15. Experimental results  Describes our baseline and refer to App. A 16. Structure of abstract and paper  Overview of the important elements App. A: Raw results

28 Causality Causal:  Post-Processing  Speech Band Emphasizing Filter Bank Non-causal:  (Adaptive) Quantile Based Noise Estimation

29 Ordinary (non-causal) QBNE One discrete frequency (  ) Entire utterance is used for noise estimate

30 Causal QBNE One discrete frequency (  ) Noise estimate updated for each new frame

31 Causal QBNE n=0n=1n=2

32 Causal Adaptive QBNE

33 Causality PP and SBE are inherently causal QBNE and AQBNE can be made causal by using af buffer for the quantile  Additional computational cost  Reduced storage requirement

34 Closure Agenda:  Future work  Project working process

35 Future work (1/2) Implement causal AQBNE  Find optimal q-estimation curve etc.

36 Future work (2/2) Combine AQBNE and SBE with advanced front- end (WI008) Source: ETSI ES V1.1.3 ( ) AQBNE SBE Filter-Bank

37 Project working process Project reporting form  No 3 weeks final report correction  Worksheets easier to write than report chapters Difficult to parallelize tasks  Few tasks  Large groups Information gathering  State of the art knowledge from scientific papers  No textbooks with up to date information exist