Wake-up Word Detector Douglas Rauscher ECE5525 April 30, 2008.

Slides:



Advertisements
Similar presentations
Tuning Jenny Burr August Discussion Topics What is tuning? What is the process of tuning?
Advertisements

Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.
Functions in MatLab Create a new folder on your Z:drive called MatLab_Class24 Start MatLab and change your current directory to MatLab_Class24 Topics:
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Why python? Automate processes Batch programming Faster Open source Easy recognition of errors Good for data management What is python? Scripting programming.
Getting Started: Ansoft HFSS 8.0
Author :Panikos Heracleous, Tohru Shimizu AN EFFICIENT KEYWORD SPOTTING TECHNIQUE USING A COMPLEMENTARY LANGUAGE FOR FILLER MODELS TRAINING Reporter :
Musical Background - Pianos are extremely complicated (over 9,000 pieces) - Highly certified piano technicians can take days to prepare a concert piano.
A Data-Driven Approach to Quantifying Natural Human Motion SIGGRAPH ’ 05 Liu Ren, Alton Patrick, Alexei A. Efros, Jassica K. Hodgins, and James M. Rehg.
EGR 106 – Week 2 – Arrays & Scripts Brief review of last week Arrays: – Concept – Construction – Addressing Scripts and the editor Audio arrays Textbook.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Introduction to MATLAB MECH 300H Spring Starting of MATLAB.
Grep, comm, and uniq. The grep Command The grep command allows a user to search for specific text inside a file. The grep command will find all occurrences.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Access 2007 ® Use Databases How can Access help you to find and use information?
Collections Management Museums EMu 3.1 / 3.2 – New Features EMu 3.1 / 3.2 New Features Bernard Marshall Chief Technology Officer KE Software.
Prototype & Design Computer Inputs. How to Prototype & Design Computer Inputs Step 1: Review Input Requirements Step 2: Select the GUI Controls Step 3:
Programming For Nuclear Engineers Lecture 12 MATLAB (3) 1.
Advanced File Processing
Arthur Kunkle ECE 5525 Fall Introduction and Motivation  A Large Vocabulary Speech Recognition (LVSR) system is a system that is able to convert.
How Spread Works. Spread Spread stands for Speech and Phoneme Recognition as Educational Aid for the Deaf and Hearing Impaired Children It is a game used.
Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.
Week 7 Working with the BASH Shell. Objectives  Redirect the input and output of a command  Identify and manipulate common shell environment variables.
Utterance Verification for Spontaneous Mandarin Speech Keyword Spotting Liu Xin, BinXi Wang Presenter: Kai-Wun Shih No.306, P.O. Box 1001,ZhengZhou,450002,
Temple University Goals : 1.Down sample 20 khz TIDigits data to 16 khz. 2. Use Down sample data run regression test and Compare results posted in Sphinx-4.
GUI development with Matlab: GUI Front Panel Components 1 GUI front panel components In this section, we will look at -GUI front panel components -Programming.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
Presentation by Daniel Whiteley AME department
User Authentication Using Keystroke Dynamics Jeff Hieb & Kunal Pharas ECE 614 Spring 2005 University of Louisville.
Launch SpecE8 and React from GSS. You can use the chemical analyses in a GSS data sheet to set up and run SpecE8 and React calculations. Analysis → Launch…
Productivity Programs Common Features and Commands.
Jacob Zurasky ECE5526 – Spring 2011
Incorporating Dynamic Time Warping (DTW) in the SeqRec.m File Presented by: Clay McCreary, MSEE.
Temple University Training Acoustic model using Sphinx Train Jaykrishna shukla,Mubin Amehed& cara Santin Department of Electrical and Computer Engineering.
240-Current Research Easily Extensible Systems, Octave, Input Formats, SOA.
Chapter 3 MATLAB Fundamentals Introduction to MATLAB Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information.
1 EndNote X2 Your Bibliographic Management Tool 29 September 2009 Humanities and Social Sciences Resource Teams.
Chapter Six Introduction to Shell Script Programming.
Sep 13, 2006 Scientific Computing 1 Managing Scientific Computing Projects Erik Deumens QTP and HPC Center.
Performance Comparison of Speaker and Emotion Recognition
The Art of Programming. The process of breaking problems down into smaller, manageable parts By breaking the problem down, each part becomes more specific.
JWST Pipeline/Analysis Tools Perry Greenfield Science Software Branch.
By: David Gelbendorf, Hila Ben-Moshe Supervisor : Alon Zvirin
A New Approach to Utterance Verification Based on Neighborhood Information in Model Space Author :Hui Jiang, Chin-Hui Lee Reporter : 陳燦輝.
1 Lecture 5 Post-Graduate Students Advanced Programming (Introduction to MATLAB) Code: ENG 505 Dr. Basheer M. Nasef Computers & Systems Dept.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
Linux+ Guide to Linux Certification, Second Edition
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
SCRIPTS AND FUNCTIONS DAVID COOPER SUMMER Extensions MATLAB has two main extension types.m for functions and scripts and.mat for variable save files.
Chapter – 8 Software Tools.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
CS307P-SYSTEM PRACTICUM CPYNOT. B13107 – Amit Kumar B13141 – Vinod Kumar B13218 – Paawan Mukker.
Statistical techniques for video analysis and searching chapter Anton Korotygin.
“Moh’d Sami” AshhabSummer 2008University of Jordan MATLAB By (Mohammed Sami) Ashhab University of Jordan Summer 2008.
Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter.
Converting Matrix Market matrices to Matlab format The Matrix Market is an interesting collection of matrices from a variety of applications.
Paul van Mulbregt Sheera Knecht Jon Yamron Dragon Systems Detection at Dragon Systems.
ASE Optdiag Features including dynamic_histogram
Large-Scale Content-Based Audio Retrieval from Text Queries
Topics Introduction to Repetition Structures
MATLAB(Matrix Laboratory). Introduction Developed by MathWorks Numerical Computing Environment Fourth-generation Programming Language.
Presentation by Daniel Whiteley AME department
Command Me Specification
Visual Recognition of American Sign Language Using Hidden Markov Models 문현구 문현구.
Games Development 2 Tools Programming
Using Clustering to Make Prediction Intervals For Neural Networks
CA16R405 - Mobile Application Development (Theory)
CA16R405 - Mobile Application Development (Theory)
Presentation transcript:

Wake-up Word Detector Douglas Rauscher ECE5525 April 30, 2008

Douglas Rauscher Introduction  The purpose of this project is to generate feature vectors and Hidden Markov Models for a single word  Data is processed using Sphinx and Matlab  The Wake-up Word chosen is “Help”

Douglas Rauscher Corpus  The corpus used is the original WUW_Corpus, provided on the ECE5526 server: ftp:// /CORPORA/WUW_Corpora/WUW_Corpus/  This corpus was used because single utterances of the word “Help” were frequent in the data set  Data is in µ-law format

Douglas Rauscher File lists & Transcriptions  Before processing in Sphinx, “transcription” and “fileids” files need to be created: wuw_corpus_train.fileidswuw_corpus_train.fileids wuw_corpus_train.transcriptionwuw_corpus_train.transcription wuw_corpus_test.fileidswuw_corpus_test.fileids wuw_corpus_test.transcriptionwuw_corpus_test.transcription  These were created in Matlab by searching the given “|”-delimited file for “Help” utterances.  80% of “Help” utterances were used in the training list. The remaining 20% were used in the test list.  All utterances that did not contain “Help” were included in the test set to test for false alarms.  A handful of the utterances in the original.trans file were manually removed from the list because either They had no data bytes in the fileThey had no data bytes in the file Sphinx had trouble with the sound qualitySphinx had trouble with the sound quality The utterance was cut off in such a way that Sphinx threw an errorThe utterance was cut off in such a way that Sphinx threw an error

Douglas Rauscher dcr_extract.m close all; clear all; clc; A = textread('C:\CMUtutorial\WUW_Corpus\wuw.trans','%s','delimiter','|'); idx = 1:length(A); idx = idx((strcmp(A,'Male')+strcmp(A,'Female'))>0); gender = A(idx); dialect = A(idx+1); phone_type = A(idx+2); filename = A(idx+3); CallNO = A(idx+4); UttNO = A(idx+5); Ortho = A(idx+6); AllIdx = 1:length(Ortho); HelpIdx = AllIdx(strcmp(Ortho,'Help')); NotHelpIdx = AllIdx(~strcmp(Ortho,'Help')); N = floor(length(HelpIdx)*0.8); fout = fopen('C:\CMUtutorial\WUW_Corpus\etc\wuw_corpus_train.fileids','w'); ftsn = fopen('C:\CMUtutorial\WUW_Corpus\etc\wuw_corpus_train.transcription','w'); for k=1:N fprintf(fout,'calls/%s/WUW%s_%s\n',char(filename(HelpIdx(k))),... fprintf(fout,'calls/%s/WUW%s_%s\n',char(filename(HelpIdx(k))),... char(filename(HelpIdx(k))),... char(filename(HelpIdx(k))),... char(CallNO(HelpIdx(k)))); char(CallNO(HelpIdx(k)))); fprintf(ftsn,' %s (WUW%s_%s)\n',upper(char(Ortho(HelpIdx(k)))),... fprintf(ftsn,' %s (WUW%s_%s)\n',upper(char(Ortho(HelpIdx(k)))),... char(filename(HelpIdx(k))),... char(filename(HelpIdx(k))),... char(CallNO(HelpIdx(k)))); char(CallNO(HelpIdx(k))));endfclose(fout);fclose(ftsn); fout = fopen('C:\CMUtutorial\WUW_Corpus\etc\wuw_corpus_test.fileids','w'); ftsn = fopen('C:\CMUtutorial\WUW_Corpus\etc\wuw_corpus_test.transcription','w'); % Remaining "Help" for k=(N+1):length(HelpIdx) fprintf(fout,'calls/%s/WUW%s_%s\n',char(filename(HelpIdx(k))),... fprintf(fout,'calls/%s/WUW%s_%s\n',char(filename(HelpIdx(k))),... char(filename(HelpIdx(k))),... char(filename(HelpIdx(k))),... char(CallNO(HelpIdx(k)))); char(CallNO(HelpIdx(k)))); fprintf(ftsn,'%s (WUW%s_%s)\n',upper(char(Ortho(HelpIdx(k)))),... fprintf(ftsn,'%s (WUW%s_%s)\n',upper(char(Ortho(HelpIdx(k)))),... char(filename(HelpIdx(k))),... char(filename(HelpIdx(k))),... char(CallNO(HelpIdx(k)))); char(CallNO(HelpIdx(k))));end % Other utterances for k=1:length(NotHelpIdx) fprintf(fout,'calls/%s/WUW%s_%s\n',char(filename(NotHelpIdx(k))),... fprintf(fout,'calls/%s/WUW%s_%s\n',char(filename(NotHelpIdx(k))),... char(filename(NotHelpIdx(k))),... char(filename(NotHelpIdx(k))),... char(CallNO(NotHelpIdx(k)))); char(CallNO(NotHelpIdx(k)))); fprintf(ftsn,'%s (WUW%s_%s)\n',upper(char(Ortho(NotHelpIdx(k)))),... fprintf(ftsn,'%s (WUW%s_%s)\n',upper(char(Ortho(NotHelpIdx(k)))),... char(filename(NotHelpIdx(k))),... char(filename(NotHelpIdx(k))),... char(CallNO(NotHelpIdx(k)))); char(CallNO(NotHelpIdx(k))));endfclose(fout);fclose(ftsn);

Douglas Rauscher Data preparation  Corpus data was originally: file extension.ulawfile extension.ulaw 8-bit µ-law format8-bit µ-law format 8kHz sample rate8kHz sample rate  This data must be converted, as.ulaw files are not readable by Sphinx.  Format chosen to convert to: File extension.rawFile extension.raw 16-bit linear quantization16-bit linear quantization 16kHz (linearly interpolated)16kHz (linearly interpolated)

Douglas Rauscher ulaw2raw.m for k=0:252 ulaw2raw(sprintf('C:\\CMUtutorial\\WUW_Corpus\\calls\\%05d\\',k),0); ulaw2raw(sprintf('C:\\CMUtutorial\\WUW_Corpus\\calls\\%05d\\',k),0);end function ulaw2raw(filepath,playflag) % ulaw2raw('C:\CMUtutorial\WUW_Corpus\calls\00000\'); cd_save = cd; cd(filepath); files = dir; % US standard u-law coeff u=255; for k=3:length(files) if (files(k).isdir==0) && (strcmp(files(k).name(end-4:end),'.ulaw')) if (files(k).isdir==0) && (strcmp(files(k).name(end-4:end),'.ulaw')) disp(files(k).name); disp(files(k).name); fin = fopen(files(k).name,'r'); fin = fopen(files(k).name,'r'); A = fread(fin,'int8'); A = fread(fin,'int8'); % move data to proper sign % move data to proper sign A1 = A.*(A 0); A1 = A.*(A 0); % remove u-law % remove u-law B1 = sign(A1).*(1/u).*(((1+u).^abs(A1/128))-1); B1 = sign(A1).*(1/u).*(((1+u).^abs(A1/128))-1); B2 = reshape([B1,((B1+[B1(2:end);0])./2)].',1,[]); B2 = reshape([B1,((B1+[B1(2:end);0])./2)].',1,[]); if(playflag) if(playflag) sound(B2,16000) sound(B2,16000) pause(length(B2)/16000); pause(length(B2)/16000); end end fclose(fin); fclose(fin); generateRawWav(files(k).name(1:end-5),B2); generateRawWav(files(k).name(1:end-5),B2); end endendcd(cd_save); function generateRawWav(filename,data) fout = fopen(strcat(filename,'.raw'),'w'); dataq = round(32768.*data./128); fwrite(fout,dataq,'int16');fclose(fout);

Douglas Rauscher Language model creation  For a Wake-up Word recognizer, a language model is not particularly desirable in detecting the word.  Sphinx allows you to weight the priority of the language model in it’s calculations, but does not appear to allow the user to disable the language model all together.  Therefore, to avoid errors, a custom language model had to be created. 1.The lm tool generator was used to convert a text file that contained only the word “Help” to a.lm file. 2.The lm3g2dmp tool was used to convert the.lm file to.lm.DMP format. run cmd cd C:\CMUtutorial\lm3g2dmp\Debug> lm3g2dmp 7092.lm./

Douglas Rauscher Training the Model  Sphinx Training Configuration file was edited to use proper input files  The Max Number of Gaussians was set to 8  The Number of HMM States was increased from 3 to 5, without significant improvement  Sphinx commands: cd c:/CMUtutorial/WUW_Corpus/ perl scripts_pl/make_feats.pl -ctl etc/wuw_corpus_train.fileids -cfg etc/sphinx_train.cfg -param etc/feat.params perl scripts_pl/RunAll.pl

Douglas Rauscher Testing the Model  Sphinx Testing Configuration file was edited to use proper input files.  Language model weight was set to “1” (the lowest allowable setting)  Number of Gaussians was set to 8 to match the training configuration  Sphinx commands: perl scripts_pl/make_feats.pl -ctl etc/wuw_corpus_test.fileids -cfg etc/sphinx_decode.cfg -param etc/feat.params perl scripts_pl/decode/slave.pl

Douglas Rauscher Sphinx Output  Sphinx was used to calculate Acoustic Scoring only, not to perform thresholding.  These resulting scores were parsed in Matlab and PDF/CDF plots were generated.  See attached output document for raw Cygwin output

Douglas Rauscher plotDistributions.m % plotDistributions clear all; clc; close all; fn = 'C:\CMUtutorial\WUW_Corpus\logdir\decode\wuw_corpus-1-1.log'; RawText = textread(fn,'%s'); idx = []; for k=1:(length(RawText)-6) if(~isempty(findstr(char(RawText(k)),'fv:')) &&... if(~isempty(findstr(char(RawText(k)),'fv:')) &&... strcmp(char(RawText(k+1)),'HELP')) strcmp(char(RawText(k+1)),'HELP')) idx = [idx; k:k+7]; idx = [idx; k:k+7]; end endend RawText = RawText(idx); % fetch and plot Acoustic Score histograms HelpAScr = []; FalsAScr = []; for k=1:size(RawText,1) if(findstr(char(RawText(k,1)),'_008>')) if(findstr(char(RawText(k,1)),'_008>')) % True HELP % True HELP HelpAScr = [HelpAScr str2num(char(RawText(k,5)))]; HelpAScr = [HelpAScr str2num(char(RawText(k,5)))]; else else % Not a HELP % Not a HELP FalsAScr = [FalsAScr str2num(char(RawText(k,5)))]; FalsAScr = [FalsAScr str2num(char(RawText(k,5)))]; end endend mn = min(min(HelpAScr),min(FalsAScr)); mx = max(max(HelpAScr),max(FalsAScr)); vals = mn:((mx-mn)/100):mx; HelpAScrHist = hist(HelpAScr,vals); HelpAScrHist = HelpAScrHist./sum(HelpAScrHist); FalsAScrHist = hist(FalsAScr,vals); FalsAScrHist = FalsAScrHist./sum(FalsAScrHist); for k=1:length(vals) HelpAScrCDF(k) = sum(HelpAScrHist(1:k)); HelpAScrCDF(k) = sum(HelpAScrHist(1:k)); FalsAScrCDF(k) = sum(FalsAScrHist(k:end)); FalsAScrCDF(k) = sum(FalsAScrHist(k:end));endfigure; subplot(2,1,1); plot(vals,HelpAScrHist,'b',vals,FalsAScrHist,'r'); title('Probability Density Function') legend('Help','Other Utterances') axis([mn,mx,0,1.1*max(max(HelpAScrHist),max(FalsAScrHist))]); subplot(2,1,2); plot(vals,HelpAScrCDF, 'b',vals,FalsAScrCDF, 'r'); title('Cumulative Distribution Function') axis([mn,mx,0,1.1]);

Douglas Rauscher plotDistributions.m

Conclusions  Sphinx had problems correctly detecting the word “Help” in this test, but there is clearly a decent model created.  The test set was rather constrained and limited, and would benefit from a much larger sampling of “Help” utterances.  Sphinx features that would have been nice: Native.ulaw file inputNative.ulaw file input Simpler mechanism to input sample rateSimpler mechanism to input sample rate Native text file input for language model, by integrating the.lm generator and.lm.DMP converter into Sphinx.Native text file input for language model, by integrating the.lm generator and.lm.DMP converter into Sphinx. Better handling of utterance fragmentsBetter handling of utterance fragments