Presentation is loading. Please wait.

Presentation is loading. Please wait.

Wake-up Word Detector Douglas Rauscher ECE5525 April 30, 2008.

Similar presentations


Presentation on theme: "Wake-up Word Detector Douglas Rauscher ECE5525 April 30, 2008."— Presentation transcript:

1 Wake-up Word Detector Douglas Rauscher ECE5525 April 30, 2008

2 Douglas Rauscher Introduction  The purpose of this project is to generate feature vectors and Hidden Markov Models for a single word  Data is processed using Sphinx and Matlab  The Wake-up Word chosen is “Help”

3 Douglas Rauscher Corpus  The corpus used is the original WUW_Corpus, provided on the ECE5526 server: ftp://163.118.203.219/CORPORA/WUW_Corpora/WUW_Corpus/  This corpus was used because single utterances of the word “Help” were frequent in the data set  Data is in µ-law format

4 Douglas Rauscher File lists & Transcriptions  Before processing in Sphinx, “transcription” and “fileids” files need to be created: wuw_corpus_train.fileidswuw_corpus_train.fileids wuw_corpus_train.transcriptionwuw_corpus_train.transcription wuw_corpus_test.fileidswuw_corpus_test.fileids wuw_corpus_test.transcriptionwuw_corpus_test.transcription  These were created in Matlab by searching the given “|”-delimited file for “Help” utterances.  80% of “Help” utterances were used in the training list. The remaining 20% were used in the test list.  All utterances that did not contain “Help” were included in the test set to test for false alarms.  A handful of the utterances in the original.trans file were manually removed from the list because either They had no data bytes in the fileThey had no data bytes in the file Sphinx had trouble with the sound qualitySphinx had trouble with the sound quality The utterance was cut off in such a way that Sphinx threw an errorThe utterance was cut off in such a way that Sphinx threw an error

5 Douglas Rauscher dcr_extract.m close all; clear all; clc; A = textread('C:\CMUtutorial\WUW_Corpus\wuw.trans','%s','delimiter','|'); idx = 1:length(A); idx = idx((strcmp(A,'Male')+strcmp(A,'Female'))>0); gender = A(idx); dialect = A(idx+1); phone_type = A(idx+2); filename = A(idx+3); CallNO = A(idx+4); UttNO = A(idx+5); Ortho = A(idx+6); AllIdx = 1:length(Ortho); HelpIdx = AllIdx(strcmp(Ortho,'Help')); NotHelpIdx = AllIdx(~strcmp(Ortho,'Help')); N = floor(length(HelpIdx)*0.8); fout = fopen('C:\CMUtutorial\WUW_Corpus\etc\wuw_corpus_train.fileids','w'); ftsn = fopen('C:\CMUtutorial\WUW_Corpus\etc\wuw_corpus_train.transcription','w'); for k=1:N fprintf(fout,'calls/%s/WUW%s_%s\n',char(filename(HelpIdx(k))),... fprintf(fout,'calls/%s/WUW%s_%s\n',char(filename(HelpIdx(k))),... char(filename(HelpIdx(k))),... char(filename(HelpIdx(k))),... char(CallNO(HelpIdx(k)))); char(CallNO(HelpIdx(k)))); fprintf(ftsn,' %s (WUW%s_%s)\n',upper(char(Ortho(HelpIdx(k)))),... fprintf(ftsn,' %s (WUW%s_%s)\n',upper(char(Ortho(HelpIdx(k)))),... char(filename(HelpIdx(k))),... char(filename(HelpIdx(k))),... char(CallNO(HelpIdx(k)))); char(CallNO(HelpIdx(k))));endfclose(fout);fclose(ftsn); fout = fopen('C:\CMUtutorial\WUW_Corpus\etc\wuw_corpus_test.fileids','w'); ftsn = fopen('C:\CMUtutorial\WUW_Corpus\etc\wuw_corpus_test.transcription','w'); % Remaining "Help" for k=(N+1):length(HelpIdx) fprintf(fout,'calls/%s/WUW%s_%s\n',char(filename(HelpIdx(k))),... fprintf(fout,'calls/%s/WUW%s_%s\n',char(filename(HelpIdx(k))),... char(filename(HelpIdx(k))),... char(filename(HelpIdx(k))),... char(CallNO(HelpIdx(k)))); char(CallNO(HelpIdx(k)))); fprintf(ftsn,'%s (WUW%s_%s)\n',upper(char(Ortho(HelpIdx(k)))),... fprintf(ftsn,'%s (WUW%s_%s)\n',upper(char(Ortho(HelpIdx(k)))),... char(filename(HelpIdx(k))),... char(filename(HelpIdx(k))),... char(CallNO(HelpIdx(k)))); char(CallNO(HelpIdx(k))));end % Other utterances for k=1:length(NotHelpIdx) fprintf(fout,'calls/%s/WUW%s_%s\n',char(filename(NotHelpIdx(k))),... fprintf(fout,'calls/%s/WUW%s_%s\n',char(filename(NotHelpIdx(k))),... char(filename(NotHelpIdx(k))),... char(filename(NotHelpIdx(k))),... char(CallNO(NotHelpIdx(k)))); char(CallNO(NotHelpIdx(k)))); fprintf(ftsn,'%s (WUW%s_%s)\n',upper(char(Ortho(NotHelpIdx(k)))),... fprintf(ftsn,'%s (WUW%s_%s)\n',upper(char(Ortho(NotHelpIdx(k)))),... char(filename(NotHelpIdx(k))),... char(filename(NotHelpIdx(k))),... char(CallNO(NotHelpIdx(k)))); char(CallNO(NotHelpIdx(k))));endfclose(fout);fclose(ftsn);

6 Douglas Rauscher Data preparation  Corpus data was originally: file extension.ulawfile extension.ulaw 8-bit µ-law format8-bit µ-law format 8kHz sample rate8kHz sample rate  This data must be converted, as.ulaw files are not readable by Sphinx.  Format chosen to convert to: File extension.rawFile extension.raw 16-bit linear quantization16-bit linear quantization 16kHz (linearly interpolated)16kHz (linearly interpolated)

7 Douglas Rauscher ulaw2raw.m for k=0:252 ulaw2raw(sprintf('C:\\CMUtutorial\\WUW_Corpus\\calls\\%05d\\',k),0); ulaw2raw(sprintf('C:\\CMUtutorial\\WUW_Corpus\\calls\\%05d\\',k),0);end function ulaw2raw(filepath,playflag) % ulaw2raw('C:\CMUtutorial\WUW_Corpus\calls\00000\'); cd_save = cd; cd(filepath); files = dir; % US standard u-law coeff u=255; for k=3:length(files) if (files(k).isdir==0) && (strcmp(files(k).name(end-4:end),'.ulaw')) if (files(k).isdir==0) && (strcmp(files(k).name(end-4:end),'.ulaw')) disp(files(k).name); disp(files(k).name); fin = fopen(files(k).name,'r'); fin = fopen(files(k).name,'r'); A = fread(fin,'int8'); A = fread(fin,'int8'); % move data to proper sign % move data to proper sign A1 = A.*(A 0); A1 = A.*(A 0); % remove u-law % remove u-law B1 = sign(A1).*(1/u).*(((1+u).^abs(A1/128))-1); B1 = sign(A1).*(1/u).*(((1+u).^abs(A1/128))-1); B2 = reshape([B1,((B1+[B1(2:end);0])./2)].',1,[]); B2 = reshape([B1,((B1+[B1(2:end);0])./2)].',1,[]); if(playflag) if(playflag) sound(B2,16000) sound(B2,16000) pause(length(B2)/16000); pause(length(B2)/16000); end end fclose(fin); fclose(fin); generateRawWav(files(k).name(1:end-5),B2); generateRawWav(files(k).name(1:end-5),B2); end endendcd(cd_save); function generateRawWav(filename,data) fout = fopen(strcat(filename,'.raw'),'w'); dataq = round(32768.*data./128); fwrite(fout,dataq,'int16');fclose(fout);

8 Douglas Rauscher Language model creation  For a Wake-up Word recognizer, a language model is not particularly desirable in detecting the word.  Sphinx allows you to weight the priority of the language model in it’s calculations, but does not appear to allow the user to disable the language model all together.  Therefore, to avoid errors, a custom language model had to be created. 1.The lm tool generator was used to convert a text file that contained only the word “Help” to a.lm file. http://www.speech.cs.cmu.edu/tools/lmtool.html 2.The lm3g2dmp tool was used to convert the.lm file to.lm.DMP format. run cmd cd C:\CMUtutorial\lm3g2dmp\Debug> lm3g2dmp 7092.lm./

9 Douglas Rauscher Training the Model  Sphinx Training Configuration file was edited to use proper input files  The Max Number of Gaussians was set to 8  The Number of HMM States was increased from 3 to 5, without significant improvement  Sphinx commands: cd c:/CMUtutorial/WUW_Corpus/ perl scripts_pl/make_feats.pl -ctl etc/wuw_corpus_train.fileids -cfg etc/sphinx_train.cfg -param etc/feat.params perl scripts_pl/RunAll.pl

10 Douglas Rauscher Testing the Model  Sphinx Testing Configuration file was edited to use proper input files.  Language model weight was set to “1” (the lowest allowable setting)  Number of Gaussians was set to 8 to match the training configuration  Sphinx commands: perl scripts_pl/make_feats.pl -ctl etc/wuw_corpus_test.fileids -cfg etc/sphinx_decode.cfg -param etc/feat.params perl scripts_pl/decode/slave.pl

11 Douglas Rauscher Sphinx Output  Sphinx was used to calculate Acoustic Scoring only, not to perform thresholding.  These resulting scores were parsed in Matlab and PDF/CDF plots were generated.  See attached output document for raw Cygwin output

12 Douglas Rauscher plotDistributions.m % plotDistributions clear all; clc; close all; fn = 'C:\CMUtutorial\WUW_Corpus\logdir\decode\wuw_corpus-1-1.log'; RawText = textread(fn,'%s'); idx = []; for k=1:(length(RawText)-6) if(~isempty(findstr(char(RawText(k)),'fv:')) &&... if(~isempty(findstr(char(RawText(k)),'fv:')) &&... strcmp(char(RawText(k+1)),'HELP')) strcmp(char(RawText(k+1)),'HELP')) idx = [idx; k:k+7]; idx = [idx; k:k+7]; end endend RawText = RawText(idx); % fetch and plot Acoustic Score histograms HelpAScr = []; FalsAScr = []; for k=1:size(RawText,1) if(findstr(char(RawText(k,1)),'_008>')) if(findstr(char(RawText(k,1)),'_008>')) % True HELP % True HELP HelpAScr = [HelpAScr str2num(char(RawText(k,5)))]; HelpAScr = [HelpAScr str2num(char(RawText(k,5)))]; else else % Not a HELP % Not a HELP FalsAScr = [FalsAScr str2num(char(RawText(k,5)))]; FalsAScr = [FalsAScr str2num(char(RawText(k,5)))]; end endend mn = min(min(HelpAScr),min(FalsAScr)); mx = max(max(HelpAScr),max(FalsAScr)); vals = mn:((mx-mn)/100):mx; HelpAScrHist = hist(HelpAScr,vals); HelpAScrHist = HelpAScrHist./sum(HelpAScrHist); FalsAScrHist = hist(FalsAScr,vals); FalsAScrHist = FalsAScrHist./sum(FalsAScrHist); for k=1:length(vals) HelpAScrCDF(k) = sum(HelpAScrHist(1:k)); HelpAScrCDF(k) = sum(HelpAScrHist(1:k)); FalsAScrCDF(k) = sum(FalsAScrHist(k:end)); FalsAScrCDF(k) = sum(FalsAScrHist(k:end));endfigure; subplot(2,1,1); plot(vals,HelpAScrHist,'b',vals,FalsAScrHist,'r'); title('Probability Density Function') legend('Help','Other Utterances') axis([mn,mx,0,1.1*max(max(HelpAScrHist),max(FalsAScrHist))]); subplot(2,1,2); plot(vals,HelpAScrCDF, 'b',vals,FalsAScrCDF, 'r'); title('Cumulative Distribution Function') axis([mn,mx,0,1.1]);

13 Douglas Rauscher plotDistributions.m

14 Conclusions  Sphinx had problems correctly detecting the word “Help” in this test, but there is clearly a decent model created.  The test set was rather constrained and limited, and would benefit from a much larger sampling of “Help” utterances.  Sphinx features that would have been nice: Native.ulaw file inputNative.ulaw file input Simpler mechanism to input sample rateSimpler mechanism to input sample rate Native text file input for language model, by integrating the.lm generator and.lm.DMP converter into Sphinx.Native text file input for language model, by integrating the.lm generator and.lm.DMP converter into Sphinx. Better handling of utterance fragmentsBetter handling of utterance fragments


Download ppt "Wake-up Word Detector Douglas Rauscher ECE5525 April 30, 2008."

Similar presentations


Ads by Google