ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 0 of 36 Seungchan Lee Intelligent Electronic Systems Human and Systems Engineering Department.

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 0 of 36 Seungchan Lee Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering On a Utility for Speaker Verification Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 1 of 36 Set up a standard IES environment The first appearance at CAVS is good. The first thing to do is set up IES environment.  Create Enlistment  Our production system is consist of many classes  I’m surprised at the structure of our software environment. Even though many works has been already done, I need to consolidate our system with other IFCers.  GroupWise : Good communication and schedule management tools within our group  After that, I could make a program and compile it in my local machine. client CVS repository SERVER

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 2 of 36 First IFC program, instruction First simple IFC program do the following instructions  Reads a 3×3 float matrix from an Sof file.  Reads a 3×1 float vector from an Sof file.  Multiples the vector and matrix using the equation Z=alpha*A*B  Writes the result to an Sof file.  Allows the value of alpha to be set from the command line: foo.exe –alpha 2.0 input.sof output.sof

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 3 of 36 First IFC program, flow Foo.exe Foo.exe –alpha 2.0 input_file output_file Read Input Sof Write to Output Sof file Read 3×3 float matrix Read 3×1 float vector Multiples the vector and matrix

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 4 of 36 First IFC program After completing first IFC program, I ’m more familiar with our production system. When I have questions about our production system, our prominent group members always helps me about my questions. It’s good to study alone, but sometime it is better to ask an expert in the programming. The more I know about our production system, The more I have many questions. The more I have many questions.

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 5 of 36 First IFC program First Question - How can we view the contents of the class? Answer : It is possible through debug method. In order to view the contents of the Sof object, it is so hard to figure out during the debugging time. Instead of, I used debug method that is included in the source code. Sometimes this may retard our debugging time, but I know this is best way until now. Thus, I can figure out which data is contained in the Sof object. All other class variables are same.

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 6 of 36 First IFC program Second Question - Using debug method, why changes string capacity? SysString token = “oscar had a heap of apples” Using debug method, we can see the each value. value_d = (5 >= 5) oscar value_d = (5 >= 3) had value_d = (5 >= 1) a value_d = (5 >= 4) heap value_d = (5 >= 2) of value_d = (6 >= 6) apples Answer In the expression (n >= m), 'n' is to total capacity of the data structure, and 'm' is the current length. So for the line: value_d = (5 >= 3) had The capacity of the SysString is 5 and the current length is 3, which is obvious from the string 'had'.

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 7 of 36 First IFC program Third Question - What “L” means? - In our production system, all of the classes uses “L” character. For example, SysString file1; file1.assign(L"/tmp/foo_bin.sof"); I didn’t exactly figure out why this “L” is used. Answer  The "L" is just a macro that tells the compiler that the following string is a Unicode string.

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 8 of 36 ISIP_VERIFY Basic Work Flow - Decide what added and removed in the new version - Analyze old version - Draw class diagram - Design new version - Coding and Compilation - Testing and fixing bugs

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 9 of 36 ISIP_VERIFY Decide what added and removed in the new version - Currently, isip_verify does Speaker Verification, but only uses HMM algorithm. We want new isip_verify performs that function using HMM, SVM, RVM algorithm. This means new version of “isip_verify” will be more general utility than the old version. Analyze old version - isip_verify utility uses SpeakerVerifier,VerifyHMM,HMM classes, and does both training and testing. Different to the “GMM” case, “SVM” statistical model have “isip_svm_learn” and “isip_svm_classify”. While “isip_svm_learn” utility can process training, “isip_svm_classify” can process testing.

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 10 of 36 ISIP_VERIFY The problem : 1. isip_verify can process only using “GMM” statistical model. 2. We does not have “RVM” routine which can do same function of “SVM” utility. Solution : 1. Add SVM, RVM routine in the isip_verify 2. Add same functionality in the RVM class. 3. Modify the SpeakerVerifier class. We can make a utility which can do all functions which I mentioned. To begin with, I drew class block diagrams of each utility and make sure the relationship of classes and functions. After that, I could figure out more easily about these utilities. Next, I drew the flow chart of new utilities.

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 11 of 36 ISIP_VERIFY ISIP_VERIFY (util/speech) SpeakerVerifier (asr) VerifyHMM (pr) If algorithm = HMM If algorithm = VERIFY Verify() HiddenMarkovModel(pr) If algorithm = TRAIN Train and model creation If implementation = LIKELIHOOD Verifyl() LIKELIHOOD RATIO Verifylr() run() algorithm = TRAIN Implementation = BAUM WELCH linearDecoder()Run() else Set algorithm Set implementation Class Block Diagram Parameter check

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 12 of 36 ISIP_VERIFY ISIP_SVM_LEARN isip_svm_learn (util/speech) SupportVectorMachine(pr) if algorithm = SEQUENTIAL_MINIMAL_OPTIMIZATION sequentialMinimalOptimization() train() determine the support vector writeModel() loadFeature() positive example, negative example StatisticalModel(stat) – SupportVectorModel type StatisticalModelBase SupportVectorModel(stat) getSupportVectorModel() getBias() getKernels() getAlphas() getSupportVectors() write() Parameter check

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 13 of 36 ISIP_VERIFY ISIP_SVM_CLASSIFY isip_svm_classify (util/speech) StatisticalModel (stat) AudioDatabase (mmedia) FeatureFile( mmedia) read() getRecord() getBufferData() getSupportVectorModel() open() getDistance() open write the distance to output file

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 14 of 36 ISIP_VERIFY FLOW CHART Isip_verify (new version) algorithm HMMSVMRVM mode train testtrain test svmTrain() svmTest()rvmTrain() rvmTest() isip_verify (old version) isip_verify -param.sof.... -algo_type [hmm,svm,rvm] –mode [train, test] Check statistical_model = “GMM” error No error No Check statistical_model = “SVM” Check statistical_model = “RVM” Check “algo_type” option Check “mode” option (Model incorrect) verifyHMM class processes parameter file for isip_verify which can do both training and testing = gmmVerify() Yes No Since no algo_type was specified, HMM algo_type was chosen statistical model Yes statistical model No You must specify mode error Yes

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 15 of 36 ISIP_VERIFY Coding and Compilation 1. Add and remove parameters and check the parameters (Won) 2. Combine three functionality - new “isip_verify” performs run() method in that utility and run() method call support vector machine object or relevance vector machine object, then performing training. This enables us to implement three models on one utility. 3. SpeakerVerifier class - include SVM, RVM class - modify parameter check - modify run(sdb) method - add run(pos_sdb,neg_sdb) method 4. RVM class - Add training and testing module (Sridhar)

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 16 of 36 ISIP_VERIFY Problems during coding and compilation - How to verify SpeakerVerifier class? After modifying existing class, we need to verify the correctness. Diagnose method performs this functionality in our production system. This method is implemented *_02.cc in every class. After compiling the class, we execute “make test”. This automatically check every function in that class. - How can we resolve segmentation fault? One of the most difficult things to figure out the reason. Comment out all new modules, and then add one module, compile the class. And then test it. This is continued when every new module is tested.

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 17 of 36 ISIP_VERIFY Problems during coding and compilation - Compilation, debugging time - When developing a new program, one of the most time consuming works is compiling and debugging. - In our production system, it takes much time to compile and debug a program. We have so many linking processes when compiling a program. - How can we resolve it?  It is faster to do in our local repository.

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 18 of 36 ISIP_VERIFY Testing and fixing bugs  This part is as important as previous steps.  We can find faults and missing points during this step.  Problems :  What happens sdb object?  Normally, sdb object contains every commandline options.(except parameters)  However, the sdb object loses its contents when passing to the SpeakerVerifier class.  How can fix that? Comment out all code except control code. This is because I did not give list file option.

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 19 of 36 Software Release What need to know for Software Release?  Varmint utility : to track down all problems  Production system : In order to better understand our system, I did and will do the followings. Data Preparation Feature Extraction Recognition Acoustic modeling Language modeling  These will be more specifically explain after this topic

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 20 of 36 Software Release ProductionRuleTokenType class  It uses lots of if-else statement when doing read/write function.  Instead of doing this, we can use NameMap class.  In order to do that, Declare the NameMap class and modified related module. Problems : I met run-time errors. Solution : –I made a simple program that includes diagnose method in prtt_02.cc. –After track down the function, I could find the reason. –I firstly checked in this class on our production system.

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 21 of 36 Software Release isip_lm_tester  This utility randomly generates sentences based on the language model file and tests the language model.  Problem : Currently, generating state transcriptions won’t generate past first symbols at the highest level.  What to do? I need to track down this problem, but it requires to the understanding of language model.  Read and study our tutorial on the production system thoroughly, and then can involve in fixing bugs in isip_lm_tester.

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 22 of 36 Production System In this part, I will go from Data preparation to Feature extraction. How can we better understand our production system? - Data Preparation - Feature extraction - Recognition - Acoustic modeling - Language modeling

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 23 of 36 Production System, Data Preparation Data Preparation  Why difficult as a beginner? -In normal programming, preparing input data is not hard. -But, in our production system, it is not easy to prepare that for a beginner.  It requires the knowledge of speech. This includes speech file format, file conversion, sampling  Speech file - Header + Sampled data - Sampled data  raw files header Data header Data header Data header Data Data WAV, Sof SPHERE, AU Raw

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 24 of 36 Production System, Data Preparation Sof Format  Information of thelocation of each object stored in the file, and the corresponding object data.  Support two basic storage formats - text : human readable files - binary : sampled data  Used by all data objects in the ISIP environment to unify and simplify I/O.  Binary format : - Handle machine architecture differences with automatic byte transformations. - Used for large quantities of data for the obvious efficiency gains. - The objects are stored in a binary tree and a symbol table is used to hold the object class names.

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 25 of 36 Production System, Data Preparation  Text format : - Used User input parameter files in the ISIP environment. - Simple format that consists of object names and tags, followed by the object data - Example : @ Sof v1.0 @ @ Float 0 @ value = 1.3; @ VectorFloat 0 @ value = 3.5,5.7,3.8; @ VectorLong 0 @ value = 2,3;

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 26 of 36 Production System, Data Preparation Converting from external (i.e., SPHERE, WAV) format to raw format. speech.sph  speech.raw 1. Convert the SPHERE file's binary data to 16-bit linear samples using w_decode  w_decode -o pcm speech.sph speech-nb.sph 2. Strip the file's header using h_strip  h_strip speech-nb.sph speech.raw 3. The result is speech.raw which is identical everything except missing first 1024-bytes header information 4. One line command : 2 + 3 w_decode -o pcm speech.sph - | h_strip - - > speech.raw Header Data

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 27 of 36 Production System, Data Preparation Verification of Conversion to Raw  SoX: Audio Playback  sox -t.sw -r 16000 speech.raw -t.au speech.au  audioplay speech.au  File Size Comparison: Using "ls -l"  ls -l speech.* -rw-rw-r-- 1 may isip 97486 Sep 10 15:19 speech.raw -rw-rw-r-- 1 may isip 98510 Sep 10 15:12 speech.sph  We can see the fifth field is file size. Speech.raw is 1024 bytes smaller than speech.sph.  Octel Dump (od): Listing Values  od -t d2 speech.raw

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 28 of 36 Production System, Data Preparation Creating Sof file : raw file  Sof file  Using isip_make_sof  type the following :  isip_make_sof speech.raw  This creates binary file. If you want to create text version, type the following  isip_make_sof -type text -suffix _text speech.raw Data Header Isip_make_sof

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 29 of 36 Production System, Feature Extraction What is feature extraction?  Speech Recognizer dose not understand human voice  Only certain features of human voice are useful for recognizer decoding  Must be numerically measured and stored  feature vector  The process of taking these measurements is known as feature extraction.  Include the followings. -converting the signal to a digital form -measuring some important characters of the signal -augmenting these measurements Human voice MicroPhone Digital Signal

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 30 of 36 Production System, Feature Extraction Frame  Typical frame duration in speech recognition is 10 ms  Determines the number of times we produce a feature vector Window  Typical window duration is 25 ms  Surrounding the frame for smoother representation of the speech data  Determine the number of samples Sampling rate :  number of samples per second taken from a continuous signal to make a discrete signal  Example) 8 Khz sampling rate with a frame duration of 10 ms, measurements would be taken over 80 samples to produce one feature vector.

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 31 of 36 Production System, Feature Extractionm, Signal Flow Graph Basic process of extracting a single feature  Input : Speech data stored in digital form on a computer.  Energy : A computer program or algorithm specifically designed to measure energy values in the speech data.  Ouput : A computer file which stores the measurements of features Including window  Determine the number of samples used to calculate the energy measurements inputEnergyoutput inputEnergyoutputWind

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 32 of 36 Production System, Feature Extraction, Signal Flow Graph Process of computing the frequency spectrum for a speech signal  Energy – time domain  Converting signals from the time domain to the frequency domain  Spec : represents the Fourier Transform Additional methods are needed to fully measure the features needed by a speech recognizer. Further analyze FFT of speech signal MFCC : Use a mathematical transformation called the cepstrum which computes the inverse Fourier transform of the log-spectrum of the speech signal. inputSpecoutputWind inputSpecoutputWindCeps

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 33 of 36 Production System, Feature Extraction, Signal Flow Graph Recipe :  The information for each component is stored in a single entity. - format of the speech input - algorithms for extracting the features - format of the output - make recipe using isip_transform - Example) simple signal flow graph for extracting energy inp out Engy Recipe1 Recipe File

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 34 of 36 Production System, Feature Extraction, Signal Flow Graph More complex Recipes  A single recipe file is produced for the entire graph. inp out Wind Recipe2 Recipe File EngyCeps

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 35 of 36 Q & A 1. ordinary data type and function - In our production system, all data type is used in our classes. Instead of using float, why we use Float? This made me so confused. When I tried to use commandline interface, I used cout, cin function in C++ class. However, the situation is different in our system.

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 36 of 36 Reference Production System Tutorial http://www.cavs.msstate.edu/hse/ies/projects/speech/software/tutorials/production/ fundamentals/current/

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 0 of 36 Seungchan Lee Intelligent Electronic Systems Human and Systems Engineering Department.

Similar presentations

Presentation on theme: "ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 0 of 36 Seungchan Lee Intelligent Electronic Systems Human and Systems Engineering Department."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 0 of 36 Seungchan Lee Intelligent Electronic Systems Human and Systems Engineering Department.

Similar presentations

Presentation on theme: "ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 0 of 36 Seungchan Lee Intelligent Electronic Systems Human and Systems Engineering Department."— Presentation transcript:

Similar presentations

About project

Feedback