Speech Recognition LIACS Media Lab Leiden University Seminar Speech Recognition Project Support E.M. Bakker LIACS Media Lab (LML) Leiden University.

Slides:

Advertisements

Similar presentations

Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.

Advertisements

Building an ASR using HTK CS4706

BravoBrava Mississippi State University Spontaneous telephone speech is still a “grand challenge”. Telephone-quality speech is still central to the problem.

BravoBrava Mississippi State University Can Advances in Speech Recognition make Spoken Language as Convenient and as Accessible as Online Text? Joseph.

Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.

Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.

Frederico Rodrigues and Isabel Trancoso INESC/IST, 2000 Robust Recognition of Digits and Natural Numbers.

SPEECH RECOGNITION Kunal Shalia and Dima Smirnov.

Seminar Speech Recognition 2003 E.M. Bakker LIACS Media Lab

Toward Semantic Indexing and Retrieval Using Hierarchical Audio Models Wei-Ta Chu, Wen-Huang Cheng, Jane Yung-Jen Hsu and Ja-LingWu Multimedia Systems,

Speech Recognition Training Continuous Density HMMs Lecture Based on:

Phoneme Alignment. Slide 1 Phoneme Alignment based on Discriminative Learning Shai Shalev-Shwartz The Hebrew University, Jerusalem Joint work with Joseph.

Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo.

A PRESENTATION BY SHAMALEE DESHPANDE

Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.

ISSUES IN SPEECH RECOGNITION Shraddha Sharma

May 20, 2006SRIV2006, Toulouse, France1 Acoustic Modeling of Accented English Speech for Large-Vocabulary Speech Recognition ATR Spoken Language Communication.

Audio Processing for Ubiquitous Computing Uichin Lee KAIST KSE.

Introduction to Automatic Speech Recognition

Isolated-Word Speech Recognition Using Hidden Markov Models

Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.

Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.

Multimodal Interaction Dr. Mike Spann

Speech and Language Processing

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.

Speech, Perception, & AI Artificial Intelligence CMSC March 5, 2002.

Csc Lecture 7 Recognizing speech. Geoffrey Hinton.

Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.

Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,

IRCS/CCN Summer Workshop June 2003 Speech Recognition.

LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.

Rundkast at LREC 2008, Marrakech LREC 2008 Ingunn Amdal, Ole Morten Strand, Jørn Almberg, and Torbjørn Svendsen RUNDKAST: An Annotated.

Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.

Speech, Perception, & AI Artificial Intelligence CMSC February 13, 2003.

Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.

PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.

Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,

Speech Recognition LIACS Media Lab Leiden University Speech Recognition E.M. Bakker LIACS Media Lab Leiden University.

Speech Recognition LIACS Media Lab Leiden University Seminar Speech Recognition Projects E.M. Bakker LIACS Media Lab Leiden University.

Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.

Seminar Speech Recognition a Short Overview E.M. Bakker

Performance Comparison of Speaker and Emotion Recognition

Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.

RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.

Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.

Statistical Models for Automatic Speech Recognition Lukáš Burget.

EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,

1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.

Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.

Speech Recognition Created By : Kanjariya Hardik G.

Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:

Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.

Speaker Recognition UNIT -6. Introduction  Speaker recognition is the process of automatically recognizing who is speaking on the basis of information.

A NONPARAMETRIC BAYESIAN APPROACH FOR

Speech Recognition UNIT -5.

EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture

Statistical Models for Automatic Speech Recognition

Automatic Speech Recognition

HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs

Speech Processing Speech Recognition

Statistical Models for Automatic Speech Recognition

LECTURE 15: REESTIMATION, EM AND MIXTURES

Speech recognition, machine learning

Speech recognition, machine learning

Presentation transcript:

Speech Recognition LIACS Media Lab Leiden University Seminar Speech Recognition Project Support E.M. Bakker LIACS Media Lab (LML) Leiden University

Speech Recognition LIACS Media Lab Leiden University Introduction What is Speech Recognition? Speech Recognition Words “How are you?” Speech Signal Goal: Automatically extract from the speech signal the string of spoken words Other interesting area’s: –Who is talking (speaker recognition, identification) –Text to speech (speech synthesis) –What do the words mean (speech understanding, semantics)

Speech Recognition LIACS Media Lab Leiden University Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition: P(W|A) = P(A|W) P(W) / P(A) Recognition Architectures A Communication Theoretic Approach Objective: minimize the word error rate Approach: maximize P(W|A) during training Components: P(A|W) : acoustic model (hidden Markov models, mixtures) P(W) : language model (statistical, finite state networks, etc.) The language model typically predicts a small set of next words based on knowledge of a finite number of previous words (N-grams).

Speech Recognition LIACS Media Lab Leiden University Input Speech Recognition Architectures Incorporating Multiple Knowledge Sources Acoustic Front-end Acoustic Front-end The speech signal is converted to a sequence of feature vectors based on spectral and temporal measurements. Acoustic Models P(A/W) Acoustic Models P(A/W) Acoustic models represent sub-word units, such as phonemes, as a finite- state machine. States model spectral structure and transitions model temporal structure. Recognized Utterance Search Efficient searching strategies are crucial to the system, since many combinations of words must be investigated to find the most probable word sequence. The language model predicts the next set of words, and controls which (acoustic) models are hypothesized. Language Model P(W)

Speech Recognition LIACS Media Lab Leiden University Fourier Transform Fourier Transform Cepstral Analysis Cepstral Analysis Perceptual Weighting Perceptual Weighting Time Derivative Time Derivative Time Derivative Time Derivative Energy + Mel-Spaced Cepstrum Delta Energy + Delta Cepstrum Delta-Delta Energy + Delta-Delta Cepstrum Input Speech Knowledge of the nature of speech sounds is incorporated in the feature measurements. Utilize rudimentary models of human perception. Acoustic Modeling Feature Extraction Measure features 100 times per sec. (10msec) Use a 25 msec window for frequency domain analysis. Include absolute energy and 12 spectral measurements. Time derivatives are used to model spectral change.

Speech Recognition LIACS Media Lab Leiden University Acoustic models encode the temporal evolution of the features (spectrum). Gaussian mixture distributions are used to account for variations in speaker, accent, and pronunciation. Phonetic model topologies are simple left-to-right structures. Skip states (time-warping) and multiple paths (alternate pronunciations) are also common features of models. Sharing model parameters is a common strategy to reduce complexity. Acoustic Modeling Hidden Markov Models

Speech Recognition LIACS Media Lab Leiden University Acoustic Models (HMM) Some typical HMM topologies used for acoustic modeling in large vocabulary speech recognition: a) typical triphone, b) short pause c) silence. The shaded states denote the start and stop states for each model.

Speech Recognition LIACS Media Lab Leiden University Closed-loop data-driven modeling supervised only from a word-level transcription. The expectation/maximization (EM) algorithm is used to improve our parameter estimates. Computationally efficient training algorithms have been crucial. Batch mode parameter updates are typically preferred. Decision trees are used to optimize parameter-sharing, system complexity, and the use of additional linguistic knowledge. Acoustic Modeling Parameter Estimation Initialization Single Gaussian Estimation 2-Way Split Mixture Distribution Reestimation 4-Way Split Reestimation

Speech Recognition LIACS Media Lab Leiden University Language Modeling The Wheel of Fortune

Speech Recognition LIACS Media Lab Leiden University Language Modeling N-Grams (Words) Bigrams (SWB): Most Common: “you know”, “yeah SENT!”, “!SENT um-hum”, “I think” Rank-100: “do it”, “that we”, “don’t think” Least Common:“raw fish”, “moisture content”, “Reagan Bush” Trigrams (SWB): Most Common: “!SENT um-hum SENT!”, “a lot of”, “I don’t know” Rank-100: “it was a”, “you know that” Least Common:“you have parents”, “you seen Brooklyn” Unigrams (SWB): Most Common: “I”, “and”, “the”, “you”, “a” Rank-100: “she”, “an”, “going” Least Common: “Abraham”, “Alastair”, “Acura”

Speech Recognition LIACS Media Lab Leiden University Language Modeling Integration of Natural Language Natural language constraints can be easily incorporated. Lack of punctuation and search space size pose problems. Speech recognition typically produces a word-level time-aligned annotation. Time alignments for other levels of information also available.

Speech Recognition LIACS Media Lab Leiden University Typical LVCSR systems have about 10M free parameters, which makes training a challenge. Large speech databases are required (several hundred hours of speech). Tying, smoothing, and interpolation are required. Implementation Issues Search Is Resource Intensive

Speech Recognition LIACS Media Lab Leiden University Dynamic programming is used to find the most probable path through the network. Beam search is used to control resources. Implementation Issues Dynamic Programming-Based Search Search is time synchronous and left-to-right. Arbitrary amounts of silence must be permitted between each word. Words are hypothesized many times with different start/stop times, which significantly increases search complexity.

Speech Recognition LIACS Media Lab Leiden University Cross-word Decoding: since word boundaries don’t occur in spontaneous speech, we must allow for sequences of sounds that span word boundaries. Cross-word decoding significantly increases memory requirements. Implementation Issues Cross-Word Decoding Is Expensive

Speech Recognition LIACS Media Lab Leiden University Example ASR System: RES

Speech Recognition LIACS Media Lab Leiden University Applications Conversational Speech Conversational speech collected over the telephone contains background noise, music, fluctuations in the speech rate, laughter, partial words, hesitations, mouth noises, etc. WER (Word Error Rate) has decreased from 100% to 30% in six years. Laughter Singing Unintelligible Spoonerism Background Speech No pauses Restarts Vocalized Noise Coinage

Speech Recognition LIACS Media Lab Leiden University Applications Audio Indexing of Broadcast News Broadcast news offers some unique challenges: Lexicon: important information in infrequently occurring words Acoustic Modeling: variations in channel, particularly within the same segment (“ in the studio” vs. “on location”) Language Model: must adapt (“ Bush,” “Clinton,” “Bush,” “McCain,” “???”) Language: multilingual systems? language-independent acoustic modeling?

Speech Recognition LIACS Media Lab Leiden University From President Clinton’s State of the Union address (January 27, 2000): “These kinds of innovations are also propelling our remarkable prosperity... Soon researchers will bring us devices that can translate foreign languages as fast as you can talk... molecular computers the size of a tear drop with the power of today’s fastest supercomputers.” Applications Real-Time Translation Imagine a world where: You book a travel reservation from your cellular phone while driving in your car without ever talking to a human (database query) You converse with someone in a foreign country and neither speaker speaks a common language (universal translator) You place a call to your bank to inquire about your bank account and never have to remember a password (transparent telephony) You can ask questions by voice and your Internet browser returns answers to your questions (intelligent query) Human Language Engineering: a sophisticated integration of many speech and language related technologies... a science for the next millennium.

Speech Recognition LIACS Media Lab Leiden University RES Copying the source code The Sound Files The Source Code The Modules The Examples Compiling the Code with MS Visual C++

Speech Recognition LIACS Media Lab Leiden University RES: Copying the source code Copy all the files from the CD to a directory. –Right-click on the RES directory that was just copied and left- click –Deselect the read-only option. –Left-click –Apply to all sub-folders – Adobe Acrobat Reader – Gpp for MS-DOS, Linux and MS Projects – Sound, Annotation, and Feature Files – Source Code used in the Projects – Compiled examples for testing

Speech Recognition LIACS Media Lab Leiden University RES: The Sound Files Directory RES\Sndfile File types: –.wav16 kHz, signed 16-bits, mono sound files –.phnannotated phoneme representation –.sgmannotated phoneme representation –.srotext string –.lsntext string –.ftsFeaturesFile

Speech Recognition LIACS Media Lab Leiden University RES: Speech Databases Many distributed by the Linguistic Data Consortium TIMIT and ATIS are the most important databases used to build acoustic models of American English TIMIT (TI (Texas Instruments) + MIT) –1 CD, 5.3 hours, 650Mbytes, 630 speakers of 8 main US regional varieties –6300 sentences, divided in train (20-30%) and test database (70-80%) –none of the speakers appear in both sets –minimal coincidence of the same words in the two sets –phonetic database, all phonemes are included many times in different contexts –Every phrase is described by: file.txt the orthographic transcription of the phrase (spelling) file.wav the wavefile of the sound file.phn the correspondence between the phonems and the samples file.wrd the correspondence between the words and the samples –Furthermore: SX are phonetically compact phrases in order to abtain a good coverage of every pair of phones SI phonetically varied phrases, for different allophonic contexts SA for dialectal pronunciation

Speech Recognition LIACS Media Lab Leiden University RES: Speech Databases ATIS (Air Travel Information System, 1989 ARPA-SLS project) –6 CD, 10,2 hours, 2,38 Gbytes, 36 speakers, utterances –natural speech in a system for air travel requests “What is the departure time of the flight to Boston?” –word recognition applications –Every phrase is described by: file.catcategory of the phrase file.nliphrase text with point describing what the speaker had in mind file.ptxtext in prompting form (question, exclamation,…) file.snrSNOR (Standard Normal Orthographic Representation) transcription of the phrase (abbreviations and numbers explicitly expanded) file.sqladditional information file.srodetailed description of the major acoustic events file.lsnSNOR lexical transcription derived from the.sro file.log scenario of the session file.wavthe waveform of the phrase in NIST_1A format (sampling rate, LSB or MSB byte order, min max amplitude, type of microphone, etc…) file.win references for the interpretation –Phrase labeling: ‘s’ close-speaking (Sennheiser mic), ‘c’ table microphone (Crown-mic), ‘x’ lack of direct microphone, ‘s’ spontaneous speech, ‘r’ read phrases.

Speech Recognition LIACS Media Lab Leiden University RES: The Sound Files SX127.WAV 16 kHz, signed 16-bits, mono sound files The emporer had a mean temper

Speech Recognition LIACS Media Lab Leiden University RES: The Sound Files SX127.WAV 16 kHz, signed 16-bits, mono sound files SX127.PHN and SX127.SGM annotated phoneme representation h# dh iy q eh m pcl p r ix hv ae dx ix m iy n tcl t eh m pcl p axr h# 02240sil dh iy k eh m sil p r ih hh ae dx ih m iy n sil t eh m sil p er sil THEEMPERORHADAMEANTEMPERTHEEMPERORHADAMEANTEMPER THE EMPEROR HAD A MEAN TEMPER starts at ~17280/16000 = 1.08sec

Speech Recognition LIACS Media Lab Leiden University RES: The Sound Files 4Y0021SS.WAV 16 kHz, signed 16-bits, mono sound files 4Y0021SS.PHN annotated phoneme representation 4Y0021SX.SRO “ which airlines. depart from boston” 4Y0021SS.LSN “ WHICH AIRLINES DEPART FROM BOSTON” 4Y0021SS.FTS FeaturesFile: File=..\..\..\sndfile\4y0021ss.fts window_lenght=512 window_overlap=352 preemphasis_and_hamming_window: preemphasis=0.95 mfcc_with_energy: num_features=12 compute_energy=yes compute_log_of_energy=yes feature_dim= 13 feature_first_byte= 1024 feature_n_bytes= 8 feature_byte_format= 01 end_head

Speech Recognition LIACS Media Lab Leiden University RES: The Source Code baseclas_polymorf –Tests the class implementing polymorphism. The class is used to implement “drivers” that handle different databases or different DSP operations. baseclas_testbase –Tests the classes handling memory and strings. The class handling memory is the root class from which all the other classes are derived. Also diagnostics is tested. Ioclass –Tests the class that retrieves data from speech databases. Feature –Tests the class that performs feature extraction. This class is designed to perform arbitrary sequences of digital signal processing on the input sequence according to the configuration file. Resconf –This project tests the class that handles configuration services.

Speech Recognition LIACS Media Lab Leiden University RES: The Source Code utils –This project shows a simple program that performs arbitrary sequences of operations on a list of files according to the configuration file. The implemented operations are utilities for conversion from MS-DOS to Unix. Vetclas –This project shows and tests the mathematical operations over vectors, diagonal matrices and full matrices.

Speech Recognition LIACS Media Lab Leiden University RES: The Source Code Projects related to programs required for speech recognition Print_feature –This project writes features of each single sound file. This is useful to avoid recomputing features in the embedded training procedure. endpoint_feature –This project does the same as Print_feature but eliminates silences. Print_phon_feature –This project writes features of the required files where all the same phonemes of all the files are collected in one file, i.e. one output feature file for each phoneme. This is required for non-embedded training.

Speech Recognition LIACS Media Lab Leiden University RES: The Source Code Projects related to programs required for speech recognition Initiali –This project initializes the HMM models. HMM model parameters are evaluated according to a clustering procedure training –This project re-estimates HMM models phoneme per phoneme using the Baum–Welch algorithm. The bounds of each phoneme within the utterances are required, i.e. segmentation of all the training speech data. Embedded –This project re-estimates HMM models per utterance using the Baum–Welch algorithm. Segmentation is not required.

Speech Recognition LIACS Media Lab Leiden University RES: The Source Code Projects related to programs required for speech recognition lessico –This project estimates language model parameters according to various algorithms. Recog –This project performs phoneme/word recognition. Segmen –This project performs phonetic segmentation. eval_rec –This project evaluates accuracy of word/phoneme recognition. eval_segm –This project evaluates accuracy of segmentation.

Speech Recognition LIACS Media Lab Leiden University RES Modules Common BaseClasses Configuration and Specification Speech Database, I/O Feature Extraction HMM Initialisation and Training Language Models Recognition: Searching Strategies Evaluators

Speech Recognition LIACS Media Lab Leiden University RES Modules Common BaseClasses Configuration and Specification Speech Database, I/O Feature ExtractionRecognition: Searching Strategies Evaluators Language Models HMM Initialisation and Training

Speech Recognition LIACS Media Lab Leiden University RES Modules: Files baseclas baseclas.cpp Baseclas.h Baseclas.hpp Boolean.h Compatib.h Defopt.h Diagnost.cpp Diagnost.h Polymorf.cpp Polymorf.h Polytest.cpp Testbase.cpp Textclas.cpp Textclas.h Embedded Emb_b_w.cpp Emb_b_w.h Emb_Train.cpp Vetclas Arraycla.cpp Arraycla.h Arraycla.hpp Diagclas.cpp Diagclas.h Diagclas.hpp Testvet.cpp Vetclas.cpp Vetclas.h Vetclas.hpp eval_rec evalopt.cpp evalopt.h Evaluate.cpp Evaluate.h eval_rec.cpp eval_segm eval.cpp eval.h main_eval.cpp Features DSPPROC.CPP endpoint.cpp Feature.cpp Feature.h mean_feature.cpp print_file_feat.cpp print_ph_feat.cpp Test_feature.cpp Initiali Iniopt.cpp Iniopt.h Initiali.cpp Initiali.h Proiniti.cpp labelcl.cpp labelcl.h Soundfil.cpp ioclass Soundfil.h Soundlab.cpp Soundlab.h TESTIONE.CPP Test_MsWav.cpp Lessico lessico.cpp lessico.h lexopt.cpp lexopt.h main_lessico.c pp Recog hypolist.cpp Hypolist.h Hypolist.hpp recog.cpp recopt.cpp recopt.h resconf resconf.cpp Resconf.h TESTCONF.CPP Segment Hypolist.cpp Hypolist.h hypolist.hpp hypolistseg.cpp Segment.cpp Segopt.cpp Segopt.h Training Baumwelc.cpp Baumwelc.h Protrain.cpp tspecmod testtspecbase.cpp Tspecbas.cpp Tspecbas.h Tspecbas.hpp utils multifop.cpp multifop.h

Speech Recognition LIACS Media Lab Leiden University RES Modules: Files baseclas baseclas.cpp Baseclas.h Baseclas.hpp Boolean.h Compatib.h Defopt.h Diagnost.cpp Diagnost.h Polymorf.cpp Polymorf.h Polytest.cpp Testbase.cpp Textclas.cpp Textclas.h Embedded Emb_b_w.cpp Emb_b_w.h Emb_Train.cpp Vetclas Arraycla.cpp Arraycla.h Arraycla.hpp Diagclas.cpp Diagclas.h Diagclas.hpp Testvet.cpp Vetclas.cpp Vetclas.h Vetclas.hpp eval_rec evalopt.cpp evalopt.h Evaluate.cpp Evaluate.h eval_rec.cpp eval_segm eval.cpp eval.h main_eval.cpp Features DSPPROC.CPP endpoint.cpp Feature.cpp Feature.h mean_feature.cpp print_file_feat.cpp print_ph_feat.cpp Test_feature.cpp Initiali Iniopt.cpp Iniopt.h Initiali.cpp Initiali.h Proiniti.cpp labelcl.cpp labelcl.h Soundfil.cpp ioclass Soundfil.h Soundlab.cpp Soundlab.h TESTIONE.CPP Test_MsWav.cpp Lessico lessico.cpp lessico.h lexopt.cpp lexopt.h main_lessico.cpp Recog hypolist.cpp Hypolist.h Hypolist.hpp recog.cpp recopt.cpp recopt.h resconf resconf.cpp Resconf.h TESTCONF.CPP Segment Hypolist.cpp Hypolist.h hypolist.hpp hypolistseg.cpp Segment.cpp Segopt.cpp Segopt.h Training Baumwelc.cpp Baumwelc.h Protrain.cpp tspecmod testtspecbase.cpp Tspecbas.cpp Tspecbas.h Tspecbas.hpp utils multifop.cpp multifop.h

Speech Recognition LIACS Media Lab Leiden University RES: The Examples Test_me/Phoneme/Start_me.bat: “recog res.ini eval_rec res.ini” - The output here is the phoneme recognition. On a 2GHz machine it takes 7 seconds for 3 sentences. Test_me/Word_Rec/Start_me.bat –This test shows an example of word recognition with RES –The file recog.sol contains the recognized sentence, –the file recog.rsl is the true sentence –and result.txt is the result in term of accuracy and percent correct –The recognition module is many times slower than real-time on this notebook, on a 2GHz machine the small example still takes 30 seconds

Speech Recognition LIACS Media Lab Leiden University RES Compiling with MS Visual C++ Building the Executables Goto the directory “RES\Projects\projectMS” Double-click RES.dsw (Click yes, if it wants to convert to a workspace of the current version of MS Visual C++) Goto the MS Visual C++ menu-item Select the items you want to build. Select Left-click the -button. Test_me Again Now the directories: \eval_rec and \recog contain the newly built executables “eval_rec.exe” and “recog.exe”, respectively, that can replace the executables in the directory “\Test_me\PHONEME” Then, by executing “Start_me.bat” you can run the examples with the newly built executable.