Download presentation
Presentation is loading. Please wait.
Published byWilliam Richards Modified over 9 years ago
1
Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By By Hesheng Li Hesheng Li Instructor: Dr.Kepuska Department of Electrical and Computer Engineering
2
Speech Processing and Recognition 2 © Florida Institute of Technology Overview Introduction Three models to access live audio data How to get audio data by using low level API model? Application in speech recognition Comparison and Analysis Conclusion
3
Speech Processing and Recognition 3 © Florida Institute of Technology Introduction Why ? How ? Live audio data access has a Wide application !
4
Speech Processing and Recognition 4 © Florida Institute of Technology Three model to access live audio data High level Digital Audio API-----MCI DirectSound Low level Digital Audio API----WaveX
5
Speech Processing and Recognition 5 © Florida Institute of Technology High level Digital Audio API MCI MCI The media control interface (MCI) provides standard command for playing multi-media device and recording multi-media resource files Two different ways are possible to send devices a command. 1. Command message interface 2. Command string interface
6
Speech Processing and Recognition 6 © Florida Institute of Technology Command message interface Passing binary values and structures to an Audio device is referred to as using the "Command message interface“ We use the function mciSendCommand() to send commands using this approach. Example waveParams.lpstrElementName = "C:\\WINDOWS\\CHORD.WAV"; mciSendCommand(0, MCI_OPEN, MCI_WAIT|MCI_OPEN_ELEMENT|MCI_OPEN_TYPE| MCI_OPEN_TYPE_ID, (DWORD)(LPVOID)&waveParams)
7
Speech Processing and Recognition 7 © Florida Institute of Technology Command string interface Passing strings to an Audio device is referred to as using the "Command string interface“ We use the function mciSendString() to send commands using this approach. Example mciSendString( “ open C:\\WINDOWS\\CHORD.WAV type waveaudio alias A_Chord", 0, 0, 0)))
8
Speech Processing and Recognition 8 © Florida Institute of Technology MCI Some other command: Command message interface: 1.Start record by “MCI _REOCRD” 2.Write data to wave file by “MCI _SAVE” 3.Stop by “MCI _STOP” 4.Play by “MCI_PLAY” Command string interface: 1.Play by "play %s %s %s" 2.Stop by “ stop %s %s %s"
9
Speech Processing and Recognition 9 © Florida Institute of Technology DirectSound Like other components of DirectX,DirectSound allow you to use the hardware in the most efficient way Here are some other things that DirectSound makes easy: Querying hardware capabilities at run time to determine the best solution for any given personal computer configuration Using property sets so that new hardware capabilities can be exploited even when they are not directly supported by DirectSound Low-latency mixing of audio streams for rapid response Implementing three dimensional (3-D) sound
10
Speech Processing and Recognition 10 © Florida Institute of Technology Directsound DirectSound playback is built on the IDirectSoundIDirectSound Component Object Model (COM) interface and on the IDirectSoundBuffer interface for manipulating soundIDirectSoundBuffer buffers. DirectSound capture is based on the IDirectSoundCaptureIDirectSoundCapture and IDirectSoundCaptureBuffer COM interfaces.IDirectSoundCaptureBuffer
11
Speech Processing and Recognition 11 © Florida Institute of Technology Low level Digital Audio API----WaveX Open audio device Open audio device Prepare structure Prepare structure for recording for recording Start recording Data processing Release structure Close audio device
12
Speech Processing and Recognition 12 © Florida Institute of Technology Open Audio Device There are several different approaches you can take, depending upon how fancy and flexible you want your program to be. 1. 1. Pass the value ”Wave mapper ” to open "preferred audio input/output device. 2. 2. Call function to get the list of the devices and then open the audio device which one you want 3. 3. WaveInOpen() and WaveOutOpen()
13
Speech Processing and Recognition 13 © Florida Institute of Technology EXAMPLE result = waveInOpen(&outHandle, WAVE_MAPPER, result = waveInOpen(&outHandle, WAVE_MAPPER, &waveFormat, &waveFormat, (DWORD)myWindow, (DWORD)myWindow, 0,CALLBACK_WINDOW); 0,CALLBACK_WINDOW); if (result) if (result) { printf("There was an error opening the preferred Digital Audio in device!\r\n"); } { printf("There was an error opening the preferred Digital Audio in device!\r\n"); }
14
Speech Processing and Recognition 14 © Florida Institute of Technology EXAMPLE iNumDevs = waveInGetNumDevs(); for (i = 0; i < iNumDevs; i++) { if (!waveOutGetDevCaps(i, &woc, sizeof(WAVEOUTCAPS))) if (!waveOutGetDevCaps(i, &woc, sizeof(WAVEOUTCAPS))) { printf("Device ID #%u: %s\r\n", i, woc.szPname); } } { printf("Device ID #%u: %s\r\n", i, woc.szPname); } } result = waveInOpen(&outHandle,iNumDevs,&waveFormat, (DWORD)myWindow, (DWORD)myWindow, 0,CALLBACK_WINDOW); 0,CALLBACK_WINDOW); Return
15
Speech Processing and Recognition 15 © Florida Institute of Technology Structure wavefomatex WFomatTag PCM, Mulaw, Aulaw nChannelsMono,Stereo nSamplePer Sec Sample rates,ie 8000HZ navgBytePe rSec Average data-transfer rate nBlockAlig n Minimum atomic unit of data wBitsPerSa mple 8bits or 16bits per sample cbSize Extra format information
16
Speech Processing and Recognition 16 © Florida Institute of Technology Example WAVEFORMATEX waveFormat; /* Initialize the WAVEFORMATEX for 16-bit, 44KHz, stereo */ waveFormat.wFormatTag = WAVE_FORMAT_PCM; waveFormat.nChannels = 2; waveFormat.nSamplesPerSec = 44100; waveFormat.wBitsPerSample = 16; waveFormat.nSamplesPerSec = 44100; waveFormat.wBitsPerSample = 16; waveFormat.nBlockAlign =waveFormat.nChannels* waveFormat.nBlockAlign =waveFormat.nChannels* (waveFormat.wBitsPerSample/8); waveFormat.nAvgBytesPerSec=waveFormat.nSamplesPerSec * (waveFormat.wBitsPerSample/8); waveFormat.nAvgBytesPerSec=waveFormat.nSamplesPerSec * waveFormat.nBlockAlign; waveFormat.nBlockAlign; waveFormat.cbSize = 0; waveFormat.cbSize = 0; Return
17
Speech Processing and Recognition 17 © Florida Institute of Technology Recording engine buffer1buffer2buffer3buffer4 Call back function Data proccesing AddInBuffer() waveInStart() Audio device msg
18
Speech Processing and Recognition 18 © Florida Institute of Technology Recording engine buffer2buffer3buffer4buffer1 Call back function Data processing Data processingmsg Audio device Circular buffer
19
Speech Processing and Recognition 19 © Florida Institute of Technology 1+3+1 Three Important methods: prepare a buffer for wave-audio input function: WaveInPrepareHeader() Send the buffer to audio device,when the buffer is full the application is notified function: WaveInAddBuffer() Start recording function: WaveInStart()
20
Speech Processing and Recognition 20 © Florida Institute of Technology Example if(MMSYSERR_NOERROR != waveInPrepareHeader(m_hWaveIn, &waveheader, sizeof(WAVEHDR))) waveInPrepareHeader(m_hWaveIn, &waveheader, sizeof(WAVEHDR))) { printf(“prepare buffer faliure!”) printf(“prepare buffer faliure!”)} waveInAddBuffer(m_hWaveIn, &waveheader, sizeof(WAVEHDR)); waveInAddBuffer(m_hWaveIn, &waveheader, sizeof(WAVEHDR)); waveInStart(m_hWaveIn); waveInStart(m_hWaveIn);
21
Speech Processing and Recognition 21 © Florida Institute of Technology Message Windows messages: MM_WIM_DATA:this message is sent to a window when the data is present in the buffer and buffer is being returned to the application Other messages : MM_WIM_CLOSE 、 MM_WIM_OPEN 、 MM_WOM_CLOSE MM_WOM_DONE 、 MM_WOM_OPEN Call back function messages: WIM_DATA : this message is sent to the given call back function when the data is present in the input buffer and the buffer is being returned to the application Other messages: WIM_CLOSE 、 WIM_DONE 、 WIN_OPEN 、 WOM_CLOSE 、 WOM_DONE 、 WOM_OPEN
22
Speech Processing and Recognition 22 © Florida Institute of Technology Message Example Call back message waveInOpen(&m_hWaveIn, WAVE_MAPPER, &m_Format, waveInProc, 0L, CALLBACK_FUNCTION ) waveInProc, 0L, CALLBACK_FUNCTION ) waveInProc(…..) { switch(msg) { case WIM_OPEN: …………. break, break, case WIM_DATA: …………. break, break, case WIM_CLOSE: ………… Window message waveInOpen(&m_hWaveIn, WAVE_MAPPER, &m_Format, hWnd, 0L, CALLBACK_WINDOW ) hWnd, 0L, CALLBACK_WINDOW ) Return
23
Speech Processing and Recognition 23 © Florida Institute of Technology Application in Real-time Key Word Recognition To be continued….
24
Speech Processing and Recognition 24 © Florida Institute of Technology Application in Real-time Key Word Recognition Practical problems when we apply this model in speech recognition 1. Asynchronism Asynchronism 2. Efficiency
25
Speech Processing and Recognition 25 © Florida Institute of Technology Application in Real-time Key Word Recognition buffer2 Call back function Data proccessing buffer3buffer4buffer500 …. msg CALL buffer1
26
Speech Processing and Recognition 26 © Florida Institute of Technology Comparison and Analysis Mci is the easiest model,very convenient,but offers the least amount control,”FileLevel” waveX is more complicit,but can flexible control audio data,”BufferLevel” Direct sound is the most efficient method,but most complicit, ”BufferLevel”
27
Speech Processing and Recognition 27 © Florida Institute of Technology Conclusion Apply MCI to audio document part in “video conference” Apply WaveX to real time speech recognition and also to “video conference” Direct sound is widely used in computer game design
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.