Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By.

Slides:

Advertisements

Similar presentations

An open source tool for creating and editing digital audio files By Tim VanSlyke.

Advertisements

CS335 Principles of Multimedia Systems Audio Hao Jiang Computer Science Department Boston College Oct. 11, 2007.

Digital Audio 1.

Investigating the sound quality of different audio file formats In this activity, we are going to record a short voice sample with a sound recording tool,

CSE 380 – Computer Game Programming Audio Engineering.

SimPhonics, Inc. FXDirect Audio System for V+. SimPhonics, Inc. What Is FXDirect ® Audio Subsystem Add-on to V+ –Currently built-in Consists of Objects.

Audio 1 Subject:T0934 / Multimedia Programming Foundation Session:8 Tahun:2009 Versi:1/0.

1 Audio input and output Speaker: Ching Chen Chang Date:

WaveIO Speaker: Paul Date: Outline Wave File Format Multi-Media API (Output) Damo Multi-Media API (Input) Damo Reference 2.

Multi-Model Digital Video Library Professor: Michael Lyu Member: Jacky Ma Joan Chung Multi-Model Digital Video Library LYU9904 Multi-Model Digital Video.

SWE 423: Multimedia Systems Chapter 3: Audio Technology (2)

CHAPTER 16 Audio © 2008 Cengage Learning EMEA. LEARNING OBJECTIVES In this chapter you will learn about: – –The fundamentals of sound – –DirectX Audio.

GAM666 – Introduction To Game Programming DirectX Audio, first appearing in DirectX 8, is the union of ● DirectSound – low level control of the audio hardware.

Auto-tuning for Electric Guitars using Digital Signal Processing Pat Hurney, 4ECE 31 st March 2009.

1 L45 Multimedia: Applets and Applications. 2 OBJECTIVES  How to get and display images.  To create animations from sequences of images.  To create.

Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.

Electrical Engineering Department Software Systems Lab TECHNION - ISRAEL INSTITUTE OF TECHNOLOGY Meeting recorder Application based on Software Agents.

Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.

PDACS Final Presentation Michelle Berger John Curtin Trey Griffin Aaron King Michael Nordfelt Jeffrey Whitted.

1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System Supervisor: Prof Michael Lyu Presented by: Lewis Ng,

Audio Tools for ESL ORTESOL 2006 Tim VanSlyke ESL Instructor Chemeketa Community College.

Speech Detection Project 1. Outline Motivation Problem Statement Details Hints.

Chapter 9 Audio.

Chapter 14 Recording and Editing Sound. Getting Started FAQs: − How does audio capability enhance my PC? − How does your PC record, store, and play digital.

Sound Chapter Types of Sound Waveforms MIDI Sound is related to many things in computers but only Wav and MIDI exist in PCs.

Digital Audio Multimedia Systems (Module 1 Lesson 1)

MSS & AMS Name and configure MIDI controllers, devices and sound modules. Control their routing to Pro Tools MSS - Configurations can be imported or exported.

I. Understanding collections, projects, and movies.

Video Streaming © Nanda Ganesan, Ph.D..

Windows audio architecture Win MM Application DirectSound Application SysAudio.SYS Kmixer.SYS WinMM.DLLDSound.DLL Device Drive Container USB Device Driver.

Computer Science [3] Java Programming II - Laboratory Course Lab 7: Multimedia: Applets and Applications Faculty of Engineering & IT Software Engineering.

Chapter II The Multimedia Sysyem. What is multimedia? Multimedia means that computer information can be represented through audio, video, and animation.

COMP Representing Sound in a ComputerSound Course book - pages

The Application Layer Chapter 7. DNS – The Domain Name System a)The DNS Name Space b)Resource Records c)Name Servers.

Audio. Why Audio Essential tool for – Interface – Narrative – Setting & Mood.

Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.

Overview of Multimedia A multimedia presentation might contain: –Text –Animation –Digital Sound Effects –Voices –Video Clips –Photographic Stills –Music.

Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal VideoConference Archives Indexing System.

Multimedia Technology and Applications Chapter 2. Digital Audio

MULTIMEDIA TECHNOLOGY SMM 3001 MEDIA - VIDEO. In this chapter How digital video differs from conventional analog video How digital video differs from.

ITEC Final Presentation For Fall 2011 Table of Content –Basic Requirements  Audacity  Inskcape  GIMP  Blender  Animation of 2D and 3D.

Sound DirectMusic & DirectSound. DirectShow Video Formats DirectShow is an open architecture, it can support any format as long as there are filters to.

Digital Recording. Digital recording is different from analog in that it doesn’t operate in a continuous way; it breaks a continuously varying waveform.

2015/11/141 Audio I/O Speaker : Wei-Shin Pan DATE :

Chapter 13 DirectSound 로 잡음 만들기. 2 History of Sound Programming Sound programming always gets put off until the end DOS –Third party sound libraries:

Sound DirectMusic & DirectSound. Sound Formats Wav Midi (Musical Instrument Digital Interface) DLS (Downloadable Sounds) DirectMusic Producer Segments.

Multimedia Hardware. Fast processor  e.g. Pentium Large RAM (Random Access memory)  Memory space that the computer uses when performing work.  More.

Celluloid An interactive media sequencing language.

Sound and Digital Sound v © Allan C. Milne Abertay University.

Glencoe Introduction to Multimedia Chapter 8 Audio 1 sound effect An artificially created or enhanced sound used to achieve an effect (without speech or.

1 Audio input and output Speaker: Ching Chen Chang Date:

Temporal relationships.. What is meant by temporal relationship ? Temporal (timing) relationships are important in a multimedia presentation. Ex: A speaker’s.

CSCI-100 Introduction to Computing Hardware Part II.

1 What is Multimedia? Multimedia can have a many definitions Multimedia means that computer information can be represented through media types: – Text.

 Speech  Narration—also called voice overlay or voice track  Dialogue—between two or more characters  Direct Address—talking straight at the.

Project Presentation Eoin Culhane Multi Channel Music Recognition for an Electric Guitar.

Multi Channel Music Recognition for an Electric Guitar.

Computer Technology Semester 2 Final Exam Review.

Garage Band For MAC. What is it? A digital audio workstation that can record and play back multiple tracks of audio. Is a software application for OS.

What is DirectX? DirectX is built by Microsoft as a collection of API’s (Application Programming Interfaces) for the purpose of multimedia processing.

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Voice Manipulator Department of Electrical & Computer Engineering

Supervisor: Prof Michael Lyu Presented by: Lewis Ng, Philip Chan

Cisco Unity Connection

Network Controllable MP3 Player

Audio Compression Techniques

Voice Manipulator Department of Electrical & Computer Engineering

Chapter 9 Audio.

Presentation transcript:

Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By By Hesheng Li Hesheng Li Instructor: Dr.Kepuska Department of Electrical and Computer Engineering

Speech Processing and Recognition 2 © Florida Institute of Technology Overview   Introduction   Three models to access live audio data   How to get audio data by using low level API model?   Application in speech recognition   Comparison and Analysis   Conclusion

Speech Processing and Recognition 3 © Florida Institute of Technology Introduction Why ? How ? Live audio data access has a Wide application !

Speech Processing and Recognition 4 © Florida Institute of Technology Three model to access live audio data   High level Digital Audio API-----MCI   DirectSound   Low level Digital Audio API----WaveX

Speech Processing and Recognition 5 © Florida Institute of Technology High level Digital Audio API MCI   MCI The media control interface (MCI) provides standard command for playing multi-media device and recording multi-media resource files   Two different ways are possible to send devices a command. 1. Command message interface 2. Command string interface

Speech Processing and Recognition 6 © Florida Institute of Technology Command message interface   Passing binary values and structures to an Audio device is referred to as using the "Command message interface“   We use the function mciSendCommand() to send commands using this approach.   Example waveParams.lpstrElementName = "C:\\WINDOWS\\CHORD.WAV"; mciSendCommand(0, MCI_OPEN, MCI_WAIT|MCI_OPEN_ELEMENT|MCI_OPEN_TYPE| MCI_OPEN_TYPE_ID, (DWORD)(LPVOID)&waveParams)

Speech Processing and Recognition 7 © Florida Institute of Technology Command string interface   Passing strings to an Audio device is referred to as using the "Command string interface“   We use the function mciSendString() to send commands using this approach.   Example mciSendString( “ open C:\\WINDOWS\\CHORD.WAV type waveaudio alias A_Chord", 0, 0, 0)))

Speech Processing and Recognition 8 © Florida Institute of Technology MCI Some other command: Command message interface: 1.Start record by “MCI _REOCRD” 2.Write data to wave file by “MCI _SAVE” 3.Stop by “MCI _STOP” 4.Play by “MCI_PLAY” Command string interface: 1.Play by "play %s %s %s" 2.Stop by “ stop %s %s %s"

Speech Processing and Recognition 9 © Florida Institute of Technology DirectSound   Like other components of DirectX,DirectSound allow you to use the hardware in the most efficient way Here are some other things that DirectSound makes easy:    Querying hardware capabilities at run time to determine the best solution for any given personal computer configuration   Using property sets so that new hardware capabilities can be exploited even when they are not directly supported by DirectSound   Low-latency mixing of audio streams for rapid response   Implementing three dimensional (3-D) sound

Speech Processing and Recognition 10 © Florida Institute of Technology Directsound   DirectSound playback is built on the IDirectSoundIDirectSound Component Object Model (COM) interface and on the IDirectSoundBuffer interface for manipulating soundIDirectSoundBuffer buffers.   DirectSound capture is based on the IDirectSoundCaptureIDirectSoundCapture and IDirectSoundCaptureBuffer COM interfaces.IDirectSoundCaptureBuffer

Speech Processing and Recognition 11 © Florida Institute of Technology Low level Digital Audio API----WaveX Open audio device Open audio device Prepare structure Prepare structure for recording for recording Start recording Data processing Release structure Close audio device

Speech Processing and Recognition 12 © Florida Institute of Technology Open Audio Device   There are several different approaches you can take, depending upon how fancy and flexible you want your program to be Pass the value ”Wave mapper ” to open "preferred audio input/output device Call function to get the list of the devices and then open the audio device which one you want WaveInOpen() and WaveOutOpen()

Speech Processing and Recognition 13 © Florida Institute of Technology EXAMPLE result = waveInOpen(&outHandle, WAVE_MAPPER, result = waveInOpen(&outHandle, WAVE_MAPPER, &waveFormat, &waveFormat, (DWORD)myWindow, (DWORD)myWindow, 0,CALLBACK_WINDOW); 0,CALLBACK_WINDOW); if (result) if (result) { printf("There was an error opening the preferred Digital Audio in device!\r\n"); } { printf("There was an error opening the preferred Digital Audio in device!\r\n"); }

Speech Processing and Recognition 14 © Florida Institute of Technology EXAMPLE iNumDevs = waveInGetNumDevs(); for (i = 0; i < iNumDevs; i++) { if (!waveOutGetDevCaps(i, &woc, sizeof(WAVEOUTCAPS))) if (!waveOutGetDevCaps(i, &woc, sizeof(WAVEOUTCAPS))) { printf("Device ID #%u: %s\r\n", i, woc.szPname); } } { printf("Device ID #%u: %s\r\n", i, woc.szPname); } } result = waveInOpen(&outHandle,iNumDevs,&waveFormat, (DWORD)myWindow, (DWORD)myWindow, 0,CALLBACK_WINDOW); 0,CALLBACK_WINDOW); Return

Speech Processing and Recognition 15 © Florida Institute of Technology Structure wavefomatex WFomatTag PCM, Mulaw, Aulaw nChannelsMono,Stereo nSamplePer Sec Sample rates,ie 8000HZ navgBytePe rSec Average data-transfer rate nBlockAlig n Minimum atomic unit of data wBitsPerSa mple 8bits or 16bits per sample cbSize Extra format information

Speech Processing and Recognition 16 © Florida Institute of Technology Example WAVEFORMATEX waveFormat; /* Initialize the WAVEFORMATEX for 16-bit, 44KHz, stereo */ waveFormat.wFormatTag = WAVE_FORMAT_PCM; waveFormat.nChannels = 2; waveFormat.nSamplesPerSec = 44100; waveFormat.wBitsPerSample = 16; waveFormat.nSamplesPerSec = 44100; waveFormat.wBitsPerSample = 16; waveFormat.nBlockAlign =waveFormat.nChannels* waveFormat.nBlockAlign =waveFormat.nChannels* (waveFormat.wBitsPerSample/8); waveFormat.nAvgBytesPerSec=waveFormat.nSamplesPerSec * (waveFormat.wBitsPerSample/8); waveFormat.nAvgBytesPerSec=waveFormat.nSamplesPerSec * waveFormat.nBlockAlign; waveFormat.nBlockAlign; waveFormat.cbSize = 0; waveFormat.cbSize = 0; Return

Speech Processing and Recognition 17 © Florida Institute of Technology Recording engine buffer1buffer2buffer3buffer4 Call back function Data proccesing AddInBuffer() waveInStart() Audio device msg

Speech Processing and Recognition 18 © Florida Institute of Technology Recording engine buffer2buffer3buffer4buffer1 Call back function Data processing Data processingmsg Audio device Circular buffer

Speech Processing and Recognition 19 © Florida Institute of Technology Three Important methods:   prepare a buffer for wave-audio input function: WaveInPrepareHeader()   Send the buffer to audio device,when the buffer is full the application is notified function: WaveInAddBuffer()   Start recording function: WaveInStart()

Speech Processing and Recognition 20 © Florida Institute of Technology Example if(MMSYSERR_NOERROR != waveInPrepareHeader(m_hWaveIn, &waveheader, sizeof(WAVEHDR))) waveInPrepareHeader(m_hWaveIn, &waveheader, sizeof(WAVEHDR))) { printf(“prepare buffer faliure!”) printf(“prepare buffer faliure!”)} waveInAddBuffer(m_hWaveIn, &waveheader, sizeof(WAVEHDR)); waveInAddBuffer(m_hWaveIn, &waveheader, sizeof(WAVEHDR)); waveInStart(m_hWaveIn); waveInStart(m_hWaveIn);

Speech Processing and Recognition 21 © Florida Institute of Technology Message   Windows messages: MM_WIM_DATA:this message is sent to a window when the data is present in the buffer and buffer is being returned to the application Other messages : MM_WIM_CLOSE 、 MM_WIM_OPEN 、 MM_WOM_CLOSE MM_WOM_DONE 、 MM_WOM_OPEN   Call back function messages: WIM_DATA : this message is sent to the given call back function when the data is present in the input buffer and the buffer is being returned to the application Other messages: WIM_CLOSE 、 WIM_DONE 、 WIN_OPEN 、 WOM_CLOSE 、 WOM_DONE 、 WOM_OPEN

Speech Processing and Recognition 22 © Florida Institute of Technology Message Example   Call back message waveInOpen(&m_hWaveIn, WAVE_MAPPER, &m_Format, waveInProc, 0L, CALLBACK_FUNCTION ) waveInProc, 0L, CALLBACK_FUNCTION ) waveInProc(…..) { switch(msg) { case WIM_OPEN: …………. break, break, case WIM_DATA: …………. break, break, case WIM_CLOSE: …………   Window message waveInOpen(&m_hWaveIn, WAVE_MAPPER, &m_Format, hWnd, 0L, CALLBACK_WINDOW ) hWnd, 0L, CALLBACK_WINDOW ) Return

Speech Processing and Recognition 23 © Florida Institute of Technology Application in Real-time Key Word Recognition To be continued….

Speech Processing and Recognition 24 © Florida Institute of Technology Application in Real-time Key Word Recognition  Practical problems when we apply this model in speech recognition 1. Asynchronism Asynchronism 2. Efficiency

Speech Processing and Recognition 25 © Florida Institute of Technology Application in Real-time Key Word Recognition buffer2 Call back function Data proccessing buffer3buffer4buffer500 …. msg CALL buffer1

Speech Processing and Recognition 26 © Florida Institute of Technology Comparison and Analysis  Mci is the easiest model,very convenient,but offers the least amount control,”FileLevel”  waveX is more complicit,but can flexible control audio data,”BufferLevel”  Direct sound is the most efficient method,but most complicit, ”BufferLevel”

Speech Processing and Recognition 27 © Florida Institute of Technology Conclusion  Apply MCI to audio document part in “video conference”  Apply WaveX to real time speech recognition and also to “video conference”  Direct sound is widely used in computer game design