Project 1 Speech Detection Due: Sunday, February 1 st, 11:59pm.

Slides:

Advertisements

Similar presentations

An Algorithm for Determining the Endpoints for Isolated Utterances L.R. Rabiner and M.R. Sambur The Bell System Technical Journal, Vol. 54, No. 2, Feb.

Advertisements

Digital Audio 1.

Using Multimedia on the Web Enhancing a Web Site with Sound, Video, and Applets.

Sound in multimedia How many of you like the use of audio in The Universal Machine? What about The Universal Computer? Why or why not? Does your preference.

Evaluation of Speech Detection Algorithm Project 1b Due October 11.

Project 1b Evaluation of Speech Detection Due: February 17 th, at the beginning of class.

Hierarchy of Design Voice Controlled Remote Voice Input Control Path Speech Processing IR Interface.

1 Audio input and output Speaker: Ching Chen Chang Date:

WaveIO Speaker: Paul Date: Outline Wave File Format Multi-Media API (Output) Damo Multi-Media API (Input) Damo Reference 2.

Final Year Project Progress January 2007 By Daire O’Neill 4EE.

Technology ICT Option: Audio.

CHAPTER 16 Audio © 2008 Cengage Learning EMEA. LEARNING OBJECTIVES In this chapter you will learn about: – –The fundamentals of sound – –DirectX Audio.

NSF REU JULY 27 TH, 2012 Sound Pixel Project. Building a Frame Design Concept  Transportable, study, and easily constructed Material: 80/20  Lightweight.

Digital audio recording Kimmo Tukiainen. My background playing music since I was five first time in a studio at fourteen recording on my own for six months.

5/21/20151 IRLP LINKING Peter Barry, VA6PJB / KE5HQC.

1 Steve Chenoweth Tuesday, 10/04/11 Week 5, Day 2 Right – Typical tool for reading out error codes logged by your car’s computer, to help analyze its problems.

Speak A Simple VoIP Application CS529 Multimedia Networking Due date: October 21 st by 11:59pm Project 2.

Detecting Speech Project 1. Outline Motivation Problem Statement Details Hints.

An Algorithm for Determining the Endpoints for Isolated Utterances L.R. Rabiner and M.R. Sambur The Bell System Technical Journal, Vol. 54, No. 2, Feb.

Digital Voice Communication Link EE 413 – TEAM 2 April 21 st, 2005.

Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.

Evaluation of Speech Detection Algorithm Project 1b Due February 14th.

1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System Supervisor: Prof Michael Lyu Presented by: Lewis Ng,

Speech Detection Project 1. Outline Motivation Problem Statement Details Hints.

Chapter 9 Audio.

Sound Chapter Types of Sound Waveforms MIDI Sound is related to many things in computers but only Wav and MIDI exist in PCs.

Systems Software Operating Systems.

Representation of Data in Computer Systems

Speak A Simple VoIP Application Project 2 Due date: March 3 rd by 11:59pm.

3dtv.at DV/HDV Tape Drive Synchronization Stereoscopic Displays and Applications Conference 29 th – 31 th January 2007 San Jose, United States.

Windows audio architecture Win MM Application DirectSound Application SysAudio.SYS Kmixer.SYS WinMM.DLLDSound.DLL Device Drive Container USB Device Driver.

1 JCM 106 Computer Application for Journalism Lecture 1 – Introduction to Computing.

Input/Output. Input/Output Problems Wide variety of peripherals —Delivering different amounts of data —At different speeds —In different formats All slower.

Smart Home Design Based On Voice Recognition

Module 2 Part I Introduction To Windows Operating Systems Intro & History Introduction To Windows Operating Systems Intro & History.

An Error – Concealment Technique for Wireless Digital Audio Delivery N. Tatlas, A. Floros, T. Zarouchas and J. Mourjopoulos.

Audio. Why Audio Essential tool for – Interface – Narrative – Setting & Mood.

Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.

Standard Grade Computing System Software & Operating Systems.

ITCS 6010 SALT. Speech Application Language Tags (SALT) Speech interface markup language Extension of HTML and other markup languages Adds speech and.

Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.

Cisco Public © 2012 Cisco and/or its affiliates. All rights reserved. 1.

Sound on the Web. Using Sound on a Web Site Conveying information  pronounce a word or describe a product Set a mood  music to match the web page scene.

Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal VideoConference Archives Indexing System.

Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome.

Systems Software Operating Systems. What is software? Software is the term that we use for all the programs and data that we use with a computer system.

1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.

Analogue & Digital. Analogue Sound Storage Devices.

2015/11/141 Audio I/O Speaker : Wei-Shin Pan DATE :

Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By.

Chapter 13 DirectSound 로 잡음 만들기. 2 History of Sound Programming Sound programming always gets put off until the end DOS –Third party sound libraries:

Glencoe Introduction to Multimedia Chapter 8 Audio 1 sound effect An artificially created or enhanced sound used to achieve an effect (without speech or.

1 Audio input and output Speaker: Ching Chen Chang Date:

Module 2 Part I Introduction To Windows Operating Systems Intro & History Introduction To Windows Operating Systems Intro & History.

Interactive Multimedia Sound Mikael Fernström. Data sources Microphones and transducers –Sample acoustic reality Synthesis –Simulate reality (and beyond.

Audio Streaming © Nanda Ganesan, Ph.D.. Audio File Features Audio file is a record of captured sound that can be played back –The WAV File is an example.

Glencoe Introduction to Multimedia Chapter 8 Audio 1 Section 8.1 Audio in Multimedia Audio plays many roles in multimedia. Effective use in multimedia.

Application Design Document Developers: o Uri Goldenberg o Henry Abravanel o Academic.

Evolution music engine Alfred E. Heggestad OPLUG 1 February 2006.

Objective % Explain concepts used to create digital audio.

Analogue & Digital.

Inserting Graphics, Media, and Objects

Objective % Explain concepts used to create digital audio.

Computers: Hardware and Software

Technology ICT Option: Audio.

Chapter 1: An Overview of Computers and Programming Languages

Technology ICT Option: Audio.

Objective Explain concepts used to create digital audio.

An Algorithm for Determining the Endpoints for Isolated Utterances

Presentation transcript:

Project 1 Speech Detection Due: Sunday, February 1 st, 11:59pm

Outline Motivation Problem Statement Details Hints Grading Turn in

Motivation Speech recognition detects word boundaries in raw audio data “Silence Is Golden”

Motivation Recognizing silence can reduce: – Network bandwidth – Processing load Easy in sound proof room, with digitized tape – Measure energy level in digitized voice What about elsewhere?

Speech in the Presence of Noise Speech corrupted by additive background noise at decreasing SNRs. Clean SNR = 5 dB SNR = -5 dB [RGS07]

Speech in the Presence of Noise Energy (dB) [RGS07]

Research Problem Noisy rooms with background noise make some edges difficult “Five”

Research Problem Computer audio often for interactive applications – Voice commands – Teleconferencing (Voice over IP or VoIP)  Needs to be done in ‘real-time’

Project Implementation in Linux or Windows (or Cygwin) Implement end-point algorithm by Rabiner and Sambur [RS75] – (Paper for class, next) Embed in “record” utility – data from microphone Build “play” utility for testing – data to speakers Basis for VoIP – (Project 2)

Details - Record Voice-quality: – 8000 samples/second – 8 bits per sample – One channel Record sound, write files: – sound.data : audio data (from 0 and 255)  can graph – sound.raw: raw audio data with all sound  can play – speech.raw: raw audio data without silence  can play – energy.data: energy, one row per frame  can graph – zero.data: zero crossings, one row per frame  can graph Other features allowed

Details - Play Plays sound file Repeat until file empty E.g., play sound.raw play speech.raw Other features allowed

Sound in Windows Microsoft Visual C++ – See Web page for hints Use sound device  WAVEFORMATEX struct: wFormatTag: set to WAVE_FORMAT_PCM nChannels, nSamplesPerSec, wBitsPerSample: set to voice quality audio settings nBlockAlign: set to number of channels times number of bytes per sample nAvgBytesPerSec: set to number of samples per second times the nBlockAlign value cbSize : set this to zero

Sound in Windows waveInOpen() – a device handle (HWAVEIN) – device number (may be 1, depends upon PC) – WAVEFORMATEX variable – a callback function  gets invoked when sound device has a sample of audio

Sound in Windows Sound device needs buffers to fill Type LPWAVEHDR – lpData: for raw data samples – dwBufferLength: set to nBlockAlign times length (in bytes) of sound chunk you want waveInAddBuffer() to give buffer to sound device – Give it device – Buffer (LPWAVEHDR) – Size of variable When callback invoked, buffer (lpData) has raw data to analyze – Must give it another via waveInAddBuffer() again

Sound in Windows Useful header files: #include extern "C" Useful data types: HWAVEOUT – writing audio device HWAVEIN – reading audio device WAVEFORMATEX – sound format structure LPWAVEHDR – buffer MMRESULT – Return type from wave system calls Online documentation from Visual C++ for more information (Visual Studio  Samples)

Sound in Linux Two primary methods: Open Sound System (OSS) or Advanced Linux Sound Architecture (ALSA) ALSA part of kernel, v2.4+ – Phonon, Xine, Gstreamer, PulseAudio, Jack, even OSS all interface with it OSS “legacy” but broader than Linux How it works: Linux audio explained

Linux – OSS Can do in Windows!  Cygwin – Unix-like environment and command-line interface for Microsoft Windows ( Audio device just like file (POSIX): /dev/dsp open("/dev/dsp", O_RDWR) Recording and Playing by: read() to record write() to play

OSS – Sound Parameters Use ioctl() to change sound card parameters E.g., to change sample size to 8 bits: fd = open("/dev/dsp", O_RDWR); arg = 8; ioctl(fd, SOUND_PCM_WRITE_BITS, &arg); Remember to error check all system calls!

OSS – Sound Parameters The parameters of interest are: – SOUND_PCM_WRITE_BITS number of bits per sample – SOUND_PCM_WRITE_CHANNELS mono or stereo – SOUND_PCM_WRITE_RATE sample/playback rate

OSS Compatibility Mode Has not been in the Linux kernel since v2.4 But there is OSS compatibility mode Try: aoss name_of_program_using_oss – E.g., aoss record Or have snd_pcm_oss kernel module loaded

Linux – ALSA Samples at given time for all channels is called a frame If stream is non-interleaved, each channel is stored in separate buffer If stream is interleaved, the samples are mixed together in single buffer A period contains multiple samples (frames) Only needed audio include is: #include When compiling, -lasound needed to link in libasound library

ALSA – Open Device Open with snd_pcm_open() snd_pcm_t *handle; /* open playback device (e.g. speakers default) */ snd_pcm_open(&handle, "default", SND_PCM_STREAM_PLAYBACK, 0); /* open record device (e.g. microphone default) */ snd_pcm_open(&handle, "default", SND_PCM_STREAM_CAPTURE, 0); When done, close with snd_pcm_close()

ALSA – Write/Read Done by snd_pcm_writei() and snd_pcm_read(), respectively: /* write to audio device */ snd_pcm_writei(handle, buffer, frames); /* read from audio device */ snd_pcm_readi(handle, buffer, frames); [Tra04] J. Tranter. Introduction to Sound Programming with ALSA, Linux Journal, Issue #126, October, 2004.Introduction to Sound Programming with ALSA

Program Template (Linux) open sound device set sound device parameters record silence set algorithm parameters while (1) record sound compute algorithm stuff detect speech write data to file write sound to file if speech, write speech to file

Questions When done, brief answers (in answers.txt ) 1.What might happen to the speech detection algorithm in a situation where the background noise changes a lot over the audio session? 2.What are some cases where you might want the silence to remain in a recorded audio stream? 3.Accurate detecting the beginning of speech might be easier with a large sample size (i.e., capturing more of the audio before computing energy and zero crossings). Why might this be a bad idea for some audio applications? 4.Do you think the algorithm is language (e.g., English versus Spanish) specific? Why or why not?

Hand In Online turnin (see Web page) Turn in: – Code – Makefile/Project file – Answers Zip/Tar up in one file Via

Grading 25% basic recording of sound 25% basic playback of sound 20% speech detection 10% adjustment of thresholds 10% proper file output (sound, speech, data) 10% answers to questions Rubric on Webpage