Ch.1: Introduction to audio signal processing

Slides:



Advertisements
Similar presentations
Analog Representations of Sound Magnified phonograph grooves, viewed from above: When viewed from the side, channel 1 goes up and down, and channel 2 goes.
Advertisements

Chapter 4: Representation of data in computer systems: Sound OCR Computing for GCSE © Hodder Education 2011.
Speech Compression. Introduction Use of multimedia in personal computers Requirement of more disk space Also telephone system requires compression Topics.
The frequency spectrum
4-Integrating Peripherals in Embedded Systems (cont.)
Analog/Digital Coding Bobby Geevarughese ECE-E4434/12/05.
SIMS-201 Characteristics of Audio Signals Sampling of Audio Signals Introduction to Audio Information.
IT-101 Section 001 Lecture #8 Introduction to Information Technology.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
CEN352, Dr. Ghulam Muhammad King Saud University
Overview What is in a speech signal?
Chapter 1: Introduction to audio signal processing
CMPE 80N - Introduction to Networks and the Internet 1 CMPE 80N Winter 2004 Lecture 4 Introduction to Networks and the Internet.
1 Digitisation Conversion of a continuous electrical signal to a digitally sampled signal Analog-to-Digital Converter (ADC) Sampling rate/frequency, e.g.
EET 450 Chapter 18 – Audio. Analog Audio Sound is analog Consists of air pressure that has a variety of characteristics  Frequencies  Amplitude (loudness)
Chapter 2 : Business Information Business Data Communications, 4e.
SIMS-201 Audio Digitization. 2  Overview Chapter 12 Digital Audio Digitization of Audio Samples Quantization Reconstruction Quantization error.
Digital Audio Multimedia Systems (Module 1 Lesson 1)
Representing Sound in a computer Analogue  Analogue sound is produced by being picked up by a transducer (microphone) and converted in an electrical current.
Digital Data Patrice Koehl Computer Science UC Davis.
Digital to Analogue Conversion Natural signals tend to be analogue Need to convert to digital.
Digital audio. In digital audio, the purpose of binary numbers is to express the values of samples that represent analog sound. (contrasted to MIDI binary.
LE 460 L Acoustics and Experimental Phonetics L-13
School of Informatics CG087 Time-based Multimedia Assets Sampling & SequencingDr Paul Vickers1 Sampling & Sequencing Combining MIDI and audio.
Fall 2004EE 3563 Digital Systems Design Audio Basics  Analog to Digital Conversion  Sampling Rate  Quantization  Aliasing  Digital to Analog Conversion.
Digital Sound and Video Chapter 10, Exploring the Digital Domain.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
Introduction to Interactive Media 10: Audio in Interactive Digital Media.
Lecture # 22 Audition, Audacity & Sound Editing Sound Representation.
COMP Representing Sound in a ComputerSound Course book - pages
Lecture 5: Signal Processing II EEN 112: Introduction to Electrical and Computer Engineering Professor Eric Rozier, 2/20/13.
ACOE2551 Microprocessors Data Converters Analog to Digital Converters (ADC) –Convert an analog quantity (voltage, current) into a digital code Digital.
1 4-Integrating Peripherals in Embedded Systems (cont.)
Preprocessing Ch2, v.5a1 Chapter 2 : Preprocessing of audio signals in time and frequency domain  Time framing  Frequency model  Fourier transform 
Overview of Multimedia A multimedia presentation might contain: –Text –Animation –Digital Sound Effects –Voices –Video Clips –Photographic Stills –Music.
Chapter 3. Lesson Objectives Equations Chapter
MULTIMEDIA INPUT / OUTPUT TECHNOLOGIES INTRODUCTION 6/1/ A.Aruna, Assistant Professor, Faculty of Information Technology.
1 Introduction to Information Technology LECTURE 6 AUDIO AS INFORMATION IT 101 – Section 3 Spring, 2005.
1 Speech Synthesis User friendly machine must have complete voice communication abilities Voice communication involves Speech synthesis Speech recognition.
Introduction Advantage of DSP: - Better signal quality & repeatable performance - Flexible  Easily modified (Software Base) - Handle more complex processing.
IT-101 Section 001 Lecture #9 Introduction to Information Technology.
Acoustic Phonetics 3/14/00.
Digital Audio I. Acknowledgement Some part of this lecture note has been taken from multimedia course made by Asst.Prof.Dr. William Bares and from Paul.
Speech Recognition Created By : Kanjariya Hardik G.
Audio sampling as an example of analogue to digital Mr S McIntosh.
Lifecycle from Sound to Digital to Sound. Characteristics of Sound Amplitude Wavelength (w) Frequency ( ) Timbre Hearing: [20Hz – 20KHz] Speech: [200Hz.
XP Practical PC, 3e Chapter 14 1 Recording and Editing Sound.
DATA Unit 2 Topic 2. Different Types of Data ASCII code: ASCII - The American Standard Code for Information Interchange is a standard seven-bit code that.
Chapter 1: Introduction to audio signal processing KH WONG, Rm 907, SHB, CSE Dept. CUHK,
Fourier Analysis Patrice Koehl Department of Biological Sciences National University of Singapore
Chapter 6 Digital Audio Technology
Ch. 4: Feature representation
Microprocessors Data Converters Analog to Digital Converters (ADC)
Chapter 15 Recording and Editing Sound
Image and Sound Representation
Ch. 2 : Preprocessing of audio signals in time and frequency domain
The Physics of Sound.
COMPUTER NETWORKS and INTERNETS
Talking with computers
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Multimedia: Digitised Sound Data
Multimedia Systems and Applications
Chapter 2 Signal Sampling and Quantization
Ch. 4: Feature representation
COMS 161 Introduction to Computing
ITEC2110, Digital Media Chapter 1 Background & Fundamentals
CEN352, Dr. Ghulam Muhammad King Saud University
Recap In previous lessons we have looked at how numbers can be stored as binary. We have also seen how images are stored as binary. This lesson we are.
Presentation transcript:

Ch.1: Introduction to audio signal processing Dr. K.H. Wong, Introduction to Speech Processing Ch.1: Introduction to audio signal processing KH WONG, CSE Dept. CUHK, Email: khwong@cse.cuhk.edu.hk http://www.cse.cuhk.edu.hk/~khwong Audio signal processing, v.8c V.74d

Audio signal processing, v.8c References Audio signals processing Theory and Applications of Digital Speech Processing, Lawrence Rabiner , Ronald Schafer , Pearson 2011 DAFX: Digital Audio Effects by Udo Zölzer (2nd Edition 2011) , JohnWiley & Sons, Ltd. First edition can be found at http://books.google.com.hk The Audio Programming Book by Richard Boulanger, Victor Lazzarini 2010, The MIT press, can be found at CUHK e-library Digital Audio Signal Processing by Udo Zölzer, Wiley 2008. Real sound synthesis for interactive applications : by Perry Cook, AK Peters Machine learning https://www.tensorflow.org/tutorials Audio signal processing, v.8c

Overview of Audio signal processing Chapter 1: Introduction Chapter 2: Preprocessing Chapter 3: Feature extraction Chapter 4: Speech compression : Vector quantization Chapter 5: Recognition Procedures Audio signal processing, v.8c

Audio signal processing, v.8c Chapter 1: Chapter 1.A : Introduction Chapter 1.B : Signals in time & frequency domain Audio signal processing, v.8c

Chapter 1: introduction Content Components of a speech recognition system Types of speech recognition systems Speech recognition Hardware A speech production model Phonetics: English and Cantonese Audio signal processing, v.8c

Components of a speech recognition system Pre-processor Feature extraction Training of the system Recognition Audio signal processing, v.8c

Types of speech recognition technology Isolated speech recognition - the speaker has to speak into the system word-by-word. Continuous speech recognition - like human. Current products http://developer.android.com/reference/android/speech/SpeechRecognizer.html https://chrome.google.com/webstore/detail/voice-recognition/ikjmfindklfaonkodbnidahohdfbdhkn?hl=en Audio signal processing, v.8c

Types depending on speakers Speaker dependent recognition - designed for one speaker who has trained the system. Speaker independent recognition - designed for all users without prior training. Audio signal processing, v.8c

Speech recognition hardware DAC (Digital to Analog Converter) ADC (analog-to-digital conversion system) Speech Recording System Or Audio signal processing, v.8c

Audio signal processing, v.8c Sampling example 16-bit Voltage or pressure range 0->(216-1)=65535) digitized levels Time in ms Sampling is at 1KHz Voltage or pressure 65535 Time in ms www.webkinesia.com/games/images/quant.gif Audio signal processing, v.8c

Conversion time and sampling time Human listening range (frequency) 20Hz to 20KHz, Sampling frequency (freq.) must double or higher than the highest freq. (sampling theory). So sampling for Hi-Fi music > 40KHz. 74 minutes CD music, 44.1KHz sampling 16-bit sound=44.1KHz*2bytes*2channels*60seconds*70min.=78 3,216,000 bytes (747~ MB). (see http://en.wikipedia.org/wiki/CD-ROM) Compromise: telephone quality sound is 8KHz 8-bit sampling – still ok for human speech. Audio signal processing, v.8c

Audio signal processing, v.8c A speech wave Time samples Audio signal processing, v.8c

Audio signal processing, v.8c Music wave: violin3.wav (repeated 6 times for demo purposes) (http://www.youtube.com/watch?v=xdMX5D99xgU&feature=youtu.be) Sampling Frequency=FS=44100 Hz ( 42070 samples) How long is the play time? Answer:(1/44100)*42070 =0.954 seconds All 42070 samples Zoom in to see 1000 samples Zoom in to see 300 samples Audio signal processing, v.8c

Dr. K.H. Wong, Introduction to Speech Processing Class exercise 1.1 For a 20KHz, 16-bit sampling signal, how many bytes are used in 5 seconds? Answer:? Audio signal processing, v.8c V.74d

Sampling and reconstruction https://edocs.uis.edu/jduva1/www/courses/455/sampling.jpg (216-)-1= 65535 time After sampling you only have the data points You may reconstruct the signal by joining the data points Audio signal processing, v.8c

Hardware for speech recognition setup http://www.ras.ucalgary.ca/grad_project_2005/asph_sampling.jpg Speech is captured by a microphone , e.g. Sampled periodically ( 16KHz) by an analogue-to-digital converter (ADC) Each sample converted is a 16-bit data. Tutorial: For a 16KHz/16-bit sampling signal, how many bytes are used in 1 second. (=32Kbytes) If sampling is too slow, sampling may fail , see Sampling theorem for a signal X: The sampling frequency must be higher or equal to double the highest frequency in the signal X. E.g. If the highest frequency in a signal is 16K Hz, sampling frequency is 32 KHz or higher. If the highest frequency in a signal is 20K Hz, sampling frequency is 40 KHz or higher. Audio signal processing, v.8c

Audio signal processing, v.8c Exercise 1.2 If the sampling rate of the analog-to-digital conversion system is 20KHz , how large is the frequency of the sound that that can be sampled? Answer: ________________? If the sound is 20KHz, what is the minimum sampling rate of the analog-to-digital conversion system? Audio signal processing, v.8c

Discussion: Conversion resolution Music 44.1KHz , 16 bit is very good. Higher specifications may be used : e.g. 96KH sampling 24 bit Compression: MP3,etc can compress data Speech 20KHz sampling 16-bit is good enough. Audio signal processing, v.8c

Audio signal processing, v.8c Class exercise 1.3 A sound is sampled at 22-KHz and resolution is 16 bit. How many bytes are needed to store the sound wave for 10 seconds? Answer: ? What is the highest frequency allowed in the sound signal? Audio signal processing, v.8c

Audio signal processing, v.8c Signal analysis spectrum Audio signal processing, v.8c

Audio signal processing, v.8c Pressure /output of mic Can we see speech? Time domain signal Yes, using spectrogram. The “time domain signal” shows the amplitude of air-pressure against time. The “spectrogram” shows the energies of the frequency contents aginst time. time Freq. Spectrogram Spectrogram (matlab function spectrogram.m) Time Audio signal processing, v.8c

Audio signal processing, v.8c Basic Phonetics Phonemes are symbols to show how a word is pronounced. Phonemes Consonants -Nasals /M/ -stops /B/,/P/ -fricative /V/,/S/ -whisper /H/ -affricates /JH/,/CH/ Vowel /AA/,/I/,/UH/ Diphthongs /AY/,/AW/ Audio signal processing, v.8c

Audio signal processing, v.8c Phonetic table http://www.telefonica.net/web2/eseducativa/phonetics/tablea.gif Audio signal processing, v.8c

Special features for Cantonese phonetics 廣東話 Each word is combined by an Initial (consonant 聲母) and a final (vowel 韵母); entering tone (入聲) are ended by /p/, /t/ or /k/ Nine tones(九聲): lower-flat(陽平),lower-rising(陽上),lower-go(陽去) higher-flat(陰平),higher-rising(陰上),higher-go (陰上) Entering (入聲) : ended by /p/, /t/ or /k/ Audio signal processing, v.8c

Audio signal processing, v.8c Summary Studied Basic digital audio recording systems Speech recognition system applications and classifications Audio signal processing, v.8c

Audio signal processing, v.8c Appendix Audio signal processing, v.8c

Answer: Class exercise 1.1 Dr. K.H. Wong, Introduction to Speech Processing Answer: Class exercise 1.1 For a 20KHz, 16-bit sampling signal, how many bytes are used in 5 seconds? Answer: 20KHz*2bytes*5 seconds=200Kbytes. Audio signal processing, v.8c V.74d

Audio signal processing, v.8c Answer: Exercise 1.2 If the sampling rate of the analog-to-digital conversion system is 20KHz , how large is the frequency of the sound that that can be sampled? Answer: ___20/2=10KHz_____________? If the sound is 20KHz, what is the minimum sampling rate of the analog-to-digital conversion system? Answer: _______20x2=40KHz________? Audio signal processing, v.8c

Answer: Class exercise 1.3 A sound is sampled at 22-KHz and resolution is 16 bit. How many bytes are needed to store the sound wave for 10 seconds? Answer: One second has 22K samples , so for 10 seconds: 22K x 2bytes x 10 seconds =440K bytes *note: 2 bytes are used because 16-bit = 2 bytes What is the highest frequency allowed in the sound signal? ANS: 11KHz because the sampling frequency is 22KHz, so the signal cannot be higher than 22KHz/2=11KHz. Audio signal processing, v.8c