Characteristics of Speech zLong-term (sentence level, several seconds) yDrastic/irregular changes zShort-term (frame level, 20ms or so) yRegular periodic.

Characteristics of Speech zLong-term (sentence level, several seconds) yDrastic/irregular changes zShort-term (frame level, 20ms or so) yRegular periodic changes for voiced sounds yNoise-like for unvoiced sounds zHard to recognize without context information

Spectrum in Frequency-Domain zThree basic characteristics in a spectrum: yTimbre: Spectrum after smoothing yPitch: Distance between harmonics yIntensity: Magnitude of spectrum Second formant F2 First formant F1 Pitch freq Intensity

Timber Demo: Real-time Spectrogram zSimulink model for real-time display of spectrogram ydspstfft_audio (Before MATLAB R2011a) ydspstfft_audioInput (R2012a or later) Spectrogram : Spectrum :

Audio Feature Extraction & Recog. zFrame blocking yFrame duration of 20 ms zFeature extraction yVolume, pitch, MFCC, LPC, etc zEndpoint detection yBased on volume & ZCR zRecognition yDTW, HMM

Example: Audio Feature Extraction 256 points/frame 84 points overlap 11025/(256-84)=64 feature vectors per second Zoom in Overlap Frame

Three Basic Acoustic Features  Three basic speech features  Volume/Energy/Intensity （音量、能量、強度）： Vibration Amplitude  Pitch （音高）： Fundamental frequency (which is equal to the reciprocal of the fundamental period)  Timbre （音色）： The waveform within a fundamental period  These features are perceived subjectively by humans. However, we can use some mathematics to “ emulate ” human and capture these features.

Acoustic Feature: Energy zEnergy is the square sum of a frame, also known as intensity or volume. zCharacteristics: yUsually noise and fricative have low energy. yEnergy is influence a lot by microphone setup. yIf we take log of square sum, and times 10, we have energy in terms of Decibel （分貝） yEnergy is commonly used in endpoint detection. yIn embedded system implementation, volume can be computed as the abs. sum of a frame in order to reduce computation.

Acoustic Feature: Zero Crossing Rate zZero crossing rate (ZCR) yThe number of zero crossing in a frame. zCharacteristics ： y Noise and unvoiced sound have high ZCR. y ZCR is commonly used in endpoint detection, especially in detection the start and end of unvoiced sound. yTo distinguish noise/silence from unvoiced sound, usually we add a bias before computing ZCR.

Demo: Volume and ZCR zFunction: you can record your voice and display waveform, volume, and ZCR on the fly! zDemo files: ygoVolZcr.m xYou can set recordViaMic to 1 for your own recording xWith appropriate shift, ZCR of noise can be reduced to zero. yshowVolZcr.mdl xReal-time display of volume and ZCR.

Pitch zComputation yPitch freq. is the reciprocal of fundamental period. yPitch in terms of semitone:

一般聲音的產生與接收  基本流程  發音體的震動  空氣的波動  耳膜的振動  內耳神經的接收  大腦的辨識  發聲機制  敲擊所引發的自然震動頻率（例：音叉）  空氣摩擦所引發的共振頻率（例：笛子）

Human Speech Production

The Vocal Tract

Glottal Volume Velocity & Resulting Sound Pressure (Voiced)

Speech Production Glottal Pulses Vocal Tract Speech Signal (a) Source Spectrum(c) Output Energy Spectrum + + = = (b) Filter Function

Acoustical Analysis (speech signal of “ 七 ”)

Speech Production Modeling phonation whispering frication compression vibration Impulse Train Generator Noise Generator Pitch Period × u(n) Time- varying digital filter Vocal Tract Parameters s(n) G

Parametric Representation × u(n) G A(z) s(n) Z-Transform Model Write in A(z) G = gain of excitation u(n) = excitation source (quasi-periodic pulse train or random noise)

The Speech Model : A Summary zVoiced/unvoiced classification, zPitch period for voiced sounds, zThe gain parameter, and zThe coefficients of the digital filters, {a k }.

名詞對照 zCochlea ：耳蝸 zPhoneme ：音素、音位 zPhonics ：聲學；聲音基礎教學法（以聲音為基礎進而教拼字的教學法） zPhonetics ：語音學 zPhonology ：音系學、語音體系 zProsody ：韻律學；作詩法 zSyllable ：音節 zTone ：音調 zAlveolar ：齒槽音 zSilence ：靜音 zNoise ：雜訊 zGlottis ：聲門 zlarynx ：喉頭 zPharynx ：咽頭 zPharyngeal ：咽部的，喉音的 zVelum ：軟顎 zVocal chords ：聲帶 zEsophagus ：食管 zDiaphragm ：橫隔膜 zTrachea ：氣管

Hints for Exercises zHow to generate a sine wave signal: yMath formula: yMATLAB code: duration=3; f=440; fs=16000; time=(0:duration*fs-1)/fs; y=0.8*sin(2*pi*f*t); plot(time, y); sound(y, fs);

Characteristics of Speech zLong-term (sentence level, several seconds) yDrastic/irregular changes zShort-term (frame level, 20ms or so) yRegular periodic.

Similar presentations

Presentation on theme: "Characteristics of Speech zLong-term (sentence level, several seconds) yDrastic/irregular changes zShort-term (frame level, 20ms or so) yRegular periodic."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Characteristics of Speech zLong-term (sentence level, several seconds) yDrastic/irregular changes zShort-term (frame level, 20ms or so) yRegular periodic.

Similar presentations

Presentation on theme: "Characteristics of Speech zLong-term (sentence level, several seconds) yDrastic/irregular changes zShort-term (frame level, 20ms or so) yRegular periodic."— Presentation transcript:

Similar presentations

About project

Feedback