Download presentation
Presentation is loading. Please wait.
Published byRudolph Jackson Modified over 9 years ago
2
Basic Features of Audio Signals ( 音訊的基本特徵 ) Jyh-Shing Roger Jang ( 張智星 ) http://mirlab.org/jang MIR Lab, CSIE Dept National Taiwan Univ., Taiwan
3
Audio Features zFour commonly used audio features yVolume, pitch, timbre, zero crossing rate zOur goal yThese features can be perceived (more or less) subjectively. yOur goal is to compute them quantitatively (and objectively) for further processing and recognition.
4
General Steps for Audio Analysis 1.Frame blocking yFrame duration of 20~40 ms or so 2.Frame-based feature extraction yVolume, zero-crossing rate, pitch, MFCC, etc 3.Frame-based Analysis yPitch vector for QBSH comparison yMFCC for HMM evaluation y…
5
Frame Blocking Sample rate = 16 kHz Frame size = 512 samples Frame duration = 512/16000 = 0.032 s = 32 ms Overlap = 192 samples Hop size = frame size – overlap = 512-192 = 320 samples Frame rate = 16000/320 = 50 frames/sec Zoom in Overlap Frame Quiz candidate!
6
Audio Features in Time Domain z3 of the most prominent time-domain audio features in a frame (also known as analysis window) Intensity Fundamental period Timbre: Waveform within an FP Quiz candidate!
7
Audio Features in Frequency Domain zFrequency-domain audio features in a frame yEnergy: Sum of power spectrum yPitch: Distance between harmonics yTimbre: Smoothed spectrum Second formant F2 First formant F1 Pitch freq Energy
8
Frame-based Manipulation zFor simplicity, we usually pack frames into a matrix for easy manipulation in MATLAB: y[y, fs] = audioread(‘file.wav’); yframeMat = enframe(y, frameSize, overlap); Frame 1Frame 2Frame n … frameMat =
9
Introduction to Volume zLoudness of audio signals yVisual cue: Amplitude of vibration yAlso known as energy or intensity zTwo major ways of computing volume: yVolume: yLog energy (in decibel): Quiz candidate!
10
Volume: Perceived and Computed zPerceived volume is influenced by yFrequency (example shown later) yTimbre (example shown later) zComputed volume is influenced by yMicrophone types yMicrophone setups
11
Volume Computation zTo avoid DC bias (or DC drifting) yDC bias: The vibration is not around zero yComputation: xVolume: xLog energy (in decibel): zTheoretical background (How to prove them?) Quiz candidate!
12
Examples of Volume zFunctions for computing volume yExample: volume01Example: volume01 yExample: volume02Example: volume02 yExample: volume03Example: volume03 zVolume depends on… yFrequency xEqual loudness testEqual loudness test yTimbre xExample: volume04Example: volume04
13
Zero Crossing Rate zZero crossing rate (ZCR) yThe number of zero crossing in a frame. zCharacteristics : yZCR is higher for noise and unvoiced sounds, lower for voiced sounds. yZero-justification is required before computing ZCR. zUsage yFor endpoint detection, especially in detection the start and end of unvoiced sounds. yTo distinguish noise from unvoiced sound, usually we add a shift before computing ZCR. Quiz candidate!
14
ZCR Computations zTwo types of ZCR definitions yIf a sample with zero value is considered a case of ZCR, then the value of ZCR is higher. Otherwise its lower. yThe distinction diminishes when using a higher bit resolution. zOther consideration yZCR with shift can be used to distinguish between unvoiced sounds and silence. yBut it is hard to set up the right shift amount.
15
Examples of ZCR zZCR computing yExample: zcr01Example: zcr01 yExample: zcr02Example: zcr02 zTo use ZCR to distinguish between unvoiced sounds and environmental noise yExample: Example: zcrWithShiftExample: zcrWithShift
16
Pitch zDefinition yPitch is also known as fundamental frequency, which is equal to the no. of fundamental period within a second. The unit used here is Hertz (Hz). zUnit zMore commonly, pitch is in terms of semitone, which can be converted from pitch in Hertz: Piano roll via HTML5 Quiz candidate!
17
Pitch Computation for Tuning Forks zPitch of tuning forks (code)code Quiz candidate!
18
Pitch Computation for Speech zPitch of speech (code)code Quiz candidate!
19
Tones in Mandarin Chinese z Some statistics about Mandarin Chinese y 5401 characters, each character is at least associated with a base syllable and a tone y 411 base syllables, and most syllables have 4 tones, so we have 1501 tonal syllables zSyllables with 3 or less tones y 媽麻馬罵、當檔蕩、嗲 zMore examples y1234 :三民主義、三國 演義、優柔寡斷 y????? :美麗大教堂、滷 蛋有夠鹹( Taiwanese ) yTone sandhi :勇猛果敢
20
Features Related to Tones z Tone is characterized by the pitch curves: yTone 1: high-high yTone 2: low-high yTone 3: high-low-high yTone 4: high-low (Put you hand on your throat and you can feel it…) zTone recognition is mostly based on features obtained from pitch and volume Quiz candidate!
21
Tones in Mandarin TTS z TTS: Text to speech ( demo ) demo z Tone Sandhi: phonological change occurring in tonal language y3+3 2+3 x 總統、總統府、李總統、母老虎、膽小鬼 y 不 x 不好、不難 vs. 不對、不妙 y 一 x 一個、一次、一半 vs. 一般、一毛、一會兒
22
Mandarin Tone Practice z 雙音節詞連音組合 雙音節詞連音組合
23
Sentences of All Tone 3 zTone Sandhi of 3+3 y 請老李給我買五把好雨傘 y 老李買好酒請馬小姐買幾百把小雨傘 y 總統府裏的李總統有點想請我買酒 y 北海只有兩里遠,水也很淺 y 展覽館北館有好幾百種展覽品 y 你早晚打掃,我啃水果咬水餃 y 我很了解你,我倆永遠友好 y 水管可以點火,趕緊買保險 Quiz candidate!
24
Pitch Change due to Fast Forward zIf audio is played at a higher sample rate… yPitch is higher yDuration is shorter zPitch change due to sample rate change at playback ySample rate: fs k*fs (at playback) yDuration: d d/k yFundamental frequency: ff k*ff yPitch: pitch pitch+12*log 2 (k) Quiz candidate!
25
Pitch Perception zAge-related hearing loss yAs one grows old, the audible frequency bandwidth is getting narrower yMosquito ringtone xLow to high, high to lowLow to highhigh to low xApplications zFrequencies vs. ages 8k 12k 15k 17.4k 21k
26
Other Things about Pitch zSome interesting phenomena about pitch yBeatBeat yDoppler effectDoppler effect yShepard toneShepard tone xAn auditory illusion of a tone that continually ascends or descends in pitch yOvertone singingOvertone singing How to create these effects in MATLAB? Quiz candidate!
27
Timbre zTimbre is represented by yWaveform within a fundamental period yFrame-based energy distribution over frequencies xPower spectrum (over a single frame) xSpectrogram (over many frames) yFrame-based MFCC (mel-frequency cepstral coefficients)
28
Timbre Demo: Real-time Spectrogram zSimulink model for real-time display of spectrogram ydspstfft_audio (Before MATLAB R2011a) ydspstfft_audioInput (R2012a or later) Spectrogram : Spectrum :
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.