Download presentation
Presentation is loading. Please wait.
2
Basic Features of Audio Signals ( 音訊的基本特徵 ) Jyh-Shing Roger Jang ( 張智星 ) http://www.cs.nthu.edu.tw/~jang MIR Lab, CS Dept, Tsing Hua Univ. Hsinchu, Taiwan
3
Audio Features zFour commonly used audio features yVolume yPitch yZero crossing rate yTimber zOur goal yThese features can be perceived subjectively. yBut we need to compute them quantitatively for further processing and recognition.
4
Audio Features in Time Domain zAudio features presented in the time domain Intensity Fundamental period Timbre: Waveform within an FP
5
Audio Features in Frequency Domain zVolume: Magnitude of spectrum zPitch: Distance between harmonics zTimber: Smoothed spectrum Second formant F2 First formant F1 Pitch freq Intensity
6
Demo: Real-time Spectrogram zTry “dspstfft_audio” under MATLAB: Spectrogram : Spectrum :
7
Steps for Audio Feature Extraction zFrame blocking yFrame duration of 20 ms or so zFeature extraction yVolume, zero-crossing rate, pitch, MFCC, etc zEndpoint detection yUsually based on volume & zero-crossing rate
8
Frame Blocking Sample rate = 11025 Hz Frame size = 256 samples Overlap = 84 samples (Hop size = 256-84) Frame rate = 11025/(256-84)=64 frames/sec Zoom in Overlap Frame
9
Intensity (I) zIntensity yVisual cue: Amplitude of vibration yComputation: xVolume: xLog energy (in decibel): zCharacteristics yInfluenced by xmicrophone types xMicrophone setups yPerceived volume is influenced by frequency and timbre
10
Intensity (II) zTo avoid DC drifting yDC drifting: The vibration is not around zero yComputation: xVolume: xLog energy (in decibel): zTheoretical background (How to prove?)
11
Intensity (III) zExamples yPlease refer to the online tutorial
12
Pitch zDefinition yPitch is known as fundamental frequency, which is equal to the no. of fundamental period within a second. The unit used here is Hertz (Hz). yMore commonly, pitch is in terms of semitone, which can be converted from pitch in Hertz:
13
Pitch Computation (I) zPitch of tuning forks
14
Pitch Computation (II) zPitch of speech
15
Statistics of Mandarin Chinese z 5401 characters, each character is at least associated with a base syllable and a tone z 411 base syllables, and most syllables have 4 ones, so we have 1501 tonal syllables z Tone is characterized by the pitch curves: yTone 1: high-high yTone 2: low-high yTone 3: high-low-high yTone 4: high-low z Some examples of tones: y1242 :清華大學 y1234 :三民主義、優柔寡斷、搭達打大、依宜以易、夫福府負 y????? :美麗大教堂、滷蛋有夠鹹( Taiwanese )
16
Sinusoidal Signals zHow to generate a stream of sinusoidal signals fs=16000; duration=3; f=440; t=(1:fs*duration)/fs; y=0.8*sin(2*pi*f*t); plot(t,y); axis([0.6, 0.65, -1 1]); sound(y, fs);
17
Zero Crossing Rate zZero crossing rate (ZCR) yThe number of zero crossing in a frame. zCharacteristics : y Noise and unvoiced sound have high ZCR. y ZCR is commonly used in endpoint detection, especially in detection the start and end of unvoiced sounds. yTo distinguish noise/silence from unvoiced sound, usually we add a bias before computing ZCR.
18
ZCR Computations zTwo types of ZCR definition yIf a sample with zero value is considered a case of ZCR, then the value of ZCR is higher. Otherwise its lower. yIt affects the ZCR, especially when the sample rate is low. zOther consideration yZero-justification is required. yZCR with shift can be used to distinguish between unvoiced sounds and silence. (How to determine the shift amount?)
19
ZCR zExamples yPlease refer to the online tutorial.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.