Basic Features of Audio Signals ( 音訊的基本特徵 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CS Dept, Tsing Hua Univ. Hsinchu, Taiwan.

Slides:



Advertisements
Similar presentations
Acoustic/Prosodic Features
Advertisements

Audio Compression ADPCM ATRAC (Minidisk) MPEG Audio –3 layers referred to as layers I, II, and III –The third layer is mp3.
Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
1 Non-Linearities Linear systems (e.g., filters) can change the intensity and phase of a signal input. Non-linear systems (e.g., amplfiers) not only can.
CMPS1371 Introduction to Computing for Engineers PROCESSING SOUNDS.
Pitch Tracking (音高追蹤) Jyh-Shing Roger Jang (張智星) MIR Lab (多媒體資訊檢索實驗室)
EE Audio Signals and Systems Psychoacoustics (Pitch) Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
NEW CHAPTER the BIG idea Sound waves transfer energy through vibrations. Sound Sound is a wave. Frequency determines pitch. Intensity determines loudness.
Intro. to Audio Signals Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CSIE Dept National Taiwan Univ., Taiwan.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
Pitch Recognition with Wavelets Final Presentation by Stephen Geiger.
Basic Features of Audio Signals ( 音訊的基本特徵 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CSIE Dept National Taiwan Univ., Taiwan.
PH 105 Dr. Cecilia Vogel Lecture 10. OUTLINE  Subjective loudness  Masking  Pitch  logarithmic  critical bands  Timbre  waveforms.
Classifying Motion Picture Audio Eirik Gustavsen
PH 105 Dr. Cecilia Vogel Lecture 13. OUTLINE  Timbre and graphs:  Time graph  Spectrum graph  Spectrogram  Envelope  scales  units  interval factors.
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
PH 105 Dr. Cecilia Vogel Lecture 12. OUTLINE  Timbre review  Spectrum  Fourier Synthesis  harmonics and periodicity  Fourier Analysis  Timbre and.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
A PRESENTATION BY SHAMALEE DESHPANDE
Human Psychoacoustics shows ‘tuning’ for frequencies of speech If a tree falls in the forest and no one is there to hear it, will it make a sound?
LE 460 L Acoustics and Experimental Phonetics L-13
DTC 354 Digital Storytelling Rebecca Goodrich. Wave made up of changes in air pressure by an object vibrating in a medium—water or air.
Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.
2015/9/131 Stress Detection J.-S. Roger Jang ( 張智星 ) MIR LabMIR Lab, CSIE Dept., National Taiwan Univ.
Harmonic Series and Spectrograms
Standing waves on a string (review) n=1,2,3... Different boundary conditions: Both ends fixed (see above) Both ends free (similar to both ends fixed )
Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CSIE Dept National Taiwan Univ., Taiwan.
Wireless and Mobile Computing Transmission Fundamentals Lecture 2.
Speech Assessment 語音評測 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept, Tsing.
Copyright 2004 Ken Greenebaum Introduction to Interactive Sound Synthesis Lecture 11: Modulation Ken Greenebaum.
Digital Linear Filters 張智星 (Roger Jang) 多媒體資訊檢索實驗室 清華大學 資訊工程系.
Characteristics of Speech zLong-term (sentence level, several seconds) yDrastic/irregular changes zShort-term (frame level, 20ms or so) yRegular periodic.
2015/10/221 Progressive Filtering and Its Application for Query-by-Singing/Humming J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept.,
Conceptual Physics Notes on Chapter 25 Sound. Sound   All sounds are produced by the vibrations of material objects.   Pitch describes our impressions.
Introduction to SOUND.
Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments 張智星
RuSSIR 2013 QBSH and AFP as Two Successful Paradigms of Music Information Retrieval Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CSIE Dept.
Acoustics Research Institute S_TOOLS -STx Supplement Speech Formant Tracking and Fundamental Frequency Extraction Default Parameter Setting.
Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,
Introduction to psycho-acoustics: Some basic auditory attributes For audio demonstrations, click on any loudspeaker icons you see....
Some Research Activities in MIR Lab J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS.
DTW for Speech Recognition J.-S. Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, Tsing Hua Univ. ( 清華大學.
Digital Audio I. Acknowledgement Some part of this lecture note has been taken from multimedia course made by Asst.Prof.Dr. William Bares and from Paul.
Basic Acoustics. Sound – your ears’ response to vibrations in the air. Sound waves are three dimensional traveling in all directions. Think of dropping.
Distance/Similarity Functions for Pattern Recognition J.-S. Roger Jang ( 張智星 ) CS Dept., Tsing Hua Univ., Taiwan
Physics of Murmurs Dr Rakesh Jain Senior Resident, Dept. of Cardiology Govt. Medical College, Calicut.
Physics Mrs. Dimler SOUND.  Every sound wave begins with a vibrating object, such as the vibrating prong of a tuning fork. Tuning fork and air molecules.
HOW WE TRANSMIT SOUNDS? Media and communication 김경은 김다솜 고우.
Auditory Perception 1 Streaming 400 vs. 504 Hz 400 vs. 566 Hz 400 vs. 635 Hz 400 vs. 713 Hz A 400-Hz tone (tone A) is alternated with a tone of a higher.
Pitch Tracking in Time Domain Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, Dept of CSIE National Taiwan University
Waves & Sound Review Level Physics.
Waves & Sound Review Level Physics.
Basic Features of Audio Signals (音訊的基本特徵)
Onset Detection, Tempo Estimation, and Beat Tracking
Intro. to Audio Signals Jyh-Shing Roger Jang (張智星)
Discrete Fourier Transform (DFT)
Ch. 2 : Preprocessing of audio signals in time and frequency domain
Spectrum Analysis and Processing
Intro. to Audio Signals Jyh-Shing Roger Jang (張智星)
"Digital Media Primer" Yue-Ling Wong, Copyright (c)2013 by Pearson Education, Inc. All rights reserved.
Chapter 26: Sound.
Conceptual Physics Notes on Chapter 26 Sound.
Higher Intensity (Volume)
Intro. to Audio Signals Jyh-Shing Roger Jang (張智星)
Intro to Fourier Series
Sound & Sound Waves.
Sound Important words: vibrate pitch (high or low) ...objects vibrate
Properties of Waves Unit 12 Section 2.
Endpoint Detection ( 端點偵測)
Duration & Pitch Modification via WSOLA
Presentation transcript:

Basic Features of Audio Signals ( 音訊的基本特徵 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CS Dept, Tsing Hua Univ. Hsinchu, Taiwan

Audio Features zFour commonly used audio features yVolume yPitch yZero crossing rate yTimber zOur goal yThese features can be perceived subjectively. yBut we need to compute them quantitatively for further processing and recognition.

Audio Features in Time Domain zAudio features presented in the time domain Intensity Fundamental period Timbre: Waveform within an FP

Audio Features in Frequency Domain zVolume: Magnitude of spectrum zPitch: Distance between harmonics zTimber: Smoothed spectrum Second formant F2 First formant F1 Pitch freq Intensity

Demo: Real-time Spectrogram zTry “dspstfft_audio” under MATLAB: Spectrogram : Spectrum :

Steps for Audio Feature Extraction zFrame blocking yFrame duration of 20 ms or so zFeature extraction yVolume, zero-crossing rate, pitch, MFCC, etc zEndpoint detection yUsually based on volume & zero-crossing rate

Frame Blocking Sample rate = Hz Frame size = 256 samples Overlap = 84 samples (Hop size = ) Frame rate = 11025/(256-84)=64 frames/sec Zoom in Overlap Frame

Intensity (I) zIntensity yVisual cue: Amplitude of vibration yComputation: xVolume: xLog energy (in decibel): zCharacteristics yInfluenced by xmicrophone types xMicrophone setups yPerceived volume is influenced by frequency and timbre

Intensity (II) zTo avoid DC drifting yDC drifting: The vibration is not around zero yComputation: xVolume: xLog energy (in decibel): zTheoretical background (How to prove?)

Intensity (III) zExamples yPlease refer to the online tutorial

Pitch zDefinition yPitch is known as fundamental frequency, which is equal to the no. of fundamental period within a second. The unit used here is Hertz (Hz). yMore commonly, pitch is in terms of semitone, which can be converted from pitch in Hertz:

Pitch Computation (I) zPitch of tuning forks

Pitch Computation (II) zPitch of speech

Statistics of Mandarin Chinese z 5401 characters, each character is at least associated with a base syllable and a tone z 411 base syllables, and most syllables have 4 ones, so we have 1501 tonal syllables z Tone is characterized by the pitch curves: yTone 1: high-high yTone 2: low-high yTone 3: high-low-high yTone 4: high-low z Some examples of tones: y1242 :清華大學 y1234 :三民主義、優柔寡斷、搭達打大、依宜以易、夫福府負 y????? :美麗大教堂、滷蛋有夠鹹( Taiwanese )

Sinusoidal Signals zHow to generate a stream of sinusoidal signals fs=16000; duration=3; f=440; t=(1:fs*duration)/fs; y=0.8*sin(2*pi*f*t); plot(t,y); axis([0.6, 0.65, -1 1]); sound(y, fs);

Zero Crossing Rate zZero crossing rate (ZCR) yThe number of zero crossing in a frame. zCharacteristics : y Noise and unvoiced sound have high ZCR. y ZCR is commonly used in endpoint detection, especially in detection the start and end of unvoiced sounds. yTo distinguish noise/silence from unvoiced sound, usually we add a bias before computing ZCR.

ZCR Computations zTwo types of ZCR definition yIf a sample with zero value is considered a case of ZCR, then the value of ZCR is higher. Otherwise its lower. yIt affects the ZCR, especially when the sample rate is low. zOther consideration yZero-justification is required. yZCR with shift can be used to distinguish between unvoiced sounds and silence. (How to determine the shift amount?)

ZCR zExamples yPlease refer to the online tutorial.