Characteristics of Speech zLong-term (sentence level, several seconds) yDrastic/irregular changes zShort-term (frame level, 20ms or so) yRegular periodic.

Slides:

Advertisements

Similar presentations

Acoustic/Prosodic Features

Advertisements

專題研究語音訊號處理專題助教：余典翰指導教授：李琳山 2013/07/30.

Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),

Improvement of Audio Capture in Handheld Devices through Digital Filtering Problem Microphones in handheld devices are of low quality to reduce cost. This.

Pitch Tracking (音高追蹤) Jyh-Shing Roger Jang (張智星) MIR Lab (多媒體資訊檢索實驗室)

Chapter 7 Principles of Analog Synthesis and Voltage Control Contents Understanding Musical Sound Electronic Sound Generation Voltage Control Fundamentals.

Intro. to Audio Signals Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CSIE Dept National Taiwan Univ., Taiwan.

Introduction to Phonology. Introduction to Phonetics Human listeners can hear speech as a sequence of sounds, and each sound can be represented by a written.

Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.

Introduction to Acoustics Words contain sequences of sounds Each sound (phone) is produced by sending signals from the brain to the vocal articulators.

Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors

A 12-WEEK PROJECT IN Speech Coding and Recognition by Fu-Tien Hsiao and Vedrana Andersen.

Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.

7.0 Speech Signals and Front-end Processing

國立臺北科技大學進修部推廣教育中心生活美語會話課程英語課程說明陳韻如 Melody.  課程目的：學生能夠使用簡單的英文以及在一般英文會話中能夠自然應對並啟發學習英文興趣  培養學生的聽、說、讀、寫基本能力，且琢磨於文法、句型、字彙上的練習及應用使學生透過老師的帶領，進行文化的體驗、發音的矯.

數位教材製作與經驗分享研討會同步遠距教學與數位教材製作 Jan 21, 2008 朱繼農資訊管理系.

21 st 世紀通識教育賴明詔 2008/05/10. 環境變化與能力需求 1. 資訊爆炸，新領域出現頻繁 2. 壽命延長，須自我學習新知 3. 變化迅速，一生時常換工作 4. 世界交流，國際間活動增加 5. 競爭激烈，探索與關懷生命 1. 人文素養與專業技能 2. 融會貫通與創意 3. 領導能力.

1 天線原理與應用研習營 Feb. 9 th ~10 th, 2010 課程內容說明廖文照助理教授台灣科技大學電機系

錄音筆,MP3 撥放器, 隨身碟之原理及規格. 定義錄音筆 – 以錄音為首要功能 MP3 撥放器 – 以播放音樂為首要功能隨身碟 – 以行動碟為功能.

Basic Features of Audio Signals ( 音訊的基本特徵 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CS Dept, Tsing Hua Univ. Hsinchu, Taiwan.

溶劑可以溶解反應物，形成均勻的反應系統；溶劑用來調整反應物的濃度與反應溫度，控制速率與方向；溶劑萃取，分離特定的化合物。溶劑，特別是有機溶劑，是環境污染的主要來源。綠色（永續）化學逐漸形成一種新的科學理念。溶劑的選擇與化學反應的設計，必須加上環境因素的考量。化學家已發展出許多有機溶劑替代液體及綠色的合成方法：

Speech Communications Chapter 7. Speech Communications  The Nature of Speech    Criteria for Evaluating Speech    Components of Speech Communication.

概念性產品企劃書呂學儒李政翰.

Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.

Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,

1 Chemical and Engineering Thermodynamics Chapter 1 Introduction Sandler.

A PRESENTATION BY SHAMALEE DESHPANDE

0 - 1 © 2007 Texas Instruments Inc, Content developed in partnership with Tel-Aviv University From MATLAB ® and Simulink ® to Real Time with TI DSPs Measuring.

Representing Acoustic Information

LE 460 L Acoustics and Experimental Phonetics L-13

Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.

Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.

Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.

ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Jacob Zurasky ECE5526 – Spring 2011

1 Linear Prediction. 2 Linear Prediction (Introduction) : The object of linear prediction is to estimate the output sequence from a linear combination.

1 Linear Prediction. Outline Windowing LPC Introduction to Vocoders Excitation modeling  Pitch Detection.

Speech analysis with Praat Paul Trilsbeek DoBeS training course June 2007.

Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )

Structure of Spoken Language

ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska

Audio processing methods on marine mammal vocalizations Xanadu Halkias Laboratory for the Recognition and Organization of Speech and Audio

Perceptual Linear Predictive Analysis of Speech Hynek Hermansky, Speech Technology Laboratory, J. Acoustical Society of America, April 1990 報告 : 張志豪.

A Comparison Of Speech Coding With Linear Predictive Coding (LPC) And Code-Excited Linear Predictor Coding (CELP) By: Kendall Khodra Instructor: Dr. Kepuska.

Performance Comparison of Speaker and Emotion Recognition

EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,

Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,

More On Linear Predictive Analysis

Predicting Voice Elicited Emotions

SPEECH CODING Maryam Zebarjad Alessandro Chiumento Supervisor : Sylwester Szczpaniak.

1 4-8 IO_speaker 利用 HT66F50 中 PC4 腳位輸出特性控制 speaker 發聲.

An Exploratory Investigation of Voice Characteristics and Selling Effectiveness by Robert A. Peterson, Micheal P. Cannito and Steven P. Brown 直销通路管理报告.

Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)

By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.

P105 Lecture #27 visuals 20 March 2013.

EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,

Pitch Tracking in Time Domain Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, Dept of CSIE National Taiwan University

Intro. to Audio Signals Jyh-Shing Roger Jang (張智星)

Discrete Fourier Transform (DFT)

Intro. to Audio Signals Jyh-Shing Roger Jang (張智星)

Linear Prediction.

Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan

1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.

Intro. to Audio Signals Jyh-Shing Roger Jang (張智星)

Mobile Systems Workshop 1 Narrow band speech coding for mobile phones

Richard M. Stern demo January 12, 2009

Digital Systems: Hardware Organization and Design

Linear Prediction.

Presentation transcript:

Characteristics of Speech zLong-term (sentence level, several seconds) yDrastic/irregular changes zShort-term (frame level, 20ms or so) yRegular periodic changes for voiced sounds yNoise-like for unvoiced sounds zHard to recognize without context information

Spectrum in Frequency-Domain zThree basic characteristics in a spectrum: yTimbre: Spectrum after smoothing yPitch: Distance between harmonics yIntensity: Magnitude of spectrum Second formant F2 First formant F1 Pitch freq Intensity

Timber Demo: Real-time Spectrogram zSimulink model for real-time display of spectrogram ydspstfft_audio (Before MATLAB R2011a) ydspstfft_audioInput (R2012a or later) Spectrogram : Spectrum :

Audio Feature Extraction & Recog. zFrame blocking yFrame duration of 20 ms zFeature extraction yVolume, pitch, MFCC, LPC, etc zEndpoint detection yBased on volume & ZCR zRecognition yDTW, HMM

Example: Audio Feature Extraction 256 points/frame 84 points overlap 11025/(256-84)=64 feature vectors per second Zoom in Overlap Frame

Three Basic Acoustic Features  Three basic speech features  Volume/Energy/Intensity （音量、能量、強度）： Vibration Amplitude  Pitch （音高）： Fundamental frequency (which is equal to the reciprocal of the fundamental period)  Timbre （音色）： The waveform within a fundamental period  These features are perceived subjectively by humans. However, we can use some mathematics to “ emulate ” human and capture these features.

Acoustic Feature: Energy zEnergy is the square sum of a frame, also known as intensity or volume. zCharacteristics: yUsually noise and fricative have low energy. yEnergy is influence a lot by microphone setup. yIf we take log of square sum, and times 10, we have energy in terms of Decibel （分貝） yEnergy is commonly used in endpoint detection. yIn embedded system implementation, volume can be computed as the abs. sum of a frame in order to reduce computation.

Acoustic Feature: Zero Crossing Rate zZero crossing rate (ZCR) yThe number of zero crossing in a frame. zCharacteristics ： y Noise and unvoiced sound have high ZCR. y ZCR is commonly used in endpoint detection, especially in detection the start and end of unvoiced sound. yTo distinguish noise/silence from unvoiced sound, usually we add a bias before computing ZCR.

Demo: Volume and ZCR zFunction: you can record your voice and display waveform, volume, and ZCR on the fly! zDemo files: ygoVolZcr.m xYou can set recordViaMic to 1 for your own recording xWith appropriate shift, ZCR of noise can be reduced to zero. yshowVolZcr.mdl xReal-time display of volume and ZCR.

Pitch zComputation yPitch freq. is the reciprocal of fundamental period. yPitch in terms of semitone:

一般聲音的產生與接收  基本流程  發音體的震動  空氣的波動  耳膜的振動  內耳神經的接收  大腦的辨識  發聲機制  敲擊所引發的自然震動頻率（例：音叉）  空氣摩擦所引發的共振頻率（例：笛子）

Human Speech Production

The Vocal Tract

Glottal Volume Velocity & Resulting Sound Pressure (Voiced)

Speech Production Glottal Pulses Vocal Tract Speech Signal (a) Source Spectrum(c) Output Energy Spectrum + + = = (b) Filter Function

Acoustical Analysis (speech signal of “ 七 ”)

Speech Production Modeling phonation whispering frication compression vibration Impulse Train Generator Noise Generator Pitch Period × u(n) Time- varying digital filter Vocal Tract Parameters s(n) G

Parametric Representation × u(n) G A(z) s(n) Z-Transform Model Write in A(z) G = gain of excitation u(n) = excitation source (quasi-periodic pulse train or random noise)

The Speech Model : A Summary zVoiced/unvoiced classification, zPitch period for voiced sounds, zThe gain parameter, and zThe coefficients of the digital filters, {a k }.

名詞對照 zCochlea ：耳蝸 zPhoneme ：音素、音位 zPhonics ：聲學；聲音基礎教學法（以聲音為基礎進而教拼字的教學法） zPhonetics ：語音學 zPhonology ：音系學、語音體系 zProsody ：韻律學；作詩法 zSyllable ：音節 zTone ：音調 zAlveolar ：齒槽音 zSilence ：靜音 zNoise ：雜訊 zGlottis ：聲門 zlarynx ：喉頭 zPharynx ：咽頭 zPharyngeal ：咽部的，喉音的 zVelum ：軟顎 zVocal chords ：聲帶 zEsophagus ：食管 zDiaphragm ：橫隔膜 zTrachea ：氣管

Hints for Exercises zHow to generate a sine wave signal: yMath formula: yMATLAB code: duration=3; f=440; fs=16000; time=(0:duration*fs-1)/fs; y=0.8*sin(2*pi*f*t); plot(time, y); sound(y, fs);