Speech signal processing & its real-time implementation

Slides:



Advertisements
Similar presentations
Speech and Audio Processing and Coding (cont.)
Advertisements

Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.
Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
Speech Recognition Chapter 3
The Human Voice Chapters 15 and 17. Main Vocal Organs Lungs Reservoir and energy source Larynx Vocal folds Cavities: pharynx, nasal, oral Air exits through.
A 12-WEEK PROJECT IN Speech Coding and Recognition by Fu-Tien Hsiao and Vedrana Andersen.
Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
Automatic Lip- Synchronization Using Linear Prediction of Speech Christopher Kohnert SK Semwal University of Colorado, Colorado Springs.
Eva Björkner Helsinki University of Technology Laboratory of Acoustics and Audio Signal Processing HUT, Helsinki, Finland KTH – Royal Institute of Technology.
Xkl: A Tool For Speech Analysis Eric Truslow Adviser: Helen Hanson.
EE513 Audio Signals and Systems LPC Analysis and Speech Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
1 Frequency Domain Analysis/Synthesis Concerned with the reproduction of the frequency spectrum within the speech waveform Less concern with amplitude.
Speech and Audio Processing and Recognition
Speech Coding Nicola Orio Dipartimento di Ingegneria dell’Informazione IV Scuola estiva AISV, 8-12 settembre 2008.
6/3/20151 Voice Transformation : Speech Morphing Gidon Porat and Yizhar Lavner SIPL – Technion IIT December
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
Pole Zero Speech Models Speech is nonstationary. It can approximately be considered stationary over short intervals (20-40 ms). Over thisinterval the source.
Synthetic Audio A Brief Historical Introduction Generating sounds Synthesis can be “additive” or “subtractive” Additive means combining components (e.g.,
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
03/04/2005ENEE408G Spring 2005 Multimedia Signal Processing 1 ENEE408G: Capstone Design Project: Multimedia Signal Processing Design Project 3: Digital.
Anatomic Aspects Larynx: Sytem of muscles, cartileges and ligaments.
Communications & Multimedia Signal Processing Formant Tracking LP with Harmonic Plus Noise Model of Excitation for Speech Enhancement Qin Yan Communication.
Communications & Multimedia Signal Processing Refinement in FTLP-HNM system for Speech Enhancement Qin Yan Communication & Multimedia Signal Processing.
Analysis & Synthesis The Vocoder and its related technology.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
09/09/2005ENEE408G Fall 2005 Multimedia Signal Processing 1 ENEE408G: Capstone Design Project: Multimedia Signal Processing Design Project 1: Digital Speech.
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun.
Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.
Linear Prediction Coding (LPC)
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.
Say “blink” For each segment (phoneme) write a script using terms of the basic articulators that will say “blink.” Consider breathing, voicing, and controlling.
1 Linear Prediction. 2 Linear Prediction (Introduction) : The object of linear prediction is to estimate the output sequence from a linear combination.
1 Linear Prediction. Outline Windowing LPC Introduction to Vocoders Excitation modeling  Pitch Detection.
♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
DR.D.Y.PATIL POLYTECHNIC, AMBI COMPUTER DEPARTMENT TOPIC : VOICE MORPHING.
Linear Predictive Analysis 主講人:虞台文. Contents Introduction Basic Principles of Linear Predictive Analysis The Autocorrelation Method The Covariance Method.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
119 PLANT RESPONSES A/D CONTROL ALGORITHM D/A CM D IN INSIDE COMPUTER + - SENSORS A/D ERR SENSOR FEEDBACKS Figure 4.1. Real-time digital control loop.
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
(Extremely) Simplified Model of Speech Production
Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.
Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,
More On Linear Predictive Analysis
SPEECH CODING Maryam Zebarjad Alessandro Chiumento Supervisor : Sylwester Szczpaniak.
EE Audio Signals and Systems Speech Production Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
Topic: Pitch Extraction
Linear Prediction.
Speech Recognition with Matlab ® Neil E. Cotter ECE Department UNIVERSITY OF UTAH
1 Speech Compression (after first coding) By Allam Mousa Department of Telecommunication Engineering An Najah University SP_3_Compression.
Introduction of Speech Signal 30 th, October. What is “ speech signal ” ? Physical definition: –Signals produced by human speech production organs –Lung,
Figure 11.1 Linear system model for a signal s[n].
Vocoders.
EE513 Audio Signals and Systems
Linear Prediction.
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Linear Predictive Coding Methods
Mobile Systems Workshop 1 Narrow band speech coding for mobile phones
The Vocoder and its related technology
Linear Prediction.
EE Audio Signals and Systems
Speech Processing Final Project
The Human Voice.
Presentation transcript:

Speech signal processing & its real-time implementation 20st November

Goals & objectives Introducing basic speech signal processing Speech production model Vocal tract transfer function Fundamental frequency Basic parameter estimation Linear predictive coefficients (LPC) Pitch/Voicing status analysis Real-time implementation LPC analysis Estimation of vocal tract transfer function Pitch estimation

What is “speech signal” ? Physical definition: Signals produced by human speech production organs Lung, Larynx, Pharyngeal cavity, Oral cavity, Lip, Tongue, Nasal cavity Informational definition: Context + personality

Speech production mechanism

Speech production model vocal tract transfer function excitation signal speech signal

Examples of vocal tract MR images ‘a’ of ‘matt’ ‘i’ of ‘vit’ ‘fi’ of ‘fiffig’ ‘j’ of ‘jord’

Primitive speech synthesizer In 1779 Russian Professor Christian Kratzenstein made apparatus to produce five vowels (/a/, /e/, /i/, /o/, /u/) artificially

Von Kempelen's speaking machine (1930)

Digital model for speech production Excitation model Vocaltract model

Model for each stage Excitation model -> Impulse train generator Vocaltract model -> All-pole linear time-varying filter (IIR digital filter)

Formant frequency Vocal tract 의 공진(resonance)에 의해 발생되는 tonal 성분 (공진주파수) 1st 2nd 3rd Vocal tract transfer function

Waveforms & Formant frequency

Vocal tract transfer function 음성이 발생되는 기본 요소 Context-dependent : 음성 인식에 사용 Speaker-dependent : 화자 인식에 사용

Related signal processing theories (Z-transformation) 연속 신호 이산 신호 F.T. L.T. DFT Z-transform

Pole & Zero

Estimation of VTF Vocal tract AR (Auto-Regressive) Model Tube-like shape Resonance frequency를 갖음 All-pole modeling 가능 즉, AR (Auto-Regressive) Model LPC (선형예측계수)

VTF from AR-Model 즉, 선형예측계수{ak}로 부터 VTF를 추정할 수 있다.

선형예측계수(LPC)의 추정

The Durbin’s Recursion … … … … … … … …

From LPC to magnitude response of VTF

실습-1 한 frame에 대한 autocorrelation 을 계산 Autocorrelation 으로 부터 Durbin’s recursion algorithm을 이용하여 LPC 계산 LPC로 부터 VTF의 magnitude response를 계산 입력된 음성신호에 따른 VTF magnitude response 를 파형과 연동하여 display 입력 신호는 마이크로폰 또는 배포된 음성신호(clean)를 play하여 사용 Frame 길이는 자유롭게 하되 256 sample, frame 이동은 128 sample권장 CCS의 profile 기능을 이용하여 CPU load를 살펴보고 샘플링 주파수 (8kHz, 32kHz …) 에 따른 real-time processing 가능 여부를 확인. Real-time failure가 발생한 경우 계산량을 절감하는 방법 고안

Flow chart Filled ping or pong buffer Frame 구성 Autocorrelation 계산 두 신호간 를 연동하여 plot, 비교 Durbin’s recursion algorithm Vocal tract transfer function 계산 다음 Frame 으로 이동

Example of results

VTF 계산시 계산량

FFT를 이용한 VTF의 계산

LPC-to-VTF without FFT 예제 code LPC-to-VTF without FFT for (i=0; i<FFTLEN/2; i++) { factor=3.14*((float)i/(FFTLEN/2-1)); rs=0.0; is=0.0; for (j=0; j<LPCORD; j++) { rs+=lpc[j]*cosf(factor*(j+1)); is+=lpc[j]*sinf(factor*(j+1)); } rs+=1.; H[i]=1./sqrtf(rs*rs+is*is); LPC-to-VTF with FFT for(i=1; i<FFTLEN; i++) { tfftb[i<<1]=(i-1 <LPCORD)?lpc[i-1]:0; tfftb[(i<<1)+1]=0; } DSPF_sp_cfftr2_dit(tfftb, ffttw, FFTLEN); DSPF_sp_bitrev_cplx(tfftb, brv, FFTLEN); for(i=0; i<FFTLEN/2; i++) { rs=tfftb[i<<1]+1; is=tfftb[(i<<1)+1]; H[i]=1/sqrtf(rs*rs+is*is);

실습-2 DSPLIB에서 제공되는 FFT함수를 이용하여 VTF의 magnitude response를 구함 실습-1과 동일한 결과가 나오는지 check Profile 시 실습-1에 비교하여 얼만큼 계산량이 줄어드는지 확인 실습-1에서 불가능했던 샘플링 주파수에 대해서도 가능여부 확인

Part II Pitch estimation

What is “pitch” 여기신호의 펄스 간격 음성의 높낮이를 결정 기본 주파수 (fundamental frequency, F0) = 1/pitch 음성의 높낮이를 결정 간격이 좁을 때 : 높은 음성 (female) 간격이 넓을 때 : 낮은 음성 (male)

Pitch 추정 – 자기상관함수 이용 신호의 주기와 피크간 거리가 비슷

Autocorrelation for voiced/unvoiced speech signals

Usefulness of Autocorrelation Pitch Time-lag of the first peak = pitch period Voiced/Unvoiced Relative intensity of the first peak If sufficiently high -> voiced frame Otherwise -> unvoiced/silence frame LPC computation Necessary parameters for LP-analysis

Pitch estimation시 고려사항 Pitch doubling problem 주기가 T라면 2T, 3T, .. 도 주기가 됨 Autocorrelation 의 peak위치도 주기적으로 나타남

Plot of the autocorrelation peaks Pitch doubling Pitch halving

Pitch estimation 시 고려사항 Median filter 사용 N개의 값 중 중간값을 택함 Ex) 100, 110, 120, 122, 230, 123, 121, 119, … 110, 120, 122, 123, 123,

Flow chart Filled ping or pong buffer Frame 구성 Autocorrelation 계산 Rmax/R(0) > 0.64 ? Unvoiced frame Rmax에 해당하는 i값 = pitch Pitch 값에 대한 3-tap median filtering 다음 Frame 으로 이동

실습-3 실습1에서 작성한 autocorrelation 함수를 사용하여 pitch를 추정하는 프로그램을 작성 현재 입력된 음성을 voiced/unvoiced 결정하는 프로그램 작성 권장: R(t)/R(0) > 0.64인 경우 voiced로 판정 구한 피치값을 입력된 파형과 연동하여 display 입력 신호는 마이크로폰 또는 배포된 clean음성 신호를 player에서 연속 재생