Introduction of Speech Signal 30 th, October. What is “ speech signal ” ? Physical definition: –Signals produced by human speech production organs –Lung,

Slides:



Advertisements
Similar presentations
Figures for Chapter 7 Advanced signal processing Dillon (2001) Hearing Aids.
Advertisements

Analysis and Digital Implementation of the Talk Box Effect Yuan Chen Advisor: Professor Paul Cuff.
August 2004Multirate DSP (Part 2/2)1 Multirate DSP Digital Filter Banks Filter Banks and Subband Processing Applications and Advantages Perfect Reconstruction.
Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.
Speech Enhancement through Noise Reduction By Yating & Kundan.
Advanced Speech Enhancement in Noisy Environments
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
2004 COMP.DSP CONFERENCE Survey of Noise Reduction Techniques Maurice Givens.
Background Noise Definition: an unwanted sound or an unwanted perturbation to a wanted signal Examples: – Clicks from microphone synchronization – Ambient.
A 12-WEEK PROJECT IN Speech Coding and Recognition by Fu-Tien Hsiao and Vedrana Andersen.
Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
Automatic Lip- Synchronization Using Linear Prediction of Speech Christopher Kohnert SK Semwal University of Colorado, Colorado Springs.
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
SOME SIMPLE MANIPULATIONS OF SOUND USING DIGITAL SIGNAL PROCESSING Richard M. Stern demo August 31, 2004 Department of Electrical and Computer.
Communications & Multimedia Signal Processing Meeting 6 Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 6 July,
Communications & Multimedia Signal Processing Report of Work on Formant Tracking LP Models and Plans on Integration with Harmonic Plus Noise Model Qin.
Synthetic Audio A Brief Historical Introduction Generating sounds Synthesis can be “additive” or “subtractive” Additive means combining components (e.g.,
Single-Channel Speech Enhancement in Both White and Colored Noise Xin Lei Xiao Li Han Yan June 5, 2002.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Communications & Multimedia Signal Processing Formant Tracking LP with Harmonic Plus Noise Model of Excitation for Speech Enhancement Qin Yan Communication.
Background Noise Definition: an unwanted sound or an unwanted perturbation to a wanted signal Examples: Clicks from microphone synchronization Ambient.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
A PRESENTATION BY SHAMALEE DESHPANDE
A Full Frequency Masking Vocoder for Legal Eavesdropping Conversation Recording R. F. B. Sotero Filho, H. M. de Oliveira (qPGOM), R. Campello de Souza.
Speech Communications (Chapter 7) Prepared by: Ahmed M. El-Sherbeeny, PhD 1.
LE 460 L Acoustics and Experimental Phonetics L-13
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.
Second International Conference on Intelligent Interactive Technologies and Multimedia (IITM 2013), March 2013, Allahabad, India 09 March 2013 Speech.
LECTURE Copyright  1998, Texas Instruments Incorporated All Rights Reserved Encoding of Waveforms Encoding of Waveforms to Compress Information.
Multimedia Specification Design and Production 2013 / Semester 2 / week 3 Lecturer: Dr. Nikos Gazepidis
Multiresolution STFT for Analysis and Processing of Audio
Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.
Acoustic Analysis of Speech Robert A. Prosek, Ph.D. CSD 301 Robert A. Prosek, Ph.D. CSD 301.
Speech Enhancement Using Spectral Subtraction
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
Preprocessing Ch2, v.5a1 Chapter 2 : Preprocessing of audio signals in time and frequency domain  Time framing  Frequency model  Fourier transform 
Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.
Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.
Reduction of Additive Noise in the Digital Processing of Speech Avner Halevy AMSC 663 Mid Year Progress Report December 2008 Professor Radu Balan 1.
1 Linear Prediction. Outline Windowing LPC Introduction to Vocoders Excitation modeling  Pitch Detection.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
DR.D.Y.PATIL POLYTECHNIC, AMBI COMPUTER DEPARTMENT TOPIC : VOICE MORPHING.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
IIT Bombay ICSCI 2004, Hyderabad, India, Feb’ 04 Introduction Analysis / synthesis Spec. Sub. Methodology Results Conclusion and.
Performance Comparison of Speaker and Emotion Recognition
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
Professors: Eng. Diego Barral Eng. Mariano Llamedo Soria Julian Bruno
Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,
More On Linear Predictive Analysis
Automatic Equalization for Live Venue Sound Systems Damien Dooley, Final Year ECE Progress To Date, Monday 21 st January 2008.
1 Introduction1 Introduction 2 Noise red. tech 3 Spect. Subtr. 4. QBNE 5 Invest. QBNE 6 Conc., & future work2 Noise red. tech 3 Spect. Subtr.4. QBNE5 Invest.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
IIT Bombay ISTE, IITB, Mumbai, 28 March, SPEECH SYNTHESIS PC Pandey EE Dept IIT Bombay March ‘03.
SOME SIMPLE MANIPULATIONS OF SOUND USING DIGITAL SIGNAL PROCESSING Richard M. Stern demo January 15, 2015 Department of Electrical and Computer.
1 Introduction1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work2 Spectral subtraction 3 QBNE4 Results5 Conclusion, & future.
Speech signal processing & its real-time implementation
1 Speech Compression (after first coding) By Allam Mousa Department of Telecommunication Engineering An Najah University SP_3_Compression.
UNIT-IV. Introduction Speech signal is generated from a system. Generation is via excitation of system. Speech travels through various media. Nature of.
Speech Enhancement Summer 2009
ARTIFICIAL NEURAL NETWORKS
Vocoders.
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing
Govt. Polytechnic Dhangar(Fatehabad)
EE Audio Signals and Systems
Presentation transcript:

Introduction of Speech Signal 30 th, October

What is “ speech signal ” ? Physical definition: –Signals produced by human speech production organs –Lung, Larynx, Pharyngeal cavity, Oral cavity, Lip, Tongue, Nasal cavity Informational definition: –Context + personality

Speech production mechanism

Examples of vocal tract MR images ‘ a ’ of ‘ matt ’‘ i ’ of ‘ vit ’‘ fi ’ of ‘ fiffig ’‘ j ’ of ‘ jord ’

Primitive speech synthesizer In 1779 Russian Professor Christian Kratzenstein made apparatus to produce five vowels (/a/, /e/, /i/, /o/, /u/) artificially

Von Kempelen's speaking machine (1930)

Digital model for speech production Excitation model Vocaltract model

Model for each stage Excitation model -> Impulse train generator Vocaltract model -> All-pole linear time- varying filter (IIR digital filter)

Spectrum of speech signal Voiced speech Unvoiced speech

Speech parameters Context information –What is spoken? –By vocal tract transfer function Prosody information –Rhythm –By intonation, duration, intensity Speaker information –Who?

Speech parameters-context information

F1, F2 according to phoneme

Formant tracking

Speech enhancement algorithm real-time implementation Part II

Objectives Improve one or more perceptual aspect of speech, such as overall quality, intelligibility or degree of listener fatigue. To make the processed speech sound better than the unprocessed speech. (cf. Speech Restoration)

Applications Background noise reduction –Offices, streets, motor vehicle, air-craft noise Correcting for distorted speech –Deep-sea drivers breathing a helium-oxygen gas, –Pathological difficulties of the speaker –Improvement for people with impaired hearing

Speech enhancement algorithms – “ Older algorithm ” Spectral subtraction method –Frequency domain approach, very simple. Maximum likelihood estimation approach –Use statistical properties of noise/signal Adaptive filtering method –Time domain approach, using LMS algorithm

Speech enhancement algorithms “ new algorithm ” Perceptual domain approach –Use psychoacoustical properties of human ear. Hearing aid approach –Enhance selective frequency bands. Newly developed approach –Hybrid, signal decomposition method, ICA- based

Spectral subtraction method

Adaptive filter approach S signal V Noise reference N X U S H

Examples of older/new algorithm Original corrupted speech Hybrid Spectral subtraction

Audio demonstration Computer fan noise Phone channel interference Vacuum cleaner noise Industrial noise Office noise Aircraft cabin noise Street noise Background conversation Background music Power hum

Block diagram of Spectral subtraction method Windowing FFT Compute magnitude Compute phase Silence/ Speeech decision Noise spectrum estimation IFFT OverLap and Add Input signal Output signal Two problems 1. How to decide silence and speech ? 2. How to estimate noise spectrum ?

Silence/Signal (Speech) decision 경험적 방법 – 단구간 에너지, 단구간 영교차율 등 사용 묵음구간 ( 단구간 에너지 小, 영교차율 大 ) 음성구간 ( 단구간 에너지 大, 영교차율 小 or 中 ) – 잡음의 특성이 변화하면 성능이 저하될 수 있다. 통계적 방법 – 가정 음성의 통계적 특성 = 잡음의 통계적 특성 – 통계적 특성을 표현하는 파라메터의 추정 필요 모델에 바탕을 둔 방법 – 가정 음성의 발생 모델 = 잡음의 발생 모델 – 음성과 유사한 특징을 갖는 잡음에 대해 성능이 저하될 수 있다.

경험적인 방법 단구간 에너지 단구간 영교차율 Eng Zcr Eng Zcr If Eng Zth then “noise”

Window 에 따른 short-time energy

Zero crossing rate 음성의 특성에 따른 분포 음성의 특성에 따른 값

Combine Energy & ZCR Speech (unvoiced) Speech (voiced) Silence (noise)

실습 -1 단구간 음성신호에 대해 energy 를 구함 단구간 음성신호에 대해 영교차율을 구함 위 두 신호를 임의의 array 에 저장하고 CCS 의 “ insert graphic ” 기능을 이용하여 – 입력된 파형과 연동하여 energy, zcr 을 plot – 입력신호가 어느 값에서 silence 에 해당하는지 조사 – 입력신호로, 마이크로폰을 사용 (AIC 입력 =MIC) 배포된 “ Noisy_sample[LMH].snd ” 파일을 player(Goldwave) 로 연속출력 (AIC 입력 =LINE) –Goldwave play 시 sampling_freq=16KHz, 16bits

Example (Speech 구간 )

Example (Silence 구간 )

Noise spectrum estimation Combine previous estimated, and current noise spectrum 지금까지 사용되어 온 noise magnitude spectrum Silence 라고 판단된 현재 프레임의 magnitude spectrum Update 된 noise magnitude spectrum

Example code

Clean spectrum estimation 현재 frame 에 대한 FFT 계수에서 –Magnitude spectrum 은 현재 추정된 noise spectrum 을 뺀다. –Phase spectrum 은 현재 frame 에 대한 값을 그대로 사용한다.

Overall flowchart Build frame & windowing Do FFT and Compute Amplitude, Phase If current frame is “silence”? Compute Energy and Zero-Crossing rate Noise amplitude spectrum update Yes Clean spectrum estimation Do IFFT & OverLap Add x(n) X(m)

실습 -2 Implement Spectral subtraction algorithm – 새롭게 추가될 부분 Compute Energy, Zero Crossing Rate Silence/Speech 구간 판정 – 경험적인 방법으로 Energy, ZCR 에 대한 문턱치 (threshold) 를 결정함. Noise power spectrum estimation (update) Clean speech estimation – 나머지 부분은 이전의 FFT-based 디지털 필터와 동일함. 결과 확인 – 입력 소스로 배포한 noisy speech 를 연속적으로 play –DIP 스위치 setting 에 따라 original speech 또는 처리된 결과를 출 력

결과 example (time domain)

결과 example (frequency domain)

Improving the performance of spectral subtraction method

Problem of spectral subtraction method Noise power spectrum measure 시 –Averaging effect 문제 발생 – 본래의 noise power > 추정된 noise power –Background noise ( 잔류 잡음 ) 이 인지됨 Possible solution (noise spectrum boosting) – 추정된 noise power spectrum 을 인위적으로 증폭 – 즉 clean spectrum estimation 시 boosting factor

Still problem! – annoying artifacts 물 흐르는 듯한 소리가 깔림 –So called “ Musical noise ” – 낮은 에너지 영역 -> “ 0 ” 에 가까운 값이 – 높은 에너지 영역 -> Total 성분이 됨 –Beep noise 발생 해결 방안 (Noise floor) – 낮은 에너지 영역에도 임의로 신호를 넣어 줌

실습 -3 Noise spectrum boosting 기법을 적용 –Boosting 정도에 따른 음질 가장 좋은 음질을 출력하는 boosting factor 결정 – 실습 2 와 음질적인 차이를 비교함 Noise floor 기법을 적용 –Noise floor level 에 따른 음질 가장 좋은 음질을 출력하는 floor level 을 결정 위 두 가지 방법을 DIP SW 에 의해 선택할 수 있 도록 프로그램을 수정.