[Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006.

Slides:



Advertisements
Similar presentations
Introduction to Speech Recognition
Advertisements

Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution.
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
Speech Enhancement through Noise Reduction By Yating & Kundan.
Lecture 11: Introduction to Fourier Series Sections 2.2.3, 2.3.
CEN352, Dr. Ghulam Muhammad King Saud University
Speech Coding Nicola Orio Dipartimento di Ingegneria dell’Informazione IV Scuola estiva AISV, 8-12 settembre 2008.
SOME SIMPLE MANIPULATIONS OF SOUND USING DIGITAL SIGNAL PROCESSING Richard M. Stern demo August 31, 2004 Department of Electrical and Computer.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
To Understand, Survey and Implement Neurodynamic Models By Farhan Tauheed Asif Tasleem.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
MPEG Audio Compression by V. Loumos. Introduction Motion Picture Experts Group (MPEG) International Standards Organization (ISO) First High Fidelity Audio.
Subband-based Independent Component Analysis Y. Qi, P.S. Krishnaprasad, and S.A. Shamma ECE Department University of Maryland, College Park.
Communications & Multimedia Signal Processing 1 Speech Communication for Mobile and Hands-Free Devices in Noisy Environments EPSRC Project GR/S30238/01.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
11 EENG 3810 Chapter 4 Amplitude Modulation (AM).
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING MARCH 2010 Lan-Ying Yeh
Representing Acoustic Information
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults.
Speech Signal Processing
„Bandwidth Extension of Speech Signals“ 2nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd and 23rd June.
Microphone Integration – Can Improve ARS Accuracy? Tom Houy
Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.
Comparing Audio Signals Phase misalignment Deeper peaks and valleys Pitch misalignment Energy misalignment Embedded noise Length of vowels Phoneme variance.
Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.
Concepts of Multimedia Processing and Transmission IT 481, Lecture #4 Dennis McCaughey, Ph.D. 25 September, 2006.
SPEECH CODING Maryam Zebarjad Alessandro Chiumento.
Overview of Part I, CMSC5707 Advanced Topics in Artificial Intelligence KH Wong (6 weeks) Audio signal processing – Signals in time & frequency domains.
1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.
8. 1 MPEG MPEG is Moving Picture Experts Group On 1992 MPEG-1 was the standard, but was replaced only a year after by MPEG-2. Nowadays, MPEG-2 is gradually.
ICASSP Speech Discrimination Based on Multiscale Spectro–Temporal Modulations Nima Mesgarani, Shihab Shamma, University of Maryland Malcolm Slaney.
Advanced Topics in Speech Processing (IT60116) K Sreenivasa Rao School of Information Technology IIT Kharagpur.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
 To Cover the basic theory and algorithms that are widely used in digital image processing.  To Expose students to current technologies and issues that.
Recognition of Speech Using Representation in High-Dimensional Spaces University of Washington, Seattle, WA AT&T Labs (Retd), Florham Park, NJ Bishnu Atal.
CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2009.
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
ITU-T G.729 EE8873 Rungsun Munkong March 22, 2004.
ECE 8443 – Pattern Recognition ECE 3163 – Signals and Systems Objectives: Demultiplexing and Demodulation Superheterodyne Receivers Review Resources: Wiki:
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
A Recognition Model for Speech Coding Wendy Holmes 20/20 Speech Limited, UK A DERA/NXT Joint Venture.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Fourier and Wavelet Transformations Michael J. Watts
Voice Sampling. Sampling Rate Nyquist’s theorem states that a signal can be reconstructed if it is sampled at twice the maximum frequency of the signal.
Institut für Nachrichtengeräte und Datenverarbeitung Prof. Dr.-Ing. P. Vary On the Use of Artificial Bandwidth Extension Techniques in Wideband Speech.
1 What is Multimedia? Multimedia can have a many definitions Multimedia means that computer information can be represented through media types: – Text.
Topic: Pitch Extraction
1 Speech Compression (after first coding) By Allam Mousa Department of Telecommunication Engineering An Najah University SP_3_Compression.
Spectral subtraction algorithm and optimize Wanfeng Zou 7/3/2014.
Lifecycle from Sound to Digital to Sound. Characteristics of Sound Amplitude Wavelength (w) Frequency ( ) Timbre Hearing: [20Hz – 20KHz] Speech: [200Hz.
High Quality Voice Morphing
Discrete Fourier Transform (DFT)
Digital Communications Chapter 13. Source Coding
Vocoders.
Fourier and Wavelet Transformations
Linear Prediction.
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Chapter 1 Introduction to Digital Signal Processing
Speech and Audio Processing
Voice source characterisation
Richard M. Stern demo January 12, 2009
Govt. Polytechnic Dhangar(Fatehabad)
CEN352, Dr. Ghulam Muhammad King Saud University
INTRODUCTION TO ADVANCED DIGITAL SIGNAL PROCESSING
Presentation transcript:

[Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006

State of the Art in Speech/Audio Speech and audio processing may be divided into “low-level” and “high-level” inference Speech enhancement, compression, and coding are all widely used technologies This low-level work is the most mature High-level tasks will drive future advances Speech/music database information retrieval Automatic speaker and speech recognition But low-level issues also remain…

Fundamental Questions How to obtain highly structured representations of speech and audio signals? Time frequency “atoms” as building blocks How can statistical inference enable advances in speech signal processing? A means to obtain an “atomic decomposition” Statistical modeling of time- frequency coefficients provides a principled solution

Representative Applications Missing data in the context of VOIP: Original Missing Restored Source / Speaker Separation Source 1 Source 2 Mixture 1 Mixture 2 Recovery 1 Recovery 2

Digital Speech/Audio Processing

Speech Production

Time-Scale Modification

Male & Female Speaker Original Fast Faster Slower Trumpet Original Fast Slow Speech and Quasi-Periodic Audio Sinewave-based Modification Voicing-dependent Rate Factor

More Time-Scale Modification Falling Can, Bongo Drums, Loon Original Slow Complex Non-Speech Signals Phase-Vocoder-based Modification Event-Dependent Phase Coherence

Pitch and Vocal Tract Change Male & Female Speaker Original Low pitch/Long vocal tract High pitch/Short vocal tract Male Speaker Original and Monotone Sinewave-based Modification

Speech Coding Female Speaker Original CELP 8000 bps Sine 4800 bps Sine 2400 bps Sinewave-based Code-Excited Linear Prediction Male Speaker Original CELP 8000 bps Sine 4800 bps Sine 2400 bps

Noise Reduction Cell Phone Noise, Cocktail Party, Automobile Noise Original Enhanced Adaptive Wiener Filter Adaptation Based on Spectral Change

Compression Low-noise case Original 1.5 dB Reduction 3.0 dB Reduction Reduction of Peak-to-RMS amplitude ratio Based on Sinewave Analysis/Synthesis High-noise case Original 1.5 dB Reduction 3.0 dB Reduction