Speech & Audio Processing

Slides:



Advertisements
Similar presentations
Alex Chen Nader Shehad Aamir Virani Erik Welsh
Advertisements

Audio Compression ADPCM ATRAC (Minidisk) MPEG Audio –3 layers referred to as layers I, II, and III –The third layer is mp3.
Introduction to MP3 and psychoacoustics Material from website by Mark S. Drew
Guerino Mazzola (Fall 2014 © ): Introduction to Music Technology IIIDigital Audio III.6 (Fr Oct 24) The MP3 algorithm with PAC.
MPEG/Audio Compression Tutorial Mike Blackstock CPSC 538a January 11, 2004.
CS335 Principles of Multimedia Systems Audio Hao Jiang Computer Science Department Boston College Oct. 11, 2007.
MPEG-1 MUMT-614 Jan.23, 2002 Wes Hatch. Purpose of MPEG encoding To decrease data rate How? –two choices: could decrease sample rate, but this would cause.
Chapter 4 sampling of continous-time signals 4.5 changing the sampling rate using discrete-time processing 4.1 periodic sampling 4.2 discrete-time processing.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Digital Audio Compression
Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.
Philippe Gournay, Bruno Bessette, Roch Lefebvre
Digital Audio Coding – Dr. T. Collins Standard MIDI Files Perceptual Audio Coding MPEG-1 layers 1, 2 & 3 MPEG-4.
Speech Recognition Chapter 3
AUDIO COMPRESSION TOOLS & TECHNIQUES Gautam Bhattacharya.
Digital Representation of Audio Information Kevin D. Donohue Electrical Engineering University of Kentucky.
PAC/AAC audio coding standard A. Moreno Georgia Institute of Technology ECE8873-Spring/2004
1 Digital Audio Compression. 2 Formats  There are many different formats for storing and communicating digital audio:  CD audio  Wav  Aiff  Au 
Speech Coding Nicola Orio Dipartimento di Ingegneria dell’Informazione IV Scuola estiva AISV, 8-12 settembre 2008.
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
MPEG Audio Compression by V. Loumos. Introduction Motion Picture Experts Group (MPEG) International Standards Organization (ISO) First High Fidelity Audio.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
COMP 249 :: Spring 2005 Slide: 1 Audio Coding Ketan Mayer-Patel.
Warped Linear Prediction Concept: Warp the spectrum to emulate human perception; then perform linear prediction on the result Approaches to warp the spectrum:
1 Audio Compression Multimedia Systems (Module 4 Lesson 4) Summary: r Simple Audio Compression: m Lossy: Prediction based r Psychoacoustic Model r MPEG.
CS :: Fall 2003 Audio Coding Ketan Mayer-Patel.
Over-Sampling and Multi-Rate DSP Systems
Sampling Terminology f 0 is the fundamental frequency (Hz) of the signal –Speech: f 0 = vocal cord vibration frequency (>=80Hz) –Speech signals contain.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
Digital Audio Watermarking: Properties, characteristics of audio signals, and measuring the performance of a watermarking system نيما خادمي کلانتري
Audio Compression Usha Sree CMSC 691M 10/12/04. Motivation Efficient Storage Streaming Interactive Multimedia Applications.
AUDIO COMPRESSION msccomputerscience.com. The process of digitizing audio signals is called PCM PCM involves sampling audio signal at minimum rate which.
Multiresolution STFT for Analysis and Processing of Audio
A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.
Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.
Speech and Audio Coding Heejune AHN Embedded Communications Laboratory Seoul National Univ. of Technology Fall 2013 Last updated
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
SPEECH CODING Maryam Zebarjad Alessandro Chiumento.
MPEG Audio coders. Motion Pictures Expert Group(MPEG) The coders associated with audio compression part of MPEG standard are called MPEG audio compressor.
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.
8. 1 MPEG MPEG is Moving Picture Experts Group On 1992 MPEG-1 was the standard, but was replaced only a year after by MPEG-2. Nowadays, MPEG-2 is gradually.
Basics of Neural Networks Neural Network Topologies.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Submitted By: Santosh Kumar Yadav (111432) M.E. Modular(2011) Under the Supervision of: Mrs. Shano Solanki Assistant Professor, C.S.E NITTTR, Chandigarh.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
1 Audio Coding. 2 Digitization Processing Signal encoder Signal decoder samplingquantization storage Analog signal Digital data.
Subband Coding Jennie Abraham 07/23/2009. Overview Previously, different compression schemes were looked into – (i)Vector Quantization Scheme (ii)Differential.
SPEECH CODING Maryam Zebarjad Alessandro Chiumento Supervisor : Sylwester Szczpaniak.
DEPARTMENTT OF ECE TECHNICAL QUIZ-1 AY Sub Code/Name: EC6502/Principles of digital Signal Processing Topic: Unit 1 & Unit 3 Sem/year: V/III.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
Sub-Band Coding Multimedia Systems and Standards S2 IF Telkom University.
Audio Coding Lecture 7. Content  Digital Audio Basic  Speech Compression  Music Compression.
Presentation III Irvanda Kurniadi V. ( )
MP3 and AAC Trac D. Tran ECE Department The Johns Hopkins University Baltimore MD
MP3 and MP4 Audio By: Krunal Tailor
[1] National Institute of Science & Technology Technical Seminar Presentation 2004 Suresh Chandra Martha National Institute of Science & Technology Audio.
PATTERN COMPARISON TECHNIQUES
III Digital Audio III.6 (Fr Oct 20) The MP3 algorithm with PAC.
Digital Communications Chapter 13. Source Coding
Vocoders.
Linear Prediction.
Linear Predictive Coding Methods
MPEG-1 Overview of MPEG-1 Standard
III Digital Audio III.6 (Mo Oct 22) The MP3 algorithm with PAC.
Linear Prediction.
Govt. Polytechnic Dhangar(Fatehabad)
Presentation transcript:

Speech & Audio Processing Digital Systems: Hardware Organization and Design 4/16/2017 Speech & Audio Processing Speech & Audio Coding Examples Architecture of a Respresentative 32 Bit Processor

Linear Prediction Analysis A Simple Speech Coder LPC Based Analysis Structure Linear Prediction Analysis Pre- emphasis Windowing Analysis Auto- Correlation Levinson- Durbin Audio Input Residual Residual Analysis Filter Quantization Filter Coeffs Filter Coeffs 16 April 2017 Veton Këpuska

Windowing Analysis Stage N – Length of the Analysis Window 10-30 msec 16 April 2017 Veton Këpuska

Some Analysis Windows 16 April 2017 Veton Këpuska

MATLAB Useful Functions wintool Use “doc wintool” for more information window Use “>doc window” for the list of supported windows Define your own window if needed e.g: Sine window and Vorbis window 16 April 2017 Veton Këpuska

LPC Analysis Stage LPC Method Described in: Summary: MATLAB help Ch5-Analysis_&_Synthesis_of_Pole-Zero_Speech_Models.ppt Summary: Perform Autocorrelation Solve system of equations with Durbin-Levinson Method MATLAB help doc lpc, etc. 16 April 2017 Veton Këpuska

Example of MATLAB Code ge[n] ŝ[n] function myLPCCodec(wavfile, N) % % wavfile - input MS wav file % N - LPC Filter Order [x, fs, nbits] = wavread(wavfile); % plot(x); % Playing Original Signal soundsc(x,fs); % Performing LPC analysis using MATLAB lpc function [a, g] = lpc(x,N); % performing filtering operation on estimated filter coeffs % producing predicted samples est_x = filter([0 -a(2:end)], 1, x); % error signal e = x - est_x; % Testing the quality of predicted samples soundsc(est_x, fs); % Synthesis Stage With Zero Loss of Information syn_x = filter([0 -a(2:end)], 1, g.*e); soundsc(syn_x,fs); ge[n] ŝ[n] 16 April 2017 Veton Këpuska

Analysis of Quantization Errors Use MATLAB functions to research the effects of quantization errors introduced by precision of the arithmetic operations and representation of the filter and error signal: Double (float64) representation (software emulation) Float (float32) representation (software emulation) Int (int32) representation (hardware emulation) Short (int16) representation (hardware emulation). Useful MATLAB functions: Fix, floor, round, ceil Example: sig_hat=fix(sig*2^(B-1))/2^(B-1); Truncation of the sig to B bits. 16 April 2017 Veton Këpuska

Quantization of Error Signal & Filter Coefficients Can Apply ADPCM for Error Signal Filter Coefficients in the Direct Filter Form are found to be sensitive to quantization errors: Small quantization error can have a large effect on filter characteristics. Issue is that polynomial coefficients have non-linear mapping to poles of the filter (e.g., roots of the polynomial). Alternate representations possible that have significantly better tolerance to quantization error. 16 April 2017 Veton Këpuska

LPC Filter Representations As noted previously when Levinson-Durbin algorithm was introduced one alternate representation to filter coefficients was also mentioned: PARCOR coefficients: LPC to PARCOR: 16 April 2017 Veton Këpuska

PARCOR Filter Representation PARCOR to LPC: 16 April 2017 Veton Këpuska

Line Spectral Frequency Representation It turns out that PARCOR coefficients can be represented with LSF that have significantly better properties. Note that: The PARCOR lattice structure of the LPC synthesis filter above: Input Output Ap Ap-1 A0 + + kp kp-1 kp+1=∓1 k0=-1 - - z-1 z-1 z-1 Bp Bp-1 B0 16 April 2017 Veton Këpuska

Line Spectral Frequency Representation From previous slide the following holds: From this realization of the filter the LSP representation is derived: 16 April 2017 Veton Këpuska

LSF Representation 16 April 2017 Veton Këpuska

LPC Synthesis Filter with LSF 16 April 2017 Veton Këpuska

A Simple Speech Coder LPC Based Synthesis Structure Residual Signal Synthesis Filter De- emphasis Audio Output Residual Decoding Filter Coeffs Filter Coeffs 16 April 2017 Veton Këpuska

Audio Coding

Digital Systems: Hardware Organization and Design 4/16/2017 Audio Coding Most of the Audio Coding Standards use principles of Psychoacoustics. Example of Basic Structure of MP3 encoder: Audio Input Bit-stream Filterbank & Transform Quantization Psychoacoustic Model 16 April 2017 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Basic Structure of Audio Coders Filterbank Processing Psychoacoustic Model Quantization 16 April 2017 Veton Këpuska

Filter Bank Analysis Synthesis

Filterbank Processing: Splitting full-band signal into several sub-bands: Uniform sub-bands (FFT) Critical Band (FFT followed by non-linear transformation) Reflect Human Auditory Apparatus. Mel-Scale and Bark-Scale transformations 16 April 2017 Veton Këpuska

Mel-Scale 16 April 2017 Veton Këpuska

Bark-Scale 16 April 2017 Veton Këpuska

Analysis Structure of Filterbank hk[n] – Impulse Response of a Quadrature Mirror kth-filter N – Number of Channels. Typically 32 ↓ - Down-sampling MDCT – Modified Discrete Cosine Transform h1[n] ↓ MDCT MDCT Audio Input Bit Stream hk[n] ↓ MDCT Quantization MDCT hN[n] ↓ MDCT MDCT 16 April 2017 Veton Këpuska

Analysis Structure of Filterbank gk[n] – Impulse Response of a Inverse Quadrature Mirror kth-filter N – Number of Channels. Typically 32 ↑ - Up-sampling IMDCT – Inverse Modified Discrete Cosine Transform MDCT IMDCT ↑ g1[n] Bit Stream Audio Output Decoding MDCT IMDCT ↑ gk[n] MDCT IMDCT ↑ gN[n] 16 April 2017 Veton Këpuska

Psycho-Acoustic Modeling

Psychoacoustic Model Masking Threshold according to the human auditory perception. Masking threshold is used to quantize the Discrete Cosine Transform Coefficients Analysis is done in frequency domain represented by DFT and computed by FFT. 16 April 2017 Veton Këpuska

Threshold of Hearing Absolute threshold of audibly perceptible events in quiet conditions (no other sounds). Any signal bellow the threshold can be removed without effect on the perception. 16 April 2017 Veton Këpuska

Threshold of Hearing 16 April 2017 Veton Këpuska

Frequency Masking Schröder Spreading Function Bark Scale Function: 16 April 2017 Veton Këpuska

Masking Curve 16 April 2017 Veton Këpuska

Primary Tone 1kHz 16 April 2017 Veton Këpuska

Masked Tone 900 Hz 16 April 2017 Veton Këpuska

Combined Sound 1kHz + 0.9kHz 16 April 2017 Veton Këpuska

Combined 1kHz + 0.9kHz (-10dB) 16 April 2017 Veton Këpuska

Combined 1kHz + 5kHz (-10dB) 16 April 2017 Veton Këpuska

END 16 April 2017 Veton Këpuska