EE Dept., IIT Bombay CEP-cum-TEQUIP-KITE Course “Digital Signal Processing”, IIT Bombay, 2–6 November 2015, Course Coordinator:

Slides:

Advertisements

Similar presentations

MPEG-1 MUMT-614 Jan.23, 2002 Wes Hatch. Purpose of MPEG encoding To decrease data rate How? –two choices: could decrease sample rate, but this would cause.

Advertisements

AVQ Automatic Volume and eQqualization control Interactive White Paper v1.6.

Digital Coding of Analog Signal Prepared By: Amit Degada Teaching Assistant Electronics Engineering Department, Sardar Vallabhbhai National Institute of.

Guitar Effects Processor Using DSP

Speech Compression. Introduction Use of multimedia in personal computers Requirement of more disk space Also telephone system requires compression Topics.

Filtering Filtering is one of the most widely used complex signal processing operations The system implementing this operation is called a filter A filter.

Digital Representation of Audio Information Kevin D. Donohue Electrical Engineering University of Kentucky.

EE Dept., IIT Bombay Workshop “AICTE Sponsored Faculty Development Programme on Signal Processing and Applications", Dept. of Electrical.

1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.

Top Level System Block Diagram BSS Block Diagram Abstract In today's expanding business environment, conference call technology has become an integral.

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.

MPEG Audio Compression by V. Loumos. Introduction Motion Picture Experts Group (MPEG) International Standards Organization (ISO) First High Fidelity Audio.

EE 198 B Senior Design Project. Spectrum Analyzer.

Warped Linear Prediction Concept: Warp the spectrum to emulate human perception; then perform linear prediction on the result Approaches to warp the spectrum:

Digital Communication Techniques

A Full Frequency Masking Vocoder for Legal Eavesdropping Conversation Recording R. F. B. Sotero Filho, H. M. de Oliveira (qPGOM), R. Campello de Souza.

Self-Calibrating Audio Signal Equalization Greg Burns Wade Lindsey Kevin McLanahan Jack Samet.

Real-time Implementation of Multi-band Frequency Compression for Listeners with Moderate Sensorineural Impairment [Ref.: N. Tiwari, P. C. Pandey, P. N.

Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.

LE 460 L Acoustics and Experimental Phonetics L-13

Ni.com Data Analysis: Time and Frequency Domain. ni.com Typical Data Acquisition System.

GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.

Digital Audio Watermarking: Properties, characteristics of audio signals, and measuring the performance of a watermarking system نيما خادمي کلانتري

Lecture 1 Signals in the Time and Frequency Domains

Figures for Chapter 6 Compression

SIGNAL PROCESSING IN HEARING AIDS

EE Dept., IIT Bombay NCC2014 Kanpur, 28 Feb.- 2 Mar. 2014, Paper No (Session III, Sat., 1 st Mar., 1020 – 1200) A Sliding-band.

IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults.

EE Audio Signals and Systems Effects Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

1 SPEECH PROCESSING FOR BINAURAL HEARING AIDS Dr P. C. Pandey EE Dept., IIT Bombay Feb’03.

Second International Conference on Intelligent Interactive Technologies and Multimedia (IITM 2013), March 2013, Allahabad, India 09 March 2013 Speech.

Audio Compression Usha Sree CMSC 691M 10/12/04. Motivation Efficient Storage Streaming Interactive Multimedia Applications.

CSC361/661 Digital Media Spring 2002

IIT Bombay Dr. Prem C. Pandey Dr. Pandey is a Professor in Electrical Engineering at IIT Bombay. He is currently also the Associate.

1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.

EE Dept., IIT Bombay Indicon2013, Mumbai, Dec. 2013, Paper No. 524 (Track 4.1,

SPEECH CODING Maryam Zebarjad Alessandro Chiumento.

1/18 1.Intro 2. Implementation 3. Results 4. Con.

EE Dept., IIT Bombay NCC 2013, Delhi, Feb. 2013, Paper 3.2_2_ ( Sat.16 th, 1135 – 1320, 3.2_2) Speech Enhancement.

EE Dept., IIT Bombay NCC 2015, Mumbai, 27 Feb.- 1 Mar. 2015, Paper No (28 th Feb., Sat., Session SI, 10:05 – 11:15, Paper.

♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.

1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.

Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.

Basic Concepts of Audio Watermarking. Selection of Different Approaches Embedding Domain  time domain  frequency domain DFT, DCT, etc. Modulation Method.

Image Processing Architecture, © 2001, 2002, 2003 Oleh TretiakPage 1 ECE-C490 Image Processing Architecture MP-3 Compression Course Review Oleh Tretiak.

Gammachirp Auditory Filter

EE Dept., IIT Bombay IEEE Workshop on Intelligent Computing, IIIT Allahabad, Oct Signal processing for improving speech.

CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2009.

P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti / DSP 2009, Santorini, 5-7 July DSP 2009 (Santorini, Greece. 5-7 July 2009), Session: S4P,

IIT Bombay 14 th National Conference on Communications, 1-3 Feb. 2008, IIT Bombay, Mumbai, India 1/27 Intro.Intro.

IIT Bombay {pcpandey,   Intro. Proc. Schemes Evaluation Results Conclusion Intro. Proc. Schemes Evaluation Results Conclusion.

EE Dept., IIT Bombay P. C. Pandey, "Signal processing for persons with sensorineural hearing loss: Challenges and some solutions,”

4.2 Digital Transmission Pulse Modulation Pulse Code Modulation

EE Dept., IIT Bombay Part B Sliding-band Dynamic Range Compression (N. Tiwari & P. C. Pandey, NCC 2014) P. C. Pandey, "Signal processing.

EE Dept., IIT Bombay Workshop “Radar and Sonar Signal Processing,” NSTL Visakhapatnam, Aug 2015 Coordinator: Ms. M. Vijaya.

1 Introduction1 Introduction 2 Noise red. tech 3 Spect. Subtr. 4. QBNE 5 Invest. QBNE 6 Conc., & future work2 Noise red. tech 3 Spect. Subtr.4. QBNE5 Invest.

CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2014.

IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.

A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.

SOUND PRESSURE, POWER AND LOUDNESS

Lifecycle from Sound to Digital to Sound. Characteristics of Sound Amplitude Wavelength (w) Frequency ( ) Timbre Hearing: [20Hz – 20KHz] Speech: [200Hz.

Saketh Sharma, Nitya Tiwari, & Prem C. Pandey

ARTIFICIAL NEURAL NETWORKS

Digital Communications Chapter 13. Source Coding

A Smartphone App-Based

Basic Concepts of Audio Watermarking

Results from offline processing

MPEG-1 Overview of MPEG-1 Standard

Govt. Polytechnic Dhangar(Fatehabad)

Embedded Sound Processing : Implementing the Echo Effect

Presentation transcript:

EE Dept., IIT Bombay CEP-cum-TEQUIP-KITE Course “Digital Signal Processing”, IIT Bombay, 2–6 November 2015, Course Coordinator: Prof. V. M. Gadre =============================================================== Audio Signals and Dynamic Range Compression P. C. Pandey EE Dept, IIT Bombay ee.iitb.ac.in Day, Date, & Time: Tuesday 3/11/2015, 1115 – 1245 Venue: Maths Dept, A1A2 Classroom, IIT Bombay

EE Dept., IIT Bombay 2/27 Overview 1. Introduction 2.Sliding-band Dynamic Range Compression 3.Offline & Real-time Implementations 4.Test Results 5.Summary & Conclusion References U. Zölzer, “Dynamic range control,” in Digital Audio Signal Processing, 2nd ed., Chichester, West Sussex, U.K.: Wiley, 2008, pp N. Tiwari and P. C. Pandey, “A sliding-band dynamic range compression for use in hearing aids,” in Proc. National Conference on Communications 2014, Kanpur, paper no

EE Dept., IIT Bombay /27 1. Introduction Dynamic Range Ratio of maximum to minimum signal amplitude, expressed in dB. Dynamic Range of Audio Signals: 40−120 dB. Dynamic Range Control Measuring the input level and adaptively adjusting the signal level in accordance with the requirements of Signal acquisition Processing; Storage; Transmission Reproduction Listening environment; Listener characteristics

EE Dept., IIT Bombay /27 Some Applications A/D Converter & Recording System: Optimal use of the amplitude range without causing saturation or overload. Sound Reproduction: Lowest level above the ambient noise and highest within the linear range of the speaker. Hearing aids for Persons with Moderate-to-Severe Sensorineural Hearing Loss: To present sounds comfortably within the severely limited dynamic range of the listener by amplifying the low level sounds without making the high level sounds uncomfortably loud.

EE Dept., IIT Bombay /27 Sensorineural Hearing Loss Causes Loss of the sensory mechanism in the inner ear or the abnormalities in the auditory nerve Problems Reduced dynamic range of hearing: Frequency-dependent elevation of hearing threshold levels (HTL) without corresponding increase in uncomfortable listening level (UCL), with a narrow gap between HTL and UCL (as low as 10 DB) Loudness recruitment: Abnormally rapid growth in loudness with sound level. Different growth function for different frequencies. No easy established tests.

EE Dept., IIT Bombay /27 Processing Steps in Dynamic Range Control Level estimation (input or output) Gain calculation Gain application Classification of Dynamic Range Controllers On the basis of signal level calculation: single-band or multiband On the basis of gain control method: feedback or feed- forward

EE Dept., IIT Bombay /27 Processing Gain dependent on the dynamically varying signal level. Parameters: Compression threshold (T H ) Compression ratio (CR) Attack & release time in level calculation Problems Single-Band Dynamic Range Compression Compensation for frequency-dependent loudness growth not feasible. Power mostly contributed by low-frequency components → level of of high-frequency components controlled by low-frequency components → Inaudibility of high frequency components, distortions in temporal envelope

EE Dept., IIT Bombay /27 Multiband Dynamic Range Compression General Scheme of Processing Spectral components of the input signal divided in multiple bands and the gain for each band calculated on the basis of signal power in that band. Parameters (band specific): compression threshold T H, compression ratio CR, attack & release time for detection.

EE Dept., IIT Bombay /27 Some Earlier Investigations Lippmann et al. (1980): 16-channel compression. 9% improvement in recognition score over linear amplification. Asano et al.(1991): Multiband dynamic range compression realized as a single time-varying FIR filter & implemented on a 32-bit DSP fixed-point processor. Less spectral distortion due to smoothened frequency response of FIR filter. Stone et al. (1999): Comparison of single and four-channel compression schemes & effect of varying CR, T H, and attack & release times. Intelligibility & quality tests showed no specific preference for schemes. Li et al. (2000): Wavelet-based compression (7 octave sub-band analysis using wavelet filter bank & resynthesis after applying a logarithmic compression on the wavelet coefficients). Increase in intelligibility without introducing noticeable distortions. Magotra et al. (2000): Multiband dynamic range compression using a 16-bit fixed-point processor. Taylor's series approximation used for the compression function to reduce computations in gain calculation.

EE Dept., IIT Bombay /27 Disadvantages of Multiband Compression Spurious spectral distortions Reduction in spectral contrasts and modulation depth Distortion in spectral shape of spectral peaks (speech formants) lying across the band boundaries Distortion of transitions of spectral peaks across the adjacent bands Time-varying magnitude response without corresponding variation in the phase response leading to quality degradation → Audible distortions, perceptible discontinuities, adverse effect on the perception of certain speech cues.

EE Dept., IIT Bombay /27 Example of distortion due to multiband dynamic range compression during spectral transition Processed output: multiband compression with 18 auditory critical bands, CR = 30, T a = 6.4 ms, T r = 192 ms Swept sinusoidal input: constant amplitude, 125 –250 Hz linearly swept frequency, 200 ms sweep duration Time (s)

EE Dept., IIT Bombay /27 Investigation for a Solution Real-time dynamic range compression to compensate for frequency-dependent loudness recruitment associated with sensorineural hearing loss for use in hearing aids with a low- power processor. Low distortions Low computational complexity & memory requirement Low signal delay (algorithmic + computational)

EE Dept., IIT Bombay /27 Proposed Scheme: Sliding-Band Dynamic Range Compression Proposed for significantly reducing the temporal and spectral distortions associated with the currently used single-band and multiband compressions in hearing aids. Realized with computational complexity acceptable for implementation on a 16-bit fixed-point DSP processor and signal delay acceptable for real-time application. Investigations Using offline & Real-Time Implementations S election of processing parameters Evaluation of the Implementations Informal listening, PESQ measure

EE Dept., IIT Bombay /27 2. Sliding-Band Dynamic Range Compression Short-time spectral analysis: windowing, zero-padding, DFT calculation Spectral modification: gain calculation, output spectrum calculation Resynthesis: IDFT calculation, windowing, overlap-add Processing Applying a frequency-dependent gain function, with the gain for each spectral sample determined by the short-time power in auditory critical bandwidth centered at it & in accordance with the specified hearing thresholds, compression ratios, and attack and release times.

EE Dept., IIT Bombay /27 Spectral modification: P mc (k): Power at upper comfortable listening level CR(k): Compression ratio Short-time spectral analysis: windowing (length L, shift S ), zero- padding, N -point DFT Resynthesis: N -point IDFT, overlap-add

EE Dept., IIT Bombay /27 Gain Calculation Auditory critical bandwidth BW(k) = ( f 2 ) 0.69, freq. sample = k, freq. = f Target gain calculation Power at upper comfortable listening level: P mc (k) Compression ratio: CR(k) Input power: P ic (k), Output power: P oc (k) Target gain: G t (k) = P oc (k) / P ic (k) Compression relation dB scale: [P oc (k) / P mc (k)] dB = [P ic (k) / P mc (k)] dB / CR(k) linear scale: P oc (k) / P mc (k) = [P ic (k) / P mc (k)] 1/ CR(k) Target gain for k th spectral sample [G t (k)] dB = [1 − 1 / CR(k)] [P mc (k) / P ic (k)] dB

EE Dept., IIT Bombay /27 Gain changed in steps from the previous value towards the target value with settable attack and release times Fast attack: to avoid the output level from exceeding UCL during transients Slow release: to avoid the pumping effect or amplification of breathing Number of steps during attack phase = s a Number of steps during release phase = s r Target gain corresponding to min. input level = G max Target gain corresponding to max. input level = G min Gain ratio for attack phase γ a = (G max / G min ) 1/sa Gain ratio for release phase γ r = (G max / G min ) 1/sr Gain for i th window & k th spectral sample G(i,k) = max[G(i − 1,k) / γ a, G t (i,k)] for G t (i,k) < G(i − 1,k) min[G(i − 1,k) γ r, G t (i,k)] for G t (i,k) > G(i − 1,k) Attack time T a = s a S / f s, Release time T r = s r S / f s [f s = sampling freq., S = window shift]

EE Dept., IIT Bombay /27 Implementation Related Challenges Audible distortions due to modifications in the short-time mag. spectrum without associated modification in the phase spectrum. High computational complexity: log or series approximation based gain calculation at each spectral sample for use in sliding- band compression. Solutions Analysis-synthesis using least-square error based signal estimation from modified STFT (Griffin & Lim, 1984): Processing artifacts reduced by masking the effect of phase discontinuities in the modified short-time complex spectrum. Look-up table based gain calculation: Two-dimensional look-up table relating the input power with gain as a function of frequency. Permits compression function most suited to compensate for the abnormal loudness growth.

EE Dept., IIT Bombay /27 3. Offline & Real-Time Implementations Implementation for Offline Processing Implementation using Matlab 7.10 for evaluating the proposed technique and the effect of processing parameters. Processing parameters ◦ f s = 10 kHz ◦ Frame length = 25.6 ms ( L = 256 ) ◦ Overlap = 75% ( S = 64 ) ◦ FFT size N = 512 2D look-up table for frequency-dependent compression based on a linear relation between input-dB and output-dB, with settable CR(k) and P mc (k). ◦ Input range: 20 log intervals (trade-off: small gain increments, look-up table size). ◦ Look-up table with 256×20 entries Attack and release times ◦ s a =1, T a = 6.4 ms : Fast attack to avoid uncomfortable level during transients ◦ s r =30, T r = 192 ms : Slow release to avoid pumping & amplification of breathing

EE Dept., IIT Bombay /27 Implementation for Real-Time Processing Implementation on a 16-bit fixed-point DSP board to examine suitability of the technique for use in hearing aids. DSP chip: TI/TMS320C5515 ◦16 MB memory space ( 320 KB on-chip RAM with 64 KB dual access data memory) ◦ Three 32 -bit programmable timers ◦4 DMA controllers each with 4 channels ◦ FFT hardware accelerator ( up to point FFT) ◦ Max. clock speed: 120 MHz DSP Board: eZdsp ◦ 4 MB on-board NOR flash for user program ◦ Stereo codec TLV320AIC3204: 16/20/24/32-bit ADC & DAC, 8 – 192 kHz sampling Software development: C using TI's 'CCStudio ver. 4. 0

EE Dept., IIT Bombay /27 Input-output operations: DMA based I/O with cyclic buffers ADC and DAC: one codec (left channel) with 16 -bit quantization Processing parameters (same as for offline processing): f s = 10 kHz, L = 256, S = 64, N = 512 Data representation (input samples, spectral values, processed samples): 16 -bit real & 16 -bit imaginary Implementation details

EE Dept., IIT Bombay /27 Data transfers & buffering operations ( S = L/4 ) DMA cyclic buffers 5 -block S - sample input buffer 2 -block S - sample output buffer Pointers Current input block Just-filled input block Current output block Write-to output block (incremented cyclically on DMA interrupt) Signal delay: Algorithmic: 1 frame ( 25.6 ms), Computational ≤ frame shift ( 6.4 ms)

EE Dept., IIT Bombay /27 4. Test Results Tests for Verification & Evaluation Offline processing Verification of the compression technique for speech input with a large level variation and examination of the effect of different set of processing parameters. Assessment of output speech quality (using informal listening) for different input speech materials and time varying levels. Comparison of distortions introduced by different compression techniques during spectral transitions. Real-time processing Comparison of the processed outputs from offline & real-time implementation: informal listening, PESQ measure (0 – 4.5). Signal delay & computational requirement.

EE Dept., IIT Bombay /27 Example: "you will mark ut please" concatenated with scaling factors for variation in the input level. CR = 2, T a = 6.4 ms, T r = 6.4 & 192 ms. Input waveform Scaling factor Unprocessed waveform Processed T r = 6.4 ms, low P mc Processed T r = 192 ms, low P mc Processed T r = 6.4 ms, high P mc Processed T r = 192 ms, high P mc Time (s) Results from Offline Processing Processing of different speech materials with varying levels: No audible roughness or distortion during informal listening.

EE Dept., IIT Bombay /27 Time (s) Distortions during spectral transitions: Example of swept sinusoidal input. Sliding band compression output Multiband compression (18 auditory critical bands) output Single-band compression output Input: constant amplitude, 125 –250 Hz linearly swept frequency, 200 ms sweep duration CR = 30, T a = 6.4 ms, T r = 192 ms.

EE Dept., IIT Bombay /27 Results from Real-Time Processing Informal listening: real-time output perceptually similar to the offline output PESQ for real-time w.r.t. offline : 3.5 Signal delay = 36 ms Use of processing capacity: 41% (lowest acceptable clock: 50 MHz, max = 120 MHz) Unprocessed Offline processed Real-time processed Example: "you will mark ut please" concatenated with scaling factors for variation in the input level. CR = 2, T a = 6.4 ms, T r = 192 ms, low P mc. Time (s)

EE Dept., IIT Bombay /27 5. Summary & Conclusions Summary: Development & investigation of sliding band compression scheme Realized using modified fixed-frame analysis-synthesis for low computational complexity & without distortions associated with phase discontinuities. Suitable for speech & non-speech audio & provision for settable attack time, release time, & compression ratios. Implemented using 16-bit fixed-point DSP chip & tested for satisfactory operation: 36 ms signal delay, 41% use of processing capacity, indicating scope for combination with other processing techniques. Conclusion: Sliding-band compression can be used to compensate for frequency-dependent loudness recruitment without introducing the distortions associated with single-band & multiband compression.

EE Dept., IIT Bombay Thank you