A Full Frequency Masking Vocoder for Legal Eavesdropping Conversation Recording R. F. B. Sotero Filho, H. M. de Oliveira (qPGOM), R. Campello de Souza.

Slides:



Advertisements
Similar presentations
EET260: A/D and D/A conversion
Advertisements

Speech Coding Techniques
Introduction to MP3 and psychoacoustics Material from website by Mark S. Drew
Digital Signal Processing
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Digital Audio Compression
Digital Coding of Analog Signal Prepared By: Amit Degada Teaching Assistant Electronics Engineering Department, Sardar Vallabhbhai National Institute of.
SWE 423: Multimedia Systems Chapter 3: Audio Technology (2)
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
AUDIO COMPRESSION TOOLS & TECHNIQUES Gautam Bhattacharya.
Digital Representation of Audio Information Kevin D. Donohue Electrical Engineering University of Kentucky.
IT-101 Section 001 Lecture #8 Introduction to Information Technology.
CEN352, Dr. Ghulam Muhammad King Saud University
CELLULAR COMMUNICATIONS 5. Speech Coding. Low Bit-rate Voice Coding  Voice is an analogue signal  Needed to be transformed in a digital form (bits)
Speech Coding Nicola Orio Dipartimento di Ingegneria dell’Informazione IV Scuola estiva AISV, 8-12 settembre 2008.
Speech & Audio Processing
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
Digital Voice Communication Link EE 413 – TEAM 2 April 21 st, 2005.
Chapter 2 Fundamentals of Data and Signals
© 2006 Cisco Systems, Inc. All rights reserved. 2.2: Digitizing and Packetizing Voice.
Chapter 2: Fundamentals of Data and Signals. 2 Objectives After reading this chapter, you should be able to: Distinguish between data and signals, and.
Fundamentals of Digital Audio. The Central Problem n Waves in nature, including sound waves, are continuous: Between any two points on the curve, no matter.
Chapter 4 Digital Transmission
1 Audio Compression Multimedia Systems (Module 4 Lesson 4) Summary: r Simple Audio Compression: m Lossy: Prediction based r Psychoacoustic Model r MPEG.
Representing Acoustic Information
CS :: Fall 2003 Audio Coding Ketan Mayer-Patel.
1/21 Chapter 5 – Signal Encoding and Modulation Techniques.
EE513 Audio Signals and Systems Digital Signal Processing (Systems) Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
LE 460 L Acoustics and Experimental Phonetics L-13
Digital Audio What do we mean by “digital”? How do we produce, process, and playback? Why is physics important? What are the limitations and possibilities?
Ni.com Data Analysis: Time and Frequency Domain. ni.com Typical Data Acquisition System.
DIGITAL VOICE NETWORKS ECE 421E Tuesday, October 02, 2012.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
Digital Audio Watermarking: Properties, characteristics of audio signals, and measuring the performance of a watermarking system نيما خادمي کلانتري
Lecture 1 Signals in the Time and Frequency Domains
COSC 3213 – Computer Networks I Summer 2003 Topics: 1. Line Coding (Digital Data, Digital Signals) 2. Digital Modulation (Digital Data, Analog Signals)
Data Communications & Computer Networks, Second Edition1 Chapter 2 Fundamentals of Data and Signals.
CSC361/661 Digital Media Spring 2002
Media Representations - Audio
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.
Concepts of Multimedia Processing and Transmission IT 481, Lecture #4 Dennis McCaughey, Ph.D. 25 September, 2006.
MPEG Audio coders. Motion Pictures Expert Group(MPEG) The coders associated with audio compression part of MPEG standard are called MPEG audio compressor.
© 2006 Cisco Systems, Inc. All rights reserved. Optimizing Converged Cisco Networks (ONT) Module 2: Cisco VoIP Implementations.
ECE 4710: Lecture #6 1 Bandlimited Signals  Bandlimited waveforms have non-zero spectral components only within a finite frequency range  Waveform is.
1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.
Submitted By: Santosh Kumar Yadav (111432) M.E. Modular(2011) Under the Supervision of: Mrs. Shano Solanki Assistant Professor, C.S.E NITTTR, Chandigarh.
Pre-Class Music Paul Lansky Six Fantasies on a Poem by Thomas Campion.
CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2009.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
1 Audio Coding. 2 Digitization Processing Signal encoder Signal decoder samplingquantization storage Analog signal Digital data.
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2014.
Voice Sampling. Sampling Rate Nyquist’s theorem states that a signal can be reconstructed if it is sampled at twice the maximum frequency of the signal.
1 What is Multimedia? Multimedia can have a many definitions Multimedia means that computer information can be represented through media types: – Text.
Lifecycle from Sound to Digital to Sound. Characteristics of Sound Amplitude Wavelength (w) Frequency ( ) Timbre Hearing: [20Hz – 20KHz] Speech: [200Hz.
Codec 2 ● open source speech codec ● low bit rate (2400 bit/s and below) ● applications include digital speech for HF and VHF radio ● fills gap in open.
COMPUTER NETWORKS and INTERNETS
Spectral Analysis Spectral analysis is concerned with the determination of the energy or power spectrum of a continuous-time signal It is assumed that.
ARTIFICIAL NEURAL NETWORKS
Vocoders.
Pulse Code Modulation (PCM)
EE Audio Signals and Systems
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Soutenance de thèse vendredi 24 novembre 2006, Lorient
MPEG-1 Overview of MPEG-1 Standard
Pulse Code Modulation (PCM)
Analog to Digital Encoding
Govt. Polytechnic Dhangar(Fatehabad)
CEN352, Dr. Ghulam Muhammad King Saud University
Presentation transcript:

A Full Frequency Masking Vocoder for Legal Eavesdropping Conversation Recording R. F. B. Sotero Filho, H. M. de Oliveira (qPGOM), R. Campello de Souza Signal Processing Group, Federal University of Pernambuco – UFPE

Abstract: New approach for a vocoder Based on: full frequency masking by octaves Useful to save bandwidth (applications requiring intelligibility) Recommended for: legal eavesdropping of long conversations.

Introduction Vocoder = contraction from voice encoder: waveform not recreate the original waveform in appearance, (but it should be perceptually similar to it) first described by Homer Dudley at Bell Telephone Laboratory in 1939 Parameters are extracted from the spectrum and updated every ms Properties of voice: limitation of the human auditory system physiology of the voice generation process

Psycho-Acoustics of the Human Auditory System Frequency Masking: Masking in frequency or "reduced audibility of a sound due to the presence of another" Insensitivity to the phase: The human ear has little sensitivity to the phase of signals

Simplification of the spectrum via frequency masking For each voice segment: FFT of blocklength 160 (frame of 20 ms) The spectrum is segmented into regions of influence (octaves). The range Hz is removed. 64 Hz-128 Hz, 128 Hz-512 Hz, and so on. Each spectral sample corresponds to a multiple of 50 Hz

Table 1. Number of spectral lines per octave (DFT of length N=160, sample rate 8 kHz) Octave (Hz)# spectral samples/octave A total of 79 frequencies (DFT with N=160) is reduced to 4 survivors! (holding less than 5% of the spectral components).

Figure 1. The spectrum of a voice frame computed by the FFT: a)Original spectrum b) Simplified full-masking spectrum This technique is called full frequency masking.

Signal synthesis via spectral filling The beta distribution is a probability distribution defined over 0≤x≤1, characterized by a pair of parameters α and β : P(x)=1/B(α,β) x (α-1) (1-x) (β-1), 1<α,β<+∞, whose normalized factor is B(α,β)=(Γ(α)Γ(β))/(Γ(α+β)), where Γ(.) is the generalized Euler factorial function and B(.,.) is the Beta function. Figure 2. Envelope shape of survivor tone different parameters α and .

By making the fitting: new mode = (α-1)/(α+β-2) (f M - f m )+ f m. upper limit is equivalent to the difference between the normalized cutoff frequency exceeding (f M ) and lower (f m ) of each octave, i.e., f M - f m. To fulfill the spectral algorithm each frame: P(x)= 1/( f M - f m ) (α+β-2) (x- f m ) (α-1) (f M -x) (β-1).

( – piece of speech from radio) A few audio files generated by this vocoder are available at the URL (vocoder with Hamming windowing) Figure 3. Full masking and spectral filling

Quantization and Coding of Speech Signals The maximum excursion of the full-spectrum was divided into 256 intervals of equal length, each represented by one byte. No negative samples to be quantized => the quantizer cannot be bipolar. Table 2. Bit allocation in a voice frame (20 ms). The required number of bits is expressed as A + P, where A is the number of bits for spectral line amplitude and P the number of bits to express the relative position within the octave. Relevant octave#possible survivor componentsBits A+P #1 ( Hz) #2 ( Hz) #3 ( Hz) #4 ( Hz)

Each voice frame needs 50 bits (18 for identifying positions and 32 for identifying masking tones), The vocoder rate is 50 bits/20 ms=2.5 kbps The binary format.voz The representation of a voice frame in this format (extension.voz): The 50 bits are distributed into four sub blocks, indicating the value of the spectral sample followed by its respective position in the spectrum. The voice files registered in the.wav format are converted to this binary format, by a Matlab routine.

Figure 4. Frame of files in the format.voz (20 ms). Table 3. MOS scores for the voice signals synthesized by four different techniques Vocoder techniqueMOS score Synthesized signals with no spectral filling3.0 Vocoder signals reconstructed via beta spectral filling technique2.5 Synthesized voice signals combining 1 and 2 techniques (linear)2.8 Voice signals from item 2, but with an extra Hamming windowing3.0 Intelligibility and voice quality versus bit rate Voice quality is estimated using the "Mean Opinion Score (MOS)"

Conclusions New vocoder: voice signal using fewer samples of the spectrum. Voice (acceptable quality) at a rate of a few kbits/s. A new technique of spectral filling: not helpful in improving the voice quality, but naturalness APPLICATIONS maintenance voice channels in large plants speaker recognition system monitoring voice conversation from authorized eavesdropping THAT’S ALL FOLKS! TKS.

Pre-signal processing Shannon sampling theorem (a signal band limited to f m Hz is sampled at a rate of at least 2f m equally spaced samples per second). LPF. Voice Segmentation and Windowing partition of the speech signal into pieces (stationary frames): (~ ms). Hamming window chosen due to softness at the edges. Pre-emphasis -6dB/octave, radiated from the lips during speech. This spectral distortion can be eliminated by applying a filter response approximately +6 dB/octave y(n)= x(n)-a.x(n-1), for 1 ≤ n < M, where M is the number of samples of x(n), y(n) is the emphasized signal and the constant "a" is normally set 0.95.