Mobile Systems Workshop 1 Narrow band speech coding for mobile phones

Slides:



Advertisements
Similar presentations
Speech Coding Techniques
Advertisements

Tamara Berg Advanced Multimedia
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.
Ranko Pinter Simoco Digital Systems
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
The Human Voice Chapters 15 and 17. Main Vocal Organs Lungs Reservoir and energy source Larynx Vocal folds Cavities: pharynx, nasal, oral Air exits through.
A 12-WEEK PROJECT IN Speech Coding and Recognition by Fu-Tien Hsiao and Vedrana Andersen.
Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
PH 105 Dr. Cecilia Vogel Lecture 14. OUTLINE  consonants  vowels  vocal folds as sound source  formants  speech spectrograms  singing.
1 Frequency Domain Analysis/Synthesis Concerned with the reproduction of the frequency spectrum within the speech waveform Less concern with amplitude.
CELLULAR COMMUNICATIONS 5. Speech Coding. Low Bit-rate Voice Coding  Voice is an analogue signal  Needed to be transformed in a digital form (bits)
Speech Coding Nicola Orio Dipartimento di Ingegneria dell’Informazione IV Scuola estiva AISV, 8-12 settembre 2008.
Multimedia communications EG-371Dr Matt Roach Multimedia Communications EG 371 and EG 348 Dr Matthew Roach Lecture 2 Digital.
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
COMP 249 :: Spring 2005 Slide: 1 Audio Coding Ketan Mayer-Patel.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Waveform SpeechCoding Algorithms: An Overview
1 Audio Compression Multimedia Systems (Module 4 Lesson 4) Summary: r Simple Audio Compression: m Lossy: Prediction based r Psychoacoustic Model r MPEG.
CS :: Fall 2003 Audio Coding Ketan Mayer-Patel.
Speech coding. What’s the need for speech coding ? Necessary in order to represent human speech in a digital form Applications: mobile/telephone communication,
LE 460 L Acoustics and Experimental Phonetics L-13
GODIAN MABINDAH RUTHERFORD UNUSI RICHARD MWANGI.  Differential coding operates by making numbers small. This is a major goal in compression technology:
Speech Coding PCM DPCM ADPCM LPC CELP A road map Page 1 of 30
LECTURE Copyright  1998, Texas Instruments Incorporated All Rights Reserved Encoding of Waveforms Encoding of Waveforms to Compress Information.
AUDIO COMPRESSION msccomputerscience.com. The process of digitizing audio signals is called PCM PCM involves sampling audio signal at minimum rate which.
Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.
CE Digital Signal Processing Fall 1992 Waveform Coding Hossein Sameti Department of Computer Engineering Sharif University of Technology.
1 PCM & DPCM & DM. 2 Pulse-Code Modulation (PCM) : In PCM each sample of the signal is quantized to one of the amplitude levels, where B is the number.
1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.
Compression No. 1  Seattle Pacific University Data Compression Kevin Bolding Electrical Engineering Seattle Pacific University.
1 Speech Synthesis User friendly machine must have complete voice communication abilities Voice communication involves Speech synthesis Speech recognition.
Submitted By: Santosh Kumar Yadav (111432) M.E. Modular(2011) Under the Supervision of: Mrs. Shano Solanki Assistant Professor, C.S.E NITTTR, Chandigarh.
CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2009.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
ITU-T G.729 EE8873 Rungsun Munkong March 22, 2004.
A Comparison Of Speech Coding With Linear Predictive Coding (LPC) And Code-Excited Linear Predictor Coding (CELP) By: Kendall Khodra Instructor: Dr. Kepuska.
1 Audio Coding. 2 Digitization Processing Signal encoder Signal decoder samplingquantization storage Analog signal Digital data.
Introduction to Digital Speech Processing Presented by Dr. Allam Mousa 1 An Najah National University SP_1_intro.
SPEECH CODING Maryam Zebarjad Alessandro Chiumento Supervisor : Sylwester Szczpaniak.
EE Audio Signals and Systems Speech Production Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Encoding How is information represented?. Way of looking at techniques Data Medium Digital Analog Digital Analog NRZ Manchester Differential Manchester.
Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2014.
Voice Sampling. Sampling Rate Nyquist’s theorem states that a signal can be reconstructed if it is sampled at twice the maximum frequency of the signal.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
CELP / FS-1016 – 4.8kbps Federal Standard in Voice Coding
Institut für Nachrichtengeräte und Datenverarbeitung Prof. Dr.-Ing. P. Vary On the Use of Artificial Bandwidth Extension Techniques in Wideband Speech.
Fundamentals of Multimedia Chapter 6 Basics of Digital Audio Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
1 Speech Compression (after first coding) By Allam Mousa Department of Telecommunication Engineering An Najah University SP_3_Compression.
Lifecycle from Sound to Digital to Sound. Characteristics of Sound Amplitude Wavelength (w) Frequency ( ) Timbre Hearing: [20Hz – 20KHz] Speech: [200Hz.
Intro. to Audio Signals Jyh-Shing Roger Jang (張智星)
The Human Voice. 1. The vocal organs
Digital Communications Chapter 13. Source Coding
Vocoders.
Intro. to Audio Signals Jyh-Shing Roger Jang (張智星)
Chapter 13 Basic Audio Compression Techniques
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Intro. to Audio Signals Jyh-Shing Roger Jang (張智星)
Linear Predictive Coding Methods
Vocoders.
PCM & DPCM & DM.
Speech coding.
EE Audio Signals and Systems
Presentation transcript:

Mobile Systems Workshop 1 Narrow band speech coding for mobile phones 28/11/2018 Mobile Systems Workshop 1 Narrow band speech coding for mobile phones COMP28512 Steve Furber & Barry Cheetham Workshop1 COMP28512 COMP28512 Workshop 1

Speech coding for mobile phones 64 kbit/s too high for mobile phones. Originally, they used 24.7 kbit/s , but this included extra bits for correcting bit-errors. Bit-rate available for narrow-band speech  13 kbit/s. More recently, AMR speech coder is used. Encodes narrowband speech at bit-rates ranging from 4.75 to 12.2 kbit/s. ‘Toll’ quality speech is achieved at 12.2 kbit/s. How? AMR file recorded on a mobile phone (10s, 16kBytes) Converted to wav. Workshop1 COMP28512

The problem If Fs = 8 Hz, only have 1.5 bits per sample at 12 kb/s. If we reduce Fs to 4 kHz, can have 3 bits/sample But we must filter off all sound above 2 kHz. Speech will sound ‘muffled’ And 3 bits/sample is still not enough for uniform quantisation with reasonable accuracy. Operator_long_orig 8kHz uniform 12 bits/sample Operator_short_orig 8kHz unif 12 bits/sample Low-pass filtered (<2 kHz) Low-pass filtered (<1kHz) LP filtered (<1kHz) Lowpass filtered <2kHz) Workshop1 COMP28512

An idea: differential encoding Encode differences between samples: Delay by 1 sample Quantiser s[n] e[n] Transmit or store s[n - 1] If successive samples are close together, differences e[n] will be smaller & easier to encode. Can get back to orig speech by adding differences Works at 32 & 16 bkb/s with a few tricks Known as G726-ADPCM (adaptive differential PCM) Still not good enough for mobile phones Workshop1 COMP28512

Waveform & parametric speech coders So far, all our ideas have tried to encode the shape of the signal (its ‘waveform’) by sampling it. These are ‘waveform coders’ Simple, but no good at very low bit-rates. ‘Parametric’ coders try to model the human speech production process: the vibration of the vocal cords, and the shape of the ‘vocal tract’ (mouth, lips etc) These do not change as quickly as the wave-form So can be sampled more slowly without losing accuracy. Workshop1 COMP28512

Human speech There are two types of human speech: - voiced (vowels) & unvoiced (consonants) For vowels, vocal cords to open & close periodically. Causes pressure variation (a ‘buzzing’ sound). Modified (filtered) by vocal tract to produce vowel sound. Air from lungs Nose Mouth Tongue Voiced speech waveform F0=1/P Hz Time P Pressure variation created by vocal cords (buzz) P Time Pressure Vocal cords Workshop1 COMP28512

Human speech: unvoiced Vocal cords held open – no vibration. Constriction in air-flow creates ‘turbulence’. Turbulent flow is chaotic & random – sounds like white noise Air from lungs Vocal cords Nose Mouth Tongue waveform for consonant ‘ss’ t Pressure variation at constriction t Workshop1 COMP28512

Model of human speech production Noise generator Digital filter Amplifier Switch Vocal tract model Impulse sequence generator Speech output Fundamental freq Voiced or UnV Gain Coefficients Workshop1 COMP28512

Linear predictive speech coding (LPC) Similar model implemented in all mobile phones. Sender & receiver Sender derives parameters every 20 ms (50 times/s): Coeffs of a digital filter which models effect of the vocal tract How loud it the speech is Whether it is voiced or unvoiced If voiced, measures the fundamental frequency These parameters are sent to represent the speech. Receiver reconstructs speech from these parameters. Workshop1 COMP28512

LPC-10 2400 b/s coder once widely used in military comms. Not used in mobile phones Encodes 20 ms speech frames by deriving: 10 digital filter coeffs by LPC analysis, unvoiced/voiced decision (1 bit) gain (or amplitude): a single number: 8 bits say fundamental frequency: a single number: 8 bits say) Each frame has 48 bits which affords 37 bits for the 10 digital filter coeffs. Quite simple to understand and implement, But its quality is far from good (run & listen to demo) Workshop1 COMP28512

LPC10 demo Simple LPC10 encoder & decoder written in MATLAB & Python. 28/11/2018 Simple LPC10 encoder & decoder written in MATLAB & Python. Encoder derives filter coeffs & vocal tract excitation for 20ms segments. Voiced’ & ‘excitation freq’ determined by very simple method Stores parameters in a file Decoder implements the model uses parameters read from file Run the encoder, but don’t worry abt how it works yet. Then run the decoder. Can modify the code to investigate: what happens if it is forced to be voiced with constant pitch-period? what happens if the speech is always forced to be unvoiced? Orig: modoperamp.wav LPC10SpeechNPP100 LPC10SpeechNPP25 LPC10SpeechWisp LPC10SpeechNPP50 LPC10SpeechPP Workshop1 COMP28512 COMP28512 Workshop 1

Voiced & Unvoiced speech Voiced speech (vowels): - Resonances (formants) which change as person speaks. - These determine the vowel sounds. Unvoiced speech (consonants): - Vocal cords do not vibrate. - Turbulent air-flow produces “hissing” sound. - Vocal tract excitation is random noise-like signal. Workshop1 COMP28512

Codebook excited LPC (CELP) Really used in mobile phones Instead of having fixed noise & impulse generators, CELP model has a selection of different ones stored in a code-book. Sender tells it which one to use for each frame. Just needs to send a code-book index. Sender has a copy of the code-book & uses ‘analysis-by synthesis’ to find the best excitation to use. Tries them one-by one & compares what they produce with the original speech. Workshop1 COMP28512

LPC10, CELP & AMR LPC-10 coder at 2400 b/s used in military comms. CELP is used in mobile telephony (13 kbit/s & lower). Adaptive multi-rate (AMR) coder uses CELP at various bit-rates: 4.75 … 7.4 , 7.95, 10.2, 12.2 kbit/s Uses vector (code-book) quantisation. Better quality than LPC10. Workshop1 COMP28512

Waveform & parametric coding. Waveform coding techniques such as PCM & ADPCM try to preserve exact shape of waveform as far as possible. Simple to understand & implement, but cannot achieve very low bit-rates. Parametric techniques such as LPC10 & CELP do not aim to preserve exact wave-shape. Instead they represent features expected to be perceptually significant by sets of parameters, i.e. by filter coeffs & params of excitation signal. Parametric more complicated to understand & implement than waveform coding, but achieves lower bit-rates. Workshop1 COMP28512

Laboratory (Reminder) Task 1 ends in Week 2 Submit to Blackboard a Jupyter notebook containing: Your Python code Documentation for the code Using ‘markdown’, answers to questions & summary of results for each part of the Task. Headings for each part (i.e. clear presentation) (Not many marks for just code without clear documentation) Deadline is Friday of Week 3 at 18:00 pm Please book a slot during the lab session to: demonstrate your code answer a few questions Task 2 will be available on Monday 12 Feb 2018 Workshop1 COMP28512

Summary Differential coding & ADPCM. 28/11/2018 Summary Differential coding & ADPCM. Modelling human speech mechanism . Concept of linear prediction coding (LPC) Closely based on characteristics of human voice. LPC10 illustrated. CELP (as used in mobile telephony). Waveform & parametric speech coding. Workshop1 COMP28512 COMP28512 Workshop 1