Y(J) Stein VoP4 1 VOPVOP YJS Other Features
Y(J) Stein VoP4 2 VOPVOP YJS Echo Cancellation
Y(J) Stein VoP4 3 VOPVOP YJS Acoustic Echo Ecan
Y(J) Stein VoP4 4 VOPVOP YJS Line echo Telephone 1 Telephone 2 hybrid Ecan
Y(J) Stein VoP4 5 VOPVOP YJS Subjective reaction to echo Required suppression (dB)Round-Trip Delay (ms) Ecan
Y(J) Stein VoP4 6 VOPVOP YJS Ecan
Y(J) Stein VoP4 7 VOPVOP YJS Subjective effect of 15 dB echo returns loss. Percent DifficultyDecrease in MOSRound-trip Delay (ms) Ecan
Y(J) Stein VoP4 8 VOPVOP YJS Echo suppressor comp switch inv 4w In practice need more: VOX, over-ride, reset, etc. Ecan
Y(J) Stein VoP4 9 VOPVOP YJS Why not echo suppresion? Echo suppression makes conversation half duplex –Waste of full-duplex infrastructure –Conversation unnatural –Hard to break in –Dead sounding line It would be better to cancel the echo subtract the echo signal allowing desired signal through but that requires DSP. near end - far end Ecan
Y(J) Stein VoP4 10 VOPVOP YJS Echo cancellation? Unfortunately, it’s not so easy Outgoing signal is delayed, attenuated, distorted Two echo canceller architectures: MODEM TYPE LINE ECHO CANCELLER (LEC) near end far end - clean echo path clean near end far end - echo path clean Ecan
Y(J) Stein VoP4 11 VOPVOP YJS LEC architecture A/D hybridhybrid D/A near end doubletalk detector adapt - NLP far end filter H X Y Ecan
Y(J) Stein VoP4 12 VOPVOP YJS Adaptive Algorithms How do we find the echo cancelling filter? keep it correct even if the echo path parameters change? Need an algorithm that continually changes the filter parameters All adaptive algorithms are based on the same ideas (lack of corellation between desired signal and interference) Let’s start with a simpler case - adaptive noise cancellation Ecan
Y(J) Stein VoP4 13 VOPVOP YJS Noise cancellation h nx y x n - h y e e n Ecan
Y(J) Stein VoP4 14 VOPVOP YJS Noise cancellation - cont. Assume that noise is distorted only by unknown gain h We correct by transmitting e n so that the audience hears y = x + h n - e n = x + (h-e) n the energy of this signal is E y y 2 = x 2 + (h-e) 2 n 2 + 2 (h-e) x n Assume that C xn = x n We need only set e to minimize E y ! (turn knob until minimal) Even if the distortion is a complete filter h we set the ANC filter e to minimize E y Ecan
Y(J) Stein VoP4 15 VOPVOP YJS The LMS algorithm Gradient descent on energy correction to H is proportional to error times input X H H + X Ecan
Y(J) Stein VoP4 16 VOPVOP YJS Nonlinear processing Because of finite numeric precision the LEC (linear) filtering can not completely remove echo Standard LEC adds center clipping to remove residual echo Clipping threshold needs to be properly set by adaptation Ecan
Y(J) Stein VoP4 17 VOPVOP YJS Doubletalk detection Adaptation of H should take place only when far end speaks So we freeze adaptation when no far end or double-talk, that is whenever near end speaks Geigel algorithm compares absolute value of near-end speech to half the maximum absolute value in X buffer If near-end exceeds far-end can assume only near-end is speaking Ecan
Y(J) Stein VoP4 18 VOPVOP YJS Data Relays
Y(J) Stein VoP4 19 VOPVOP YJS The need for relays Voice is a relatively forgiving signal (rather the ear is) Compression techniques are designed to pass voice but may hopelessly distort other signals Even simple tones (or DTMF) may not be passed by coders We could go back to 64Kbps G.711 for non-voice signals But isn’t that silly? Using 64Kbps for 64bps or even 9.6Kbps data? The solution is to use a relay Relays
Y(J) Stein VoP4 20 VOPVOP YJS Open Channel Reasons to use 64Kbps G.711 (open channel) (32 Kbps ADPCM may work as well) Inexpensive Simple design Robust Even open channel is not trivial! Need dynamic BW mechanism Need to detect the event (fax/modem tone, DTMF, MF, CPT, etc.) Need to return to compressed voice (end of session, time-out)
Y(J) Stein VoP4 21 VOPVOP YJS Tone / Fax / Modem Relay A/D D/A Demodulate/ Remodulate Analog 64 Kbps Demodulate/ Remodulate 64 Kbps A/D D/A Analog Relays Fax PSN Problems: need highly accurate detectors need low false alarm rate need appropriate protocol need accurate timing need expensive DSP processing delay may be too large may need “spoofing” can sides operate with different parameters?
Y(J) Stein VoP4 22 VOPVOP YJS VoP DSP Architecture Multi Channel Codec Speech Coders Tone Detector Packet Voice Protocol Playout Unit Control Real Time Operating System VAD CNG DISC. PCM Interface Tone Generator Serial Port Voice Packet Module LEC Relays PSN
Y(J) Stein VoP4 23 VOPVOP YJS DSP VoP System Implementation Telephony Signaling Module Microprocessor Voice Packet Module Microprocessor Voice Signaling Packet Protocol Module Network Management Module NM info Voice & Signaling Packets ATM / FR / IP Network PSTN Relays
Y(J) Stein VoP4 24 VOPVOP YJS Quality of Service
Y(J) Stein VoP4 25 VOPVOP YJS The meaning of QoS For general purpose data: Every little bit counts –only lossless compression –best effort delivery Real-time not essential –dynamic routing and packet reordering allowed For speech: Only subjective quality counts –Can use lossy compression –Can drop segments with little effect Real-time essential –predetermined route preferable (traffic engineering) QoS
Y(J) Stein VoP4 26 VOPVOP YJS PSTN QoS Virtually all calls (>95%) completed Once connected virtually no disconnects or faults Toll quality voice Low delay (except satellite calls) Full switching, optimized routing Call Management Fax/Modem functions Wireline and wireless services QoS
Y(J) Stein VoP4 27 VOPVOP YJS Paying for QoS Law of Photonics Price of transmitting a bit drops by half every 9 months Free Internet telephony Several firms offering free long distance service over Internet Strong compression, significant delay and jitter We no longer need to pay for service … but we are willing to pay for quality of service QoS
Y(J) Stein VoP4 28 VOPVOP YJS Paying for QoS wire service mobile service toll QoS
Y(J) Stein VoP4 29 VOPVOP YJS Speech Quality Measurement
Y(J) Stein VoP4 30 VOPVOP YJS Why does it sound the way it sounds? PSTN BW= KHz, SNR>30 dB PCM, ADPCM (BER ) five nines reliability line echo cancellation Voice over packet network speech compression delay, delay variation, jitter packet loss/corruption/priority echo cancellation SQM
Y(J) Stein VoP4 31 VOPVOP YJS Subjective Voice Quality Old Measures 5/9 DRT DAM The modern scale MOS DMOS meet neat seat feet Pete beat heat SQM
Y(J) Stein VoP4 32 VOPVOP YJS MOS according to ITU P.800 Subjective Determination of Transmission Quality Annex B: Absolute Category Rating (ACR) Listening Quality Listening Effort 5 excellent relaxed 4 good attention needed 3 fair moderate effort 2 poor considerable effort 1 bad no meaning with feasible effort SQM
Y(J) Stein VoP4 33 VOPVOP YJS MOS according to ITU (cont) Annex D Degradation Category Rating (DCR) Annex E Comparison Category Rating (CCR) ACR not good at high quality speech DCR CCR 5 inaudible 4 not annoying 3 slightly annoying much better 2 annoying better 1 very annoying slightly better 0 the same -1 slightly worse -2 worse -3 much worse SQM
Y(J) Stein VoP4 34 VOPVOP YJS Some MOS numbers Effect of Speech Compression: (from ITU-T Study Group 15) Quiet room 48 KHz 16 bit linear sampling 5.0 PCM (A-law/ law) 64 Kb/s Kb/s Kb/s 3.9 ADPCM G Kb/s 3.8 toll quality 13Kb/s 3.6 VSELP 8Kb/s 3.4 SQM
Y(J) Stein VoP4 35 VOPVOP YJS The Problem(s) with MOS Accurate MOS tests are the only reliable benchmark BUT MOS tests are off-line MOS tests are slow MOS tests are expensive Different labs give consistently different results Most MOS tests only check one aspect of system SQM
Y(J) Stein VoP4 36 VOPVOP YJS The Problem(s) with SNR Naive question: Isn’t CCR the same as SNR? SNR does not correlate well with subjective criteria Squared difference is not an accurate comparator Gain Delay Phase Nonlinear processing SQM
Y(J) Stein VoP4 37 VOPVOP YJS Speech distance measures Many objective measures have been proposed: Segmental SNR Itakura Saito distance Euclidean distance in Cepstrum space Bark spectral distortion Coherence Function None correlate well with MOS ITU target - find a quality-measure that does correlate well SQM
Y(J) Stein VoP4 38 VOPVOP YJS Return to Biology Standard speech model (LPC) (used by most speech processing/compression/recognition systems) is a model of speech production Unfortunately, speech production and perception systems are not matched Speech quality measurement idea: use a models of human auditory system (perception) ITU-T P.861 Perceptual Speech Quality Measurement (PSQM) ITU-T P.862 Perceptual Evaluation of Speech Quality (PESQ) ITU-R BS1387 Objective Measurements of Perceived Audio Quality SQM
Y(J) Stein VoP4 39 VOPVOP YJS Some objective methods Perceptual Speech Quality Measurement (PSQM) ITU-T P.861 Perceptual Analysis Measurement System (PAMS) BT proprietary technique Perceptual Evaluation of Speech Quality (PESQ) ITU-T P.862 Objective Measurement of Perceived Audio Quality (PAQM) ITU-R BS.1387 E-model ITU-T G.107, G.108 ETSI ETR-250 SQM
Y(J) Stein VoP4 40 VOPVOP YJS Objective Quality Strategy speech MOS estimate channel QM to MOS SQM
Y(J) Stein VoP4 41 VOPVOP YJS PSQM philosophy (from P.861) Perceptual model Perceptual model Internal Representation Internal Representation Audible Difference Cognitive Model SQM
Y(J) Stein VoP4 42 VOPVOP YJS PSQM philosophy (cont) Perceptual Modelling (Internal representation) Short time Fourier transform Frequency warping (telephone-band filtering, Hoth noise) Intensity warping Cognitive Modelling Loudness scaling Internal cognitive noise Asymmetry Silent interval processing PSQM Values 0 (no degradation) to 6.5 (maximum degradation) Conversion to MOS PSQM to MOS calibration using known references Equivalent Q values SQM
Y(J) Stein VoP4 43 VOPVOP YJS Problems with PSQM Designed for telephony grade speech codecs Doesn’t take network effects into account: filtering variable time delay localized distortions Draft standard P.862 adds: transfer function equalization time alignment, delay skipping distortion averaging SQM
Y(J) Stein VoP4 44 VOPVOP YJS PESQ philosophy (from P.862) Perceptual model Perceptual model Internal Representation Internal Representation Audible Difference Cognitive Model Time Alignment SQM
Y(J) Stein VoP4 45 VOPVOP YJS E-model R factor mouth to ear transmission quality model R = R 0 - I s - I d - I e + A where R 0 effect of SNR I s effect of simultaneous impairments I d effect of delayed impairments I e effect of equipment distortion A advantage of method (e.g. mobility of cellphone) Defined in ITU-T G.107, G.108 and ETSI ETR-250 SQM
Y(J) Stein VoP4 46 VOPVOP YJS VQMon PSQM and PESQ are intrusive techniques PSQM and PESQ require on-line DSP processing Given the speech encoder shouldn’t there be a connection between network parameters e.g. packet loss, jitter and speech quality? A nonintrusive technique has been developed based on the E-model Invented by AD Clark (Telchemy) accepted by ETSI TIPHON SQM