Presentation is loading. Please wait.

Presentation is loading. Please wait.

Y(J) Stein VoP4 1 VOPVOP YJS Other Features. Y(J) Stein VoP4 2 VOPVOP YJS Echo Cancellation.

Similar presentations


Presentation on theme: "Y(J) Stein VoP4 1 VOPVOP YJS Other Features. Y(J) Stein VoP4 2 VOPVOP YJS Echo Cancellation."— Presentation transcript:

1 Y(J) Stein VoP4 1 VOPVOP YJS Other Features

2 Y(J) Stein VoP4 2 VOPVOP YJS Echo Cancellation

3 Y(J) Stein VoP4 3 VOPVOP YJS Acoustic Echo Ecan

4 Y(J) Stein VoP4 4 VOPVOP YJS Line echo Telephone 1 Telephone 2 hybrid Ecan

5 Y(J) Stein VoP4 5 VOPVOP YJS Subjective reaction to echo Required suppression (dB)Round-Trip Delay (ms) 1.40 11.120 17.740 22.760 27.280 30.9100 Ecan

6 Y(J) Stein VoP4 6 VOPVOP YJS Ecan

7 Y(J) Stein VoP4 7 VOPVOP YJS Subjective effect of 15 dB echo returns loss. Percent DifficultyDecrease in MOSRound-trip Delay (ms) 000 301.3300 602.0600 602.01200 Ecan

8 Y(J) Stein VoP4 8 VOPVOP YJS Echo suppressor comp switch inv 4w In practice need more: VOX, over-ride, reset, etc. Ecan

9 Y(J) Stein VoP4 9 VOPVOP YJS Why not echo suppresion? Echo suppression makes conversation half duplex –Waste of full-duplex infrastructure –Conversation unnatural –Hard to break in –Dead sounding line It would be better to cancel the echo subtract the echo signal allowing desired signal through but that requires DSP. near end - far end Ecan

10 Y(J) Stein VoP4 10 VOPVOP YJS Echo cancellation? Unfortunately, it’s not so easy Outgoing signal is delayed, attenuated, distorted Two echo canceller architectures: MODEM TYPE LINE ECHO CANCELLER (LEC) near end far end - clean echo path clean near end far end - echo path clean Ecan

11 Y(J) Stein VoP4 11 VOPVOP YJS LEC architecture A/D hybridhybrid D/A near end doubletalk detector adapt - NLP far end filter H X Y Ecan

12 Y(J) Stein VoP4 12 VOPVOP YJS Adaptive Algorithms How do we find the echo cancelling filter? keep it correct even if the echo path parameters change? Need an algorithm that continually changes the filter parameters All adaptive algorithms are based on the same ideas (lack of corellation between desired signal and interference) Let’s start with a simpler case - adaptive noise cancellation Ecan

13 Y(J) Stein VoP4 13 VOPVOP YJS Noise cancellation h nx y x n - h y e e n Ecan

14 Y(J) Stein VoP4 14 VOPVOP YJS Noise cancellation - cont. Assume that noise is distorted only by unknown gain h We correct by transmitting e n so that the audience hears y = x + h n - e n = x + (h-e) n the energy of this signal is E y  y 2  =  x 2  + (h-e) 2  n 2  + 2  (h-e)  x n  Assume that C xn =  x n  We need only set e to minimize E y ! (turn knob until minimal) Even if the distortion is a complete filter h we set the ANC filter e to minimize E y Ecan

15 Y(J) Stein VoP4 15 VOPVOP YJS The LMS algorithm Gradient descent on energy correction to H is proportional to error  times input X H H +  X Ecan

16 Y(J) Stein VoP4 16 VOPVOP YJS Nonlinear processing Because of finite numeric precision the LEC (linear) filtering can not completely remove echo Standard LEC adds center clipping to remove residual echo Clipping threshold needs to be properly set by adaptation Ecan

17 Y(J) Stein VoP4 17 VOPVOP YJS Doubletalk detection Adaptation of H should take place only when far end speaks So we freeze adaptation when no far end or double-talk, that is whenever near end speaks Geigel algorithm compares absolute value of near-end speech to half the maximum absolute value in X buffer If near-end exceeds far-end can assume only near-end is speaking Ecan

18 Y(J) Stein VoP4 18 VOPVOP YJS Data Relays

19 Y(J) Stein VoP4 19 VOPVOP YJS The need for relays Voice is a relatively forgiving signal (rather the ear is) Compression techniques are designed to pass voice but may hopelessly distort other signals Even simple tones (or DTMF) may not be passed by coders We could go back to 64Kbps G.711 for non-voice signals But isn’t that silly? Using 64Kbps for 64bps or even 9.6Kbps data? The solution is to use a relay Relays

20 Y(J) Stein VoP4 20 VOPVOP YJS Open Channel Reasons to use 64Kbps G.711 (open channel) (32 Kbps ADPCM may work as well) Inexpensive Simple design Robust Even open channel is not trivial! Need dynamic BW mechanism Need to detect the event (fax/modem tone, DTMF, MF, CPT, etc.) Need to return to compressed voice (end of session, time-out)

21 Y(J) Stein VoP4 21 VOPVOP YJS Tone / Fax / Modem Relay A/D D/A Demodulate/ Remodulate Analog 64 Kbps Demodulate/ Remodulate 64 Kbps A/D D/A Analog Relays Fax PSN Problems: need highly accurate detectors need low false alarm rate need appropriate protocol need accurate timing need expensive DSP processing delay may be too large may need “spoofing” can sides operate with different parameters?

22 Y(J) Stein VoP4 22 VOPVOP YJS VoP DSP Architecture Multi Channel Codec Speech Coders Tone Detector Packet Voice Protocol Playout Unit Control Real Time Operating System VAD CNG DISC. PCM Interface Tone Generator Serial Port Voice Packet Module LEC Relays PSN

23 Y(J) Stein VoP4 23 VOPVOP YJS DSP VoP System Implementation Telephony Signaling Module Microprocessor Voice Packet Module Microprocessor Voice Signaling Packet Protocol Module Network Management Module NM info Voice & Signaling Packets ATM / FR / IP Network PSTN Relays

24 Y(J) Stein VoP4 24 VOPVOP YJS Quality of Service

25 Y(J) Stein VoP4 25 VOPVOP YJS The meaning of QoS For general purpose data: Every little bit counts –only lossless compression –best effort delivery Real-time not essential –dynamic routing and packet reordering allowed For speech: Only subjective quality counts –Can use lossy compression –Can drop segments with little effect Real-time essential –predetermined route preferable (traffic engineering) QoS

26 Y(J) Stein VoP4 26 VOPVOP YJS PSTN QoS Virtually all calls (>95%) completed Once connected virtually no disconnects or faults Toll quality voice Low delay (except satellite calls) Full switching, optimized routing Call Management Fax/Modem functions Wireline and wireless services QoS

27 Y(J) Stein VoP4 27 VOPVOP YJS Paying for QoS Law of Photonics Price of transmitting a bit drops by half every 9 months Free Internet telephony Several firms offering free long distance service over Internet Strong compression, significant delay and jitter We no longer need to pay for service … but we are willing to pay for quality of service QoS

28 Y(J) Stein VoP4 28 VOPVOP YJS Paying for QoS wire service mobile service toll QoS

29 Y(J) Stein VoP4 29 VOPVOP YJS Speech Quality Measurement

30 Y(J) Stein VoP4 30 VOPVOP YJS Why does it sound the way it sounds? PSTN BW=0.2-3.8 KHz, SNR>30 dB PCM, ADPCM (BER 10 -3 ) five nines reliability line echo cancellation Voice over packet network speech compression delay, delay variation, jitter packet loss/corruption/priority echo cancellation SQM

31 Y(J) Stein VoP4 31 VOPVOP YJS Subjective Voice Quality Old Measures 5/9 DRT DAM The modern scale MOS DMOS meet neat seat feet Pete beat heat SQM

32 Y(J) Stein VoP4 32 VOPVOP YJS MOS according to ITU P.800 Subjective Determination of Transmission Quality Annex B: Absolute Category Rating (ACR) Listening Quality Listening Effort 5 excellent relaxed 4 good attention needed 3 fair moderate effort 2 poor considerable effort 1 bad no meaning with feasible effort SQM

33 Y(J) Stein VoP4 33 VOPVOP YJS MOS according to ITU (cont) Annex D Degradation Category Rating (DCR) Annex E Comparison Category Rating (CCR) ACR not good at high quality speech DCR CCR 5 inaudible 4 not annoying 3 slightly annoying much better 2 annoying better 1 very annoying slightly better 0 the same -1 slightly worse -2 worse -3 much worse SQM

34 Y(J) Stein VoP4 34 VOPVOP YJS Some MOS numbers Effect of Speech Compression: (from ITU-T Study Group 15) Quiet room 48 KHz 16 bit linear sampling 5.0 PCM (A-law/  law) 64 Kb/s 4.1 G.723.1 @ 6.3 Kb/s 3.9 G.729 @ 8 Kb/s 3.9 ADPCM G.726 32 Kb/s 3.8 toll quality GSM @ 13Kb/s 3.6 VSELP IS54 @ 8Kb/s 3.4 SQM

35 Y(J) Stein VoP4 35 VOPVOP YJS The Problem(s) with MOS Accurate MOS tests are the only reliable benchmark BUT MOS tests are off-line MOS tests are slow MOS tests are expensive Different labs give consistently different results Most MOS tests only check one aspect of system SQM

36 Y(J) Stein VoP4 36 VOPVOP YJS The Problem(s) with SNR Naive question: Isn’t CCR the same as SNR? SNR does not correlate well with subjective criteria Squared difference is not an accurate comparator Gain Delay Phase Nonlinear processing SQM

37 Y(J) Stein VoP4 37 VOPVOP YJS Speech distance measures Many objective measures have been proposed: Segmental SNR Itakura Saito distance Euclidean distance in Cepstrum space Bark spectral distortion Coherence Function None correlate well with MOS ITU target - find a quality-measure that does correlate well SQM

38 Y(J) Stein VoP4 38 VOPVOP YJS Return to Biology Standard speech model (LPC) (used by most speech processing/compression/recognition systems) is a model of speech production Unfortunately, speech production and perception systems are not matched Speech quality measurement idea: use a models of human auditory system (perception) ITU-T P.861 Perceptual Speech Quality Measurement (PSQM) ITU-T P.862 Perceptual Evaluation of Speech Quality (PESQ) ITU-R BS1387 Objective Measurements of Perceived Audio Quality SQM

39 Y(J) Stein VoP4 39 VOPVOP YJS Some objective methods Perceptual Speech Quality Measurement (PSQM) ITU-T P.861 Perceptual Analysis Measurement System (PAMS) BT proprietary technique Perceptual Evaluation of Speech Quality (PESQ) ITU-T P.862 Objective Measurement of Perceived Audio Quality (PAQM) ITU-R BS.1387 E-model ITU-T G.107, G.108 ETSI ETR-250 SQM

40 Y(J) Stein VoP4 40 VOPVOP YJS Objective Quality Strategy speech MOS estimate channel QM to MOS SQM

41 Y(J) Stein VoP4 41 VOPVOP YJS PSQM philosophy (from P.861) Perceptual model Perceptual model Internal Representation Internal Representation Audible Difference Cognitive Model SQM

42 Y(J) Stein VoP4 42 VOPVOP YJS PSQM philosophy (cont) Perceptual Modelling (Internal representation) Short time Fourier transform Frequency warping (telephone-band filtering, Hoth noise) Intensity warping Cognitive Modelling Loudness scaling Internal cognitive noise Asymmetry Silent interval processing PSQM Values 0 (no degradation) to 6.5 (maximum degradation) Conversion to MOS PSQM to MOS calibration using known references Equivalent Q values SQM

43 Y(J) Stein VoP4 43 VOPVOP YJS Problems with PSQM Designed for telephony grade speech codecs Doesn’t take network effects into account: filtering variable time delay localized distortions Draft standard P.862 adds: transfer function equalization time alignment, delay skipping distortion averaging SQM

44 Y(J) Stein VoP4 44 VOPVOP YJS PESQ philosophy (from P.862) Perceptual model Perceptual model Internal Representation Internal Representation Audible Difference Cognitive Model Time Alignment SQM

45 Y(J) Stein VoP4 45 VOPVOP YJS E-model R factor mouth to ear transmission quality model R = R 0 - I s - I d - I e + A where R 0 effect of SNR I s effect of simultaneous impairments I d effect of delayed impairments I e effect of equipment distortion A advantage of method (e.g. mobility of cellphone) Defined in ITU-T G.107, G.108 and ETSI ETR-250 SQM

46 Y(J) Stein VoP4 46 VOPVOP YJS VQMon PSQM and PESQ are intrusive techniques PSQM and PESQ require on-line DSP processing Given the speech encoder shouldn’t there be a connection between network parameters e.g. packet loss, jitter and speech quality? A nonintrusive technique has been developed based on the E-model Invented by AD Clark (Telchemy) accepted by ETSI TIPHON SQM


Download ppt "Y(J) Stein VoP4 1 VOPVOP YJS Other Features. Y(J) Stein VoP4 2 VOPVOP YJS Echo Cancellation."

Similar presentations


Ads by Google