QoS Measurement and Management for VoIP Wenyu Jiang IRT Lab March 5, 2003.

QoS Measurement and Management for VoIP Wenyu Jiang IRT Lab March 5, 2003

Introduction to VoIP & IP Telephony Transport of voice packets over IP networks Cost savings – Consolidates voice and data networks – Avoids leased lines, long-distance toll calls Smart and new services – Call management (filtering, TOD forwarding): CPL – Better than PSTN quality: wide-band codecs Protocols and Standards – Signaling: SIP (IETF), H.323 (ITU-T) – Transport: RTP/RTCP (IETF)

Practical Issues in VoIP Quality of Service (QoS) – Internet is a best-effort network Loss, delay and jitter Users expect at least PSTN quality for VoIP! Ease of deployment – Requires seamless integration with legacy networks (PSTN/PBX) – Security is a must High yardstick of service availability – Can your network achieve 99.999% up time?

Outline QoS measurement – Objective vs. subjective metrics – Automated measurement of subjective quality QoS management: improving your quality – End-to-End: FEC, LBR, PLC – Network provisioning: voice traffic aggregation Reality check – Performance of end-points (IP phones, …) – Deployment issues in VoIP – Evaluation of VoIP service availability through Internet measurement

Workings of a VoIP Client Audio is packetized, encoded and transmitted Forward error correction (FEC) may be used to recover lost packets Playout control smoothes out jitter to minimize late losses; coupled with FEC Packet loss concealment (PLC) – Last line of “defense” after FEC and playout

LBR: An Alternative to FEC An (n,k) block FEC code can recover  n-k losses Low Bit-rate Redundancy (LBR) – Transmit a lower bit-rate version of original audio – No notion of “blocks” – Not bit-exact recovery

Objective QoS Metrics: Loss Internet packet loss is often bursty – May worsen voice quality than random (Bernoulli) loss Characterization of packet loss – 2-state Markov (Gilbert) model: conditional loss prob. – More detailed models, but more states! Extended Gilbert model, n th order Markov model Hidden Markov model, Gilbert-Elliot model, inter-loss distance – More states  Larger test set, loss of big picture, and Adaptive applications can trade-off model accuracy for fast feedback Gilbert model provides an acceptable compromise

Effect of Gilbert Loss Model Loss burst distribution of a packet trace – Roughly, though not exactly exponential Loss burstiness on FEC performance – FEC less efficient under bursty loss 0.1 1 10 100 1000 024681012 number of occurrences Loss burst length Packet trace Gilbert model

Objective QoS Metrics: Delay Complementary Conditional CDF (C 3 DF) – More descriptive than auto-correlation function (ACF) – Delay correlation rises rapidly beyond a threshold – Approximates conditional late loss probability

Subjective QoS Metrics Perceived quality – Mean Opinion Score (MOS) ITU-T P.800/830 Obtained via listening tests – MOS variations DMOS (Degradation) CMOS (Comparison) MOS c (Conversational): considers delay A/B preference Pros: more meaningful to end users Cons: time consuming, labor intensive MOS GradeScore Excellent5 Good4 Fair3 Poor2 Bad1

Effect of Loss Model on Perceived Quality Codec: G.729 (8kb/s ITU std) Random (Bernoulli) vs. bursty (Gilbert) loss – Bursty  lower MOS – True even when FEC or LBR is used

Going Further: Bridging Objective and Subjective Metrics The E-model (ITU-T G.107/108) – Originally for telephone network planning – Considers various impairments – Reduces to delay and loss impairment when adapted for VoIP Objective quality estimation algorithms – Suitable when network stats is not available, e.g., phone-to-phone service with IP in between. – Speech recognition performance may be used as a quality predictor, by comparing with original text

The E-model Map from loss and delay to impairment scores (I e, I d ) Compute a gross score (R value) and map to MOS c Limited number of codec loss impairment mappings

Using Speech Recognition to Predict MOS Evaluation of automatic speech recognition (ASR) based MOS prediction – IBM ViaVoice Linux version – Codec used: G.729 – Performance metric absolute word recognition ratio relative word recognition ratio

Recognition Ratio vs. MOS Both MOS and R abs decrease w.r.t. loss Then, eliminate middle variable p

Speaker Dependency Absolute performance is speaker-dependent But relative word recognition ratio is not Suitable for MOS prediction

Summary of QoS Measurement Loss burstiness: – Affects (generally worsens) perceived quality as well as FEC performance – May be described with, e.g., a Gilbert model Delay correlation: – Increases rapidly beyond a threshold, revealed through Complementary Conditional CDF (C 3 DF) – Late losses are also bursty Perceived quality (MOS) estimation – Analytical: the E-model – If network statistics N/A: relative word recognition ratio can provide speaker-independent MOS prediction

Outline QoS measurement – Objective vs. subjective metrics – Automated measurement of subjective quality QoS management: improving your quality – End-to-End: FEC, LBR, PLC – Network provisioning: voice traffic aggregation Reality check – Performance of VoIP end-points (IP phones, …) – Deployment issues in VoIP – Evaluation of VoIP service availability through Internet measurement

Quality of FEC vs. LBR FEC is substantially and consistently better – At comparable bandwidth overhead – Across all codec configurations tested G.729+G.723.1 LBR AMR LBR

Quality of FEC under Bursty Loss Packet interval T has a stronger effect on MOS with FEC than without FEC

FEC MOS Optimization Considering Delay Effect Larger T  FEC efficiency , but delay  Optimizing T with the E-model – Calculate final loss probability after FEC, apply delay impairment of FEC, map to MOS c Prediction close to FEC MOS test results – Suitable for analytical perceived quality prediction

Trade-off Analysis between Codec Robustness and FEC 3 loss repair options – FEC, LBR, PLC Loss-resilient codec – Better PLC iLBC (IETF) – But more bit-rates – Better than FEC?

Observations and Results When considering delay: – iLBC is usually preferred in low loss conditions – G.729 or G.723.1 + FEC better for high loss Example: max bandwidth 14 kb/s – Consider delay impairment (use MOS c )

Effect of Max Bandwidth on Achievable Quality 14 to 21 kb/s: significant improvement in MOS c From 21 to 28 kb/s: marginal change due to increasing delay impairment by FEC

Provisioning a VoIP Network Silence detection/suppression – Transmit only during On period, saves bandwidth – Allows traffic aggregation through statistical multiplexing Characteristics of On/Off patterns in VoIP – Traditionally found to be exponentially distributed – Modern silence detectors (G.729B VAD, NeVoT SD) produce different patterns

Traffic Aggregation Simulation Token bucket filter with N sources, R: reserved to peak BW ratio CDF model resembles trace model in most cases Exponential (traditional) model – Under-predicts out-of-profile packet probability; – Under-prediction ratio  as token buffer size B  Similar results for NeVoT SD

Summary of QoS Management End-to-End – FEC is superior in quality to LBR – Codec robustness is better than FEC in low loss conditions Combining both schemes brings the best of both sides Network provisioning – Observation: New silence detectors (G.729B, NeVoT SD)  non-exponential voice On/Off patterns – Result: performance of voice traffic aggregation  under new On/Off patterns – Important in traffic engineering and Service Level Agreement (SLA) validation

Outline QoS measurement – Objective vs. subjective metrics – Automated measurement of subjective quality QoS management: improving your quality – End-to-End: FEC, LBR, PLC – Network provisioning: voice traffic aggregation Reality check – Performance of end-points (IP phones, …) – Deployment issues in VoIP – Assessment of VoIP service availability through Internet measurement

Mouth-to-ear Delay of VoIP End-points All receivers can adjust M2E delay adaptively whenever it is too low or too high M2E delay depends mainly on receiver (esp. RAT) HW phones have relatively low delay (~45-90ms)

But Adaptiveness  Perfection Symptom of playout buffer underflow Waveforms are dropped Occurred at point of delay adjustment Bugs in software? LAN  perfect quality?

Major Observations Overall: end-points matter a lot! HW IP phones: 45-90ms average M2E delay SW clients: – Messenger 2000 lowest (68ms), XP (96-120ms) c.f. GSM  PSTN:  110ms either direction – NetMeeting very bad (> 400ms) PLC robustness – Acceptable in all 3 IP phones tested, Cisco phone more robust Silence detection/suppression – Works for speech input – Often fails for non-speech (e.g., music) input Generates many unnatural gaps Not good for customer support center (on-hold music)! Acoustic echo cancellation (AEC): – Good on most IP phones (Echo Return Loss > 40 dB) – But some do not implement AEC at all

Reality Check #2: IP Telephony Deployment Localized deployment at Columbia Univ. SIP proxy, redirect server SQL database sipd Conference Server Voicemail Server T1/E1 RTP/SIP Regular phone SIP/PSTN Gateway Telephone Switch/PBX Web based configuration Web Server Server status monitoring Core Server IP Phones

Issues and Lessons Learned PSTN/PBX integration – Requires full understanding of legacy networks Lower layer (e.g., T1 line configuration) –Parameters must match on both PSTN/PBX and gateway! PBX access configurations –To ensure calls go through in both directions Address translation (dial-plan) in both directions – Previous lessons/experiences can help greatly E.g., second gateway installed in weeks instead of months Security – Issue: SIP/PSTN gateway has no authentication feature – Solution: Use gateway’s access control lists to block direct calls SIP proxy server handles authentication using record-route

Reality Check #3: VoIP Service Availability Focus on availability rather than traditional QoS – Delay is a minor issue; FEC recovers most isolated losses – Ability to make a call is vital, especially in emergency Internet measurement sites: – 14 nodes worldwide, not just Internet2 and alike Definitions: – Availability = MTBF / (MTBF + MTTR) – Availability = successful calls / first call attempts Equipment availability: 99.999% (“5 nines”)  5 minutes/year AT&T: 99.98% availability (1997) IP frame relay SLA: 99.9% UK mobile phone survey: 97.1-98.8%

First Look of Availability Call success probability: – 62,027 calls succeeded, 292 failed  99.53% availability – Roughly constant across I2, I2+, commercial ISPs: 99.39-99.58% Overall network loss – PSTN: once connected, call usually of good quality exception: mobile phones – Compute % time below loss threshold 5% loss causes degradation for many codecs others acceptable till 20% loss0%5%10%20% All82.397.4899.1699.75 ISP78.696.7299.0499.74 I297.799.6799.7799.79 I2+86.898.4199.3299.76 US83.696.9599.2799.79 Int.81.797.7399.1199.73 US ISP 73.695.0398.9299.79 Int. ISP 81.297.6099.1099.71

Network Outages Sustained packet losses – arbitrarily defined at 8 packets – far beyond recoverable (FEC, interpolation) 23% packet losses are outages Make up significant part of 0.25% unavailability Symmetric: A  B  B  A  Spatially correlated: A  B   A  X  Not correlated across networks (e.g., I2 and commercial) Mostly short (a few seconds), but some are very long (100’s of seconds), make up majority of outage time

Outage-induced Call Abortion Probability Long interruption  user likely to abandon call from E.855 survey: P[holding] = e -t/17.26 (t in seconds)  half the users will abandon call after 12s 2,566 have at least one outage 946 of 2,566 expected to be dropped  1.53% of all calls all1.53% I21.16% I2+1.15% ISP1.82% US0.99% Int.1.78% US ISP0.86% Int. ISP2.30%

Summary of Service Availability Through several metrics, one can translate from network loss to VoIP service availability (no Internet dial-tone) Current results show availability far below five 9’s, but comparable to mobile telephony – Outage statistics are similar in research and ISP networks Working on identifying fault sources and locations Additional measurement sites are welcome

Conclusions Measuring QoS – Loss burstiness and delay correlation affects (generally worsens) perceived quality – Bridging objective and subjective metrics: the E-model, or speech recognition based MOS prediction – Performance of real products: IP phones and soft clients Ensuring/improving QoS – Network provisioning (voice traffic aggregation) Efficient, but may be expensive to deploy and manage – End-to-End (FEC > LBR, PLC) Easier to deploy, but must control overhead of FEC Reality Check – Good implementation at the end-point (e.g., IP phones) is vital – VoIP deployment requires PSTN integration and security – Service availability is crucial for VoIP, but still far from 99.999% over the Internet

Ongoing and Future Work Sampling Internet performance – Where do the problems reside? Access networks (Cable, DSL), or International paths? – How can we solve these problems? Can adaptive FEC react fast enough to changes in network conditions? Playout delay behaviors of VoIP end-points – How well do they react to jitter, delay spikes?

QoS Measurement and Management for VoIP Wenyu Jiang IRT Lab March 5, 2003.

Similar presentations

Presentation on theme: "QoS Measurement and Management for VoIP Wenyu Jiang IRT Lab March 5, 2003."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

QoS Measurement and Management for VoIP Wenyu Jiang IRT Lab March 5, 2003.

Similar presentations

Presentation on theme: "QoS Measurement and Management for VoIP Wenyu Jiang IRT Lab March 5, 2003."— Presentation transcript:

Similar presentations

About project

Feedback