Fault and Performance Management for Next Generation IP Communication Alan Clark, Telchemy Fault and Performance Management for Next Generation IP Communication Alan Clark, Telchemy
Outline Problems affecting VoIP performance Tools for Measuring and Diagnosing Problems Protocols for Reporting QoS Performance Management Architecture What to ask for/ integrate?
Enterprise VoIP Deployment Branch Office IP Phone IP VPN IP Phone Teleworker IP Phones Gateway
VoIP Deployment - Issues IP Phone IP VPN IP Phone IP Phones Gateway ECHO ACCESS LINK CONGESTION LAN CONGESTION, DUPLEX MISMATCH, LONG CABLES…. ROUTE FLAPPING, LINK FAIL CODEC DISTORTION
Call Quality Problems Packet Loss Jitter (Packet Delay Variation) Codecs and PLC Delay (Latency) Echo Signal Level Noise Level
Packet Loss and Jitter Codec IP Network Jitter Buffer Packets lost in network Packets discarded due to jitter Distorted Speech
Routers, Loss and Jitter Arriving packets Output queue Prioritize/ Route Voice packet delayed by one or more data packets Queuing delay Serialization delay Packet loss due to buffer Overflow or RED Input queue Queuing delay Processing delay
Queuing Delays Added delay due to wait for data packets to be sent = Jitter
Jitter Average jitter level (PPDV) = 4.5mS Peak jitter level = 60mS
WiFi can also cause jitter
Effects of Jitter Low levels of jitter absorbed by jitter buffer High levels of jitter o lead to packets being discarded o cause adaptive jitter buffer to grow - increasing delay but reducing discards If packets are discarded by the jitter buffer as they arrive too late they are regarded as “discarded” If packets arrive extremely late they are regarded as “lost” hence sometimes “lost” packets actually did arrive
Packet Loss Average packet loss rate = 2.1% Peak packet loss = 30%
Packet Loss is bursty Packet loss (and packet discard) tends to occur in sparse bursts - say 20-30% in density and one second or so in length Terminology o Consecutive burst o Sparse burst o Burst of Loss vs Loss/Discard
Example Packet Loss Distribution 20 percent burst density (sparse burst) Consecutive loss
Loss and Discard Loss is often associated with periods of high congestion Jitter is due to congestion (usually) and leads to packet discard Hence Loss and Discard often coincide Other factors can apply - e.g. duplex mismatch, link failures etc.
Example Loss/Discard Distribution
Leads To Time Varying Call Quality High jitter/ loss/ discard
Packet Loss Concealment Mitigates impact of packet loss/ discard by replacing lost speech segments Very effective for isolated lost packets, less effective for bursty loss/discard But isn’t loss/discard bursty? Need to be able to deal with % loss!!! Estimated by PLC
Effectiveness of PLC Codec distortion Impact of loss/ discard and PLC
Call Quality Problems Packet Loss Jitter (Packet Delay Variation) Codecs and PLC Delay (Latency) Echo Signal Level Noise Level
Effect of Delay on Conversational Quality
Causes of Delay CODEC Echo Control RTP IP UDP TCP CODEC Echo Control RTP IP UDP TCP External delayAccumulate and encode Network delay Jitter buffer, decode and playout
Cause of Echo IP Echo Canceller Gateway Line Echo Round trip delay - typically 50mS+ Additional delay introduced by VoIP makes existing echo problems more obvious Also - “convergence” echo Acoustic Echo
Echo problems Echo with very low delay sounds like “sidetone” Echo with some delay makes the line sound hollow Echo with over 50mS delay sounds like…. Echo Echo Return Loss o 55dB or above is good o 25dB or below is bad
Call Quality Problems Packet Loss Jitter (Packet Delay Variation) Codecs and PLC Delay (Latency) Echo Signal Level Noise Level
Signal Level Problems Temporal Clipping occurs with VAD or Echo Suppressors -- gaps in speech, start/end of words missing Amplitude Clipping occurs -- speech sounds loud and “buzzy” 0 dBm0 -36 dBm0
Noise Noise can be due to o Low signal level o Equipment/ encoding (e.g. quantization noise) o External local loops o Environmental (room) noise From a service provider perspective - how to distinguish between o room noise (not my problem) o Network/equipment/circuit noise (is my problem)
Measuring VoIP performance VQmon ITU G.107 ITU P.862 (PESQ) VQmon ITU P.VTQ ITU P.563 Active Test - Measure test calls Passive Test - Measure live calls VoIP Specific Analog signal based
“Gold Standard” - ACR Test Speech material o Phonetically balanced speech samples 8-10 seconds in length o Test designed to eliminate bias (e.g. presentation order different for each listener) o Known files included as anchors (e.g. MNRU) Listening conditions o Panel of listeners o Controlled conditions (quiet environment with known level of background noise)
Example ACR test results Extract from an ITU subjective test Mean Opinion Score (MOS) was 2.4 1=Unacceptable 2=Poor 3=Fair 4=Good 5=Excellent
Packet based approaches VoIP Test System VoIP Test System IP VoIP End System VoIP End System IP Passive Test Passive Test Measure call Test Call Live Call VQmon, G.107. P.VTQ
Packet based approaches ITU G.107 R = Ro - Is - Ie - Id + A o Really a network planning tool o Missing many essential monitoring features VQmon o ITU G ETSI TS Annex E +……. o Proprietary but widely used (Superset of G.107 & P.VTQ) ITU P.VTQ o Available late 2005, very limited functionality
Extended E Model - VQmon Arriving packets Discarded CODEC Jitter buffer Loss/ Discard events Metrics Calculation 4 State Markov Model Gather detailed packet loss info in real time Signal level Noise level Echo level Call Quality Scores Diagnostic Data
Modeling transient effects Time (seconds) Measured Call quality User Reported Call quality Ie(gap) Ie(burst) Ie(VQmon)
VQmon - computational model Burst loss rate Gap loss rate Ie mapping Perceptual model Calculate R-LQ MOS-LQ Calculate Ro, Is Signal level Noise level Calculate Id Echo Delay Calculate R-CQ MOS-CQ Recency model ETSI TS ITU-T G.107
Accuracy: Non-bursty conditions
Accuracy: Bursty conditions G.107 o Well established model for network planning o No way to represent jitter o Few codec models o Inaccurate for bursty loss o Conversational Quality only VQmon o Extended G.107 o Transient impairment model o Wide range of codec models o Narrow & Wideband o Jitter Buffer Emulator o Listening and Conversational Quality VQmon E Model Comparison of VQmon and E Model for severely time varying conditions
Signal based approaches VoIP End System VoIP End System IP VoIP End System VoIP End System IP P.862 Tester Test Call P.563 Tester P.862 is an Active Test Approach P.563 is a Passive Test Approach
ITU P Active testing IP Time align Audio files FFT… Compare PESQ Score Tested segment of connection PESQ
ITU P Active testing Send speech file Compare received file with original using FFT Takes typically MIPS per call MOS-like score in the range to 4.5 Widely used within the industry Results for G.729A codec for a set of speech files (i.e. for each packet loss rate the only thing changed is the speech source file)
ITU P Passive monitoring Analyses received speech file (single ended) Produces a MOS score Correlates well with MOS when averaged over many calls Requires 100MIPS per call Comparison of P.563 estimated MOS scores with actual ACR test scores. Each point is average per file ACR MOS with 16 listeners compared to P.563 score
Performance Monitoring - Passive Test RTCP XR SIP QoS Report Embedded Monitoring Function
SLA Monitoring - Active Test Active Test Functions Test call
Active or Passive Testing? Active testing o works for pre-deployment testing and on-demand troubleshooting But!!!! o IP problems are transient Passive monitoring o Monitors every call made - but needs a call to monitor o Captures information on transient problems o Provides data for post-analysis Therefore - you need both
VoIP Performance Management Framework Media Path Reporting (RTCP XR) Call Server and CDR database VoIP Endpoint VoIP Gateway SNMP Reporting Network Management System Signaling Based QoS Reporting Embedded Monitoring Network Probe, Analyzer or Router VQ Embedded Monitoring VQ RTP stream (possibly encrypted)
VoIP Performance Management Framework Embedded monitoring function in IP phones, residential gateways…. o Close to the user o Least cost + widest coverage Protocol support developed o RTCP XR (RFC3611), SIP, MGCP, H.323, Megaco o Draft SNMP MIB Works in encrypted environments Already being deployed by equipment vendors
The role of RTCP XR RTCP XR (RFC3611) 1.Provides a useful set of metrics for VoIP performance monitoring and diagnosis 2.Supports both real time monitoring and post-analysis 3.Extracts signal level, noise level and echo level from DSP software in the endpoint 4.Exchanges info on endpoint delay and echo to allow remote endpoint to assess echo impact 5.Provides midstream probes/ analyzers access to analog metrics if secure RTP is used 6.Goes through firewalls………
RFC RTCP XR Loss RateDiscard RateBurst DensityGap Density Burst Duration (mS)Gap Duration (mS) Round Trip Delay (mS)End System Delay (mS) Signal levelRERLNoise LevelGmin R FactorExt RMOS-LQMOS-CQ Rx Config-Jitter Buffer Nominal Jitter Buffer MaxJitter Buffer Abs Max
SIP Service Quality Reporting Event PUBLISH SIP/2.0 Via: SIP/2.0/UDP pc22.example.com;branch=z9hG4bK3343d7 ……… Content-Type: application/rtcpxr Content-Length:... VQSessionReport LocalMetrics: TimeStamps=START: STOP: SessionDesc=PT:0 PD:G.711 SR:8000 FD:20 FPP:2 PLC:3 SSUP:on ……… Signal=SL:2 NL:10 RERL:14 QualityEst=RLQ:90 RCQ:85 EXTR:90 MOSLQ:3.4 MOSCQ:3.3 QoEEstAlg:VQMonv2.1 DialogID: ;to-tag= ;from-tag=9123dh311
RTCP XR MIB Session table Basic parameters Call quality metrics History table Alerting
Passive Monitoring Framework Branch Office IP Phone IP VPN IP Phone Teleworker VQ IP Phones Gateway NMS VQ RTCP XR SIP QoS Report SNMP
What to Implement/ Ask For Embedded monitoring functionality in IP Phones and Gateways (e.g. VQmon) RTCP XR for mid-call data exchange between endpoints SIP Service Quality Events for reporting end of call quality RTCP XR MIB for SNMP support
Summary Problems affecting VoIP performance Tools for Measuring and Diagnosing Problems Protocols for Reporting QoS Performance Management Architecture What to ask for/ integrate?