Presentation is loading. Please wait.

Presentation is loading. Please wait.

Network reliability and QoS measurements Henning Schulzrinne Columbia University Samsung, Seoul March 2004.

Similar presentations


Presentation on theme: "Network reliability and QoS measurements Henning Schulzrinne Columbia University Samsung, Seoul March 2004."— Presentation transcript:

1

2 Network reliability and QoS measurements Henning Schulzrinne Columbia University Samsung, Seoul March 2004

3 Overview The IRT Lab at Columbia University Application: Internet multimedia Quality of service = scheduling and admission control  thousands of papers… network signaling end-system performance  embedded end systems + PCs QoS  network application reliability

4 Laboratory overview 11 PhDs 3 at IBM, Lucent, Telcordia 5 MS Visitors (Ericsson, Fujitsu, Mitsubishi, Nokia, U. Coimbra, U. Oulu, …) China, Finland, Greece, India, Japan, Portugal, Spain, Sweden, US, Taiwan

5 IRT topics Internet multimedia protocols and systems Internet telephony and radio (SIP, RTSP, RTP) Content distribution networks Internet-scale event distribution Service creation Ubiquitous, context-aware computing and communications Protocols and services for wireless ad-hoc networks Service discovery Quality of service Pricing for adaptive services Scalable resource reservation protocols (CASP, BGRP, YESSIR) End-system evaluation Network measurements Service reliability

6 Internet multimedia Internet telephony = replacing the existing circuit-switched system with Internet-based systems Signaling and services Quality of service philosophies: end systems adapt and compensate end systems use FEC, LBR, PLC jitter  playout delay compensation network offers guarantees  difficult architecturally, business, not necessarily technically we pursue both

7 Assessment of VoIP Service Availability Wenyu Jiang Henning Schulzrinne IRT Lab, Dept. of Computer Science Columbia University

8 Overview (on-going work, preliminary results, still looking for measurement sites, …) Service availability Measurement setup Measurement results call success probability overall network loss network outages outage induced call abortion probability

9 Service availability Users do not care about QoS at least not about packet loss, jitter, delay FEC and PLC can deal with losses up to 5-8% rather, it’s service availability  how likely is it that I can place a call and not get interrupted? availability = MTBF / (MTBF + MTTR) MTBF = mean time between failures MTTR = mean time to repair availability = successful calls / first call attempts equipment availability: 99.999% (“5 nines”)  5 minutes/year AT&T: 99.98% availability (1997) IP frame relay SLA: 99.9% UK mobile phone survey: 97.1-98.8%

10 Availability – PSTN metrics PSTN metrics (Worldbank study): fault rate “should be less than 0.2 per main line” fault clearance (~ MTTR) “next business day” call completion rate during network busy hour “varies from about 60% - 75%” dial tone delay

11 Example PSTN statistics Source: Worldbank

12 Measurement setup Node nameLocationConnectivityNetwork columbiaColumbia University, NY>= OC3I2 wustlWashington U., St. LouisI2 unmUniv. of New MexicoI2 epflEPFL, Lausanne, CHI2+ hutHelsinki University of TechnologyI2+ rrNYCcable modemISP rrqueensQueens, NYcable modemISP njcableNew Jerseycable modemISP newportNew JerseyADSLISP sanjoseSan Jose, Californiacable modemISP sunaKitakyushu, Japan3 Mb/sISP shShanghai, Chinacable modemISP ShanghaihomeShanghai, Chinacable modemISP ShanghaiofficeShanghai, ChinaADSLISP

13 Measurement setup Active measurements call duration 3 or 7 minutes UDP packets: 36 bytes alternating with 72 bytes (FEC) 40 ms spacing September 10 to December 6, 2002 13,500 call hours

14 Call success probability 62,027 calls succeeded, 292 failed  99.53% availability roughly constant across I2, I2+, commercial ISPs All99.53% Internet299.52% Internet2+99.56% Commercial99.51% Domestic (US)99.45% International99.58% Domestic commercial 99.39% International commercial 99.59%

15 Overall network loss PSTN: once connected, call usually of good quality exception: mobile phones compute periods of time below loss threshold 5% causes degradation for many codecs others acceptable till 20% loss0%5%10%20% All82.397.4899.1699.75 ISP78.696.7299.0499.74 I297.799.6799.7799.79 I2+86.898.4199.3299.76 US83.696.9599.2799.79 Int.81.797.7399.1199.73 US ISP 73.695.0398.9299.79 Int. ISP 81.297.6099.1099.71

16 Network Outages sustained packet losses arbitrarily defined at 8 packets far beyond any recoverable loss (FEC, interpolation) 23% outages make up significant part of 0.25% unavailability symmetric: A  B  B  A  spatially correlated: A  B   A  X  not correlated across networks (e.g., I2 and commercial)

17 Network outages

18 no. of outages % symmetric duration (mean) duration (median) total (all, h:m) outages > 1000 packets all10,75330%1452517:2010:58 I281914.5%360253:172:33 I2+2,70810%259267:475:37 ISP8,04537%107249:334:58 US1,77718%269205:183:53 Int.8,97633%1212612:026:42

19 Outage-induced call abortion proability Long interruption  user likely to abandon call from E.855 survey: P[holding] = e -t/17.26 (t in seconds)  half the users will abandon call after 12s 2,566 have at least one outage 946 of 2,566 expected to be dropped  1.53% of all calls all1.53% I21.16% I2+1.15% ISP1.82% US0.99% Int.1.78% US ISP0.86% Int. ISP2.30%

20 Conclusion Availability in space is (mostly) solved  availability in time restricts usability for new applications initial investigation into service availability for VoIP need to define metrics for, say, web access unify packet loss and “no Internet dial tone’’ far less than “5 nines” working on identifying fault sources and locations looking for additional measurement sites

21 Quality and Performance Evaluation of VoIP End-points Wenyu Jiang Henning Schulzrinne Columbia University

22 Motivations The quality of VoIP depends on both the network and the end-points Extensive QoS literature on network performance, e.g., IntServ, DiffServ Focus is on limiting network loss & delay Little is known about the behavior of VoIP end-points

23 Performance Metrics for VoIP End-points Mouth-to-ear (M2E) delay compare network delay Clock skew whether it causes any voice glitches amount of clock drift Silence suppression behavior whether the voice is clipped (depends much on hangover time) robustness to non-speech input, e.g., music Robustness to packet loss voice quality under packet loss Acoustic echo cancellation Jitter adaptation: delay > max(jitter)?

24 Measurement Approach Capture both original and output audio Use adelay program to measure M2E delay auto correlation no clock synchronization needed Assume a LAN environment by default Serve as a baseline of reference, or lower bound

25 VoIP End-points Tested Hardware End-points Cisco, 3Com and Pingtel IP phones Mediatrix 1-line SIP/PSTN Gateway Software clients Microsoft Messenger, NetMeeting (Win2K, WinXP) Net2Phone (NT, Win2K, Win98) Sipc/RAT (Solaris, Ultra-10) Robust Audio Tool (RAT) from UCL as media client Operating parameters: In most cases, codec is G.711  -law, packet interval is 20ms

26 IP Phone Hardware DSP for audio coding, AEC  C for protocol processing embedded OS (Linux, Windriver, …) with web browser Ethernet interface, maybe with hub

27 Example M2E Delay Plot 3Com to Cisco, shown with gaps > 1sec Delay adjustments correlate with gaps, despite 3Com phone has no silence suppression

28 Visual Illustration of M2E Delay Drop, Snapshot #1 3Com to Cisco 1-1 case Left/upper channel is original audio Highlighted section shows M2E delay (59ms)

29 Snapshot #2 M2E delay drops to 49ms, at time of 4:16

30 Snapshot #3 Presence of a gap during the delay change

31 Effect of RTP Marker Bits on Delay Adjustments Cisco phone sends M-bits, whereas Pingtel phone does not Presence of M-bits results in more adjustments

32 Sender Characteristics Certain senders may introduce delay spikes, despite operating on a LAN

33 Average M2E Delays for IP phones and sipc Averaging the M2E delay allows more compact presentation of end-point behaviors Receiver (especially RAT) plays an important role in M2E delay

34 Average M2E Delays for PC Software Clients Messenger 2000 wins the day Its delay as receiver (68ms) is even lower than Messenger XP, on the same hardware It also results in slightly lower delay as sender NetMeeting is a lot worse (> 400ms) Messenger’s delay performance is similar to or better than a GSM mobile phone. AB ABABBABA MgrXP (pc)MgrXP (notebook)109ms120ms Mgr2K (pc)96.8ms68.5ms NM2K (pc)NM2K (notebook)401ms421ms Mobile (GSM)PSTN (local number)115ms109ms

35 Delay Behaviors for PC Clients, contd. Net2Phone’s delay is also high ~200-500ms V1.5 reduces PC->PSTN delay PC-to-PC calls have fairly high delays AB ABABBABA N2P v1.1 NT P-2 (pc2)PSTN (local number) 292ms372ms N2P v1.5 NT P-2 (pc2)201ms373ms N2P v1.5 W2K K7 (pc)196ms401ms N2P v1.5 W2K K7 (pc)N2P v1.5 W98 P-3 (notebook2) 525ms350ms

36 Effect of Clock Skew: Cisco to 3Com, Experiment 1-1 Symptom of playout buffer underflow Waveforms are dropped Occurred at point of delay adjustment Bugs in software?

37 Clock Skew Rates Mostly symmetric between two devices RAT (Sun Ultra-10) has unusually high drift rates, > 300 ppm (parts per million) High clock skews confirmed in many (but not all) PCs and workstations Drift Rates (in ppm) 3ComCiscoMediatrixPingtelRAT 3Com-8.355.443.341.2-333 Cisco-55.2-0.4-11.8-12.1-381 Mediatrix-43.111.71.3-0.8 Pingtel-40.912.72.8-3.5-380 RAT34340337612.3

38 Drift Rates for PC Clients Drift Rates not always symmetric! But appears to be consistent between Messenger 2K/XP and Net2Phone on the same PC Existence of 2 clocking circuits in sound card? AB ABABBABA MgrXP (pc)MgrXP (notebook) 17287.7 Mgr2K (pc)16585.6 NM2K (pc)NM2K (notebook)?-33? Net2Phone NT (pc2)PSTN290-287 Net2Phone 2K (pc)16682 Mobile (GSM)00

39 Packet Loss Concealment Common PLC methods Silence substitution (worst) Packet repetition, with optional fading Extrapolation (one-sided) Interpolation (two-sided), best quality Use deterministic bursty loss pattern 3/100 means 3 consecutive losses out of every 100 packets Easier to locate packet losses Tested 1/100, 3/100, 1/20, 5/100, etc.

40 PLC Behaviors Loss tolerance (at 20ms interval) By measuring loss-induced gaps in output audio 3Com and Pingtel phones: 2 packet losses Cisco phone: 3 packet losses Level of audio distortion by packet loss Inaudible at 1/100 for all 3 phones Inaudible at 3/100 and 1/20 for Cisco phone, yet audible to very audible for the other two. Cisco phone is the most robust Probably uses interpolation

41 Effect of PLC on Delay No affirmative effect on M2E delay E.g., sipc to Pingtel

42 Silence Suppression Why? Saves bandwidth May reduce processing power (e.g., in conferencing mixer) Facilitates per-talkspurt delay adjustment Key parameters Silence detection threshold Hangover time, to delay silence suppression and avoid end clipping of speech Usually 200ms is long enough [Brady ’68]

43 Hangover Time Measured by feeding ON-OFF waveforms and monitor RTP packets Cisco phone’s is the longest (2.3-2.36 sec), then Messenger (1.06-1.08 sec), then NetMeeting (0.56-0.58 sec) A long hangover time is not necessarily bad, as it reduces voice clipping Indeed, no unnatural gaps are found Does waste a bit more bandwidth

44 Robustness of Silence Detectors to Music On-hold music is often used in customer support centers Need to ensure music is played without any interruption due to silence suppression Tested with a 2.5-min long soundtrack Messenger starts to generate many unwanted gaps at input level of -24dB Cisco phone is more robust, but still fails from input level of -41.4dB

45 Acoustic Echo Cancellation Important for hands-free/conferencing (business) applications Primary metric: Echo Return Loss (ERL) Measured by LAN-sniffing RTP packets Most IP phones support AEC ERL depends slightly on input level and speaker-phone volume Usually > 40 dB (good AEC performance) IP Phone3ComCiscoipDialogPingtelSnom-100 ERL (dB)40-45 53-  49-5433-42  -5 (no AEC)

46 M2E Delay under Jitter Delay properties under the LAN environment serves as a baseline of reference When operating over the Internet: Fixed portion of delay adds to M2E delay as a constant Variable portion (jitter) has a more complex effect Initial test Used typical cable modem delay traces Tested RAT to Cisco No audible distortion due to late loss Added delay is normal

47 M2E Delay under Jitter, contd. Cisco phone generally within expectation Can follow network delay change timely Takes longer (10-20sec) to adapt to decreasing delay Does not overshoot playout delay More end-points to be examined Artificial TraceReal Trace with Spikes

48 Conclusions Average M2E Adelay: Low (mostly < 80ms) for hardware IP phones Software clients: lowest for Messenger 2000 (68.5ms) Application (receiver) most vital in determining delay Poor implementation easily undoes good network QoS Clock skew high on SW clients (RAT, Net2Phone) Packet loss concealment quality Acceptable in all 3 IP phones tested, w. Cisco more robust Silence detector behavior Long hangover time, works well for speech input But may falsely predict music as silence Acoustic Echo Cancellation: good on most IP phones Playout delay behavior: good based on initial tests

49 Future Work Further tests with more end-points on how jitter influences M2E delay Measure the sensitivity (threshold) of various silence detectors Investigate the non-symmetric clock drift phenomena Additional experiments as more brands of VoIP end-points become available


Download ppt "Network reliability and QoS measurements Henning Schulzrinne Columbia University Samsung, Seoul March 2004."

Similar presentations


Ads by Google