1/30/2008 International SIP 2008 (Paris) Peer-to-Peer-based Automatic Fault Diagnosis in VoIP Henning Schulzrinne (Columbia U.) Kai X. Miao (Intel)

Slides:



Advertisements
Similar presentations
Running SIP behind NAT Dr. Christian Stredicke, snom technology AG Tokyo, Japan, Oct 22 th 2002.
Advertisements

NAT, firewalls and IPv6 Christian Huitema Architect, Windows Networking Microsoft Corporation.
Security in VoIP Networks Juan C Pelaez Florida Atlantic University Security in VoIP Networks Juan C Pelaez Florida Atlantic University.
CCNA2 Module 4. Discovering and Connecting to Neighbors Enable and disable CDP Use the show cdp neighbors command Determine which neighboring devices.
1 Semester 2 Module 4 Learning about Other Devices Yuda college of business James Chen
P2P Distributed Fault Diagnosis for SIP Services Henning Schulzrinne, Kyung-Hwa Kim Dept. of Computer Science, Columbia University, New York, NY Kai Miao.
11 TROUBLESHOOTING Chapter 12. Chapter 12: TROUBLESHOOTING2 OVERVIEW  Determine whether a network communications problem is related to TCP/IP.  Understand.
Module 10: Troubleshooting Network Access. Overview Troubleshooting Network Access Resources Troubleshooting LAN Authentication Troubleshooting Remote.
QoS Solutions Confidential 2010 NetQuality Analyzer and QPerf.
DYSWIS1 Managing (VoIP) Applications – DYSWIS Henning Schulzrinne Dept. of Computer Science Columbia University July 2005.
Networking Theory (part 2). Internet Architecture The Internet is a worldwide collection of smaller networks that share a common suite of communication.
Oct MMNS (San Jose) Distributed Self Fault-Diagnosis for SIP Multimedia Applications Kai X. Miao (Intel) Henning Schulzrinne (Columbia U.) Vishal.
KYUNG HWA KIM HENNING SCHULZRINNE Internet Real-Time Lab Columbia University June 2011 Distributed Network Fault Diagnosis System DYSWIS (Do You See What.
Internet Real Time Laboratory Department of Computer Science Columbia University.
SIMPLEStone – A presence server performance benchmarking standard SIMPLEStone – A presence server performance benchmarking standard Presented by Vishal.
The StarNet Analyzer. Contact SNA Department x172
Chapter 23: ARP, ICMP, DHCP IS333 Spring 2015.
A victim-centric peer-assisted framework for monitoring and troubleshooting routing problems.
Passive traffic measurement Capturing actual Internet packets in order to measure: –Packet sizes –Traffic volumes –Application utilisation –Resource utilisation.
Network Measurement Bandwidth Analysis. Why measure bandwidth? Network congestion has increased tremendously. Network congestion has increased tremendously.
Bandwidth DoS Attacks and Defenses Robert Morris Frans Kaashoek, Hari Balakrishnan, Students MIT LCS.
PROMISE: Peer-to-Peer Media Streaming Using CollectCast Presented by: Randeep Singh Gakhal CMPT 886, July 2004.
Remote Monitoring and Desktop Management Week-7. SNMP designed for management of a limited range of devices and a limited range of functions Monitoring.
McGraw-Hill The McGraw-Hill Companies, Inc., 2000 SNMP Simple Network Management Protocol.
FIREWALL TECHNOLOGIES Tahani al jehani. Firewall benefits  A firewall functions as a choke point – all traffic in and out must pass through this single.
Event Viewer Was of getting to event viewer Go to –Start –Control Panel, –Administrative Tools –Event Viewer Go to –Start.
SNMP In Depth. SNMP u Simple Network Management Protocol –The most popular network management protocol –Hosts, firewalls, routers, switches…UPS, power.
Support Protocols and Technologies. Topics Filling in the gaps we need to make for IP forwarding work in practice – Getting IP addresses (DHCP) – Mapping.
Windows Internet Connection Sharing Dave Eitelbach Program Manager Networking And Communications Microsoft Corporation.
CCNA Introduction to Networking 5.0 Rick Graziani Cabrillo College
Reading Report 14 Yin Chen 14 Apr 2004 Reference: Internet Service Performance: Data Analysis and Visualization, Cross-Industry Working Team, July, 2000.
Hands-on Networking Fundamentals
1 Computer Networks and Internets Spring 2005 Assistant Professor JainShing Liu.
Do You See What I See (DYSWIS)? or Leveraging end systems to improve network reliability Henning Schulzrinne Dept. of Computer Science Columbia University.
User-Perceived Performance Measurement on the Internet Bill Tice Thomas Hildebrandt CS 6255 November 6, 2003.
1 Automated Fault diagnosis in VoIP 31st March,2006 Vishal Kumar Singh and Henning Schulzrinne.
Network Protocols. Why Protocols?  Rules and procedures to govern communication Some for transferring data Some for transferring data Some for route.
Chapter 4. After completion of this chapter, you should be able to: Explain “what is the Internet? And how we connect to the Internet using an ISP. Explain.
1 Computer Networks DA Chapter 1-3 Introduction.
Lec4: TCP/IP, Network management model, Agent architectures
1 © 2003, Cisco Systems, Inc. All rights reserved. CCNA 2 Module 9 Basic Router Troubleshooting.
Denial-of-Service Attacks Justin Steele Definition “A "denial-of-service" attack is characterized by an explicit attempt by attackers to prevent legitimate.
1 CHAPTER 3 CLASSES OF ATTACK. 2 Denial of Service (DoS) Takes place when availability to resource is intentionally blocked or degraded Takes place when.
P2P Distributed Fault Diagnosis for SIP Services Henning Schulzrinne, Kyung-Hwa Kim Dept. of Computer Science, Columbia University, New York, NY Kai Miao.
1 TCP/IP Internetting ä Subnet layer ä Links stations on same subnet ä Often IEEE LAN standards ä PPP for telephone connections ä TCP/IP specifies.
Tony McGregor RIPE NCC Visiting Researcher The University of Waikato DAR Active measurement in the large.
CCNA 2 Week 9 Router Troubleshooting. Copyright © 2005 University of Bolton Topics Routing Table Overview Network Testing Troubleshooting Router Issues.
Content-oriented Networking Platform: A Focus on DDoS Countermeasure ( In incremental deployment perspective) Authors: Junho Suh, Hoon-gyu Choi, Wonjun.
Voice over IP B 林與絜.
Managing Services and Networks Using a Peer-to-peer Approach Henning Schulzrinne (with Vishal Singh and other IRT members) Dept. of Computer Science Columbia.
Monitoring Troubleshooting TCP/IP Chapter 3. Objectives for this Chapter Troubleshoot TCP/IP addressing Diagnose and resolve issues related to incorrect.
Internet Protocols. ICMP ICMP – Internet Control Message Protocol Each ICMP message is encapsulated in an IP packet – Treated like any other datagram,
NETGEAR CONFIDENTIAL FVS338 ProSafe VPN Firewall 50.
ERICSON BRANDON M. BASCUG Alternate - REGIONAL NETWORK ADMINISTRATOR HOW TO TROUBLESHOOT TCP/IP CONNECTIVITY.
Firewalls A brief introduction to firewalls. What does a Firewall do? Firewalls are essential tools in managing and controlling network traffic Firewalls.
Quality of Service for Real-Time Network Management Debbie Greenstreet Product Management Director Texas Instruments.
KYUNG-HWA KIM HENNING SCHULZRINNE 12/09/2008 INTERNET REAL-TIME LAB, COLUMBIA UNIVERSITY DYSWIS.
Networking (Cont’d). Congestion Control l Is achieved by informing nodes along a route that congestion has occurred and asking them to reduce their packet.
COMPUTER NETWORKS Hwajung Lee. Image Source:
Ch. 23, 25 Q and A (NAT and UDP) Victor Norman IS333 Spring 2015.
Ch. 31 Q and A IS 333 Spring 2016 Victor Norman. SNMP, MIBs, and ASN.1 SNMP defines the protocol used to send requests and get responses. MIBs are like.
Firewalls, Network Address Translators(NATs), and H.323
CompTIA Security+ Study Guide (SY0-401)
CHAPTER 3 Architectures for Distributed Systems
CompTIA Security+ Study Guide (SY0-401)
Lecture 2: Overview of TCP/IP protocol
Networking Theory (part 2)
“Detective”: Integrating NDT and E2E piPEs
Networking Theory (part 2)
Networking Theory (part 2)
Presentation transcript:

1/30/2008 International SIP 2008 (Paris) Peer-to-Peer-based Automatic Fault Diagnosis in VoIP Henning Schulzrinne (Columbia U.) Kai X. Miao (Intel)

1/30/2008 International SIP 2008 (Paris) Overview The transition in IT cost metrics End-to-end application-visible reliability still poor (~ 99.5%) –even though network elements have gotten much more reliable –particular impact on interactive applications (e.g., VoIP) –transient problems Lots of voodoo network management Existing network management doesn’t work for VoIP and other modern applications Need user-centric rather than operator-centric management Proposal: peer-to-peer management –“Do You See What I See?” Using VoIP as running example -- most complex consumer application –but also applies to IPTV and other services Also use for reliability estimation and statistical fault characterization

1/30/2008 International SIP 2008 (Paris) Circle of blame OS VSP app vendor ISP must be a Windows registry problem  re-install Windows probably packet loss in your Internet connection  reboot your DSL modem must be your software  upgrade probably a gateway fault  choose us as provider

1/30/2008 International SIP 2008 (Paris) Diagnostic undecidability symptom: “cannot reach server” more precise: send packet, but no response causes: –NAT problem (return packet dropped)? –firewall problem? –path to server broken? –outdated server information (moved)? –server dead? 5 causes  very different remedies –no good way for non-technical user to tell Whom do you call?

1/30/2008 International SIP 2008 (Paris) Traditional network management model SNMP X “management from the center”

1/30/2008 International SIP 2008 (Paris) Old assumptions, now wrong Single provider (enterprise, carrier) –has access to most path elements –professionally managed Problems are hard failures & elements operate correctly –element failures (“link dead”) –substantial packet loss Mostly L2 and L3 elements –switches, routers –rarely APs Problems are specific to a protocol –“IP is not working” Indirect detection –MIB variable vs. actual protocol performance End systems don’t need management –DMI & SNMP never succeeded –each application does its own updates

1/30/2008 International SIP 2008 (Paris) Managing the protocol stack RTP UDP/TCP IP SIP no route packet loss TCP neg. failure NAT time-out firewall policy protocol problem playout errors media echo gain problems VAD action protocol problem authorization asymmetric conn (NAT)

1/30/2008 International SIP 2008 (Paris) Types of failures Hard failures –connection attempt fails –no media connection –NAT time-out Soft failures (degradation) –packet loss (bursts) access network? backbone? remote access? –delay (bursts) OS? access networks? –acoustic problems (microphone gain, echo) –a software bug (poor voice quality) protocol stack? Codec? Software framework?

1/30/2008 International SIP 2008 (Paris) Examples of additional problems ping and traceroute no longer works reliably –WinXP SP 2 turns off ICMP –some networks filter all ICMP messages Early NAT binding time-out –initial packet exchange succeeds, but then TCP binding is removed (“web-only Internet”) policy intent vs. failure –“broken by design” –“we don’t allow port 25” vs. “SMTP server temporarily unreachable”

1/30/2008 International SIP 2008 (Paris) Fault localization Fault classification – local vs. global –Does it affect only me or does it affect others also? Global failures –Server failure e.g., SIP proxy, DNS failure, database failures –Network failures Local failures –Specific source failure node A cannot make call to anyone –Specific destination or participant failure no one can make call to node B –Locally observed, but global failures DNS service failed, but only B observed it

1/30/2008 International SIP 2008 (Paris) Proposal: “Do You See What I See?” Each node has a set of active and passive measurement tools Use intercept (NDIS, pcap) –to detect problems automatically e.g., no response to SIP, HTTP or DNS request deviation from normal protocol exchange behavior –gather performance statistics (packet jitter) –capture RTCP and similar measurement packets Nodes can ask others for their view –possibly also dedicated “weather stations” Iterative process, leading to: –user indication of cause of failure –in some cases, work-around (application-layer routing)  TURN server, use remote DNS servers Nodes collect statistical information on failures and their likely causes DYSWIS

1/30/2008 International SIP 2008 (Paris) Architecture Probe SIP ProxyDNS ServerSMTP ServerFirewallOther SensorProbeSensor Diagnosis Three types of nodes – sensor, probe, and diagnosis

1/30/2008 International SIP 2008 (Paris) Diagnosis node Architecture “not working” (notification) inspect protocol requests (DNS, HTTP, RTCP, …) “DNS failure for 15m” orchestrate tests contact others ping can buddy reach our resolver? notify admin ( , IM, SIP events, …) request diagnostics Sensor node

1/30/2008 International SIP 2008 (Paris) Solution architecture DNS Server P2P Service Provider 1 Service Provider 2 P1 P2 P3 Domain A P5 P4 P6 P7 P8 DNS Test PESQ Test SIP Server SIP Test Call Failed at P1 Nodes in different domains cooperating to determine cause of failure

1/30/2008 International SIP 2008 (Paris) Failure detection tools STUN server –what is your IP address? ping and traceroute Transport-level liveness and QoS –open TCP connection to port –send UDP ping to port –measure packet loss & jitter Need scriptable tools with dependency graph –using DROOLS for now TBD: remote diagnostic –fixed set (“do DNS lookup”) or –applets (only remote access) media RTP UDP/TCP IP

1/30/2008 International SIP 2008 (Paris) Distributed p2p architecture with an iterative process involving all these functions: - Data gathering from multiple perspectives - Knowledge in existence or built over time (learning) - Tools (with intelligence built in) for active probing or observations - Inference, analysis, and decision making Peer nodes: detection nodes, diagnosis nodes, and probe nodes P2P protocol for fault diagnosis Operation rules used to generate tests – built or learned in real time Inference based in rules (inference modeling) Components and Operations

1/30/2008 International SIP 2008 (Paris) Dependency Graphs Passive Tests/Active Tests Analysis/Inference/Diagnosis Fault diagnosis architecture, components, and domain agents Dependency relationships/Decision trees Normal Network Behavior Monitoring deviant behavior Active probesAdaptive probes Diagnostic tests Diagnostic analysis Statistical inference Learning & modeling Fault profiles Fault types: hard vs. soft Components and Operations

1/30/2008 International SIP 2008 (Paris) Dependency classification Functional dependency –At generic service level e.g., SIP proxy depends on DB service, DNS service Structural dependency –Configuration time e.g., Columbia CS SIP proxy is configured to use mysql database on host metro-north Operational dependency –Runtime dependencies or run time bindings e.g., the call which failed was using failover SIP server obtained from DNS which was running on host a.b.c.d in IRT lab

1/30/2008 International SIP 2008 (Paris) Dependency Graph

1/30/2008 International SIP 2008 (Paris) Dependency graph encoded as decision tree A C B D A Failed, Use Decision Tree Yes Invokes Decision Tree for C No Yes Invokes Decision Tree for B Invokes Decision Tree for D Cause Not Known Report, Add new Dependency A B C D A = SIP Call C = SIP Proxy B = DNS Server D = Connectivity

1/30/2008 International SIP 2008 (Paris) Current work Building decision tree system Using JBoss Rules (Drools 3.0)

1/30/2008 International SIP 2008 (Paris) Future work Learning the dependency graph from failure events and diagnostic tests Learning using random or periodic testing to identify failures and determine relationships Self healing Predicting failures Protocols for labeling event failures --> enable automatically incorporating new devices/applications to the dependency system Decision tree (dependency graph) based event correlation

1/30/2008 International SIP 2008 (Paris) Conclusion Hypothesis: network reliability as single largest open technical issue  prevents (some) new applications Existing management tools of limited use to most enterprises and end users Transition to “self-service” networks –support non-technical users, not just NOCs running HP OpenView or Tivoli Need better view of network reliability