Do You See What I See (DYSWIS)? or Leveraging end systems to improve network reliability Henning Schulzrinne Dept. of Computer Science Columbia University.

Slides:



Advertisements
Similar presentations
Running SIP behind NAT Dr. Christian Stredicke, snom technology AG Tokyo, Japan, Oct 22 th 2002.
Advertisements

NAT, firewalls and IPv6 Christian Huitema Architect, Windows Networking Microsoft Corporation.
Security in VoIP Networks Juan C Pelaez Florida Atlantic University Security in VoIP Networks Juan C Pelaez Florida Atlantic University.
IST 201 Chapter 9. TCP/IP Model Application Transport Internet Network Access.
Precept 3 Host Configuration 1 Peng Sun. What TCP conn. running? Commands netstat [-n] [-p] [-c] (Linux) lsof -i -P (Mac) ss (newer version of netstat)
Does it have to be that complicated? Thoughts on a next-generation Internet Henning Schulzrinne Internet Real Time Laboratory Computer Science Dept., Columbia.
P2P Distributed Fault Diagnosis for SIP Services Henning Schulzrinne, Kyung-Hwa Kim Dept. of Computer Science, Columbia University, New York, NY Kai Miao.
11 TROUBLESHOOTING Chapter 12. Chapter 12: TROUBLESHOOTING2 OVERVIEW  Determine whether a network communications problem is related to TCP/IP.  Understand.
QoS Solutions Confidential 2010 NetQuality Analyzer and QPerf.
Application layer (continued) Week 4 – Lecture 2.
DYSWIS1 Managing (VoIP) Applications – DYSWIS Henning Schulzrinne Dept. of Computer Science Columbia University July 2005.
Networking Theory (part 2). Internet Architecture The Internet is a worldwide collection of smaller networks that share a common suite of communication.
Oct MMNS (San Jose) Distributed Self Fault-Diagnosis for SIP Multimedia Applications Kai X. Miao (Intel) Henning Schulzrinne (Columbia U.) Vishal.
IRT Lab IP Telephony Columbia 1 Henning Schulzrinne Wenyu Jiang Sankaran Narayanan Xiaotao Wu Columbia University Department of Computer Science.
Internet Real Time Laboratory Department of Computer Science Columbia University.
The StarNet Analyzer. Contact SNA Department x172
Chapter 23: ARP, ICMP, DHCP IS333 Spring 2015.
Deployment of the VoIP Servers BY: Syed khaja Najmuddin Ahmed Anil Kumar Marikukala.
SNMP In Depth. SNMP u Simple Network Management Protocol –The most popular network management protocol –Hosts, firewalls, routers, switches…UPS, power.
Support Protocols and Technologies. Topics Filling in the gaps we need to make for IP forwarding work in practice – Getting IP addresses (DHCP) – Mapping.
Module 1: Reviewing the Suite of TCP/IP Protocols.
CN2668 Routers and Switches Kemtis Kunanuraksapong MSIS with Distinction MCTS, MCDST, MCP, A+
CCNA Introduction to Networking 5.0 Rick Graziani Cabrillo College
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public ITE PC v4.0 Chapter 1 1 Troubleshooting Your Network Networking for Home and Small Businesses.
Petrozavodsk State University, Alex Moschevikin, 2003NET TECHNOLOGIES Internet Control Message Protocol ICMP author -- J. Postel, September The purpose.
Networking Feb. 6, 2008 by Larry Finger. Networking Hardware Glossary RJ45 – Official name for 8-pin connector Cat 5, 5E or 6 - Cable suitable for “high”-speed.
1 Computer Networks and Internets Spring 2005 Assistant Professor JainShing Liu.
1 ISP Help Desk Working at a Small-to-Medium Business or ISP – Chapter 2.
S305 – Network Infrastructure Chapter 5 Network and Transport Layers.
1 Automated Fault diagnosis in VoIP 31st March,2006 Vishal Kumar Singh and Henning Schulzrinne.
Network Protocols. Why Protocols?  Rules and procedures to govern communication Some for transferring data Some for transferring data Some for route.
Throughput: Internet scenario
Chapter 4. After completion of this chapter, you should be able to: Explain “what is the Internet? And how we connect to the Internet using an ISP. Explain.
Exploring the Packet Delivery Process Chapter
Operating Systems Lesson 10. Networking Communications protocol is the set of standard rules for ◦ Data representation ◦ Signaling ◦ Authentication ◦
1 A high grade secure VoIP using the TEA Encryption Algorithm By Ashraf D. Elbayoumy 2005 International Symposium on Advanced Radio Technologies Boulder,
© 2002, Cisco Systems, Inc. All rights reserved..
Objectives: Chapter 5: Network/Internet Layer  How Networks are connected Network/Internet Layer Routed Protocols Routing Protocols Autonomous Systems.
1 © 2003, Cisco Systems, Inc. All rights reserved. CCNA 2 Module 9 Basic Router Troubleshooting.
Denial-of-Service Attacks Justin Steele Definition “A "denial-of-service" attack is characterized by an explicit attempt by attackers to prevent legitimate.
15-1 Networking Computer network A collection of computing devices that are connected in various ways in order to communicate and share resources.
1 CHAPTER 3 CLASSES OF ATTACK. 2 Denial of Service (DoS) Takes place when availability to resource is intentionally blocked or degraded Takes place when.
P2P Distributed Fault Diagnosis for SIP Services Henning Schulzrinne, Kyung-Hwa Kim Dept. of Computer Science, Columbia University, New York, NY Kai Miao.
Day 14 Introduction to Networking. Unix Networking Unix is very frequently used as a server. –Server is a machine which “serves” some function Web Server.
workshop eugene, oregon What is network management? System & Service monitoring  Reachability, availability Resource measurement/monitoring.
1 TCP/IP Internetting ä Subnet layer ä Links stations on same subnet ä Often IEEE LAN standards ä PPP for telephone connections ä TCP/IP specifies.
CCNA 2 Week 9 Router Troubleshooting. Copyright © 2005 University of Bolton Topics Routing Table Overview Network Testing Troubleshooting Router Issues.
Voice over IP B 林與絜.
Managing Services and Networks Using a Peer-to-peer Approach Henning Schulzrinne (with Vishal Singh and other IRT members) Dept. of Computer Science Columbia.
Multimedia and Networks. Protocols (rules) Rules governing the exchange of data over networks Conceptually organized into stacked layers – Application-oriented.
TCP/IP (Transmission Control Protocol / Internet Protocol)
Internet Protocols. ICMP ICMP – Internet Control Message Protocol Each ICMP message is encapsulated in an IP packet – Treated like any other datagram,
Internet protocols for the SmartGrid – architectural consideration Henning Schulzrinne Columbia University 1.
ERICSON BRANDON M. BASCUG Alternate - REGIONAL NETWORK ADMINISTRATOR HOW TO TROUBLESHOOT TCP/IP CONNECTIVITY.
1 Connectivity with ARP and RARP. 2 There needs to be a mapping between the layer 2 and layer 3 addresses (i.e. IP to Ethernet). Mapping should be dynamic.
Quality of Service for Real-Time Network Management Debbie Greenstreet Product Management Director Texas Instruments.
1/30/2008 International SIP 2008 (Paris) Peer-to-Peer-based Automatic Fault Diagnosis in VoIP Henning Schulzrinne (Columbia U.) Kai X. Miao (Intel)
KYUNG-HWA KIM HENNING SCHULZRINNE 12/09/2008 INTERNET REAL-TIME LAB, COLUMBIA UNIVERSITY DYSWIS.
Network Layer IP Address.
Firewalls, Network Address Translators(NATs), and H.323
Chapter Objectives In this chapter, you will learn:
Troubleshooting a Network
Instructor Materials Chapter 9: Testing and Troubleshooting
Part1: Ipconfig ping command Tracert command Getmac command
Introduction to Networking
Troubleshooting IP Communications
How Data Flows through the Internet
Networking Theory (part 2)
How Our Customers Communicate With Us
Networking Theory (part 2)
Presentation transcript:

Do You See What I See (DYSWIS)? or Leveraging end systems to improve network reliability Henning Schulzrinne Dept. of Computer Science Columbia University July 2005

Overview End-to-end application-visible reliability still poor (~ 99.5%) –even though network elements have gotten much more reliable –particular impact on interactive applications (e.g., VoIP) –transient problems Lots of voodoo network management Existing network management doesn’t work for VoIP and other modern applications Need user-centric rather than operator-centric management Proposal: peer-to-peer management –“Do You See What I See?” Also use for reliability estimation and statistical fault characterization

Transition in cost balance Total cost of ownership –Ethernet port cost  $10 –about 80% of Columbia CS’s system support cost is staff cost about $2500/person/year  2 new PCs/year much of the rest is backup & license for spam filters Does not count hours of employee or son/daughter time PC, Ethernet port and router cost seem to have reached plateau –just that the $10 now buys a 100 Mb/s port instead of 10 Mb/s All of our switches, routers and hosts are SNMP-enabled, but no suggestion that this would help at all

Circle of blame OS VSP app vendor ISP must be a Windows registry problem  re-install Windows probably packet loss in your Internet connection  reboot your DSL modem must be your software  upgrade probably a gateway fault  choose us as provider

Diagnostic undecidability symptom: “cannot reach server” more precise: send packet, but no response causes: –NAT problem (return packet dropped)? –firewall problem? –path to server broken? –outdated server information (moved)? –server dead? 5 causes  very different remedies –no good way for non-technical user to tell Whom do you call?

VoIP user experience Only % call attempt success –“Keynote was able to complete VoIP calls 96.9% of the time, compared with 99.9% for calls made over the public network. Voice quality for VoIP calls on average was rated at 3.5 out of 5, compared with 3.9 for public-network calls and 3.6 for cellular phone calls. And the amount of delay the audio signals experienced was 295 milliseconds for VoIP calls, compared with 139 milliseconds for public-network calls.” (InformationWeek, July 11, 2005) Mid-call disruptions common Lots of knobs to turn –Separate problem: manual configuration

Traditional network management model SNMP X “management from the center”

Old assumptions, now wrong Single provider (enterprise, carrier) –has access to most path elements –professionally managed Problems are hard failures & elements operate correctly –element failures (“link dead”) –substantial packet loss Mostly L2 and L3 elements –switches, routers –rarely APs Problems are specific to a protocol –“IP is not working” Indirect detection –MIB variable vs. actual protocol performance End systems don’t need management –DMI & SNMP never succeeded –each application does its own updates

Management element inspection configuration fault location network understanding we’ve only succeeded here what causes the most trouble?

Managing the protocol stack RTP UDP/TCP IP SIP no route packet loss TCP neg. failure NAT time-out firewall policy protocol problem playout errors media echo gain problems VAD action protocol problem authorization asymmetric conn (NAT)

Types of failures Hard failures –connection attempt fails –no media connection –NAT time-out Soft failures (degradation) –packet loss (bursts) access network? backbone? remote access? –delay (bursts) OS? access networks? –acoustic problems (microphone gain, echo)

Examples of additional problems ping and traceroute no longer works reliably –WinXP SP 2 turns off ICMP –some networks filter all ICMP messages Early NAT binding time-out –initial packet exchange succeeds, but then TCP binding is removed (“web-only Internet”) policy intent vs. failure –“broken by design” –“we don’t allow port 25” vs. “SMTP server temporarily unreachable”

Proposal: “Do You See What I See?” Each node has a set of active and passive measurement tools Use intercept (NDIS, pcap) –to detect problems automatically e.g., no response to HTTP or DNS request –gather performance statistics (packet jitter) –capture RTCP and similar measurement packets Nodes can ask others for their view –possibly also dedicated “weather stations” Iterative process, leading to: –user indication of cause of failure –in some cases, work-around (application-layer routing)  TURN server, use remote DNS servers Nodes collect statistical information on failures and their likely causes

Architecture “not working” (notification) inspect protocol requests (DNS, HTTP, RTCP, …) “DNS failure for 15m” orchestrate tests contact others ping can buddy reach our resolver? notify admin ( , IM, SIP events, …) request diagnostics

Failure detection tools STUN server –what is your IP address? ping and traceroute Transport-level liveness and QoS –open TCP connection to port –send UDP ping to port –measure packet loss & jitter TBD: Need scriptable tools with dependency graph –initially, we’ll be using ‘make’ TBD: remote diagnostic –fixed set (“do DNS lookup”) or –applets (only remote access) media RTP UDP/TCP IP

Failure statistics Which parts of the network are most likely to fail (or degrade) –access network –network interconnects –backbone network –infrastructure servers (DHCP, DNS) –application servers (SIP, RTSP, HTTP, …) –protocol failures/incompatibility Currently, mostly guesses End nodes can gather and accumulate statistics

Conclusion Hypothesis: network reliability as single largest open technical issue  prevents (some) new applications Existing management tools of limited use to most enterprises and end users Transition to “self-service” networks –support non-technical users, not just NOCs running HP OpenView or Tivoli Need better view of network reliability