1 Automated Fault diagnosis in VoIP 31st March,2006 Vishal Kumar Singh and Henning Schulzrinne.

Slides:



Advertisements
Similar presentations
Cs/ee 143 Communication Networks Chapter 6 Internetworking Text: Walrand & Parekh, 2010 Steven Low CMS, EE, Caltech.
Advertisements

©2012 ClearOne Communications. Confidential and proprietary. COLLABORATE ® Video Conferencing Networking Basics.
H. 323 Chapter 4.
IST 201 Chapter 9. TCP/IP Model Application Transport Internet Network Access.
CCNA2 Module 4. Discovering and Connecting to Neighbors Enable and disable CDP Use the show cdp neighbors command Determine which neighboring devices.
1 Semester 2 Module 4 Learning about Other Devices Yuda college of business James Chen
Precept 3 Host Configuration 1 Peng Sun. What TCP conn. running? Commands netstat [-n] [-p] [-c] (Linux) lsof -i -P (Mac) ss (newer version of netstat)
P2P Distributed Fault Diagnosis for SIP Services Henning Schulzrinne, Kyung-Hwa Kim Dept. of Computer Science, Columbia University, New York, NY Kai Miao.
11 TROUBLESHOOTING Chapter 12. Chapter 12: TROUBLESHOOTING2 OVERVIEW  Determine whether a network communications problem is related to TCP/IP.  Understand.
Module 10: Troubleshooting Network Access. Overview Troubleshooting Network Access Resources Troubleshooting LAN Authentication Troubleshooting Remote.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 4 Installing and Configuring the Dynamic Host Configuration Protocol.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 5: Planning, Configuring, And Troubleshooting DHCP.
1 CCNA 2 v3.1 Module 4. 2 CCNA 2 Module 4 Learning about Devices.
DYSWIS1 Managing (VoIP) Applications – DYSWIS Henning Schulzrinne Dept. of Computer Science Columbia University July 2005.
Fault, Configuration, Performance Management
Oct MMNS (San Jose) Distributed Self Fault-Diagnosis for SIP Multimedia Applications Kai X. Miao (Intel) Henning Schulzrinne (Columbia U.) Vishal.
Kyung Hwa Kim Henning Schulzrinne Internet Real-Time Lab Columbia University October 2011 Distributed Network.
Subnetting.
KYUNG HWA KIM HENNING SCHULZRINNE Internet Real-Time Lab Columbia University June 2011 Distributed Network Fault Diagnosis System DYSWIS (Do You See What.
SIMPLEStone – A presence server performance benchmarking standard SIMPLEStone – A presence server performance benchmarking standard Presented by Vishal.
1 System support & Management Protocols Lesson 13 NETS2150/2850 School of Information Technologies.
Deployment of the VoIP Servers BY: Syed khaja Najmuddin Ahmed Anil Kumar Marikukala.
McGraw-Hill The McGraw-Hill Companies, Inc., 2000 SNMP Simple Network Management Protocol.
 Distributed Software Chapter 18 - Distributed Software1.
Managing DHCP. 2 DHCP Overview Is a protocol that allows client computers to automatically receive an IP address and TCP/IP settings from a Server Reduces.
IP Network Basics. For Internal Use Only ▲ Internal Use Only ▲ Course Objectives Grasp the basic knowledge of network Understand network evolution history.
CECS 5460 – Assignment 3 Stacey VanderHeiden Güney.
Module 7: Configuring TCP/IP Addressing and Name Resolution.
Guide to MCSE , Second Edition, Enhanced1 Windows XP Network Overview Most versatile Windows operating system Supports local area network (LAN) connections.
Do You See What I See (DYSWIS)? or Leveraging end systems to improve network reliability Henning Schulzrinne Dept. of Computer Science Columbia University.
Chapter 1 Overview Review Overview of demonstration network
DNS (Domain Name System) Protocol On the Internet, the DNS associates various sorts of information with domain names. A domain name is a meaningful and.
1 Computer Communication & Networks Lecture 22 Network Layer: Delivery, Forwarding, Routing (contd.)
Common Devices Used In Computer Networks
CIM 2465 Intro to TCP/IP1 Introduction to TCP/IP (Topic 5) Textbook: Networking Basics, CCNA 1 Companion Guide, Cisco Press Cisco Networking Academy Program,
Lec4: TCP/IP, Network management model, Agent architectures
Repeaters and Hubs Repeaters: simplest type of connectivity devices that regenerate a digital signal Operate in Physical layer Cannot improve or correct.
Objectives: Chapter 5: Network/Internet Layer  How Networks are connected Network/Internet Layer Routed Protocols Routing Protocols Autonomous Systems.
1 Version 3.0 Module 11 TCP Application and Transport.
NUS.SOC.CS2105 Ooi Wei Tsang Application Transport Network Link Physical you are here.
Connecting to a Network Lesson 5. Objectives Understand the OSI Reference Model and its relationship to Windows 7 networking Install and configure networking.
C HAPTER 9 Supporting TCP/IP, DNS using Windows XP.
P2P Distributed Fault Diagnosis for SIP Services Henning Schulzrinne, Kyung-Hwa Kim Dept. of Computer Science, Columbia University, New York, NY Kai Miao.
1 Network Management: SNMP The roots of education are bitter, but the fruit is sweet. - Aristotle.
1 TCP/IP Internetting ä Subnet layer ä Links stations on same subnet ä Often IEEE LAN standards ä PPP for telephone connections ä TCP/IP specifies.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 4 Installing and Configuring the Dynamic Host Configuration Protocol.
OS Services And Networking Support Juan Wang Qi Pan Department of Computer Science Southeastern University August 1999.
Managing Services and Networks Using a Peer-to-peer Approach Henning Schulzrinne (with Vishal Singh and other IRT members) Dept. of Computer Science Columbia.
Finding the Right Tool For The Job Network Management: Peter Charland Senior Manager, Product Marketing
Routing and Routing Protocols
1 by Behzad Akbari Fall 2008 In the Name of the Most High Network Management Applications.
NetTech Solutions Common Connectivity Problems Lesson Eight.
1 Version 3.1 Module 6 Routed & Routing Protocols.
WEEK 11 – TOPOLOGIES, TCP/IP, SHARING & SECURITY IT1001- Personal Computer Hardware System & Operations.
Network Management CCNA 4 Chapter 7. Monitoring the Network Connection monitoring takes place every day when users log on Ping only shows that the connection.
ERICSON BRANDON M. BASCUG Alternate - REGIONAL NETWORK ADMINISTRATOR HOW TO TROUBLESHOOT TCP/IP CONNECTIVITY.
1 © 2003, Cisco Systems, Inc. All rights reserved. CCNA 2 Module 4 Learning About Other Devices.
+ Routing Concepts 1 st semester Objectives  Describe the primary functions and features of a router.  Explain how routers use information.
1 Objectives Discuss the basics of Dynamic Host Configuration Protocol (DHCP) Describe the components and processes of DHCP Install DHCP in a Windows Server.
Quality of Service for Real-Time Network Management Debbie Greenstreet Product Management Director Texas Instruments.
1/30/2008 International SIP 2008 (Paris) Peer-to-Peer-based Automatic Fault Diagnosis in VoIP Henning Schulzrinne (Columbia U.) Kai X. Miao (Intel)
1 Objectives Identify the basic components of a network Describe the features of Internet Protocol version 4 (IPv4) and Internet Protocol version 6 (IPv6)
KYUNG-HWA KIM HENNING SCHULZRINNE 12/09/2008 INTERNET REAL-TIME LAB, COLUMBIA UNIVERSITY DYSWIS.
15.1 Chapter 15 Connecting LANs, Backbone Networks, and Virtual LANs Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or.
Ch. 31 Q and A IS 333 Spring 2016 Victor Norman. SNMP, MIBs, and ASN.1 SNMP defines the protocol used to send requests and get responses. MIBs are like.
Lecture # 02 Network Models Course Instructor: Engr. Sana Ziafat.
ITMT Windows 7 Configuration Chapter 5 – Connecting to a Network ITMT 1371 – Windows 7 Configuration 1.
Network Layer IP Address.
CHAPTER 3 Architectures for Distributed Systems
Data and Computer Communications by William Stallings Eighth Edition
Presentation transcript:

1 Automated Fault diagnosis in VoIP 31st March,2006 Vishal Kumar Singh and Henning Schulzrinne

2 VoIP Diagnosis What is automated VoIP diagnosis Determining failures in network Automatically finding the root cause of the failure Why VoIP diagnosis Networks are complex, making it difficult to troubleshoot problems Automatic fault diagnosis reduces human intervention Issues in VoIP diagnosis Detecting failures/faults Finding the cause of failure, determining dependency relationships among different components for diagnosis Solution steps and approaches

3 Issues in Automated VoIP Diagnosis Increasingly complex and diverse network elements Complex interactions/relationships between different network elements Different run time bindings for each application usage instance, e.g., different calls may use different DNS, SIP proxy servers, media path Problem in one network element may manifest itself as user perceived failure of another element

4 Fault Identification Service unavailability reporting Node/Device/UA generates faults (failure events) e.g. SNMP Traps, failure messages Monitoring application e.g., SNMP based application detects service unavailability and reports the failure event Affected user reports service unavailability, e.g., by , calling to helpdesk, automatically by pressing a button on phone while in a call and experiencing echo Dependent application detects service unavailability and generates fault (failure events)

5 Fault Localization : Determining the Source of Problem Fault Classification – Local Vs. Global (Does it affect only me or Does it affect others also) Global failures Server failure e.g. SIP proxy, DNS failure, DB failures Network failures Local failures Specific Source failure e.g. node A cannot make call to anyone Specific destination or participant failure e.g. No one can make call to node B Locally observed but global failures e.g., DNS service failed, but only B observed it.

6 Solution Approach DYSWIS “Do you see what I see” [1] Peers (Nodes) perform diagnostic tests when another peer reports or detects failure Nodes can choose the diagnostic test depending on dependency encoded as decision tree Nodes (at least some) will be initially preloaded with the dependency relationship in some format (e.g., XML based) Nodes (at least some) may build and update the dependency relationship based on statistical and temporal analysis of failure events which they receive and diagnostic tests which they perform

7 Solution Approach Store context information of past failures experienced by each node E.g., specific server that was acting as the proxy server (for my call which failed) Store locality of past failures instances LAN, domain, subnet First hop at each layer e.g., switch (MAC), default gateway (IP), domain’s proxy (Application layer), Failure count for each network element (statistical) Last failure timestamp for each network element Last successfully seen timestamp for each network element (why do I need to test the proxy for you, my call just went through) Temporal correlation of past failures (proxy seems to be failing after DNS fails) Each node has a runtime dependency list based on past failures and diagnostic tests

8 Solution Architecture DNS Server P2P Service Provider 1 Service Provider 2 P1 P2 P3 Domain A P5 P4 P6 P7 P8 DNS Test PESQ Test SIP Server SIP Test Call Failed at P1 Nodes in different domains cooperating to determine cause of failure

9 Solution Architecture: Logical View Dependencies encoded as decision tree, static and dynamic rules Admin input [Dependency relationships and tests (XML) ] Triggers to perform TESTS. (Peer selection and Probe selection. Alerts Dependency graph generation [Bayesian network based, Inference, other models ] Failures in Network Decision Tree updates Test results The above figure shows logical entities and separation of dependency graph generation and Distributed diagnostic infrastructure (enclosed in blue).

10 Solution Requirements Request-Response protocol between the node which experiences the failure and the peer nodes Nodes capability to perform diagnostic tests (probes), probe selection based on cost/result Encoding the dependency relationship into a decision tree (giving as an input from an expert e.g., as XML) Peer node discovery, based on Location (local network, domain) Capability to perform tests (based on specific tests) Dependency graph generation and updation, based on Network failure events Diagnostic test results correlated with failures

11 Test/ Probe Selection Which diagnostic probe to run – network layer or application layer and for what kind of failures. A probe covering broad range of failures can give faster and crude but less accurate results E.g. PING vs TCP Connect vs. SIP PING tests Cost of Probe

12 Dependency Classifications Functional dependency: At generic service level e.g. SIP proxy depends on DB service, DNS service Structural dependency Configuration time e.g. Columbia CS SIP proxy is configured to use mysql database on metro-north Operational dependency Runtime dependencies or run time bindings, e.g., the call which failed was using failover SIP server obtained from DNS which was running on host a.b.c.d in IRT lab

13 Dependency classifications: Layered Approach Vertical and Lateral dependencies: Applications depends on other application layer services (e.g., SIP service depends on DB, DNS service) as well as lower layer services OSI layers as service dependency layers Application layer service also depends on transport layer service which in turn depends on network layer service MAC layer: Access point, Switch Network layer: Router Application layer: DNS, SIP, Database Topology based dependency e.g., calls from CS domain depends on specific SIP server, calls from lab phones depends on specific switches and routers

14 Dependency Graph

15 Dependency Graph Encoded to Decision Tree A C B D A Failed, Use Decision Tree Yes Invokes Decision Tree for C No Yes Invokes Decision Tree for B Invokes Decision Tree for D Cause Not Known Report, Add new Dependency A B C D A = SIP Call C = SIP Proxy B = DNS Server D = Connectivity

16 Diagnostic Tests SIP proxy Proxy server availability SIP PING Call Routing availability Invite tests Call Path determination SIP TraceRoute Media path Quality related Speech quality degradation - MOS Echo jitter- MOS, PESQ QoS – RTCP NAT/Firewall Checking binding expiration. Firewall failure to open a port - One way media. How to determine which Firewall in the path ? SIP signaling ?

17 Diagnostic Tests DNS tests DHCP Switch/Router ARP/RARP/Multicast BGP failures Conference mixers Gateway Echo return loss- readings- Analysis DB XCAP server tests Presence service availability tests

18 Example Call Failure – Possible Causes SIP Proxy server Database Authentication Media path failure Gateway Specific call legs – ERL, Authentication, etc. DNS server failure End station failure Network failure, e.g., router, switch failure Different calls will have different run time dependencies

19 Mapping to a Human Medical System Doctors perform diagnostic tests to find out the cause of disease when the symptoms are mentioned – They may learn new things about the disease as a part of diagnostic tests Failures and triggered tests update the dependency graph Medical researchers do different types of tests to learn about new diseases, determine the cause and relationship of a disease with other physiological system Set of tests that can run periodically and can be used to build dependency graph independent of failures

20 Solution Evolution Learning the dependency graph from failure events and diagnostic tests Learning using random/periodic testing to identify failures and determine relationships

21 Future Directions Self healing Predicting failures Protocols for labeling event failures which would enable automatically incorporating new devices/applications to the dependency system Decision tree (dependency graph) based event correlation

22 Reference [1] User-oriented Management of VoIP Applications ( bs.de/projects/nmrg/meetings/2005/nancy /dyswis.pdf)