Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Root-Cause VoIP Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions.

Similar presentations


Presentation on theme: "1 Root-Cause VoIP Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions."— Presentation transcript:

1 1 Root-Cause VoIP Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

2 2 Business disconnect Why is VoIP troubleshooting so hard? Troubleshooting methodology Tool selection Finding the root-cause Achieving Total Network Visibility Agenda

3 3 You’re responsible for the entire VoIP Infrastructure Most telecom engineers know less about their network’s health and performance than their user community You can’t manage what you can’t measure -- Peter Drucker Business Disconnect

4 4 Business Reasons Networks are getting more complex Less staff remains to support the network Technical Reasons Proper methodology is not utilized Wrong tools are employed Why is VoIP Troubleshooting so Hard?

5 5 What graduates a junior level Engineer to a senior level Engineer is their troubleshooting methodology Troubleshooting Methodologies

6 6 “Do something to try to fix the problem” Reboot the device Change the network settings Replace hardware Re-install the OS Bad Methodology

7 7 Collect information Verify Original Problem is Solved and no new problems exist Create hypothesis Test hypothesis Implement fix Document fix Notify users Undo changes Good Methodology

8 8 Types of Tools Packet analyzers/capture Application Performance Monitoring (Call Simulation) CDR Analysis Tools SNMP Collectors Tool Selection

9 9 Results of VoIP Call Latency: 127ms Jitter: 87ms Packet loss: 8.2% Packet Capture Actual VoIP Call You have confirmation that there is a problem, but no idea which device or link caused the packet loss Using a sniffer to solve a call quality problem Packet Capture

10 10 Good for: Confirming packet loss (Are we missing packets?) Confirming packet contents issues (No QoS tagging on packets when there should be) Determining application-level issues (Source and destination IP and ports used for a session) Bad for: Finding physical, data-link, or network issues Finding bandwidth limitations Finding device limitations Packet Capture

11 11 Results of Simulation Latency: 127ms Jitter: 87ms Packet loss: 8.2% Agent Simulated VoIP Call You have knowledge of the experience across the network, but no understanding of the source or cause of the problem. Using call simulation to determine performance Agent Application Performance Monitoring

12 12 Good for: Measuring user experience across the network (Are we having problems right now?) Bad for: Finding physical, data-link, or network issues Finding bandwidth limitations Finding device limitations Application Performance Monitoring

13 13 Call from x43 to x53 at 2:45pm 8.3% packet loss 46ms jitter CDR Collector Actual VoIP Call You have knowledge of a VoIP call and its perception of call quality, but no understanding of where or why there was a problem. Using Call Detail Records to determine VoIP usage CDR Record CDR Analysis Tools PBX

14 14 Good for: Confirming a VoIP problem Bad for: Finding physical, data-link, or network issues Finding bandwidth limitations Finding device limitations CDR Analysis Tools

15 15 Results of Collection WAN link is overloaded at 2:35pm SNMP Collector Actual VoIP Call You have data about conditions on some parts of the network, but no analysis of the problem or correlation to events Collecting information from switches and routers to discover faults SNMP Collectors

16 16 Good for: Tracking packet loss per interface/device (Are we dropping packets on a link? why?) Monitoring device and link resource limitations (Are we over-utilizing a link? Is the router CPU pegged?) Bad for: Determining who is using the network Finding application layer problems SNMP Collectors

17 17 Step 1: Identify the involved endpoints and where they are connected into the network Poor Quality VoIP Call Finding the Root-Cause

18 18 Step 2: Identify the full layer-2 path through the network from the first phone to the second phone Finding the Root-Cause

19 19 Step 3: Investigate involved switch and router health (CPU & Memory) for acceptable levels Finding the Root-Cause

20 20 Step 4: Investigate involved interfaces for: VLAN assignment DiffServe/QoS tagging Queuing configuration 802.1p Priority settings Duplex mismatches Cable faults Half-duplex operation Broadcast storms Incorrect speed settings Over-subscription TRANSIENT PROBLEM WARNING: If the error condition is no longer occurring when this investigation is performed, you may not catch the problem Finding the Root-Cause

21 21 In a perfect world, you want: Monitoring of:  Every switch, router, and link in the entire infrastructure  All error counters on the interfaces  QoS configuration and performance Continuous collection of information Automatic layer-1, 2, and 3 mapping from any IP endpoint to any other IP endpoint Problems identified in plain-English for rapid remediation This is what PathSolutions TotalView does Optimizing the Methodology

22 22 Install TotalView Result: One location is able to monitor all devices and links in the entire network for performance and errors All Switches and Routers are queried for information Deployment

23 23 Broad: All ports on all routers & switches Continuous: Health collected every 5 minutes Deep: 18 different error counters collected and analyzed Network Prescription engine provides plain- English descriptions of errors: Total Network Visibility® “This interface is dropping 12% of its packets due to a cable fault”

24 24 Establish Baseline of Network Health 7% Loss from cabling fault 12% Loss from Alignment Errors Results Within 12 Minutes 28% Loss from Duplex mismatch 11% Loss from Jumbo Frame Misconfiguration

25 25 Repair Issues 7% Loss from cabling fault 12% Loss from Alignment Errors Results Within 12 Minutes 28% Loss from Duplex mismatch 11% Loss from Jumbo Frame Misconfiguration

26 26 Investigate a call quality problem between x43 and x51 that happened around 2:35pm Path Analysis Report 2:36pm 18% Loss from Cable Fault

27 27 Demo

28 28 Don’t turtle your network

29 29


Download ppt "1 Root-Cause VoIP Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions."

Similar presentations


Ads by Google