9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung.

Slides:



Advertisements
Similar presentations
IEEE INFOCOM 2004 MultiNet: Connecting to Multiple IEEE Networks Using a Single Wireless Card.
Advertisements

Hidden Terminal Problem and Exposed Terminal Problem in Wireless MAC Protocols.
Cs/ee 143 Communication Networks Chapter 6 Internetworking Text: Walrand & Parekh, 2010 Steven Low CMS, EE, Caltech.
Emerging Technologies in Wireless LANs. Replacement for traditional Ethernet LANs Several Municipalities Portland, OR Philadelphia, PA San Francisco,
Wireless in the real world Gabriel Weisz October 8, 2010.
Overview r Ethernet r Hubs, bridges, and switches r Wireless links and LANs.
1 Solutions to Performance Problems in VOIP over Wireless LAN Wei Wang, Soung C. Liew Presented By Syed Zaidi.
20 – Collision Avoidance, : Wireless and Mobile Networks6-1.
11 Diagnosing and improving Wireless Networks Patrick Verkaik Mikhail Afanasyev Yuvraj Agarwal Yu-Chung Cheng Stefan Savage Alex C. Snoeren Geoffrey.
Do You See What I See (DYSWIS) Aditya Muthyala (am3551) School of Engineering and Applied Science Columbia University, Fall 2011.
6: Wireless and Mobile Networks6-1 Chapter 6: Wireless and Mobile Networks Background: r # wireless (mobile) phone subscribers now exceeds # wired phone.
1 Elements of a wireless network network infrastructure wireless hosts r laptop, PDA, IP phone r run applications r may be stationary (non- mobile) or.
11 Automating Cross-Layer Diagnosis of Enterprise Wireless Networks Yu-Chung Cheng Mikhail Afanasyev Patrick Verkaik Jennifer Chiang Alex C. Snoeren.
1. 2 Enterprise WLAN setting 2 Vivek Shrivastava Wireless controller Access Point Clients Internet NSDI 2011.
A victim-centric peer-assisted framework for monitoring and troubleshooting routing problems.
5-1 Data Link Layer r What is Data Link Layer? r Wireless Networks m Wi-Fi (Wireless LAN) r Comparison with Ethernet.
Jigsaw: Solving the Puzzle of Enterprise Analysis Yu-Chung Cheng John Bellardo, Peter Benko, Alex C. Snoeren, Geoff Voelker, Stefan Savage.
Maintaining and Updating Windows Server 2008
Low Latency Wireless Video Over Networks Using Path Diversity John Apostolopolous Wai-tian Tan Mitchell Trott Hewlett-Packard Laboratories Allen.
Junxian Huang 1 Feng Qian 2 Yihua Guo 1 Yuanyuan Zhou 1 Qiang Xu 1 Z. Morley Mao 1 Subhabrata Sen 2 Oliver Spatscheck 2 1 University of Michigan 2 AT&T.
INTRUSION DETECTION SYSTEMS Tristan Walters Rayce West.
Wireless MESH network Tami Alghamdi. Mesh Architecture – Mesh access points (MAPs). – Mesh clients. – Mesh points (MPs) – MP uses its Wi-Fi interface.
Presenter: Chi-Hung Lu 1. Problems Distributed applications are hard to validate Distribution of application state across many distinct execution environments.
6: Wireless and Mobile Networks6-1 Elements of a wireless network network infrastructure wireless hosts r laptop, PDA, IP phone r run applications r may.
Data Communications and Networks
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public ITE PC v4.0 Chapter 1 1 Troubleshooting Your Network Networking for Home and Small Businesses.
6: Wireless and Mobile Networks6-1 Chapter 6 Wireless and Mobile Networks Computer Networking: A Top Down Approach Featuring the Internet, 3 rd edition.
Adapted from: Computer Networking, Kurose/Ross 1DT066 Distributed Information Systems Chapter 6 Wireless, WiFi and mobility.
ECE 4450:427/527 - Computer Networks Spring 2015
High Performance, Easy to Deploy Wireless. Agenda Foundry Key Differentiators Business Value Product Overview Questions.
Wi-Fi Wireless LANs Dr. Adil Yousif. What is a Wireless LAN  A wireless local area network(LAN) is a flexible data communications system implemented.
CS640: Introduction to Computer Networks Aditya Akella Lecture 22 - Wireless Networking.
Divert: Fine-grained Path Selection for Wireless LAN Allen Miu, Godfrey Tan, Hari Balakrishnan, John Apostolopoulos * MIT Computer Science and Artificial.
Wireless LAN Advantages 1. Flexibility 2. Planning 3. Design
Wave Relay System and General Project Details. Wave Relay System Provides seamless multi-hop connectivity Operates at layer 2 of networking stack Seamless.
2006/12/04NTU/nslab1 Jigsaw: Solving the Puzzle of Enterprise Analysis Yu­Chung ChengYu­Chung Cheng, John Bellardo, P´eter Benk¨o Alex C. Snoeren,
1 Architecture and Techniques for Diagnosing Faults in IEEE Infrastructure Networks Atul Adya, Victor Bahl, Ranveer Chandra, Lili Qiu Microsoft.
© 2006 Cisco Systems, Inc. All rights reserved. Optimizing Converged Cisco Networks (ONT) Module 6: Implement Wireless Scalability.
6: Wireless and Mobile Networks6-1 Chapter 6 Wireless and Mobile Networks Computer Networking: A Top Down Approach Featuring the Internet, 3 rd edition.
Wireless Access avoid collisions: 2 + nodes transmitting at same time CSMA - sense before transmitting –don’t collide with ongoing transmission by other.
Layer 2 Technologies At layer 2 we create and transmit frames over communications channels Format of frames and layer 2 transmission protocols are dependent.
Wireless Mesh Network 指導教授:吳和庭教授、柯開維教授 報告:江昀庭 Source reference: Akyildiz, I.F. and Xudong Wang “A survey on wireless mesh networks” IEEE Communications.
Computer Networks Performance Metrics. Performance Metrics Outline Generic Performance Metrics Network performance Measures Components of Hop and End-to-End.
Written by Yu-Chung Cheng, John Bellardo, Peter Benko, Alex C. Snoeren, Geoffrey M. Voelker and Stefan Savage Written by Yu-Chung Cheng, John Bellardo,
MOJO: A Distributed Physical Layer Anomaly Detection System for WLANs Richard D. Gopaul CSCI 388.
SHAWN CROWE LTEC /026 ASSIGNMENT #3 Networking Components.
Wireless and Mobility The term wireless is normally used to refer to any type of electrical or electronic operation which is accomplished without the use.
CROSS-LAYER OPTIMIZATION PRESENTED BY M RAHMAN ID:
Networking Fundamentals. Basics Network – collection of nodes and links that cooperate for communication Nodes – computer systems –Internal (routers,
1 Evaluating NGI performance Matt Mathis
WIRELESS COMMUNICATION Husnain Sherazi Lecture 1.
Department of Electronic Engineering City University of Hong Kong EE3900 Computer Networks Protocols and Architecture Slide 1 Use of Standard Protocols.
1 Chapter 4. Protocols and the TCP/IP Suite Wen-Shyang Hwang KUAS EE.
CO5023 Wireless Networks. Varieties of wireless network Wireless LANs: the main topic for this week. Consists of making a single-hop connection to an.
Wireless Mesh Networks Myungchul Kim
TCP continued. Discussion – TCP Throughput TCP will most likely generate the saw tooth type of traffic. – A rough estimate is that the congestion window.
Optimization Problems in Wireless Coding Networks Alex Sprintson Computer Engineering Group Department of Electrical and Computer Engineering.
1 Chapter 4 MAC Layer – Wireless LAN Jonathan C.L. Liu, Ph.D. Department of Computer, Information Science and Engineering (CISE), University of Florida.
© ExplorNet’s Centers for Quality Teaching and Learning 1 Select appropriate hardware for building networks. Objective Course Weight 2%
Dirk Grunwald Dept. of Computer Science, ECEE and ITP University of Colorado, Boulder.
CIS 700-5: The Design and Implementation of Cloud Networks
Outline What is Wireless LAN Wireless Transmission Types
Solving Real-World Problems with Wireshark
Instructor Materials Chapter 9: Testing and Troubleshooting
ETHANE: TAKING CONTROL OF THE ENTERPRISE
Chapter 6 Wireless and Mobile Networks
CS 457 – Lecture 7 Wireless Networks
Jigsaw: Solving the Puzzle of Enterprise Analysis
Building A Network: Cost Effective Resource Sharing
IEEE Wireless Local Area Networks (RF-LANs)
Presentation transcript:

9/19/20151 Automating Cross-Layer Diagnosis of Enterprise Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung Cheng, Jennifer Chiang, Patrick Verkaik, Alex Snoeren, Stefan Savage

2 9/19/2015 Diagnosing distributed systems  Simple systems Few components Inputs/Output observed Cause of failure usually obvious  Distributed systems Many interdependent components Hard to monitor all interactions Cause of failure/degradation is non-obvious

3 9/19/2015 The promise of enterprise Blanket AP coverage = seamless connectivity

4 9/19/2015

5 A familiar story... “The wireless is being flaky.” “Flaky how?” “Well, my connections got dropped earlier and now things seem very sloooow.” “OK, we will take a look” “Wait, wait … it’s ok now” “Mmm… well let us know if you have any more problems.” User Support

6 9/19/2015 Our story: new CSE building at UCSD  150k square feet 4 floors + basement >500 occupants  Building-wide WiFi 40 APs (802.11b/g)  Channel 1, 6, 11  Users complain about wireless performance since we moved in July 2005 Admins and vendors can not solve the issues

7 9/19/2015 Why is it hard to figure out?  Problems can be in anywhere Across layers – protocols  Even in the same layer – {a,b,f,g,h,i,n,s} Software incompatibilities – vendor variations Transient or persistent - time Radio propagates in free space - locations Radio spreads across channels – frequencies  Shared spectrum makes it worse APs bridge wireless and wired worlds – infrastructure  To diagnose Gather data everywhere Analyze across all layers  Want a system to do this job automatically

8 9/19/2015 Better world The wireless is being flaky User Your SSH has over 200ms response time in average, 8% TCP packet is lost due to the interferences from the microwave oven nearby This problem is logged for sys admins

9 9/19/2015 Shaman  Goal: Develop a system to automatically diagnose problems in wireless networks  Pervasive data collection (Jigsaw) Extensive passive monitoring system  Observe all transmissions across locations, channels, and time Provides a unified synchronized trace of every packet transmission  Explicitly model protocols on critical path DHCP, MAC, TCP, etc. Provides complete delay and loss breakdown  For every packet transmission, all protocol stages  Framework for diagnostic tools Use model outputs to determine root cause of problems Users can query on demand, also alert admins

10 9/19/2015 Shaman system architecture Trace sync & merging Gather and merge traces from monitors into one global trace Protocol modeling Infer protocol states Critical path diagnosis Identify problems on the critical path Do all in real-time Wired gateway monitor Wireless monitor …

11 9/19/2015 Why pervasive monitoring?  Protocol states are often not directly observable Inferred from packet traces and protocol state machines Packet delay and losses  PHY/MAC interactions with each other and the environment  Capturing all wireless events provide the ground truth to model protocol states Require a global perspective = one clock Require high resolution timestamp for timing analysis  How? DHCP req DHCP rsp

12 9/19/2015 Jigsaw passive monitor system  Overlays existing WiFi network Series of passive monitors Blanket deployment for best coverage  Monitor PoE box w/ 266Mhz P MB ram 2 b/g radios  96 monitors (192 radios) Monitors are paired in each location  Covering all channels in use Captures all activity (including PHY/CRC errors) Stream back to centralized storage

13 9/19/2015 Trace merging (ideal) Time

14 9/19/2015 Not all monitors see all packets

15 9/19/2015 Trace merging (reality) Time Time (s) Clock diff (us)

16 9/19/2015 Challenge 1: sync at 10us precision  Why 10us precision? Critical evidence for layer analysis  channel access mechanism Carrier-sense multiple access (CSMA)  Channel busy  wait  Channel idle  send Timing unit is ~10us  Precise trace timestamps reveal internal states Ex1: if A and B send at same time, they could interfere  A can’t hear B Ex2: if A sends right after B’s transmission  A can hear B  How? Create a global clock Monitors timestamp packets w/ local HW clocks  HW clocks has 1us granularity Estimate the offset between local and global clock for each monitor

17 9/19/2015 Challenge 2: sync across 192 radios  Goal: estimate the offset between local and global clock for each monitor Time route from one monitor to the other  Sync across channels Ch. 1 monitor does not hear packet sent in ch. 6. Dual radios on same monitor slaved to same clock ToTo ∆t 1 ∆t Jigsaw: Solving the Puzzle of Enterprise Analysis Cheng, Bellardo, Benko, Snoeren, Voelker, and Savage SIGCOMM 2006

18 9/19/2015 Trace merging (reality) Time Frame 1Frame 4Frame 5Frame 3Frame 2 Shaman sync’d trace

19 9/19/2015 Part of a sync’d trace Traces synchronized User 1 User 2

20 9/19/2015 Shaman system architecture Trace sync & merging Gather and merge traces from monitors into one global trace Protocol modeling Infer protocol states Critical path diagnosis Identify problems on the critical path Wired gateway monitor Wireless monitor …

21 9/19/2015  Now we have fully sync’d global traces What protocols must we model?  Critical path Mobility management  Scan/associate w/ AP  DHCP  ARP  Portal page login Data transport protocols  TCP  Mac delay/loss Modeling protocols

22 9/19/2015 Mobility Management  Create the illusion of a single AP Proprietary system w/ site specific policy  Most components are simple protocols 1-2 request-response transactions  Easier to model compared to TCP Very reliable in wired network  ARP, DHCP, DNS  Seldom suspected as the culprits in wireless  Users expect seamless connectivity People often suspend/resume laptops while moving in the office building

23 9/19/2015 Mobility overhead in UCSD CSE Major Problem: (Gratuitous) ARPs & Scans

24 9/19/2015 Protocol Modeling  Distribution is not enough Want to diagnose any user’s problem by finding the root cause  Need to track per packet delay and loss Essential to model e2e protocols like TCP  Complex mechanisms to accommodate delay and loss  Example: slow SSH response  high TCP losses  most retries failed  microwave ovens operating nearby

25 9/19/2015 The journey of a packet in Wireless gateway CNN.com APUser Time Wired packet Data Ack Queuing Channel busy Exp. Backoff

26 9/19/2015 Modeling packet delays  Emulate AP queue Based on input/output events  Events observed directly Ethernet packet on wires data/ack on wireless  Need to infer when a packet … Reaches head of TxQ Is scheduled to the TxQ Is received by the AP Directly observed Inferred/Modeled AP

27 9/19/2015 Applying delays to TCP diagnosis  Scenario 3 users downloading same large tar ball through same AP from the CSE website 1 user complains about download performance in spite of having 54Mbps g connectivity  Major performance bottleneck is queue competition

28 9/19/2015 Modeling packet losses  Delay alone is not enough for diagnosis  Loss is another major factor performs retransmission on loss Loss happens on both ways  Data or Ack Must model conversations  Loss causes Attenuation (e.g. not enough signal strength) Interference from other devices (hidden- terminals) Interference from other devices in 2.4GHz

29 9/19/2015 Broadband interference ~9 am12-2 pm

30 9/19/2015 Interference fingerprints

31 9/19/2015 TCP performance measures  Want to measure TCP performance bottlenecks Compares actual goodput with (modeled) ideal goodput [JP98]  Major problems in UCSD CSE TCP bulk flows 30% small receiver window 19% AP retry bug 30% AP b/g compatibility policy (protected mode) Rcv wnd = 0 Internet Delay/Loss Wireless Loss Over- protected Best Rate? Automating Cross-Layer Diagnosis of Enterprise Wireless Networks Cheng, Afanasyev, Benko, Verkaik, Snoeren, Voelker, and Savage. SIGCOMM Wireless Delay

32 9/19/2015 Putting everything together Critical path diagnosis DHCP ARP TCP Diagnose Delay/Loss Broadband Interference Scan/Association

33 9/19/2015 System status  Real-time monitoring and diagnosis of UCSD CSE wireless network 30 seconds delay  Serving UCSD CSE wireless users Resolved 67 tickets  Validated manually Discovered various implementation bugs and protocol problems  Only retry once  Do not respect CSMA, burst frames in a row  Very large transmission duration  Overly conservative g protection policy  … Working w/ vendors and admins to fix AP bugs  Re-deployed in city mesh network in the bay area

34 9/19/2015 Conclusions  Wireless diagnosis is challenging Especially for large enterprise network Need to check a lot of factors Need a system to the job automatically  Shaman: an automatic comprehensive wireless diagnosis system Large-scale 24x7 monitoring High resolution synchronization Models protocol states on the critical path  Mobility management  TCP  delay and losses Automatically diagnose user problems in real-time

35 9/19/2015 Q & A  Source code, data, live traffic monitoring sysnet.ucsd.edu/wireless/ sysnet.ucsd.edu/wireless/

36 9/19/2015

37 9/19/2015 Related Work  WiFiProfiler [Mobisys06] Peer diagnosis among clients  DAIR [NSDI07] Distributed monitors but application-spec traffic summaries  No centralized merging/sync Fine-grained location system  Wit [SIGCOMM06] Automatic protocol states inference engine  Airmagnet Troubleshooting user problems (PHY/MAC)  Detect interferences, security problems, protocol incompatibilities Special devices to perform active probes  Airtight/Airdefense/Kismet Detect rogue APs and security problems

38 9/19/2015 Other work  Metropolitan-scale Wi-Fi location system Cheng, Chawathe, LaMarca, Krumm. Mobisys 2005  Monkey See, Monkey Do: A tool for TCP Tracing and Replaying Cheng, Hoezle, Cardwell, Savage, Voelker. USENIX 2004  Fatih: Detecting and Isolating Malicious Routers Mizrak, Cheng, Marzullo, Savage. DSN 2005  Total Recall: System Support for Automated Availability Management Bhagwan, Tati, Cheng, Savage, Voelker. NSDI 2003