Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jigsaw: Solving the Puzzle of Enterprise Analysis

Similar presentations


Presentation on theme: "Jigsaw: Solving the Puzzle of Enterprise Analysis"— Presentation transcript:

1 Jigsaw: Solving the Puzzle of Enterprise 802.11 Analysis
11/16/2018 Jigsaw: Solving the Puzzle of Enterprise Analysis Yu-Chung Cheng John Bellardo, Mikhail Afanasyev, Patrick Verkaik, Jennifer Chiang, Peter Benko Alex C. Snoeren, Geoff Voelker, Stefan Savage Department of Computer Science & Engineering University of California, San Diego Yu-Chung Cheng/Qualcomm CR&D

2 The promise of Enterprise 802.11?
11/16/2018 The promise of Enterprise ? 11/16/2018 Yu-Chung Cheng/Qualcomm CR&D

3 Yu-Chung Cheng/Qualcomm CR&D
11/16/2018 A familiar story... “The wireless is being flaky.” “Flaky how?” “Well, my connections got dropped earlier and now things seem very sloooow.” “OK, we will take a look” Employee “Wait, wait … it’s ok now” “Mmm… well let us know if you have any more problems.” Now what? Support 11/16/2018 Yu-Chung Cheng/Qualcomm CR&D

4 Yu-Chung Cheng/Qualcomm CR&D
11/16/2018 What are the problems? Contention with nearby wireless devices? Bad AP channel assignments? Microwave ovens? Congestions in the Internet? Bad interaction between TCP and ? Rogue access points? Poor choice of APs (weak signal)? Incompatible user software/hardware? DoS attack?! Network admins are not paid enough to figure this out… 11/16/2018 Yu-Chung Cheng/Qualcomm CR&D

5 Why is this hard to understand?
11/16/2018 Why is this hard to understand? RF domain defies traditional networking intuition Wireless topology not well-modeled as a graph Asymmetry is common for all characteristics Packet loss, bandwidth, interference, etc. Variability in all characteristics caused by: Distance/mobility, orientation, temperature, RF workload, etc Automatic management: MAC, rate control, access point selection Huge inter-vendor variation Scale – lots of different RF domains Mobility management is complex The undeclared layer 2.5… L2 (assoc, scan, etc), ARP, DHCP, registration, etc 11/16/2018 Yu-Chung Cheng/Qualcomm CR&D

6 Goal: What’s going on in my network?
11/16/2018 Goal: What’s going on in my network? Real-time diagnosis of wireless network problems In a production network Identify components of delay at physical, link, network and transport layers Deconstruct full end-to-end behavior Interactions between environment, PHY/MAC, TCP/UDP Ultimately: understand the most important sources of performance problems and opportunities for improvement 11/16/2018 Yu-Chung Cheng/Qualcomm CR&D

7 New CSE building at UCSD
11/16/2018 New CSE building at UCSD 150k square feet 4 floors + basement >500 occupants 150 faculty/staff 350 students Building-wide WiFi 40 access points 802.11b/g Channel 1, 6, 11 active clients anytime Daily traffic ~10 GB 11/16/2018 Yu-Chung Cheng/Qualcomm CR&D

8 UCSD passive monitor system
11/16/2018 UCSD passive monitor system Overlays existing WiFi Series of passive sniffers Blanket deployment for best coverage 48 sensor pods (192 radios) 4 radios per pod (cover all channels in use) Captures/timestamps all activity (including physical errors) Stream back to centralized server (>6TB storage) 11/16/2018 Yu-Chung Cheng/Qualcomm CR&D

9 Yu-Chung Cheng/Qualcomm CR&D
11/16/2018 Jigsaw system Constructs single view of all activity Unifies frame views from all radios Transitive synchronization across all views (max dispersion ~10us; 80% within 5us) Reconstructs discrete L2, L3 and L4 state Inference of unseen events and host state (vantage point limitations) via protocol behavior Designed to make it easy to add analysis modules Physical fingerprints, contention inference, DHCP analysis, etc Easy to measure cross-layer interactions Yu-Chung Cheng, John Bellardo, Peter Benko, Alex C. Snoeren, Geoffrey M. Voelker, and Stefan Savage, Jigsaw: Solving the Puzzle of Enterprise Analysis, SIGCOMM 2006 11/16/2018 Yu-Chung Cheng/Qualcomm CR&D

10 Traces synchronization and unification
11/16/2018 Traces synchronization and unification Sniffers label packets w/ local timestamp (TSF) Need a global clock Estimate the offset between TSF and the global clock for each sniffer 11/16/2018 Yu-Chung Cheng/Qualcomm CR&D

11 Part of a Jigsaw trace (L1/L2)
11/16/2018 Part of a Jigsaw trace (L1/L2) Time Monitors Received frames Received, CRC error Traces synchronized Client 1 HW corrupted Client 2 11/16/2018 Yu-Chung Cheng/Qualcomm CR&D

12 Yu-Chung Cheng/Qualcomm CR&D
Jigsaw in Action Physical layer inference Link layer modeling Transport layer flow reconstruction End-to-end cross-layer diagnosis Media access problems Mobility management overhead 11/16/2018 Yu-Chung Cheng/Qualcomm CR&D

13 Hidden terminal interference
11/16/2018 Hidden terminal interference Co-channel interference from other transmitters For sender s and receiver r, estimate conditional probability of loss given simultaneous transmission by interferer i Current finding: hidden terminals not such a big deal (some exceptions) i ? r s Hidden-terminal: s sends data, r ‘s reception is interfered by i Normal: s sends data, r sends ACK 11/16/2018 Yu-Chung Cheng/Qualcomm CR&D

14 Broadband interference
11/16/2018 Broadband interference ~9 am 12-2 pm 11/16/2018 Yu-Chung Cheng/Qualcomm CR&D

15 Interference fingerprints
11/16/2018 Interference fingerprints Microwave oven: magnetron driven by half-wave voltage 60Hz Automatically detect and tag “microwave-like” physical interference 11/16/2018 Yu-Chung Cheng/Qualcomm CR&D

16 Link layer Contention: a challenge to measure
11/16/2018 Link layer Contention: a challenge to measure Three kinds of network events Directly observable: packet sent (easy) Directly inferable: packet received (harder) Indirectly inferable: packet delayed by contention (surprisingly tricky) Key issues Need to know input and output at each AP Need to model internal state of AP 11/16/2018 Yu-Chung Cheng/Qualcomm CR&D

17 Yu-Chung Cheng/Qualcomm CR&D
11/16/2018 Model Infer time at which packet is queued on AP (via wireline analysis) Ethernet serialization delay AP bus overhead (2 I/O) AP processing overhead Determine if previous packet had cleared AP (via wireless analysis) Head-of-line blocking (delay attributable to queuing) No head-of-line blocking (delay attributable to contention/MAC) Directly observed Inferred/Modeled 11/16/2018 Yu-Chung Cheng/Qualcomm CR&D

18 Access delay (Dacc) at an AP
11/16/2018 Access delay (Dacc) at an AP Contention beyond backoff Contention during DIFS convolved with pkt backoff Mandatory backoff for last pkt 0-15 slot times (20us ea) Distributed Inter-Frame Space (50us) 11/16/2018 Yu-Chung Cheng/Qualcomm CR&D

19 End-to-end cross-layer diagnoses
Media access problems Mobility overhead 11/16/2018 Yu-Chung Cheng/Qualcomm CR&D

20 Yu-Chung Cheng/Qualcomm CR&D
11/16/2018 Pathologies 802.11b faster than g Significant unsuccessful effort over 12 months by IT groups (and vendor) in understanding problem Issue Avaya AP only attempts one retry for g frames in “protection mode” High-rate transmissions more sensitive to noise Export many more losses to IP -> TCP backoff 11/16/2018 Yu-Chung Cheng/Qualcomm CR&D

21 Yu-Chung Cheng/Qualcomm CR&D
11/16/2018 Pathologies (2) Big L2 retry delay (> 10ms) Why? Broadcast frames have > 50ms avg delay Why? Same reason If any client request power-save mode then AP must buffer broadcast frames until beacon is sent Pending frame exchange is postponed until broadcast burst is completed 11/16/2018 Yu-Chung Cheng/Qualcomm CR&D

22 Yu-Chung Cheng/Qualcomm CR&D
Pathologies (3) 802.11g protection mode Used when b clients are present 802.11g client sends a pilot CTS-to-Self frame (slow) before data Overhead is about 100% air time Issue: We still have many 11b clients But most 11b traffic are bursty, no need to use protection all the time 11/16/2018 Yu-Chung Cheng/Qualcomm CR&D

23 Yu-Chung Cheng/Qualcomm CR&D
Pathologies (4) Lots of “vendor” hacks Do not respect CSMA Bursts packets in a row Early retransmission Do not wait for the full ACK time Do not respect protection mode Do not do exponential back-off (linear) Announce very large transmission duration Could mount DOS but not working in reality Do not increment sequence numbers 11/16/2018 Yu-Chung Cheng/Qualcomm CR&D

24 TCP diagnoses breakdown
Majors: slow receiver, AP retry bug, protection mode 11/16/2018 Yu-Chung Cheng/Qualcomm CR&D

25 Mobility management overhead
Around 30% of time is spent in mobility management (DHCP, ARP, association etc) 11/16/2018 Yu-Chung Cheng/Qualcomm CR&D

26 Yu-Chung Cheng/Qualcomm CR&D
11/16/2018 Pathologies (5) Large startup delays (10s of secs) Client requests DHCP lease for private address space ( /16) Wireless Management system (Verneir) grants address with short timeout and won’t refresh Client has to do two DHCP transactions with long timeouts between 11/16/2018 Yu-Chung Cheng/Qualcomm CR&D

27 Startup delays breakdown
Delay (seconds) Majors: (Gratuitous) ARPs + Scans 11/16/2018 Yu-Chung Cheng/Qualcomm CR&D

28 Yu-Chung Cheng/Qualcomm CR&D
11/16/2018 Where to next? Real-time system for automated detection and evaluation of poor network performance Identifies problem flows and isolates potential causes of poor performance City-wide network monitoring Currently deployed in a Bay-area metropolitan network Future: explore deployment and protocol fixes 11/16/2018 Yu-Chung Cheng/Qualcomm CR&D

29 Yu-Chung Cheng/Qualcomm CR&D
11/16/2018 Q & A Live traffic monitoring and more information at 11/16/2018 Yu-Chung Cheng/Qualcomm CR&D

30 Yu-Chung Cheng/Qualcomm CR&D
11/16/2018 Synchronization Create a virtual global clock To keep unification working Critical evidence for analysis If A and B are transmitting at the same time they could interfere If A starts transmitting after B has started then A can’t hear B Require fine time-scales (10-50us) NTP is >100 usec accuracy HW clocks (TSF) have 100PPM stability Time (s) TSF diff (us) TSF diff of two sniffers 11/16/2018 Yu-Chung Cheng/Qualcomm CR&D

31 Trace unification (ideal)
11/16/2018 Trace unification (ideal) Time 11/16/2018 Yu-Chung Cheng/Qualcomm CR&D

32 Trace unification (reality)
11/16/2018 Trace unification (reality) Jigsaw unified trace JFrame 1 JFrame 2 JFrame 3 Time JFrame 4 JFrame 5 11/16/2018 Yu-Chung Cheng/Qualcomm CR&D

33 Challenge: sync at large-scale
11/16/2018 Challenge: sync at large-scale 1 2 3 4 To ∆t1 ∆t2 How to bootstrap? Goal: estimate the offset between TSF and the global clock for each sniffer Time reference from one sniffer to the other Sync across channels Dual radios on same sniffer slaved to same clock Manage TSF clock skews Continuously re-adjust offsets when unifying frames 11/16/2018 Yu-Chung Cheng/Qualcomm CR&D

34 Jigsaw syncs 99% frames < 10us
11/16/2018 Jigsaw syncs 99% frames < 10us Measure sync. quality by max dispersion per Jframe 10 us is important threshold back-off time is 20 us inter frame time is 50 us Sufficient to infer many events 11/16/2018 Yu-Chung Cheng/Qualcomm CR&D

35 Yu-Chung Cheng/Qualcomm CR&D
11/16/2018 Sensor pods Pod = pair of monitors Separated ~1 meter >35dB separation at 2.4Ghz Monitor = Soekris 266Mhz 586 class CPU 128MB RAM, 64MB Flash 100Mbps Ethernet Dual Atheros a/b/g radios Power-over-Ethernet (semi-std) Jigdump software Captures/timestamps all activity (including physical errors) Stream back to centralized server (>6TB storage) 11/16/2018 Yu-Chung Cheng/Qualcomm CR&D


Download ppt "Jigsaw: Solving the Puzzle of Enterprise Analysis"

Similar presentations


Ads by Google