Jigsaw: Solving the Puzzle of Enterprise Analysis Yu-Chung Cheng John Bellardo, Peter Benko, Alex C. Snoeren, Geoff Voelker, Stefan Savage
2 Enterprise ? Easy. Blanket the building with APs for 100% coverage
3 A familiar story... “The wireless is being flaky.” “Flaky how?” “Well, my connections got dropped earlier and now things seem very sloooow.” “OK, we will take a look” “Wait, wait … it’s ok now” “Mmm… well let us know if you have any more problems.” Now what? Employee Support
4 What are the problems? Contention with nearby wireless devices? Bad AP channel assignments? Microwave ovens? Congestions in the Internet? Bad interaction between TCP and ? Rogue access points? Poor choice of APs (weak signal)? Incompatible user software/hardware? DoS attack?! …… Need to monitor the wireless network across time, locations, channels, and protocol layers
5 How to monitor ? MeasurementLimitations AP tracesOnly packets that AP sees 1 passive snifferLimited coverage N passive sniffers in 1 channel Limited frequency (roaming, broadband interference, AP channel assignments) N passive sniffers of all channels Need synchronized traces
6 Jigsaw Measure real large wireless networks Collect every possible information PHY/Link/IP/TCP/App layer trace Collect every single wireless packet Need many sniffers for 100% coverage Provide global view of wireless networks across time, locations, channels, and protocol layers
7 New CSE building at UCSD 150k square feet 4 floors >500 occupants 150 faculty/staff 350 students Building-wide WiFi 39 access points b/g Channel 1, 6, 11 active clients anytime Daily traffic ~5 GB
8 UCSD passive monitor system Overlays existing WiFi network Series of passive sniffers Blanket deployment over 4 floors 39 sensor pods (156 radios) 4 radios per pod, cover all channels in use Captures all activities Including CRC/PHY events Stream back over wired network to a centralized storage
9 Jigsaw design Traces synchronization and unification L2 state reconstruction TCP flow reconstruction
10 Synchronization Create a virtual global clock To keep unification working Critical evidence for analysis If A and B are transmitting at the same time they could interfere If A starts transmitting after B has started then A can’t hear B Require fine time-scales (10-50us) NTP is >100 usec accuracy HW clocks (TSF) have 100PPM stability Time (s) TSF diff (us) TSF diff of two sniffers
11 Traces synchronization and unification Sniffers label packets w/ local timestamp (TSF) Need a global clock Estimate the offset between TSF and the global clock for each sniffer
12 Trace unification (ideal) Time
13 Trace unification (reality) Time JFrame 1JFrame 4JFrame 5JFrame 3JFrame 2 Jigsaw unified trace
14 Challenge: sync at large-scale How to bootstrap? Goal: estimate the offset between TSF and the global clock for each sniffer Time reference from one sniffer to the other Sync across channels Dual radios on same sniffer slaved to same clock Manage TSF clock skews Continuously re-adjust offsets when unifying frames ToTo 1234 ∆t 1 ∆t 2
15 Jigsaw in action Jigsaw unifies 156 traces into one global trace Covers 99% of AP frames, 96% of client frames StartsJan 24,2006 (Tuesday) Duration24 hr Total APs107 (39 CSE) CSE Clients1026 Active CSE clients anytime Total Events2,700M PHY/CRC Errors48% Valid Frames52% JFrames530M Events per Jframe 2.97
16 L2-ACK Beacon Synchronized Valid packets CRC errors PHY errors
17 Jigsaw syncs 99% frames < 20us Measure sync. quality by max dispersion per Jframe 20 us is important threshold back-off time is 20 us inter frame time is 50 us Sufficient to infer many events
18 Hidden terminal problems Infer transmission failure by absence of ACK Estimate conditional probability of loss given simultaneous transmission by some hidden- terminal senderreceiverhidden terminal How much packet is lost due to hidden- terminal? ?
19 Hidden Terminal Problems 10% of sender-receiver pairs have over 10% losses due to hidden terminals
20 Trace analysis b/g interactions ARP Broadcast Storms TCP loss rate in wireless vs. in Internet Microwave Ovens
21 Moving forward Developed “Jigsaw” that allows 24x7 monitor system in UCSD CSE w/ 156 sniffers Global fine-grained view of large wireless network (time, locations, channels) Jigsaw software will be available shortly Ongoing work Root cause diagnoses of end-to-end performance in wireless networks Standard wireless problem analysis Ex. Exposed terminal problems
22 Q & A Live traffic monitoring and more information at