Srinivas Narayana MIT CSAIL October 7, 2016

Srinivas Narayana MIT CSAIL October 7, 2016
Co-Designing Software and Hardware for Declarative Performance Measurement Srinivas Narayana MIT CSAIL October 7, 2016

An example: High tail latencies
Delay completion of flows (and applications)

Where is the queue buildup? How did queues build up? UDP on-off traffic? Fan-in? Which other flows cause queue buildup? Throttle UDP? Traffic pattern?

What measurement support do you need? Where is the queue buildup? How did queues build up? UDP on-off traffic? Fan-in? Which other flows cause queue buildup? Throttle UDP? Traffic pattern?

Existing measurement support
Sampling (NetFlow, sFlow) May not sample packets/events you care about  Counting (OpenSketch, UnivMon, …) Only traffic counting Time granularity too coarse  Packet capture (Endace, Niksun, Fmadio, …) Too much data to collect everywhere & always  Endpoint data collection (Pingmesh, …) Data distributed over several hosts  Insufficient network visibility  Network Specify concrete examples too (netflow / …) Host

Network performance questions
Flow-level packet drop rates Queue latency EWMA per connection Route flapping incidents Persistently long queues TCP incast and outcast Interference from bursty traffic Incidence of TCP reordering and retransmissions Understanding congestion control schemes Incidence and lengths of flowlets High end to end latencies ...

Can we build better performance monitoring tools for networks? 

Switches have precise visibility into network conditions…

Performance monitoring on switches
(+) Precise visibility of performance (e.g., queue buildup, flows, …) (+) Speed (line rates of ports of Gb/s/port) (-) Costly and time-consuming to build hardware (2-3 years) Ideally, new hardware should suit diverse measurement needs… (XXX possible to add cost numbers?)

Co-design software and hardware
(1) Design declarative query language to ask performance questions (2) Design hardware primitives to support query language at line rate Does programmability come at the cost of speed?

Performance query system
Declarative performance queries Network operator Cat picture credit: Accurate query results with low overhead Diagnostic apps

(1) Declarative language abstraction
Write SQL-like performance queries on an abstract table: (Packet headers, Queue id, Queue size, Time at queue ingress, Time at queue egress, Packet path info) For all packets at all queues

(1) Example performance queries
Queues and traffic sources with high latency SELECT srcip, qid FROM T WHERE tout - tin > 1ms Traffic counters per source & destination SELECT srcip, dstip, count, sum_len GROUPBY srcip, dstip Queue latency EWMA SELECT 5tuple, qid, ewma GROUPBY 5tuple, qid def ewma (lat_est, (tin, tout)): lat_est = alpha * lat_est + (1 - alpha) * (tout-tin) User-defined fold functions Packet ordering matters (recent queue sizes more significant)

(2) Hardware primitives
Good news: Many existing primitives prove useful! Selection: Match-action rules Per-packet latency and queue data: In-band network telemetry (INT)

(2) Hardware support for aggregation
SELECT 5tuple, ewma GROUPBY 5tuple 5tuple EWMA … Run a key-value store on switches? K-V store supports read-modify-write operations

(2) Challenges in building switch K-V
Run at line rate 1GHz packet rate Scale to millions of keys e.g., number of connections No existing memory is fast and large enough!

(2) “Split” key-value store

Need more information? See our upcoming HotNets ‘16 paper!

Why performance monitoring?
Determine network performance bottlenecks Exonerate network as source of problems New proposals for scheduling, congestion control, … what works?

Semantically useful language primitives
Per-packet performance attributes: latency, loss Isolate traffic with “interesting” performance Aggregate performance stats over sets of packets “Simultaneous” occurrences of multiple conditions Compose results to pose more complex questions

The SRAM cache

Does caching lead to correct results?
In general, no “Forget” keys and their values when evicted Keys previously evicted can come back! Merging SRAM and DRAM may eventually produce correct results

Merging example … SELECT COUNT, SUM(pkt_len) GROUPBY srcip, dstip
True COUNT = COUNTSRAM + COUNTDRAM True SUM(pkt_len) = SUMSRAM + SUMDRAM Srcip, Dstip Count Sum(pkt_len) …

Linear-in-state condition
State updates of the form S = A * S + B A, B: functions on a bounded history of packets

Linear-in-state update
correctDRAM = finalSRAM – An*defaultSRAM + An*previousDRAM previousDRAM correctDRAM State defaultSRAM finalSRAM Updates

Current work: Compiler
Detecting linear-in-state condition Queries that use multiple key-value stores Network-wide queries Nested queries An equivalent of linear-in-state for multiple tables?

How YOU can help Useful performance diagnostics questions?
How would you evaluate this system? What’s a reasonable prototype?

Srinivas Narayana MIT CSAIL October 7, 2016

Similar presentations

Presentation on theme: "Srinivas Narayana MIT CSAIL October 7, 2016"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Srinivas Narayana MIT CSAIL October 7, 2016

Similar presentations

Presentation on theme: "Srinivas Narayana MIT CSAIL October 7, 2016"— Presentation transcript:

Similar presentations

About project

Feedback