Download presentation
Presentation is loading. Please wait.
Published byEmilee Gregson Modified over 9 years ago
1
Naiad Timely Dataflow System Presented by Leeor Peled Advanced Topics in Computer Architecture April 2014
2
Greece, sometime B.C. Naiads (Ναϊάδες) were a type of nymph (female spirit) who presided over fountains, wells, springs, streams, brooks and other bodies of fresh water. Dryads (Δρυάδες) are tree nymphs, or female tree spirits There’s one hiding over here!
3
Dryad (EuroSys’07)
4
Distributed dataflow frameworks Batch processing MapReduce, Dryad, Spark Deterministic, stateless, synchronous Stream processing Storm, Millwheel, Timestream Asynchronous Graph processing Pregel, Graphlab, Giraph
5
Graph descriptive language
6
Timely Dataflow – from 10k ft ”Naiad” (SOSP`13) Framework no longer requires a DAG Nodes are stateful, and extended with epoch-based timestamps Allows simple loop contexts. Can preserve datasets No global coordination on critical path (some lazy “GC”) Choice between immediate responsiveness and aggregated syncs Better suited to implement graph algorithms over big- data streaming input …and it’s open sourced!
7
Processing example
8
For comparison.. Map/Reduce model
9
Timeliness in depth
10
Timeliness (2)
11
Timeliness (3) Message passing API: Callbacks: OnRecv(Edge, Message, Timestamp) OnNotify(Timestamp) Calls: SendBy(Edge, Message, Timestamp) NotifyAt(Timestamp) Restrictions / guarantees: Messages are queued and may reorder OnNotify guaranteed to occur after all OnRecv calls (to the same vertex) with prior times. Works like a barrier. Calls must advance time monotonously
12
Example code class DistinctCount : Vertex { Dictionary > counts; void OnRecv(Edge e, S msg, T time) { if (!counts.ContainsKey(time)) { counts[time] = new Dictionary (); this.NotifyAt(time); } if (!counts[time].ContainsKey(msg)) { counts[time][msg] = 0; this.SendBy(output1, msg, time); } counts[time][msg]++; } void OnNotify(T time) { foreach (var pair in counts[time]) this.SendBy(output2, pair, time); counts.Remove(time); } { Low latency processing (sends upon receive) Synchronization point
13
Framework implementation Single thread: Scheduler keeps track of events Pointstamp = {Location, Timestamp}. Initialized per epoch and per input vertice Per each active (pending) Pointstamp, maintain: Occurrence count (number of events associated with it) Precursor count (number of “could-result-in“ events) “Frontier”: set of pointstamps with precursor=0. Ok to Notify. Distributed updates protocol the “agree” on global state epoch
14
Logical graph deployment with parallel physical nodes, graph is cloned Each node passes messages locally by default Framework supports partitioning functions for routing Allows hash/key partitioning, map/reduce logic or group-by operations "Could-result-in” relations are calculated over the logical graph (less effective, but simplifies scaling)
15
Optimizations Checkpointing (per vertice) for fault tolerance Disable windows TCP delays (Nagle’s algorithm) Reduce backoff times in concurrency conflicts Reduce GC freq
16
Results: micro benchmarks
17
Results: Real-world apps
18
Discussion… Focus on in-memory workloads Lock contentions and producer/consumer issues Performance comparison done against external results – fishy… Auto mapping from logical to physical
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.