Using Queries for Distributed Monitoring and Forensics Atul Singh Rice University Peter Druschel Max Planck Institute for Software Systems Timothy Roscoe Intel Research Berkeley Petros Maniatis Intel Research Berkeley
Atul Singh/RiceEuroSys Building and monitoring a system Building a distributed system is a complex undertaking –Select properties –algorithms –implement, deploy Switch to monitoring the system –Testing, debugging, profiling, tuning Monitoring is hard, error-prone Distributed state Partial faults Complex interactions Asynchronous External factors
Atul Singh/RiceEuroSys Monitoring is hard! Current state of the art: –Manual insertion of “printf” –Bringing logs to one place –Parsing/processing of logs Scripts (perl/python) Queries (Astrolabe) –Offline by nature Expose internal state Ad-hoc, error-prone Probe exposed state Correlate events Bridge the semantic gap
Atul Singh/RiceEuroSys Declarative systems: building systems via queries Declarative specification via queries Execution by a distributed query processor P2[SOSP’05]: a prototype declarative system –Concise specifications –Enables rapid prototyping We present a monitoring framework for P2 –Flexible introspection –Retains semantics of application –Online execution tracing Probe the state Expose internals
Atul Singh/RiceEuroSys Overview Introduction P2 Background Monitoring framework Example applications/Performance Conclusions
Atul Singh/RiceEuroSys Example: route operation in P2 route(B,K) :- route(A,K), nextHop(A,D,B), D == K. nextHop route Join route.A == nextHop.A Select D == K Project route Rule strand Application state action :- precondition. event, R0 R1. Network In Network Out Dataflow graph K Router A nextHop K -> B K’ -> D.. Router B nextHop K K -> C K’ -> E..
Atul Singh/RiceEuroSys Overview Introduction Background Monitoring framework Examples applications/Performance Conclusions
Atul Singh/RiceEuroSys Introspection and Logging Introspection at three levels –Application state level –Rule level –Dataflow level Systematic instrumentation –System is built using smaller, re-usable components –Systematic insertion of logging statements Logging data is in the form of tuples –Retains semantics of application logic –No need for translation JoinSelection Project r1
Atul Singh/RiceEuroSys Tracing rule executions We want to step through the execution –Each step corresponds to a rule –Do it in “online” fashion For rule level tracing –Need to trace tuples 1.Match output tuple to input 2.Track tuples as they go over wire Node A Node B r1r0 x wz y
Atul Singh/RiceEuroSys (1) Tracing rule executions Matching input and output tuples of a rule –Tap elements at the beginning and end of a rule Execution tracer: tracks rule executions Execution records are stored as tuples in exec table exec x xr1yd Execution Tracer output input JoinSelection Project r1 inputruleIdoutputdest. y
Atul Singh/RiceEuroSys (2) Tracing tuples across wire Each tuple has a locally unique ID –Tuple ID is sent along with the tuple Upon receiving, a new tuple is created with different ID Hooks in the network in/out handling subsystem –A record is created tuple’s local ID tuple’s remote ID Node from which it came from xyA B’ tupleTable Network Out Network In A B x y
Atul Singh/RiceEuroSys Putting it all together Of course in reality, it’s more complicated … –Aborted rule executions –Pipelined rule executions Node A Node B r1r0 x w y z exec tupleTable exectupleTable xr0yBvxC zr1wCyzA
Atul Singh/RiceEuroSys Overview Introduction Background Monitoring framework Example applications/Performance Conclusions
Atul Singh/RiceEuroSys Example applications (I) Distributed watchpoints: Trigger an event if true –Possibly trace back/forward Oscillation of faulty/stale information (route flaps) –Gossiping for stabilization or updates Inconsistent routing in DHT’s [Pastry, Chord,…] –Each node is responsible for a unique region –Route using distinct paths and check [Bamboo, Secure Routing]
Atul Singh/RiceEuroSys Example applications (II) Online execution profiling: –How much time is spent in each rule? –Where are the bottlenecks? –Which rule is costlier? What operation? Consistent Snapshots [Chandy-Lamport]: –Snapshot for the routing state –Queries on “snapshots” itself –What is the degree distribution? –How many node-disjoint paths? No more than 16 rules for any of the above r1 r3 r2
Atul Singh/RiceEuroSys Performance 21 node Chord overlay in P2 –Monitored node on separate, unloaded machine Overhead of introspection –CPU ( %), Memory (8MB 13MB) Consistent distributed snapshot Other results in the paper % CPU Util. Rate (1/#sec) Tx pkts(X1000)
Atul Singh/RiceEuroSys Related Work Management using database techniques [Hy+…] Performance debugging [Magpie, Causeway…] Configuration debugging for BGP, OSes [Time-travel…] Distributed debuggers [WiDS, Pip, Replay Debugging…] Deep embedded monitoring [IBM Websphere, Adaptations…]
Atul Singh/RiceEuroSys Conclusions Declarative development of systems –Integrated approach to building and monitoring –Automatic execution tracing –Online, in-place monitoring Step towards “autonomic” distributed systems –Fault-finding tasks evolve with the system Interesting future directions –User interface –Trade-off between monitoring accuracy and overhead Questions? [Thank You]
Atul Singh/RiceEuroSys Request to EuroSys Please schedule my next talk on the first day Move the submission deadline away from NSDI (last year, NSDI submission (19 th Oct), EuroSys (20 th ))
Atul Singh/RiceEuroSys Questions? Thank You!