Pip: Detecting the Unexpected in Distributed Systems

Pip: Detecting the Unexpected in Distributed Systems
Patrick Reynolds Duke University Charles Killian, Amin Vahdat UC San Diego Janet L. Wiener, Jeffrey C. Mogul, Mehul A. Shah HP Labs, Palo Alto Today I want to talk about the Pip toolkit for discovering structural and performance bugs in distributed systems. This work was published NSDI just about a year ago. The main contribution of Pip is that it combines two existing debugging approaches into a single system, path analysis and automated expectation checking, and shows how this works well on distributed systems and applications. UWCS OS Seminar Discussion Andy Pavlo 07 May 2007

Problem Statement Distributed systems exhibit complex behaviors that can be difficult to debug: Often more difficult than centralized systems. Parallel, inter-node activity are difficult to capture with serial, single-node tools: Need something more robust than traditional profilers and debuggers. The main motivation behind this work is based on the premise that it difficult to debug distributed systems. This is because of the inherent parallelism of the systems and that these systems are susceptible to extra sources of failure. More components, more network communication, more messages. Applications may cross administrative boundaries. AFS? Thus, the authors argue that existing debugging tools are inadequate to truly work in this environment. We need something more robust than just debug messages.

Problem Statement Once behavior is captured, how do you analyze it?
Structural bugs: Application processing & communication Performance problems: Throughput bottlenecks Consumption of resources Unexpected interdependencies But the problem doesn't stop there, once you actually have trace data, how do you decide whether your application is behaving as it should. This is especially hard in distributed systems, where the placement and order of events can differ from one operation to the next. So the goal is to be able to reason about your system in two different ways. First, is the program performing correctly? That is, is the system sending, processing, and transmitting messages in the proper order. Second, is the system using too much or too little resources during operations. Too long = bottleneck Too little = truncated processing How can we express this in a debugging tool?

Pip Overview Suite of programs to gather, check, and display the behavior of distributed systems. Uses explicit path identifiers and programmer- written expectations to check program behavior. Pip compares actual behavior to expected behavior. The system they developed is called Pip. Automatically check the actual behavior of a system against how the programmers' expectn Infer activity based on traces of network, application, and OS events. Useful for systems driven by user requests. Captures the delays and resource consumption associate with each request. Path-based debuggers can help programmers find aberrant paths and assist in optimizing throughput and end-to-end latency. Authors argue that Pip is useful for 3 types: Orig Developers: verify/debug own systems 2nd Developers: learn about exsting systems Sys Maintainers: monitor system for changes

System Overview Annotation Library Declarative Expectations Language
Trace Checker Behavior Explorer GUI The Pip system has for main components. In the workflow shown here, we start with an application that is augmented with annotations When the program is executed, it produces tracefiles. These tracefiles are aggregated together into a single database by a reconciliation process. These paths are then checked against a expectations provided by the programmer about how the system is suppose to run The results from these process and other information about the program's execution can then be visualized by graphical interfaces. Application Trace Files Reconciliation Annotations Expectations Path Database Checker+Explorer Icon Source: Paul Davey (2007)

Application Annotation
Pip constructs an application's behavior model from generated events: Manual source code annotations Automatic middleware insertions Execution paths are based on: Tasks Messages Notices Annotations are used to generate events and resource measurements of a running app Annotations may be added manually the programmer into the source code Or they can link their programs with middleware that has been augmented with Pip's annotation library. Automatic generation This behavior model is comprised of multiple execution paths in the system. An execution path in Pip is made up of 3 comps Tasks: an interval of processing with a beginning and and end Messages: any communication between hosts or threads on the same host Notices: an identifier that a task has occured. This is equivalent to a log message with a timestamp and a path identifier for context.

Application Annotation
Set Path ID Start/End Task Send/Receive message Generate Notice I want to give a quick example of what I mean by an execution path. Simple HTTP request that triggers some processing, a database query, and a response First the application will set a unique identifier for this execution path for all the components of the distributed system. Notice that this path will be used across the multiple hosts involved in responding to the request. We then define start and end markers for the various tasks in the the path We also tag the messages that are sent and received between the hosts. Lastly, once the operation is complete, the application publishes notices to Pip about what happened during the execution. Received Request Processed Request Sent Response WWW Parse HTTP { } { Send Response } Execute { } App Server Query { } Database time

Expectations Declarative language to describe application structure, timing, and resource consumption. Expresses parallelism. Accommodates variation in the order and number of events for multiple paths. Expectations are external descriptions about the behavior of an application. This includes the execution steps, timing of certain events, and how much resources are us This is an example of an expectation written for the Web server in the previous example. We set a time limit on how quickly the application can process the request. We define the communications between the web server and the app server. We also have a check to see if the execution caused a database error and will invalidate the execution path. Now I think that one would write the expectations first, but I think it is easier to illustrate what the concepts are in reverse One thing that the Pip authors tout is the ability to express parallelism in expectations So you could define certain actions to occur in any order. Quorum example validator CGIRequest task(“Parse HTTP”) limit(CPU_TIME, 100ms); notice(m/Received Request: .*/); send(AppServer); recv(AppServer); invalidator DatabaseError notice(m/Database error: .*/);

Expectations Example: Quorum validator Request
recv(Client) limit (SIZE, {=44b}); task(“Read”) { repeat 3 { send(Peer); } repeat 2 { recv(Peer); task(“ReadReply”); } future { send(Client); Expectations are external descriptions about the behavior of an application. This includes the execution steps, timing of certain events, and how much resources are us This is an example of an expectation written for the Web server in the previous example. We set a time limit on how quickly the application can process the request. We define the communications between the web server and the app server. We also have a check to see if the execution caused a database error and will invalidate the execution path. Now I think that one would write the expectations first, but I think it is easier to illustrate what the concepts are in reverse One thing that the Pip authors tout is the ability to express parallelism in expectations So you could define certain actions to occur in any order. Quorum example

Expectations Recognizers: Aggregates:
Description of structural and performance behavior. Matching Matching with performance violations Non-matching Aggregates: Assertions about properties of sets of paths. Expectations can take two forms Recognizes are used to validate or invalidate individual path instances The example from the previous slide is a recognizer. It has parts to both validate and invalidate the execution path. Aggregates are rules about the properties of multiple paths. For example, you can define rules about the number of instances matched by a given recognizer.

Trace Checker Pip generates a search tree from expectations.
The trace checker matches results from the path database with expectations.

Behavior Explorer Interactive GUI displays: Casual Path Structure
Communication Structure Valid/Invalid Paths Resource Usage Graphs

Behavior Explorer Source: Pip web page (2007)

Behavior Explorer Casual Path Viewer
Executed tasks, messages, and notices Timing & Resource Properties Source: Pip web page (2007)

Pip vs. Paradyn The Paradyn Configuration Language (PCL) allows programmers to describe expected characteristics of applications. “...PCL cannot express the casual path structure of threads, tasks, and messages in a program, nor does Paradyn reveal the program's structure”.

Using Pip in Condor No high-level debugging tool is currently used by Condor developers. Inner-working knowledge about daemon interactions is either scattered in source code documentations or with a few developers. One of the running themes in Condor is that whenever we get a support from somebody, the standard response is always “send us your log files”.

Discussion Questions?

Pip: Detecting the Unexpected in Distributed Systems

Similar presentations

Presentation on theme: "Pip: Detecting the Unexpected in Distributed Systems"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Pip: Detecting the Unexpected in Distributed Systems

Similar presentations

Presentation on theme: "Pip: Detecting the Unexpected in Distributed Systems"— Presentation transcript:

Similar presentations

About project

Feedback