Pip Detecting the Unexpected in Distributed Systems Janet Wiener Jeff Mogul Mehul Shah Chip Killian Amin.

Slides:

Advertisements

Similar presentations

Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.

Advertisements

AS ICT Finding your way round MS-Access The Home Ribbon This ribbon is automatically displayed when MS-Access is started and when existing tables.

Programming Types of Testing.

CLUE: SYSTEM TRACE ANALYTICS FOR CLOUD SERVICE PERFORMANCE DIAGNOSIS Hui Zhang 1, Junghwan Rhee 1, Nipun Arora 1, Sahan Gamage 2, Guofei Jiang 1, Kenji.

MCTS GUIDE TO MICROSOFT WINDOWS 7 Chapter 10 Performance Tuning.

Spark: Cluster Computing with Working Sets

Automating Bespoke Attack Ruei-Jiun Chapter 13. Outline Uses of bespoke automation ◦ Enumerating identifiers ◦ Harvesting data ◦ Web application fuzzing.

Load Testing Using NeoLoad

Building Enterprise Applications Using Visual Studio ®.NET Enterprise Architect.

CS 290C: Formal Models for Web Software Lecture 10: Language Based Modeling and Analysis of Navigation Errors Instructor: Tevfik Bultan.

Presenter : Shih-Tung Huang Tsung-Cheng Lin Kuan-Fu Kuo 2015/6/15 EICE team Model-Level Debugging of Embedded Real-Time Systems Wolfgang Haberl, Markus.

File Systems and Databases

1 Chapter 4 The Fundamentals of VBA, Macros, and Command Bars.

TinyOS Software Engineering Sensor Networks for the Masses.

Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.

Chess Review November 21, 2005 Berkeley, CA Edited and presented by Sensor Network Design Akos Ledeczi ISIS, Vanderbilt University.

How Clients and Servers Work Together. Objectives Learn about the interaction of clients and servers Explore the features and functions of Web servers.

70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 7: Planning a DNS Strategy.

Performance Debugging in Data Centers: Doing More with Less Prashant Shenoy, UMass Amherst Joint work with Emmanuel Cecchet, Maitreya Natu, Vaishali Sadaphal.

Winter Retreat Connecting the Dots: Using Runtime Paths for Macro Analysis Mike Chen, Emre Kıcıman, Anthony Accardi, Armando Fox, Eric Brewer

1 © Prentice Hall, 2002 The Client/Server Database Environment.

Instrumentation and Measurement CSci 599 Class Presentation Shreyans Mehta.

Check Disk. Disk Defragmenter Using Disk Defragmenter Effectively Run Disk Defragmenter when the computer will receive the least usage. Educate users.

Module 8: Monitoring SQL Server for Performance. Overview Why to Monitor SQL Server Performance Monitoring and Tuning Tools for Monitoring SQL Server.

1 Tuning PL/SQL procedures using DBMS_PROFILER 20-August 2009 Tim Gorman Evergreen Database Technologies, Inc. Northern California Oracle.

Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.

Hands-On Microsoft Windows Server 2008 Chapter 11 Server and Network Monitoring.

MSF Testing Introduction Functional Testing Performance Testing.

1 Chapter Overview Understanding Windows Name Resolution Using WINS.

Presenter: Chi-Hung Lu 1. Problems Distributed applications are hard to validate Distribution of application state across many distinct execution environments.

Module 15: Monitoring. Overview Formulate requirements and identify resources to monitor in a database environment Types of monitoring that can be carried.

Lucent Technologies – Proprietary Use pursuant to company instruction Learning Sequential Models for Detecting Anomalous Protocol Usage (work in progress)

CSE 486/586 CSE 486/586 Distributed Systems PA Best Practices Steve Ko Computer Sciences and Engineering University at Buffalo.

Introduction to HP LoadRunner Getting Familiar with LoadRunner >>>>>>>>>>>>>>>>>>>>>>

MCTS Guide to Microsoft Windows 7

Workflow Manager and General Tuning Tips. Topics to discuss… Working with Workflows Working with Tasks General Tuning Tips.

Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.

Pip: Detecting the Unexpected in Distributed Systems Charles Killian Amin Vahdat UCSD Patrick Reynolds Collaborators: Janet Wiener.

Institute of Computer and Communication Network Engineering OFC/NFOEC, 6-10 March 2011, Los Angeles, CA Lessons Learned From Implementing a Path Computation.

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

Creating Web Applications Using ASP.NET Chapter Microsoft Visual Basic.NET: Reloaded 1.

Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.

©2010 John Wiley and Sons Chapter 12 Research Methods in Human-Computer Interaction Chapter 12- Automated Data Collection.

Describing Process Specifications and Structured Decisions Systems Analysis and Design, 7e Kendall & Kendall 9 © 2008 Pearson Prentice Hall.

©NEC Laboratories America 1 Huadong Liu (U. of Tennessee) Hui Zhang, Rauf Izmailov, Guofei Jiang, Xiaoqiao Meng (NEC Labs America) Presented by: Hui Zhang.

Testing and Debugging Version 1.0. All kinds of things can go wrong when you are developing a program. The compiler discovers syntax errors in your code.

Oracle 10g Database Administrator: Implementation and Administration Chapter 2 Tools and Architecture.

Energy-Efficient Shortest Path Self-Stabilizing Multicast Protocol for Mobile Ad Hoc Networks Ganesh Sridharan

Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,

PwC New Technologies New Risks. PricewaterhouseCoopers Technology and Security Evolution Mainframe Technology –Single host –Limited Trusted users Security.

End-to-End Performance Analytics For Mobile Apps Lenin Ravindranath, Jitu Padhye, Ratul Mahajan Microsoft Research 1.

Performance Debugging for Distributed Systems of Black Boxes Marcos K. Aguilera Jeffrey C. Mogul Janet L. Wiener HP Labs Patrick Reynolds, Duke Athicha.

8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.

A Runtime Verification Based Trace-Oriented Monitoring Framework for Cloud Systems Jingwen Zhou 1, Zhenbang Chen 1, Ji Wang 1, Zibin Zheng 2, and Wei Dong.

Performance Testing Test Complete. Performance testing and its sub categories Performance testing is performed, to determine how fast some aspect of a.

Software Quality Assurance and Testing Fazal Rehman Shamil.

WebWatcher A Lightweight Tool for Analyzing Web Server Logs Hervé DEBAR IBM Zurich Research Laboratory Global Security Analysis Laboratory

LOAD RUNNER. Product Training Load Runner 3 Examples of LoadRunner Performance Monitors Internet/Intranet Database server App servers Web servers Clients.

CIS-NG CASREP Information System Next Generation Shawn Baugh Amy Ramirez Amy Lee Alex Sanin Sam Avanessians.

SAP Tuning 실무 SK㈜ ERP TFT.

Beyond Application Profiling to System Aware Analysis Elena Laskavaia, QNX Bill Graham, QNX.

SQL Database Management

Building Enterprise Applications Using Visual Studio®

Chapter 9: The Client/Server Database Environment

Glasgow, SQL Server Meetup

MCTS Guide to Microsoft Windows 7

The Client/Server Database Environment

Chapter 12: Automated data collection methods

Pip: Detecting the Unexpected in Distributed Systems

Presentation transcript:

Pip Detecting the Unexpected in Distributed Systems Janet Wiener Jeff Mogul Mehul Shah Chip Killian Amin Vahdat Patrick Reynolds

page 2 Pip - November 2005 Motivation Distributed systems exhibit complex behaviors Some behaviors are unexpected – Structural bugs Placement or timing of processing and communication – Performance problems Throughput bottlenecks Over- or under-consumption of resources Unexpected interdependencies Parallel, inter-node behavior is hard to capture with serial, single-node tools – Not captured by traditional debuggers, profilers – Not captured by unstructured log files

page 3 Pip - November 2005 Motivation Three target audiences: Primary programmer – Debugging or optimizing his/her own system Secondary programmer – Inheriting a project or joining a programming team – Learning how the system behaves Operator – Monitoring running system for unexpected behavior – Performing regression tests after a change

page 4 Pip - November 2005 Motivation Programmers wish to examine and check system- wide behaviors – Causal paths – Components of end-to-end delay – Attribution of resource consumption Unexpected behavior might indicate a bug Web server App server Database 500ms 2000 page faults

page 5 Pip - November 2005 Pip overview Pip: 1. Captures events from a running system 2. Reconstructs behavior from events 3. Checks behavior against expectations 4. Displays unexpected behavior Both structure and resource violations Goal: help programmers locate and explain bugs Behavior model Application Expectations Pip checker Unexpected structure Resource violations Pip explorer: visualization GUI

page 6 Pip - November 2005 Outline Expressing expected behavior Building a model of actual behavior Exploring application behavior Results – FAB – RanSub – SplitStream

page 7 Pip - November 2005 Describing application behavior Application behavior consists of paths – All events, on any node, related to one high-level operation – Definition of a path is programmer defined – Path is often causal, related to a user request WWW App server DB Parse HTTP Query Send response Run application time

page 8 Pip - November 2005 Describing application behavior Within paths are tasks, messages, and notices – Tasks: processing with start and end points – Messages: send and receive events for any communication Includes network, synchronization (lock/unlock), and timers – Notices: time-stamped strings; essentially log entries WWW App server DB Parse HTTP Query Send response Run application time “Request = /cgi/…”“2096 bytes in response” “done with request 12”

page 9 Pip - November 2005 Expectations: Recognizers Application behavior consists of paths Each recognizer matches paths – A path can match more than one recognizer A recognizer can be a validator, an invalidator, or neither Any path matching zero validators or at least one invalidator is unexpected behavior: bug? validator CGIRequest task(“Parse HTTP”) limit(CPU_TIME, 100ms); notice(m/Request URL:.*/); send(AppServer); recv(AppServer); invalidator DatabaseError notice(m/Database error:.*/);

page 10 Pip - November 2005 Expectations: Recognizers language repeat: matches a ≤ n ≤ b copies of a block xor: matches any one of several blocks call: include another recognizer (macro) future: block matches now or later – done: force named block to match repeat between 1 and 3 { … } xor { branch: … } future F1 { … } … done(F1);

page 11 Pip - November 2005 Expectations: Aggregate expectations Recognizers categorize paths into sets Aggregates make assertions about sets of paths – Count, unique count, resource constraints – Simple math and set operators assert(instances(CGIRequest) > 4); assert(max(CPU_TIME, CGIRequest) < 500ms); assert(max(REAL_TIME, CGIRequest) <= 3*avg(REAL_TIME, CGIRequest));

page 12 Pip - November 2005 Outline Expressing expected behavior Building a model of actual behavior Exploring application behavior Results

page 13 Pip - November 2005 Building a behavior model Sources of events: Annotations in source code – Programmer inserts statements manually Annotations in middleware – Middleware inserts annotations automatically – Faster and less error-prone Passive tracing or interposition – Easier, but less information Or any combination of the above Model consists of paths constructed from events recorded by the running application

page 14 Pip - November 2005 Annotations Set path ID Start/end task Send/receive message Notice WWW App server DB Parse HTTP Query Send response Run application time “Request = /cgi/…”“2096 bytes in response” “done with request 12”

page 15 Pip - November 2005 Automating expectations and annotations Expectations can be generated from behavior model – Create a recognizer for each actual path – Eliminate repetition – Strike a balance between over- and under-specification Annotations can be generated by middleware Automatic annotations in Mace, Sandstorm, J2EE, FAB – Several of our test systems use Mace annotations Behavior model Application Expectations Pip checker Annotations Unexpected behavior

page 16 Pip - November 2005 Checking expectations Traces Categorized paths Reconciliation Events database Paths Path construction Expectation checking Application For each path P For each recognizer R Does R match P? Check each aggregate Expectations Match start/end task, send/receive message Organize events into causal paths

page 17 Pip - November 2005 Exploring behavior Expectations checker generates lists of valid and invalid paths Explore both sets – Why did invalid paths occur? – Is any unexpected behavior misclassified as valid? Insufficiently constrained expectations Pip may be unable to express all expectations Two ways to explore behavior – SQL queries over tables Paths, threads, tasks, messages, notices – Visualization

page 18 Pip - November 2005 Timing and resource properties for one task Causal view of path Visualization: causal paths Caused tasks, messages, and notices on that thread

page 19 Pip - November 2005 Visualization: communication graph Graph view of all host-to-host network traffic

page 20 Pip - November 2005 Visualization: performance graphs Plot per-task or per-path resource metrics – Cumulative distribution (CDF), probability density (PDF), or vs. time Click on a point to see its value and the task/path represented Time (s) Delay (ms)

page 21 Pip - November 2005 Pip vs. printf Both record interesting events to check off-line – Pip imposes structure and automates checking – Generalizes ad hoc approaches Pipprintf Nesting, causal orderUnstructured Time, path, and threadNo context CPU and I/O dataNo resource information Automatic verification using declarative language Verification with ad hoc grep or expect scripts SQL queries“Queries” using Perl scripts Automatic generation for some middleware Manual placement

page 22 Pip - November 2005 Results We have applied Pip to several distributed systems: – FAB: distributed block store – SplitStream: DHT-based multicast protocol – RanSub: tree-based protocol used to build higher-level systems – Others: Bullet, SWORD, Oracle of Bacon We have found unexpected behavior in each system We have fixed bugs in some systems … and used Pip to verify that the behavior was fixed

page 23 Pip - November 2005 Results: SplitStream (DHT-based multicast protocol) 13 bugs found, 12 fixed – 11 found using expectations, 2 found using GUI Structural bug: some nodes have up to 25 children when they should have at most 18 – This bug was fixed and later reoccurred – Root cause #1: variable shadowing – Root cause #2: failed to register a callback How discovered: first in the explorer GUI, confirmed with automated checking

page 24 Pip - November 2005 Results: FAB (distributed block store) 1 bug (so far), fixed – Four protocols checked: read, write, Paxos, membership Performance bug: nodes seeking quorum call self and peers in arbitrary order – Should call self last, to overlap computation – For cached blocks, should call self second-to-last

page 25 Pip - November 2005 Results: RanSub (tree-based protocol) 2 bugs found, 1 fixed Structural bug: during first round of communication, parent nodes send summary messages before hearing from all children – Root cause: uninitialized state variables Performance bug: linear increase in end-to-end delay for the first ~2 minutes – Suspected root cause: data structure listing all discovered nodes

page 26 Pip - November 2005 Future work Further automation of annotations, tracing – Explore tradeoffs between black-box, annotated behavior models Extensible annotations – Application-specific schema for notices Composable expectations for large systems

page 27 Pip - November 2005 Related work Expectations-based systems – PSpec [Perl, 1993] – Meta-level compilation [Engler, 2000] – Paradyn [Miller, 1995] Causal paths – Pinpoint [Chen, 2002] – Magpie [Barham, 2004] – Project5 [Aguilera, 2003] Model checking – MaceMC [Killian, 2006] – VeriSoft [Godefroid, 2005]

page 28 Pip - November 2005 Conclusions Finding unexpected behavior can help us find bugs – Both structure and performance bugs Expectations serve as a high-level external specification – Summary of inter-component behavior and timing – Regression test for structure and performance Some bugs not exposed by expectations can be found through exploring: queries and visualization

Extra slides

page 30 Pip - November 2005 Resource metrics Real time User time, system time – CPU time = user + system – Busy time = CPU time / real time Major and minor page faults (paging and allocation) Voluntary and involuntary context switches Message size and latency Number of messages sent Causal depth of path Number of threads, hosts in path