WAP5 Black-box Performance Debugging for Wide-Area Distributed Systems Patrick Reynolds With:

Slides:



Advertisements
Similar presentations
1
Advertisements

Feichter_DPG-SYKL03_Bild-01. Feichter_DPG-SYKL03_Bild-02.
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
UNITED NATIONS Shipment Details Report – January 2006.
1 Hyades Command Routing Message flow and data translation.
We need a common denominator to add these fractions.
XP New Perspectives on Microsoft Office Word 2003 Tutorial 7 1 Microsoft Office Word 2003 Tutorial 7 – Collaborating With Others and Creating Web Pages.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Year 6 mental test 10 second questions
1 Processes and Threads Creation and Termination States Usage Implementations.
1 Discreteness and the Welfare Cost of Labour Supply Tax Distortions Keshab Bhattarai University of Hull and John Whalley Universities of Warwick and Western.
Communicating over the Network
Photo Slideshow Instructions (delete before presenting or this page will show when slideshow loops) 1.Set PowerPoint to work in Outline. View/Normal click.
Protocol layers and Wireshark Rahul Hiran TDTS11:Computer Networks and Internet Protocols 1 Note: T he slides are adapted and modified based on slides.
1. 2 Objectives Become familiar with the purpose and features of Epsilen Learn to navigate the Epsilen environment Develop a professional ePortfolio on.
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
Break Time Remaining 10:00.
Configuration management
Local Area Networks - Internetworking
Seungmi Choi PlanetLab - Overview, History, and Future Directions - Using PlanetLab for Network Research: Myths, Realities, and Best Practices.
GETTING STARTED WITH WINDOWS COMMUNICATION FOUNDATION 4.5 Ed Jones & Grey Guindon.
1 Generating Network Topologies That Obey Power LawsPalmer/Steffan Carnegie Mellon Generating Network Topologies That Obey Power Laws Christopher R. Palmer.
EU market situation for eggs and poultry Management Committee 20 October 2011.
MySQL Access Privilege System
Taming User-Generated Content in Mobile Networks via Drop Zones Ionut Trestian Supranamaya Ranjan Aleksandar Kuzmanovic Antonio Nucci Northwestern University.
Intentional Networking: Opportunistic Exploitation of Mobile Network Diversity T.J. Giuli David Watson Brett Higgins Azarias Reda Timur Alperovich Jason.
Outline Minimum Spanning Tree Maximal Flow Algorithm LP formulation 1.
Green Eggs and Ham.
IP Multicast Information management 2 Groep T Leuven – Information department 2/14 Agenda •Why IP Multicast ? •Multicast fundamentals •Intradomain.
VOORBLAD.
15. Oktober Oktober Oktober 2012.
Name Convolutional codes Tomashevich Victor. Name- 2 - Introduction Convolutional codes map information to code bits sequentially by convolving a sequence.
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
© 2012 National Heart Foundation of Australia. Slide 2.
1 © 2004, Cisco Systems, Inc. All rights reserved. CCNA 1 v3.1 Module 10 Routing Fundamentals and Subnets.
Chapter 10 Software Testing
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Note to the teacher: Was 28. A. to B. you C. said D. on Note to the teacher: Make this slide correct answer be C and sound to be “said”. to said you on.
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
25 seconds left…...
: 3 00.
Systems Analysis and Design in a Changing World, Fifth Edition
1 © 2004, Cisco Systems, Inc. All rights reserved. CCNA 1 v3.1 Module 11 TCP/IP Transport and Application Layers.
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Clock will move after 1 minute
Intracellular Compartments and Transport
1 © 2004, Cisco Systems, Inc. All rights reserved. CCNA 1 v3.1 Module 9 TCP/IP Protocol Suite and IP Addressing.
PSSA Preparation.
Essential Cell Biology
1 Chapter 13 Nuclear Magnetic Resonance Spectroscopy.
Murach’s OS/390 and z/OS JCLChapter 16, Slide 1 © 2002, Mike Murach & Associates, Inc.
TCP/IP Protocol Suite 1 Chapter 18 Upon completion you will be able to: Remote Login: Telnet Understand how TELNET works Understand the role of NVT in.
New Opportunities for Load Balancing in Network-Wide Intrusion Detection Systems Victor Heorhiadi, Michael K. Reiter, Vyas Sekar UNC Chapel Hill UNC Chapel.
Performance Debugging for Distributed Systems of Black Boxes Marcos K. Aguilera Jeffrey C. Mogul Janet L. Wiener HP Labs Patrick Reynolds, Duke Athicha.
Presentation transcript:

WAP5 Black-box Performance Debugging for Wide-Area Distributed Systems Patrick Reynolds With: Marcos Aguilera Amin Vahdat Janet Wiener Jeffrey Mogul

page 2 WAP5 - WWW'06 Motivation Discover structure and performance problems in large, wide-area systems Infer paths through nodes – One path per client request – Discover timing at each step Focus attention on nodes that are problematic – First step in performance debugging Local DHT node Web proxy Client Remote DHT node DNS Origin web server

page 3 WAP5 - WWW'06 Coral example Second-level hit (4 messages) Second-level miss (6 messages) Also: DHT lookups Client Proxy DNS Origin server Origin server Causal path: a sequence of related messages and processing, annotated with timing/delays 250ms 500ms

page 4 WAP5 - WWW'06 Goals Find bugs in wide-area applications – Performance bugs: too much or too little time at any point – Structure bugs: incorrect ordering or placement of processing or communication Expose causal paths – Structure discovery – Measure latency for processing and communication – Unexpected structure or timing Indicates possible bugs Black-box approach – Do not require source code access – Allow heterogeneity

page 5 WAP5 - WWW'06 Three target audiences Primary programmer – Debugging or optimizing his/her own system Secondary programmer – Inheriting a project or joining a programming team – Discovery: learning how the system behaves Operator – Monitoring a running system for unexpected behavior – Performing regression tests after a change

page 6 WAP5 - WWW'06 Contributions New causality analysis algorithm Full tool chain – Trace capture library – Causal path analysis – Visualization Results with two PlanetLab CDNs – Coral and CoDeeN Packet or socket traces Reconciliation Message traces Causal analysis Causal paths and timing Visualization Trace capture

page 7 WAP5 - WWW'06 Outline Introduction Naming Trace capture Reconciliation Causality analysis – Message linking algorithm Results with CoDeeN & Coral

page 8 WAP5 - WWW'06 Naming Message is single read/write system call – May be many TCP or UDP packets Node can be process or host Endpoint can be socket path or Client Web server Host = foo.cs.duke.edu Web proxy pid=2297 DHT pid= /tmp/corald… DHT node

page 9 WAP5 - WWW'06 Naming Node names are causal names – Message into a process/host can cause messages out Endpoint names guide aggregation – Calls to foo:8080 are different from calls to foo:53 – Client hosts and ports can be ignored Client Web server Host = foo.cs.duke.edu /tmp/corald… DHT node Web proxy pid=2297 DHT pid=2312

page 10 WAP5 - WWW'06 Outline Introduction Naming Trace capture Reconciliation Causality analysis – Message linking algorithm Results with CoDeeN & Coral

page 11 WAP5 - WWW'06 Trace capture Capture events using host/net sniffing or library interposition – All three choices: no modifications to applications – On PlanetLab: sniffing on host only, limited flexibility We capture events using library interposition – Captures all calls that create, modify, or use a socket kernel program libc kernel program libc Network sniffing Host sniffing Library interposition

page 12 WAP5 - WWW'06 Outline Introduction Naming Trace capture Reconciliation Causality analysis – Message linking algorithm Results with CoDeeN & Coral

page 13 WAP5 - WWW'06 Reconciliation: Convert socket calls to logical messages client/5040 server/ server/8712 client/ bind(fd=6, addr={ :33250}) connect(fd=6, addr={ :80}) send(fd=6, len=10, time=0.592) recv(fd=6, len=12, time=2.033) client pid=5040 bind(fd=4, addr={ :80}) accept(lfd=4, addr={ :33250}) = 5 recv(fd=5, len=10, time=0.852) send(fd=5, len=12, time=1.705) server pid=8712 Assign endpoint names to each call

page 14 WAP5 - WWW'06 Combine send and recv events for each message – Detect dropped or reordered UDP packets – Detect differing message (buffer) boundaries client/5040 server/ server/8712 client/ bind(fd=6, addr={ :33250}) connect(fd=6, addr={ :80}) send(fd=6, len=10, time=0.592) recv(fd=6, len=12, time=2.033) client pid=5040 bind(fd=4, addr={ :80}) accept(lfd=4, addr={ :33250}) = 5 recv(fd=5, len=10, time=0.852) send(fd=5, len=12, time=1.705) server pid=8712 Reconciliation: Convert socket calls to logical messages

page 15 WAP5 - WWW'06 Assign node (process) names to each message client/5040 server/ server/8712 client/ bind(fd=6, addr={ :33250}) connect(fd=6, addr={ :80}) send(fd=6, len=10, time=0.592) recv(fd=6, len=12, time=2.033) client pid=5040 bind(fd=4, addr={ :80}) accept(lfd=4, addr={ :33250}) = 5 recv(fd=5, len=10, time=0.852) send(fd=5, len=12, time=1.705) server pid=8712 Reconciliation: Convert socket calls to logical messages

page 16 WAP5 - WWW'06 Outline Introduction Naming Trace capture Reconciliation Causality analysis – Message linking algorithm Results with CoDeeN & Coral

page 17 WAP5 - WWW'06 Causal path analysis Which call to B caused outgoing calls? – Could be spontaneous action – May be ambiguous Make good guesses Use statistics over whole trace Try multiple possibilities Build paths by combining calls

page 18 WAP5 - WWW'06 Score possible parents for each message Message linking algorithm Link-probability trees Build and aggregate paths Estimate average causal delays Message traces Causal-path patterns

page 19 WAP5 - WWW'06 Estimate average causal delay Look at all messages into B, plus all B C messages – Take smallest delay before each B C message – Trace-specific upper limit D B C = average of these delays – Might underestimate D Scaling factor λ B C = 1/D B C Create exponential distribution – f(t) = λe –λ t Smallest delay for B C

page 20 WAP5 - WWW'06 Find and weight possible parent messages Use f(t) to find weight of link from each parent

page 21 WAP5 - WWW'06 Find and weight possible parent messages Normalize so sum of weights to each child = 1 Possible-parent trees – Spontaneous action has small probability, not shown – Links to B D are slightly less likely Z BY BX B B C Z BY BX B B D

page 22 WAP5 - WWW'06 Z BY BX B B C Z BY BX B B D B C Build causality trees Invert to get possible-child trees Z B B CB D Y B B CB D X B B CB D Z BY BX BB CZ BY BX BB D B CB D

page 23 WAP5 - WWW'06 Build causality trees Build trees from individual links – Use probability to decide whether or not to keep child – Some links are try-both and generate 2 trees Tree probability is product of link probabilities p = 0.8 * 0.9 * (1-0.2) * (1-0.1) * (1-0.48) C G A B B CB DB EB F ABCG p=0.270 ABCGF p=0.249

page 24 WAP5 - WWW'06 Build causality trees Aggregate trees with identical structure – Combine client names and ports for better aggregation Total probabilities for each pattern ranking – Expected number of instances – Highlights paths that appear many times with high confidence

page 25 WAP5 - WWW'06 Outline Introduction Naming Trace capture Reconciliation Causality analysis – Message linking algorithm Results with CoDeeN & Coral

page 26 WAP5 - WWW'06 Results: Timeline vs. call tree Coral miss path with DNS lookup Coral processingOrigin serverResponse

page 27 WAP5 - WWW'06 Results: Two CoDeeN miss paths Different mean delays at proxies – 0.20 to 4.86 ms in different proxies Different delays at origin web servers All clients aggregated together

page 28 WAP5 - WWW'06 Results: Coral DHT lookup Three-level DHT lookups 3 calls in parallel

page 29 WAP5 - WWW'06 Conclusions WAP5 exposes structure and timing of wide-area applications – Particularly PlanetLab applications Successful analysis of CoDeeN and Coral traces – We found paths that match authors descriptions of systems – We characterized delays at each step and found outliers

Extra slides

page 31 WAP5 - WWW'06 Future work Phased behavior – Time-varying patterns – Time-varying delays Better aggregation – Coalesce similar path patterns Paths through DHTs Better visualization

page 32 WAP5 - WWW'06 Related work Trace-based analysis tools – Our LAN-based work in SOSP 2003 – Magpie: detailed picture per machine by using OS-level instrumentation (Event Tracing for Windows) – Pinpoint: instrument middleware – Others instrument applications Inference-based performance analysis – SLIC uses statistical induction to correlate low-level metrics with SLO violations Interposition tools – Trickle, ModelNet

page 33 WAP5 - WWW'06 Node and network latency Node latency = t3 – t2 – Not affected by clock offset All timestamps are local to B Network latency = t2 – t1; t4 – t3 – Correct for clock offset [Paxson98] RTT = (t4 – t3) + (t2 – t1) Skew = (t2 – t1) – RTT/2 A B A time t1 t2 t4 t3

page 34 WAP5 - WWW'06 LibSockCap interposition library Low overhead Easy to deploy in CoDeeN and Coral – Use existing framework to push out new software – Restart process to begin/end trace Advantages – Logical message semantics – Per process, not per machine – Capture UNIX, TCP, and UDP sockets Disadvantages – Timestamps combine OS and network latency – No control packets or fragments