Presentation is loading. Please wait.

Presentation is loading. Please wait.

Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai Budiu University of Illinois, Microsoft Research SVC.

Similar presentations


Presentation on theme: "Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai Budiu University of Illinois, Microsoft Research SVC."— Presentation transcript:

1 Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai Budiu University of Illinois, Microsoft Research SVC International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS) 2011

2 Programming Clusters: Marketing Map-Reduce

3 Programming Clusters: Reality

4 Complexity Exposed Correctness or performance bugs break the single-system abstraction

5 Motivation Job structure The Job Object Model Tools for job understanding Conclusions

6 Execution Application Data-Parallel Computation 6 Storage Language Map- Reduce GFS BigTable Cosmos Azure HPC Dryad DryadLINQ Scope Sawzall,FlumeJava Hadoop HDFS S3 Pig, Hive ≈SQLLINQ, SQLSawzall, Java

7 2-D Piping Unix Pipes: 1-D grep | sed | sort | awk | perl Dryad: 2-D grep 1000 | sed 500 | sort 1000 | awk 500 | perl 50 7

8 Dryad Job Structure 8 grep sed sort awk perl grep sed sort awk Input files Vertices (processes) Output files Channels Stage

9 Dryad System Architecture 9 Network job schedule data plane control plane NS, Sched Exec V VV Job managercluster

10 Firewall How does it work in detail? Cluster/Cloud Cluster Scheduler Job Manager (JM) Exec Storage Localhost Job Submission Compiler Application IDE Vertex Exec Storage Vertex Exec Storage L: Logs, IO: Input/Output, R: Resources LRIOLR LR

11 Logs – lots of them Job-related – Plan (xml), status, resources Job-manager – stdout.txt, stderr.txt, *.log Vertex – stdout.txt, *.log, *.xml, *.cmd

12 Monitoring Tools Structure CosmosScopeHPC v2HPC v3 Cluster abstraction Job Object Model Monitoring, Profiling, Debugging GUIs

13 Job Object Model Logs JOM Views Job Vertices Plan Tools

14 Motivation Job structure The Job Object Model Tools for job understanding Conclusions

15 The Job Browser JobStageVertex

16 Job Schedule

17 Failure diagnosis

18 Diagnosis decision tree “Hand-made” Least portable tool Incomplete High-coverage Bug types: – User level – System-level – Cluster malfunction

19 Powershell = Interactive Queries $cluster = get-cluster X $job = $cluster | select-AllJobs | sort-object Date | select-object -last 1 | select-DryadJob $failed = $job.Vertices | where-object { $_.State -eq "Failed" }

20 Vertex Debugging on Client

21 Vertex Profiling on Client

22 Debugging on Cluster Collection collection; var results = from c in collection where c.name.length > 10 orderby c.age select c.name; where c.name.length > 10 ProgramJob Breakpoint

23 Firewall Cluster/Cloud Storage LR Remote debugging Cluster Scheduler Job Manager (JM) Localhost Job Submission DryadLINQ Application Visual Studio Vertex 1Vertex 2 Breakpoint hit… Breakpoint L: Logs, IO: Input/Output, R: Resources attach Exec Storage Exec Storage Exec LRIOLR

24 Firewall Cluster/Cloud Exec Storage LLL Notifications: Our Implementation Cluster Scheduler Job Manager (JM) Localhost Job Submission DryadLINQ Application Visual Studio Vertex 1Vertex 2 Daphne L: Logs, IO: Input/Output, R: Resources Exec RIOR R attach

25 Remote debugging

26 Open Problems What happens when 100,000 processes hit a breakpoint? How to evaluate expressions in the debugger when state is distributed? How to do large-scale performance debugging? How to preserve map between distributed state and original program state? How much can the illusion of a single system be preserved?

27 Conclusions Single-machine abstractions break down in the presence of (performance/correctness) bugs Job Object Model insulates tools from messy details Design the cluster runtime to make it easy to build a JOM Rich interactive tools easily built on top of JOM Much more work needed for debugging at scale


Download ppt "Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai Budiu University of Illinois, Microsoft Research SVC."

Similar presentations


Ads by Google