The Distributed Application Debugger (DAD)

Slides:



Advertisements
Similar presentations
TCP-IP Primer David Cozens. Targets Have a basic understanding of Ethernet network technology Be aware of how this technology is applied on the 5000 series.
Advertisements

Implementing A Simple Storage Case Consider a simple case for distributed storage – I want to back up files from machine A on machine B Avoids many tricky.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
ALEPH version 21 Task Manager. New Task Manager Interface Admin tab 2 The Task Manager interface has been removed from the ALEPH menu, and is now found.
K. Salah 1 Chapter 31 Security in the Internet. K. Salah 2 Figure 31.5 Position of TLS Transport Layer Security (TLS) was designed to provide security.
Report Distribution Report Distribution in PeopleTools 8.4 Doug Ostler & Eric Knapp 7264.
How Clients and Servers Work Together. Objectives Learn about the interaction of clients and servers Explore the features and functions of Web servers.
Asynchronous Solution Appendix Eleven. Training Manual Asynchronous Solution August 26, 2005 Inventory # A11-2 Chapter Overview In this chapter,
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
Replay Debugging for Distributed Systems Dennis Geels, Gautam Altekar, Ion Stoica, Scott Shenker.
Automatic Software Testing Tool for Computer Networks ARD Presentation Adi Shachar Yaniv Cohen Dudi Patimer
1 The SpaceWire Internet Tunnel and the Advantages It Provides For Spacecraft Integration Stuart Mills, Steve Parkes Space Technology Centre University.
Administering Windows 7 Lesson 11. Objectives Troubleshoot Windows 7 Use remote access technologies Troubleshoot installation and startup issues Understand.
WaveMaker Visual AJAX Studio 4.0 Training Troubleshooting.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
Tutorial 1: Getting Started with Adobe Dreamweaver CS4.
Copyright 2003 CCNA 1 Chapter 9 TCP/IP Transport and Application Layers By Your Name.
Support for Debugging Automatically Parallelized Programs Robert Hood Gabriele Jost CSC/MRJ Technology Solutions NASA.
MCTS Guide to Microsoft Windows Server 2008 Applications Infrastructure Configuration (Exam # ) Chapter Four Windows Server 2008 Remote Desktop Services,
V. Serbo, SLAC ACAT03, 1-5 December 2003 Interactive GUI for Geant4 by Victor Serbo, SLAC.
Automated Scheduling and Operations for Legacy Applications.
Module 3 Configuring File Access and Printers on Windows 7 Clients.
Debugging and Profiling With some help from Software Carpentry resources.
ITGS Networks. ITGS Networks and components –Server computers normally have a higher specification than regular desktop computers because they must deal.
Integrating and Troubleshooting Citrix Access Gateway.
Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,
Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS.
Server Administration. [vpo_server_admin] 2 Server Administration Section Overview Controlling Management Server processes Controlling Managed Node processes.
1 Tips for the assignment. 2 Socket: a door between application process and end- end-transport protocol (UDP or TCP) TCP service: reliable transfer of.
R. Krempaska, October, 2013 Wir schaffen Wissen – heute für morgen Controls Security at PSI Current Status R. Krempaska, A. Bertrand, C. Higgs, R. Kapeller,
January 9, 2001 Router Plugins (Crossbow) 1 Washington WASHINGTON UNIVERSITY IN ST LOUIS Exercises.
Basic Navigation in Oracle R12 BY: Muhammad Irfan.
Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.
 Project Team: Suzana Vaserman David Fleish Moran Zafir Tzvika Stein  Academic adviser: Dr. Mayer Goldberg  Technical adviser: Mr. Guy Wiener.
Wednesday NI Vision Sessions
Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.
Network customization
Architecture Review 10/11/2004
Fundamental of Databases
Cookies Tutorial Cavisson Systems Inc..
Computer System Laboratory
Module 9: Preparing to Administer a Server
Introduction to Operating Systems
Jan Bækgaard Pedersen Alan Wagner Department of Computer Science
Troubleshooting Tools
Introduction to Windows Azure AppFabric
File System Implementation
Authentication & .htaccess
Getting Started with R.
FTP Lecture supp.
Module 4 Remote Login.
Chapter 2: System Structures
Environment Manager Troubleshooting and Debugging
Chrome Developer Tools
Chapter 10: Application Layer
Chapter 3: Windows7 Part 4.
Chapter 16: Distributed System Structures
Packet Sniffing.
Unit 27 - Web Server Scripting
Distributed System Structures 16: Distributed Structures
DTC Troubleshooting and Support Webinar
X Windows.
Process-to-Process Delivery:
Remote Procedure Call Hank Levy 1.
Remote Procedure Call Hank Levy 1.
Module 9: Preparing to Administer a Server
Remote Procedure Call Hank Levy 1.
Exceptions and networking
Virtual Private Network
Presentation transcript:

The Distributed Application Debugger (DAD) Michael Q. Jones & Matt B. Pedersen University of Nevada Las Vegas

Introduction The Distributed Application Debugger is a debugging tool for parallel programs Targets the MPI platform Runs remotley even on private networks Has record and replay features. Integrates GDB

Why a new tool? Results from survey of students learning parallel programming concluded 3 things: 1. Sequential errors are still frequent 2. Message errors are time consuming 3. Print statements are still used for debugging

Survey Survey results categorized according to the domains of multilevel debugging Sequential errors Message errors Protocol errors In addition to Data decomposition errors Functional decomposition errors

Survey Results

Survey Results

The Components The Client The Call Center Bridges The Runtime The GUI interacting with the programmer The Call Center A central messaging hub (running on the cluster) for Routing messages from the MPI program to The Client Routing commands from The Client to the MPI program Bridges A relay application for passing data between The Client and The Call Center, when The Call Center is not directly accessible (cluster behind firewall) The Runtime A libraries with wrapper code for the MPI functions (talks to The Call Center)

The Setup Login Server Home Cluster Login Firewall Server Cluster Login from Home to Cluster not Directly possible

The Setup Login Server Home Cluster Login Firewall Server Bridge Client Bridge Cluster Client runs at home Bridges on the servers in between home and the cluster Call Center on the cluster MPI processes on the cluster Call Center MPI MPI MPI MPI

The Distributed Application Debugger

Client Side The user provides a connection path and credentials on all machines

Client Side The user provides a connection path and credentials on all machines The system initiates SSH connections to each configured computer and launches a Bridge or The Call Center. Each component then connects to each other via TCP.

Connection Initiated

MPI Cluster Side Include a special mpi.h header file MPI calls are caught by wrapper functions Upon start up, each node creates a callback connection to The Call Center Data passed to MPI functions is sent back.

MPI Code Redirected

The Connected System

The Connected System

Session Modes An MPI session can be run in 3 modes: Play Just run like regular MPI Record (Record all messages) Record all messages Replay Use recorded messages to play back

Session Modes (Play) The Runtime behaves like regular MPI Nothing is saved to disk Nothing is read from disk Messages and parameters ARE sent back to The Client

Session Modes (Record) The Runtime Saves messages and parameters to a log file Executes the actual MPI call Saves the result

Session Modes (Replay) The Runtime does not execute any real MPI calls. All data is supplied from log files. No actual communication takes place Guarantees the same run as when the log file was recorded

Session Mode (Replay – Mixed) Mixed mode is special Some processes execute real MPI calls Some replay from log file Sometimes its necessary to execute MPI calls if communicating with someone who is executing real MPI calls; E.g. to avoid buffer overflow Validation is done on real values and log file values

Debugging Data The Runtime sends back 2 debugging messages per MPI command A PRE message indicating that an MPI command is about to be executed A POST message indicating that an MPI command completed Console messages are routed per node to the appropriate window.

Analyzing Data Debugging data gets displayed within the Console, Messages, or MPI tabs

Analyzing Data The Console Tab displays anything that the user’s code wrote to stdout.

Analyzing Data The Messages Tab displays messages as they come Matches Send/Receive pairs between nodes. Messages without a corresponding Send or Receive message get highlighted in red.

Analyzing Data The MPI tab displays all MPI commands in the order they were executed along with their parameters. Commands statuses (success, fail, or blocked) are displayed with icons in the Status Column.

Analyzing Data

Analyzing Data Buffer values can be requested and inspected.

Analyzing Data

Attaching GDB GDB can be attached to any node and controlled with the GDB Control Panel.

Attaching GDB

Attaching GDB

Source Code The source code to The Distributed Application Debugger can be found on GitHub at: https://github.com/mjones112000/DistributedApplicationDebugger

Questions??