GXP in nutshell You can send jobs (Unix shell command line) to many machines, very fast Very small prerequisites –Each node has python (ver 2.3.4 or later)

Slides:

Advertisements

Similar presentations

Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.

Advertisements

June 1, 1999Foreground/Background Processing1 Introduction to UNIX H. Foreground/Background Processing.

SSH Operation and Techniques - © William Stearns 1 SSH Operation and Techniques The Swiss Army Knife of encryption tools…

Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.

Southgreen HPC system Concepts Cluster : compute farm i.e. a collection of compute servers that can be shared and accessed through a single “portal”

Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.

Minimum intrusion GRID. Build one to throw away … So, in a given time frame, plan to achieve something worthwhile in half the time, throw it away, then.

Introducing the Command Line CMSC 121 Introduction to UNIX Much of the material in these slides was taken from Dan Hood’s CMSC 121 Lecture Notes.

Minimum intrusion GRID. Build one to throw away … So, in a given time frame, plan to achieve something worthwhile in half the time, throw it away, then.

Workload Management Massimo Sgaravatto INFN Padova.

The Difficulties of Distributed Data Douglas Thain Condor Project University of Wisconsin

KARMA with ProActive Parallel Suite 12/01/2009 Air France, Sophia Antipolis Solutions and Services for Accelerating your Applications.

Wave Relay System and General Project Details. Wave Relay System Provides seamless multi-hop connectivity Operates at layer 2 of networking stack Seamless.

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems CSCI-6140 – Computer Operating Systems David Goldschmidt, Ph.D.

Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.

Grids and Portals for VLAB Marlon Pierce Community Grids Lab Indiana University.

Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.

Logging into the linux machines This series of view charts show how to log into the linux machines from the Windows environment. Machine name IP address.

© 2008 Open Grid Forum Independent Software Vendor (ISV) Remote Computing Primer Steven Newhouse.

BOSCO Architecture Derek Weitzel University of Nebraska – Lincoln.

Network Monitoring System for the UNIX Lab Bradley Kita Capstone Project Mentor: Dr C. David Shaffer Fall 2004/Spring 2005.

Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”

Andrey Meeting 7 October 2003 General scheme: jobs are planned to go where data are and to less loaded clusters SUNY.

Beowulf Software. Monitoring and Administration Beowulf Watch 

Institute For Digital Research and Education Implementation of the UCLA Grid Using the Globus Toolkit Grid Center’s 2005 Community Workshop University.

REMOTE LOGIN. TEAM MEMBERS AMULYA GURURAJ 1MS07IS006 AMULYA GURURAJ 1MS07IS006 BHARGAVI C.S 1MS07IS013 BHARGAVI C.S 1MS07IS013 MEGHANA N. 1MS07IS050 MEGHANA.

What is SAM-Grid? Job Handling Data Handling Monitoring and Information.

Part Five: Globus Job Management A: GRAM B: Globus Job Commands C: Laboratory: globusrun.

Oracle Data Integrator Agents. 8-2 Understanding Agents.

FTP File Transfer Protocol Graeme Strachan. Agenda  An Overview  A Demonstration  An Activity.

1 High-Performance Grid Computing and Research Networking Presented by David Villegas Instructor: S. Masoud Sadjadi

Virtual Private Grid (VPG) : A Command Shell for Utilizing Remote Machines Efficiently Kenji Kaneda, Kenjiro Taura, Akinori Yonezawa Department of Computer.

Creating and running an application.

Condor Week 2004 The use of Condor at the CDF Analysis Farm Presented by Sfiligoi Igor on behalf of the CAF group.

Logging into the linux machines This series of view charts show how to log into the linux machines from the Windows environment. Machine name IP address.

SAN DIEGO SUPERCOMPUTER CENTER Inca Control Infrastructure Shava Smallen Inca Workshop September 4, 2008.

UNIX Unit 1- Architecture of Unix - By Pratima.

Distributed Data for Science Workflows Data Architecture Progress Report December 2008.

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

Tool Integration with Data and Computation Grid “Grid Wizard 2”

LSF Universus By Robert Stober Systems Engineer Platform Computing, Inc.

SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.

Process Manager Specification Rusty Lusk 1/15/04.

The Gateway Computational Web Portal Marlon Pierce Indiana University March 15, 2002.

Portable Batch System – Definition and 3 Primary Roles Definition: PBS is a distributed workload management system. It handles the management and monitoring.

Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor and (the) Grid (one of.

STAR Scheduling status Gabriele Carcassi 9 September 2002.

Beowulf Design and Setup Section 2.3.4~2.7: Adam.

Intro to Distributed Systems Hank Levy. 23/20/2016 Distributed Systems Nearly all systems today are distributed in some way, e.g.: –they use –they.

The use of the SCMS-EMI as scientific gateway in BCC of NGI_UA Authors: Andrii Golovynskyi, Andrii Malenko, Valentyna Cherepynets V. Glushkov Institute.

UNIX U.Y: 1435/1436 H Operating System Concept. What is an Operating System?  The operating system (OS) is the program which starts up when you turn.

Grid Computing: An Overview and Tutorial Kenny Daily BIT Presentation 22/09/2016.

Advanced Computing Facility Introduction

NAT、DHCP、Firewall、FTP、Proxy

Welcome to Indiana University Clusters

Open OnDemand: Open Source General Purpose HPC Portal

OpenPBS – Distributed Workload Management System

Welcome to Indiana University Clusters

Port Forwarding and Shell Login Essentials

GWE Core Grid Wizard Enterprise (

Logging into the linux machines

Virtualization in the gLite Grid Middleware software process

File Transfer Olivia Irving and Cameron Foss

Making PowerShell Useful

Initial job submission and monitoring efforts with JClarens

Overview of big data tools

Cluster Computing and the Grid, Proceedings

Making PowerShell Useful

Logging into the linux machines

Presentation transcript:

GXP in nutshell You can send jobs (Unix shell command line) to many machines, very fast Very small prerequisites –Each node has python (ver or later) –You have ssh access to it without being asked to enter passphrases (e.g., use ssh-agent for ssh) –Install GXP (only) to your home node. GXP multiplies itself to nodes you want to use

What is it useful for? With GXP, you can comfortably –operate many nodes, interactively or non- interactively –use nodes across multiple clusters –reach nodes behind firewalls/NATs –deal with many nodes some of which are daily dead or unavailable –coordinate multiple clusters as a parallel processing resource without installing any job scheduling software (PBS/condor etc.)

Things made simple by GXP (1) Launch a parallel program on many nodes across multiple clusters Kill them with a single stroke of Ctrl-C Simple PBS/Condor-like job scheduling Monitor specific programs (ps … on all nodes) Kill specific programs (killall … on all nodes) Clean up all processes as a last resort (bomb)

Things made simple by GXP (2) Copy a file to many nodes, some behind firewalls/NATs Elect a single node from each file system Get load-average of all nodes and drop highly-loaded nodes Check installation of a command and drop nodes that dont have it List processes consuming significant amount of CPU time

Our Experience A fairly large natural language processing task –parse > 100M web documents collected and archived by our web crawler –resource : 350 CPUs across two clusters –GXP integrates them without specific efforts Environments –No globus/PBS installed (globus/rsh ports are blocked across clusters) –Documents must be staged on demand due to disk capacity –Nodes in one of the two clusters cannot connect to outside the cluster. We used GXP to stage a file through multiple relaying hosts

Basic Usage `explore command: login & authenticate yourself to many nodes `e command: send and execute a command line (very fast)

Features (1) multihop logins You can reach nodes through other nodes Two typical scenarios where this is important –Home a cluster gateway cluster nodes –Very large clusters for which trees are mandatory Subsequent command submissions transparently reach all nodes home cluster gateways

Feature (2) node selection You dont always want to send commands to all nodes After logging in many nodes, you can interactively select some of them `smask command selects the nodes on which the last command succeeded

Feature (3) coordination syntax `e command takes an extended shell syntax e {{ S }} M –Run S on all selected nodes –Run M on home node –Merge all Ss standard out and feed it to M e B {{ S }} –Feed Bs standard out to all Ss e S is a abbreviation of e {{ S }}