GXP Make Kenjiro Taura. GXP make What is it? –Another parallel & distributed make What is it useful for? –Running jobs with dependencies in parallel –Running.

Slides:



Advertisements
Similar presentations
GXP in nutshell You can send jobs (Unix shell command line) to many machines, very fast Very small prerequisites –Each node has python (ver or later)
Advertisements

MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
1 CS345 Operating Systems Φροντιστήριο Άσκησης 1.
MapReduce Online Veli Hasanov Fatih University.
© 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Job Submission.
Distributed System Structures Network Operating Systems –provide an environment where users can access remote resources through remote login or file transfer.
Spark: Cluster Computing with Working Sets
Introduction to Unix – CS 21 Lecture 10. Lecture Overview Midterm questions Jobs and processes description The foreground and background Controlling jobs.
Distributed Computations
1 Processes Professor Jennifer Rexford
CPS216: Advanced Database Systems (Data-intensive Computing Systems) How MapReduce Works (in Hadoop) Shivnath Babu.
L22: SC Report, Map Reduce November 23, Map Reduce What is MapReduce? Example computing environment How it works Fault Tolerance Debugging Performance.
Fundamentals of Python: From First Programs Through Data Structures
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
JGI/NERSC New Hardware Training Kirsten Fagnan, Seung-Jin Sul January 10, 2013.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.
ADLB Update Recent and Current Adventures with the Asynchronous Dynamic Load Balancing Library Rusty Lusk Mathematics and Computer Science Division Argonne.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
MapReduce.
PVM. PVM - What Is It? F Stands for: Parallel Virtual Machine F A software tool used to create and execute concurrent or parallel applications. F Operates.
DynamicBLAST on SURAgrid: Overview, Update, and Demo John-Paul Robinson Enis Afgan and Purushotham Bangalore University of Alabama at Birmingham SURAgrid.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Sun Grid Engine. Grids Grids are collections of resources made available to customers. Compute grids make cycles available to customers from an access.
LOGO Scheduling system for distributed MPD data processing Gertsenberger K. V. Joint Institute for Nuclear Research, Dubna.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Server Sockets: A server socket listens on a given port Many different clients may be connecting to that port Ideally, you would like a separate file descriptor.
Introduction Use of makefiles to manage the build process Declarative, imperative and relational rules Environment variables, phony targets, automatic.
Chapter 4 UNIX Common Shells Commands By C. Shing ITEC Dept Radford University.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Computer Systems Principles Processes & Threads Emery Berger and Mark Corner University.
Grid Computing I CONDOR.
Pwrake: An extensible parallel and distributed flexible workflow management tool Masahiro Tanaka and Osamu Tatebe University of Tsukuba PRAGMA March.
MapReduce How to painlessly process terabytes of data.
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
CS 346 – Chapter 4 Threads –How they differ from processes –Definition, purpose Threads of the same process share: code, data, open files –Types –Support.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
CS162B: Pipes Jacob T. Chan. Pipes  These allow output of one process to be the input of another process  One of the oldest and most basic forms of.
Enabling Grids for E-sciencE EGEE-III INFSO-RI Using DIANE for astrophysics applications Ladislav Hluchy, Viet Tran Institute of Informatics Slovak.
The Alternative Larry Moore. 5 Nodes and Variant Input File Sizes Hadoop Alternative.
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
CE Operating Systems Lecture 10 Processes and process management in Linux.
Chris Lynnes, NASA Goddard Space Flight Center NASA/Goddard EARTH SCIENCES DATA and INFORMATION SERVICES CENTER (GES DISC) Analyzing.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
AMH001 (acmse03.ppt - 03/7/03) REMOTE++: A Script for Automatic Remote Distribution of Programs on Windows Computers Ashley Hopkins Department of Computer.
C-Store: MapReduce Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 22, 2009.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
 Process Concept  Process Scheduling  Operations on Processes  Cooperating Processes  Interprocess Communication  Communication in Client-Server.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Source Level Debugging of Parallel Programs Roland Wismüller LRR-TUM, TU München Germany.
HUBbub 2013: Developing hub tools that submit HPC jobs Rob Campbell Purdue University Thursday, September 5, 2013.
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
Cliff Addison University of Liverpool NW-GRID Training Event 26 th January 2007 SCore MPI Taking full advantage of GigE.
Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
Advanced Taverna Aleksandra Pawlik University of Manchester materials by Katy Wolstencroft, Aleksandra Pawlik, Alan Williams
1 An unattended, fault-tolerant approach for the execution of distributed applications Manuel Rodríguez-Pascual, Rafael Mayo-García CIEMAT Madrid, Spain.
GridWay Overview John-Paul Robinson University of Alabama at Birmingham SURAgrid All-Hands Meeting Washington, D.C. March 15, 2007.
TensorFlow– A system for large-scale machine learning
OpenPBS – Distributed Workload Management System
Large-scale file systems and Map-Reduce
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn.
BIMSB Bioinformatics Coordination
Integration of Singularity With Makeflow
Sun Grid Engine.
A Virtual Machine Monitor for Utilizing Non-dedicated Clusters
Presentation transcript:

GXP Make Kenjiro Taura

GXP make What is it? –Another parallel & distributed make What is it useful for? –Running jobs with dependencies in parallel –Running many long-running jobs where FT is mandatory –Managing dynamic jobs generated from changing sources Make is a very good “workflow” language and execution engine

Usage STEP 1: With GXP, grab (explore) as many hosts as you like STEP 2: Write a regular Makefile and then tasks are distributed to hosts you explored ( –j option for parallel execution) $ gxpc use … $ gxpc explore … $ gxpc make -j

Why make is good? (1) Concise descriptions –A single line can generate tasks with1,000 different parameters tasks with 1,000 different input files (almost) automatic parallelization –GNU make’s –j option (but this is for SMP) Dependency management –Make analyzes what to do from file system states and executes them –Parallelization almost without efforts

Why make is good? (2) Simple fault tolerance –If some jobs fail, run make again –Byproduct of dependency management Popular and robust implementation –Much more popular than any other “future- standard” workflow systems –GNU make comfortably manages 10,000 child processes on 1GB memory machine (quality rarely found in research prototypes)

Parallel make is not new Sun Grid Engine has qmake –Only for SGE –Each job takes a job submission through the queue –GXP is an “overlay scheduler” Other implementations (dmake, etc.) –I don’t know any which runs with unmodified GNU make and runs on multiple underlying resource access protocols

Useful Features state.html –Job/worker status, parallelism profile, etc. –Simple heuristics find files in the command lines and they are made anchors

Current Directory Rule: –If you run gxpc make from directory D, then D must exist on all hosts They do not have to be the same (shared) directory. Just directories of the same path names suffice –Commands will run with D as CWD –You do not have to cd to D on all hosts

Current Directory Special rule: –If D is under your home directory, it is considered “relative” to your home –E.g., On A: you have /home/tau/abc On B: you have /usr/home/taue/abc If you run gxpc make in A’s /home/tau/abc, then a command would run on B in /usr/home/taue/abc

How many tasks per worker? (tentative) –Write a config file gxpc_make.conf Format: –cpu regexp n

Implementation to understand how resources are used and where does the current scalability limit come from

GNU make features essential for GXPC make GNU make is a great tool that –allows command shell to be replaced (SHELL=…) –allows common include files to be specified in environment variables (MAKEFILES) –handles >10,000 with no problems

SHELL= By writing in your Makefile, make will run foo –c cmd to run cmd (instead of /bin/sh – c cmd) You can effectively “intercept” sub-process invocation of make SHELL=foo

MAKEFILES environment But the user would have to insert SHELL= to every single Makefile? No, thanks to MAKEFILES environment variable make automatically includes files specified in this variable before every Makefile

Process Structure GXPC make GNU make mksh … … scheduler (xmake) gxpc Compute Resources GXP daemon socket parent-child process gxpc This happens on the master host

How commands run and die GNU make spawns a shell (mksh) for command to run mksh talks to our scheduler (xmake) xmake selects jobs and spawns gxpc to send them to remote when a command terminates, xmake sends EOF to the mksh mksh exits upon receiving EOF GNU make then knows the command finished

Performance/Scalability Master host is a bottleneck This seems unavoidable with the approach of using unmodified GNU make –Question is when we hit the limit In our structure, –mksh (direct children of make) –gxpc are the main resource consumers

Naively, with P workers will retain –N mksh processes –P gxpc processes on the master host –If mksh and gxpc each takes 1MB, running 1,000 workers eat 2GB make –j N

In our implementation –gxpc will terminate without waiting for the remote command to finish (P is small) –mksh becomes (exec) a tiny process that only waits for a line from stdin and exits if read x; then exit $x; else exit 126;

Performance tips (1) Use multi-core machines for the master –They are good for handling thousands of small processes Use -j N to limit concurrency (to somewhere around worker numbers)

Performance tips (2) Load avg on the master gets high (10.0 is norm) but this is not a red signal, per se Watch for thrashing (si and so fields) state.html also shows load avg and vmstat, but update is infrequent vmstat 1

Useful GNU make tips Pattern rules (%.xyz : %.pqr) $(wildcard …) $(shell …) $(addsuffix …) $<, $* targetinput stem (whatever % matched)

Recursive make for “dynamic jobs” With a single invocation of make, it is difficult (impossible?) to have a graph of dynamic number of tasks –dynamic : not known until some commands run Example: –How to split a large data into fixed size chunks and apply a command to each in parallel? Result : big split big for f in big.*; do … done This is serial

Recursive make result : big split big frag. $(MAKE) $(subst frag.,int., $(wildcard frag.*)) cat int.* > result int.% : frag.% …

Issues and extensions for large scale Job placement Concurrency control File sharing –Staging –File systems

Job placement Currently very limited With == before the command, it will run locally. E.g., will run gcc locally main.exe : a.o b.o c.o == gcc –o main.exe a.o b.o c.o

Concurrency Control You may want to limit the concurrency of certain types of jobs. e.g., –< 10 concurrent downloads from a web server gxpc make can define “semaphore”s and annotate commands so they acquire some of them Notation: –=(res=semaphore_name) or =(semaphore name) before the command

Example Then will run at most 5 wgets in parallel %.dat : =(fileserv) wget gxpc make -- --sem fileserv:5

Default Concurrency By default, local jobs (commands with ==) have the concurrency limit of 1 You can change it with gxpc make -- --sem local: N

File sharing Within a cluster it is usually not an issue (NFS to some extent or some cluster file system) It is a headache across clusters –Staging –Distributed file systems? Gfarm, hadoop, sshfs-mux

Staging (experimental) Only staging from/to the local host is currently supported Give =(in= file, out= file, … ) before the command. E.g., a.dat : a.src =(in=$