DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.

Slides:



Advertisements
Similar presentations
LEAD Portal: a TeraGrid Gateway and Application Service Architecture Marcus Christie and Suresh Marru Indiana University LEAD Project (
Advertisements

Three types of remote process invocation
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
© 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Job Submission.
GENI Experiment Control Using Gush Jeannie Albrecht and Amin Vahdat Williams College and UC San Diego.
1 Generic logging layer for the distributed computing by Gene Van Buren Valeri Fine Jerome Lauret.
GANGA Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
Software Frameworks for Acquisition and Control European PhD – 2009 Horácio Fernandes.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 10: Server Administration.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.
Installing and running COMSOL on a Windows HPCS2008(R2) cluster
Fundamentals of Python: From First Programs Through Data Structures
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
Overview SAP Basis Functions. SAP Technical Overview Learning Objectives What the Basis system is How does SAP handle a transaction request Differentiating.
MLalign2D on the Grid Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.
Introduction to the Enterprise Library. Sounds familiar? Writing a component to encapsulate data access Building a component that allows you to log errors.
Environment for Management of Experiments on the Grid Master of Science Thesis AGH University of Science and Technology, Krakow, Poland Faculty of Electrical.
KARMA with ProActive Parallel Suite 12/01/2009 Air France, Sophia Antipolis Solutions and Services for Accelerating your Applications.
© 2012 WIPRO LTD | 1 Version 1.0a, 23 rd April 2012 TTCN-3 Users Conference Practical integration of TTCN-3 with Robot test automation framework.
Connecting OurGrid & GridSAM A Short Overview. Content Goals OurGrid: architecture overview OurGrid: short overview GridSAM: short overview GridSAM: example.
Understanding the CORBA Model. What is CORBA?  The Common Object Request Broker Architecture (CORBA) allows distributed applications to interoperate.
Flexibility and user-friendliness of grid portals: the PROGRESS approach Michal Kosiedowski
Grids and Portals for VLAB Marlon Pierce Community Grids Lab Indiana University.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES P. Saiz (IT-ES) AliEn job agents.
1 Overview of the Application Hosting Environment Stefan Zasada University College London.
DIANE Project CHEP 03 DIANE Distributed Analysis Environment for semi- interactive simulation and analysis in Physics Jakub T. Moscicki,
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
Ganga A quick tutorial Asterios Katsifodimos Trainer, University of Cyprus Nicosia, Feb 16, 2009.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Enabling Grids for E-sciencE EGEE-III INFSO-RI Using DIANE for astrophysics applications Ladislav Hluchy, Viet Tran Institute of Informatics Slovak.
Cloud Age Time to change the programming paradigm?
Convert generic gUSE Portal into a science gateway Akos Balasko 02/07/
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
S. Guatelli, A. Mantero, J. Moscicki, M. G. Pia Geant4 medical simulations in a distributed computing environment 4th Workshop on Geant4 Bio-medical Developments.
Threaded Programming in Python Adapted from Fundamentals of Python: From First Programs Through Data Structures CPE 401 / 601 Computer Network Systems.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Direct gLExec integration with PanDA Fernando H. Barreiro Megino CERN IT-ES-VOS.
The EDGeS project receives Community research funding 1 Porting Applications to the EDGeS Infrastructure A comparison of the available methods, APIs, and.
Interactive Workflows Branislav Šimo, Ondrej Habala, Ladislav Hluchý Institute of Informatics, Slovak Academy of Sciences.
INFSO-RI Enabling Grids for E-sciencE Ganga 4 – The Ganga Evolution Andrew Maier.
Getting Started with SIDL using the ANL SIDL Environment (ASE) ANL SIDL Team MCS Division, ANL April 2003 The ANL SIDL compilers are based on the Scientific.
COMP 655: Distributed/Operating Systems Summer 2011 Dr. Chunbo Chu Week 2, 3: Communication 1/30/20161Distributed Systems - COMP 655.
Experiences Running Seismic Hazard Workflows Scott Callaghan Southern California Earthquake Center University of Southern California SC13 Workflow BoF.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
K. Harrison CERN, 22nd September 2004 GANGA: ADA USER INTERFACE - Ganga release status - Job-Options Editor - Python support for AJDL - Job Builder - Python.
Daniele Spiga PerugiaCMS Italia 14 Feb ’07 Napoli1 CRAB status and next evolution Daniele Spiga University & INFN Perugia On behalf of CRAB Team.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
Susanna Guatelli Geant4 in a Distributed Computing Environment S. Guatelli 1, P. Mendez Lorenzo 2, J. Moscicki 2, M.G. Pia 1 1. INFN Genova, Italy, 2.
Application Specific Module Tutorial Zoltán Farkas, Ákos Balaskó 03/27/
ATLAS Distributed Analysis Dietrich Liko IT/GD. Overview  Some problems trying to analyze Rome data on the grid Basics Metadata Data  Activities AMI.
STAR Scheduler Gabriele Carcassi STAR Collaboration.
INFSO-RI Enabling Grids for E-sciencE Ganga 4 Technical Overview Jakub T. Moscicki, CERN.
Nguyen Thi Thanh Nha HMCL by Roelof Kemp, Nicholas Palmer, Thilo Kielmann, and Henri Bal MOBICASE 2010, LNICST 2012 Cuckoo: A Computation Offloading Framework.
Tutorial on Science Gateways, Roma, Catania Science Gateway Framework Motivations, architecture, features Riccardo Rotondo.
Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Windows Server 2003 { First Steps and Administration} Benedikt Riedel MCSE + Messaging
GridWay Overview John-Paul Robinson University of Alabama at Birmingham SURAgrid All-Hands Meeting Washington, D.C. March 15, 2007.
Threaded Programming in Python
Design rationale and status of the org.glite.overlay component
Module 01 ETICS Overview ETICS Online Tutorials
Gordon Erlebacher Florida State University
Production Manager Tools (New Architecture)
Presentation transcript:

DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid

DIANE Overview DIANE is a lightweight distributed framework for parallel scientific applications in master-worker model. It assumes that a job may be split into a number of independent tasks which is a typical case in many scientific application. As opposed to standard message passing libraries such as MPI, the DIANE framework takes care of all synchronization, communication and workflow management details on behalf of the application. The execution of a job is fully controlled by the framework which decides when and where the tasks are executed. DIANE is a thin software layer which easily works on top of more fundamental middleware such as LSF, PBS or the Grid Resource Brokers. It may also work in a standalone mode and does not require any complex underlying software.

DIANE Overview Main Features and Design Principles –The big picture

DIANE Overview Master-Worker Workflow Model –DIANE is based on pull model - workers ask for tasks to the master. Master decides how to assign tasks to workers and user may optimize this process for a particular application.

DIANE Overview Active feedback versus batch operation mode –Typically End User interacts with the GRID using some sort of User Interface. User Interface may be as simple as a set of command line tools or more complicated GUI based application which contains modules to prepare and monitor jobs (Application and Job Handler respectively).

DIANE Overview In the active feedback operation mode each Worker pull for new subjobs when it becomes available. Fast feedback to Job Master allows interactive work for the end user.

DIANE Overview Core framework –DIANE core framework does not depend on any concrete application (in particular any data analysis software) and is explicitly designed in such a way that application specific parts are implemented as a separate component. Core framework is implemented in python running CORBA in the backend in a way completely transparent for applications. Supported languages for applications –C++ and python application components are supported directly and may be configured at runtime according to different usage scenarios (as threads or separate processes). Application written in any language in a form of executable file (FORTRAN, Java) may also be used. Error Recovery –Users may specify customized error recovery policies if needed. A set of default policies is provided and may be used immediately. User may easily write and add special recovery policies by implementing simple python functions

DIANE Overview Job Monitoring and Outbound Connectivity from Worker Nodes –Remote client (user) gets full information about the state of a job and may connect and disconnect at any time. Administrator may set up any number of proxies between Client and Master so outbound connectivity from worker nodes is not required. In this way DIANE may be very easily adapted to local policies of computing centers. –Example: each of the commands below is executed on a different machine. Connecting remote client directly to job master: % diane.startmaster --job=test # cluster % diane.startclient --job=test # end user Connecting remote client through a proxy: % diane.startmaster --job=test # cluster % diane.startclient --proxy # proxy on a gateway machine of the cluster % diane.startclient --job=test # end user

DIANE Overview Simple single-user job execution on a local cluster –User using LSF at CERN may start a new job running master on his local desktop machine while submitting workers as individual jobs to LSF: –% diane.startjob --job=test --workers=30 --broker=LSF --broker- options=-q8nm Software building blocks –Master/Worker components may be arranged in a variaty of way to build more sophistcated systems or to integrate into existing frameworks.

DIANE Overview How DIANE fits into the GRID picture –DIANE runs on top of low-level GRID services.

DIANE Overview Quick start –JobInitData contains application specific parameters for 'crashTest' application, which is used to simulate application failures in different time patterns. Here we will use it to make sure everything was installed correctly. diane.startjob -j $DIANE_TOP/dev/workspace/testOK.job --wms=xterm DIANE: 22:18:39: Initializing: appname = crashTest DIANE: 22:18:39: starting new job: id = 2 DIANE: 22:18:39: number of registered workers = 0 DIANE: 22:18:39: client running... [, ] [, ] [, ] [, ] [, ] [, ] [, ] [, ] [, ] [, ] DIANE: 22:18:39: job plan: #10 tasks : 22:18:39: current job processing time: 0 s

DIANE Overview At the same time 2 xterminal windows should pop-up automatically: you will see the worker immediately put to work by the master and tasks succesively dispatched, executed and integrated. switching to current user job workspace: /home/moscicki/diane.workspace/jobs/2 DIANE: 22:19:41: reading master address from the default file: MasterOID DIANE: 22:19:42: registering new worker with wid = 1 worker: 22:19:42: initializing job 2, worker id 1 worker: 22:19:42: job initialization finished with the status: ok : 22:19:43: dispatching taskid=1 to worker wid=1 worker: 22:19:43: starting task #1 doing action: sleeping: worker: 22:19:48: task 1 finished with the status: ok DIANE: 22:19:48: recieved result, taskid =1 status: ok from worker: 1 integrating result... waiting DIANE: 22:19:54: Integrated result successfully... : 22:19:54: dispatching taskid=2 to worker wid=1 worker: 22:19:54: starting task #2 doing action: sleeping: worker: 22:20:05: task 2 finished with the status: ok

DIANE Overview At the end of job execution you should see output like this: : 22:22:55: job completed ok, quitting control loop DIANE: 22:22:55: notifying workers about finished job DIANE: 22:22:55: deactivating master worker: 22:22:55: notification from master: job 2 finished worker: 22:22:55: worker cleanup status: ok DIANE: 22:22:55: Trying to terminate server... DIANE: 22:22:55: notifying client Job terminated, id= 2 Summary = DIANE: 22:22:55: Trying to terminate server... DIANE: 22:22:55: job output in: /home/moscicki/diane.workspace/jobs/2

DIANE Overview You can construct applications by creating Planner, Integrator and Worker objects in Python language. You decide what data structures are exchanged between these objects. More examples may be found in $DIANE_TOP/dev/applications.

DIANE Overview class Planner: def env_createPlan(self, jobData, chunkNum): init_list = [] import random random.seed(jobData[1]) prob = jobData[2] avg_wait = jobData[3] std_dev = jobData[4] #... for i in range(jobData[0]): action = random.choice(failures.values()) init_list.append([action,random.gauss(avg_wait, std_dev)]) return (None,init_list)

DIANE Overview class Integrator: def env_init(self,job_data): pass def env_addPartialOutput(self, wait): print "integrating result... waiting..."+`wait` if wait>0: time.sleep(wait) return 1 def env_getResult(self): return None

DIANE Overview class Worker: def env_init(self, init_data): return 1 def env_performWork(self, what): action = what[0] wait = what[1] print "doing action: " + str(action) + " sleeping: " + `wait` if wait > 0: time.sleep(wait) return action(wait) def env_done(self): return 1

DIANE Overview DIANE Is free software under the GPL license You can download at: