REI – Recipe Execution Infrastructure Jens Knudstrup/2005-02-08 REI Recipe Execution Infrastructure.

Slides:



Advertisements
Similar presentations
How We Manage SaaS Infrastructure Knowledge Track
Advertisements

Network II.5 simulator ..
GXP in nutshell You can send jobs (Unix shell command line) to many machines, very fast Very small prerequisites –Each node has python (ver or later)
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
NGS computation services: API's,
NGAS – The Next Generation Archive System Jens Knudstrup NGAS The Next Generation Archive System.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 3: Operating Systems Computer Science: An Overview Tenth Edition.
Operating System.
Operating System Structures
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
1 Generic logging layer for the distributed computing by Gene Van Buren Valeri Fine Jerome Lauret.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
Institute for Software Science – University of ViennaP.Brezany 1 Databases and the Grid Peter Brezany Institute für Scientific Computing University of.
1 Last Time: OS & Computer Architecture Modern OS Functionality (brief review) Architecture Basics Hardware Support for OS Features.
A Grid Resource Broker Supporting Advance Reservations and Benchmark- Based Resource Selection Erik Elmroth and Johan Tordsson Reporter : S.Y.Chen.
Operating System Support Focus on Architecture
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Computer Organization and Architecture
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 3: Operating Systems Computer Science: An Overview Tenth Edition.
SERVICE BROKER. SQL Server Service Broker SQL Server Service Broker provides the SQL Server Database Engine native support for messaging and queuing applications.
1 Status of the ALICE CERN Analysis Facility Marco MEONI – CERN/ALICE Jan Fiete GROSSE-OETRINGHAUS - CERN /ALICE CHEP Prague.
Grid Computing Meets the Database Chris Smith Platform Computing Session #
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
Parallel Processing LAB NO 1.
Oracle10g RAC Service Architecture Overview of Real Application Cluster Ready Services, Nodeapps, and User Defined Services.
Resource Management and Accounting Working Group Working Group Scope and Components Progress made Current issues being worked Next steps Discussions involving.
CPS120: Introduction to Computer Science Operating Systems Nell Dale John Lewis.
Operating Systems CS3502 Fall 2014 Dr. Jose M. Garrido
Tools and Utilities for parallel and serial codes in ENEA-GRID environment CRESCO Project: Salvatore Raia SubProject I.2 C.R. ENEA-Portici. 11/12/2007.
Resource management system for distributed environment B4. Nguyen Tuan Duc.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 3: Operating Systems Computer Science: An Overview Tenth Edition.
◦ What is an Operating System? What is an Operating System? ◦ Operating System Objectives Operating System Objectives ◦ Services Provided by the Operating.
Distributed Systems Fall 2014 Zubair Amjad. Outline Motivation What is Sqoop? How Sqoop works? Sqoop Architecture Import Export Sqoop Connectors Sqoop.
DIANE Project CHEP 03 DIANE Distributed Analysis Environment for semi- interactive simulation and analysis in Physics Jakub T. Moscicki,
Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.
Can we use the XROOTD infrastructure in the PROOF context ? The need and functionality of a PROOF Master coordinator has been discussed during the meeting.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
Chapter 7 Operating Systems. Define the purpose and functions of an operating system. Understand the components of an operating system. Understand the.
Tarball server (for Condor installation) Site Headnode Worker Nodes Schedd glidein - special purpose Condor pool master DB Panda Server Pilot Factory -
Distributed System Concepts and Architectures 2.3 Services Fall 2011 Student: Fan Bai
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Chapter 2 Processes and Threads Introduction 2.2 Processes A Process is the execution of a Program More specifically… – A process is a program.
Chapter 3 Operating Systems © 2007 Pearson Addison-Wesley. All rights reserved.
Operating Systems © 2007 Pearson Addison-Wesley. All rights reserved.
Chapter 3: Operating Systems
Faucets Queuing System Presented by, Sameer Kumar.
Lesson 2: association Lesson 2: Association Lesson 1: dfos - vision and architecture Directory structure Installation Tips and tricks Lesson 3: Operations.
UNIX Unit 1- Architecture of Unix - By Pratima.
Pipeline Basics Jared Crossley NRAO NRAO. What is a data pipeline?  One or more programs that perform a task with reduced user interaction.  May be.
Pipeline Introduction Sequential steps of –Plugin calls –Script calls –Cluster jobs Purpose –Codifies the process of creating the data set –Reduces human.
Cluster Computing Applications for Bioinformatics Thurs., Sept. 20, 2007 process management shell scripting Sun Grid Engine running parallel programs.
ClearQuest XML Server with ClearCase Integration Northwest Rational User’s Group February 22, 2007 Frank Scholz Casey Stewart
A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.
Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.
© 2007 IBM Corporation Snehal S. Antani, WebSphere XD Technical Lead SOA Technology Practice IBM Software WebSphere.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Plug-In Architecture Pattern. Problem The functionality of a system needs to be extended after the software is shipped The set of possible post-shipment.
Next Generation of Apache Hadoop MapReduce Owen
CT101: Computing Systems Introduction to Operating Systems.
Advanced Integration and Deployment Techniques
Introduction to Apache
Support for ”interactive batch”
Chapter 3: Operating Systems Computer Science: An Overview
Presentation transcript:

REI – Recipe Execution Infrastructure Jens Knudstrup/ REI Recipe Execution Infrastructure

REI – Recipe Execution Infrastructure Jens Knudstrup/ Purpose of REI Main Objectives of REI -Provide the services of a parallel Batch Queue System. -Make it easy to control and monitor complicated batches with job synchronization. -Make it possible to distribute tasks (processing load) over a cluster of CPUs/nodes. Not Provided in the Present Implementation -Services for distributing data within the cluster to the nodes doing the processing (data sharing/distribution done via a common storage area/file server). -Services provided for resource management and advertising. -Services provided for explicit load balancing (optimized job distribution). -Special features for GRID appliance provided.

REI – Recipe Execution Infrastructure Jens Knudstrup/ Main Features Main Features of REI -Implemented in C++ (in house implementation from scratch). -Uses RDBMS for information sharing and task synchronization. -Execution of shell commands or native execution of CPL Recipes (no generic interfacing to shared object files). -Pworker task execution daemon provided – can take three roles: -Process Master Commands – Master Pworker. -Process Standard Commands – Standard Pworker. -Process Master and Standard Comands. -Command line utilities provided to add/remove/monitor commands and to control Pworkers. -API provided for implementing Master Command Libraries (also referred to as Recipe Planners) and Standard Command Libraries.

REI – Recipe Execution Infrastructure Jens Knudstrup/ Command Line Interface Interaction with REI -Command line interface provided: -addcmd: Add a Master Command in the Master Command Queue (handles ABs and SOFs, which are not part of core of REI). -cmdstat: Query the status of all commands or a specific command. Tail feature provided. -rmcmd: Remove information for one command or all commands from the Command Queues (clean up). -pworker: The Pworker daemon. -stopworker: Stop one specific Pworker or all Pworkers running. -listworkers: List Pworkers running in the system. -rmworker: Remove a Pworker (make it exit) or all Pworkers. -The commands are not part of the core REI system, but should be seen as convenience features. They are based on the REI libraries. -Can add commands in the DB directly via the REI libraries, i.e., can control and monitor the operation of REI programmatically.

REI – Recipe Execution Infrastructure Jens Knudstrup/ Command Lifecycle Command States -Each command submitted has 1 of 7 states indicating its current status:

REI – Recipe Execution Infrastructure Jens Knudstrup/ Command Transitions

REI – Recipe Execution Infrastructure Jens Knudstrup/ Interprocess Synchronization Interprocess Synchronization/Information Sharing -Pworkers synchronize themselves via the DB. -DB also used for exchanging information between processes in the system: -Tables: -pworker_registry: Information about Pworkers in the system (ID, node, Master and/or Standard Commands, …). -pworker_master_command_queue: Contains information for the Master Commands waiting to be executed under execution and executed. -pworker_master_sequencer: Contains information about Master Commands being BLOCKED. -pworker_command_queue: Standard Commands waiting to be executed under execution and executed. -pworker_command_sequencer: Used to sequence Standard Commands. -pworker_log: Log messages from Pworker processes.

REI – Recipe Execution Infrastructure Jens Knudstrup/ OmegaCam Demo Science Reduction Cascade/1 OmegaCam Science Demo Cascade – Example -Used adapted WFI frames (8 extensions). -Provided: -OCAM REI Recipe Planner Plug-In to schedule tasks for the recipes (general Recipe Planner for all Recipes made). -REI Standard Command Library Plug-Ins to do FITS file splitting and joining. -Cascade Scheduler Script to submit Master Commands and to create SOFs needed. -6 Recipes executed during the cascade (6 Master Commands issued to REI). -Total number of commands scheduled within REI for the cascade: ~100. -Total number of intermediate/temporary and final data products: ~200. -Number of SOFs involved: 10.

REI – Recipe Execution Infrastructure Jens Knudstrup/ OmegaCam Demo Science Reduction Cascade/2 Setting up Cascade – Example: $ addcmd -name ocam_reduce_sci_W_ T16:29:05 -bg -waitfor ocam_reduce_std_W_ T16:29:05 -recipe ocam_reduce_sci /data/ocam/sof/ocam_reduce_sci_W_ T16:29:05.sof -out /raid/data/ocam/products/ocam_reduce_sci_W_ T16:29:05 $ addcmd -name ocam_reduce_std_W_ T16:29:05 -bg -waitfor ocam_mflat_W_ T16:29:05 -trigger ocam_reduce_std_W_ T16:29:05 -recipe ocam_reduce_std /raid/data/ocam/sof/ocam_reduce_std_W_ T16:29:05.sof -out /raid/data/ocam/products/ocam_reduce_std_W_ T16:29:05 $ addcmd -name ocam_mflat_W_ T16:29:05 -bg -waitfor ocam_mtwilight_W_ T16:29:05 -trigger ocam_mflat_W_ T16:29:05 -recipe ocam_mflat /raid/data/ocam/sof/ocam_mflat_W_ T16:29:05.sof -out /raid/data/ocam/products/ocam_mflat_W_ T16:29:05 …

REI – Recipe Execution Infrastructure Jens Knudstrup/ Task Synchronization Master Split BIAS Join Master Split DOME Join Compl

REI – Recipe Execution Infrastructure Jens Knudstrup/ Command Scheduling Frame A Frame B Split Join Recipe

REI – Recipe Execution Infrastructure Jens Knudstrup/ DFO Cascading Controlling REI – DFO Environment -Already used in operation by DFO (since a while). -DFO uses REI to control scheduling of a UNIX shell script, which itself controls the execution of the recipes (calling internally esorex ). -DFO uses parallelism at frame level, no parallelism in connection with the processing of each frame. -REI used as a queue system, jobs are submitted and the scheduling and execution of the jobs carried out by REI. -Example addcmd in DFO environment: $ addcmd -name SINFO T20:25:28.895_tpl.ab -bg -trigger mflat_SINFO T20:25:28.895_tpl.ab -exe processAB -a SINFO T20:25:28.895_tpl.ab $ addcmd -name SINFO T19:55:07.961_tpl.ab -bg -trigger mwave_SINFO T19:55:07.961_tpl.ab -waitfor mflat_SINFO T20:25:28.895_tpl.ab -exe processAB -a SINFO T19:55:07.961_tpl.ab

REI – Recipe Execution Infrastructure Jens Knudstrup/ Using REI How to Integrate a Pipeline in REI (Simplified …) -Decide how to execute the recipes: 1.Native way in the form of CPL Recipes. 2.Invoke the recipe library methods/functions from within Standard Commands. 3.Execute via jacket scripts/applications encapsulating recipe. -Define the necesary/desirable level of parallelism. -Define execution plans for the various cascades. -Implement Recipe Planner, if necessary, to do the internal coordination of the command scheduling (+ producing data for the Standard Commands). -Implement Standard Command Library with special commands, which should execute internally within the REI environment (if required). -Implement external control scripts to submit Master Commands, defining dependencies and providing data for the command execution if necessary. -Decide architecture of processing cluster (number of Master Pworkers, Pworkers, CPUs, nodes, amount of memory per CPU, …). -Start up Pworkers, defining their proper role + referring to the Command Plug-in Libraries provided (if any) and/or possible CPL Recipe Plug-in Libraries.