- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.

Slides:



Advertisements
Similar presentations
DataTAG WP4 Meeting CNAF Jan 14, 2003 Interfacing AliEn and EDG 1/13 Stefano Bagnasco, INFN Torino Interfacing AliEn to EDG Stefano Bagnasco, INFN Torino.
Advertisements

Workload Management David Colling Imperial College London.
EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.
Workload management Owen Maroney, Imperial College London (with a little help from David Colling)
INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
The Grid Constantinos Kourouyiannis Ξ Architecture Group.
Job Submission The European DataGrid Project Team
WP 1 Grid Workload Management Massimo Sgaravatto INFN Padova.
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
The DataGrid Project NIKHEF, Wetenschappelijke Jaarvergadering, 19 December 2002
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Submission Fokke Dijkstra RuG/SARA Grid.
GGF Toronto Spitfire A Relational DB Service for the Grid Peter Z. Kunszt European DataGrid Data Management CERN Database Group.
Basic Grid Job Submission Alessandra Forti 28 March 2006.
DataGrid Kimmo Soikkeli Ilkka Sormunen. What is DataGrid? DataGrid is a project that aims to enable access to geographically distributed computing power.
Magda – Manager for grid-based data Wensheng Deng Physics Applications Software group Brookhaven National Laboratory.
Job Submission The European DataGrid Project Team
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
Computational grids and grids projects DSS,
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES P. Saiz (IT-ES) AliEn job agents.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
Enabling Grids for E-sciencE Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
Nadia LAJILI User Interface User Interface 4 Février 2002.
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
Grid Workload Management Massimo Sgaravatto INFN Padova.
Job Submission The European DataGrid Project Team
Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL The.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
Author - Title- Date - n° 1 Partner Logo WP5 Summary Paris John Gordon WP5 6th March 2002.
EGEE-II INFSO-RI Enabling Grids for E-sciencE An Introduction to the EGEE Project Presented by Min Tsai ISGC 2007, Taipei With thanks.
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Feb. 06, Introduction to High Performance and Grid Computing Faculty of Sciences,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Submission Fokke Dijkstra RuG/SARA Grid.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
INFSO-RI Enabling Grids for E-sciencE Αthanasia Asiki Computing Systems Laboratory, National Technical.
High-Performance Computing Lab Overview: Job Submission in EDG & Globus November 2002 Wei Xing.
EGEE is a project funded by the European Union under contract IST WS-Based Advance Reservation and Co-allocation Architecture Proposal T.Ferrari,
Workload Management System Jason Shih WLCG T2 Asia Workshop Dec 2, 2006: TIFR.
Induction: General components of Grid middleware and User Interfaces –April 26-28, General components of Grid middleware and User Interfaces Roberto.
Summary from WP 1 Parallel Section Massimo Sgaravatto INFN Padova.
INFSO-RI Enabling Grids for E-sciencE Job Description Language (JDL) Giuseppe La Rocca INFN First gLite tutorial on GILDA Catania,
Testing the HEPCAL use cases J.J. Blaising, F. Harris, Andrea Sciabà GAG Meeting April,
The DataGrid Project NIKHEF, Wetenschappelijke Jaarvergadering, 19 December 2002
EGEE 3 rd conference - Athens – 20/04/2005 CREAM JDL vs JSDL Massimo Sgaravatto INFN - Padova.
Job Submission The European DataGrid Project Team
Biomed tutorial 1 Enabling Grids for E-sciencE INFSO-RI EGEE is a project funded by the European Union under contract IST JDL Flavia.
User Interface UI TP: UI User Interface installation & configuration.
Istituto Nazionale di Astrofisica Information Technology Unit INAF-SI Job with data management Giuliano Taffoni.
Bob Jones – Project Architecture - 1 March n° 1 Project Architecture, Middleware and Delivery Schedule Bob Jones Technical Coordinator, WP12, CERN.
GRID commands lines Original presentation from David Bouvet CC/IN2P3/CNRS.
Introduction to Computing Element HsiKai Wang Academia Sinica Grid Computing Center, Taiwan.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
Grid Computing: Running your Jobs around the World
Workload Management System on gLite middleware
EGEE tutorial, Job Description Language - more control over your Job Assaf Gottlieb Tel-Aviv University EGEE is a project.
Introduction to Grid Technology
Workload Management System
5. Job Submission Grid Computing.
The EU DataGrid Job Submission Services
Wide Area Workload Management Work Package DATAGRID project
gLite Job Management Christos Theodosiou
Job Submission M. Jouvin (LAL-Orsay)
Presentation transcript:

- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software Workshop BNL - May 7, 2002

- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing ATLAS distributed processing, PPDG year 2 program role of MOP, other middleware & third party tools objectives: deliverables to users job description language: status, objectives, options —EDG JDL

- Distributed Analysis (07may02 - USA Grid SW BNL) Collective Services Information & Monitoring Replica Manager Grid Scheduler Replica Optimization Replica Catalog Interface Grid Application Layer Job Management Local Application Local Database Fabric services Configuration Management Configuration Management Node Installation & Management Node Installation & Management Monitoring and Fault Tolerance Monitoring and Fault Tolerance Resource Management Fabric Storage Management Fabric Storage Management Grid Fabric Local Computing Grid Data Management Metadata Management Object to File Mapper Underlying Grid Services Computing Element Services Authorisation, Authentication and Accounting Replica Catalog Storage Element Services SQL Database Service Service Index pink: WP1 yellow:WP2 Architecture

- Distributed Analysis (07may02 - USA Grid SW BNL) Jul’01: PSEUDOCODE FOR ATLAS SHORT TERM UC01 Logical File Name LFN = "lfn://"hostname"/"any_string Physical File Name PFN = "pfn://"hostname"/"path Transfer File Name TFN = "gridftp://"PFN_hostname"/path JDL InputData = {LFN[]} OutputSE = host.domain.name Worker Node LFN[] = WP1.LFNList() for (i=0;i<LFN.list;i++){ PFN[] = ReplicaCatalog.getPhysicalFileNames(LFN[i]) j = Athena.eventSelectonSrv.determineClosestPF(PFN[]) localFile = GDMP.makeLocal(PFN[j],OutputSE) Athena.eventSelectionSrv.open(localFile) } PFN[] = getPhysicalFileNames(LFN): PFN = getBestPhysicalFileName(PFN[], String[] protocols) TFN = getTransportFileName(PFN, String protocol) filename = getPosixFileName(TFN)

- Distributed Analysis (07may02 - USA Grid SW BNL) Sample Use Case: Simple Grid Job Submit and run a simple batch job to process one input file to produce one output file The user specifies his job via a JDL file: Executable=/usr/local/atlas.sh Requirements = TS >= 1GB Input.LFN = lfn://atlas.hep/foo.in argv1 = TFN(Input.LFN) Output.LFN= lfn://atlas.hep/foo.out Output.SE = datastore.rl.ac.uk argv2 = TFN(Output.LFN) and where the submitted “job” is: #!/bin/sh gridcp $1 $HOME/tmp1 grep higgs $HOME/tmp1 > $HOME/tmp2 gridcp $HOME/tmp2 $2

- Distributed Analysis (07may02 - USA Grid SW BNL) Steps for Simple Job Example Grid Scheduler Replica Manager Compute Element Storage Element Compute Element Storage Element Replica Catalogue send job Get LFN to SFN mapping copy input file, allocate output file start job User Select CE and SE job done copy output file site Asite B

- Distributed Analysis (07may02 - USA Grid SW BNL) Steps to Execute this Simple Grid Job User submits the job to the Grid Scheduler. Grid Scheduler asks the Replica Manager for list of all PFNs for the specified input file. Grid Scheduler determines if it is possible to run the job at a Compute Element that is “local” to one of the PFNs. —If not, it locates the best CE for the job, and creates a new replica of the input file on a SE local to that CE. Grid Scheduler then allocates space for the output file, and “pins” the input file so that is not deleted or staged to tape until after the job has completed. Then the job is submitted to the CEs job queue. When the Grid Scheduler is notified that the job has completed, it tells the Replica Manager to create a copy of the output file at the site specified in the Job Descriptions file. Replica Manager then will tag this copy of the output file the “master”, and make the original file a “replica”.

- Distributed Analysis (07may02 - USA Grid SW BNL) WP1: Job Status SUBMITTED -- The user has submitted the job to the User Interface. WAITING -- The Resource Broker has received the job. READY -- A Computing Element matching job requirements has been selected. SCHEDULED -- The Computing Element has received the job. RUNNING -- The job is running on a Computing Element. CHKPT -- The job has been suspended and check-pointed on a Computing Element. DONE -- The execution of the job has completed. ABORTED -- The job has been terminated. CLEARED -- The user has retrieved all output files successfully. Bookkeeping information is purged some time after the job enters this state.

- Distributed Analysis (07may02 - USA Grid SW BNL) WP1: Job Submission Service (JSS) strictly coupled with a Resource Broker —deployed for each installed RB single interface (non-blocking), used by the RB —job_submit() submit a job to the specified Computing Element, managing also input and output sandboxes —job_cancel() kill a list of jobs, identified by their dg_jobId. Logging and Bookkeeping (LB) Service - store & manage logging and bookkeeping information generated by Scheduler & JSS components (Information and Monitoring service —Bookkeeping: currently active jobs - job definition, expressed in JDL, status, resource consumption, user-defined data(?) —Logging - status of the Grid Scheduler & related components. These data are kept for a longer term and are used mainly for debugging, auditing and statistical purposes

- Distributed Analysis (07may02 - USA Grid SW BNL) WP1: Job Description Language (JDL) Condor classified advertisements (ClassAds) adopted as Job Description Language (JDL) —Semi-structured data model: no specific schema is required. —Symmetry: all entities in the Grid, in particular applications and computing resources, should be expressible in the same language. —Simplicity: the description language should be simple both syntactically and semantically. Executable = “simula”; Arguments = “1 2 3”; StdInput = “simula.config”; StdOutput = “simula.out”; StdError = “simula.err”; InputSandbox = {“/home/joe/simula.config”, “/usr/local/bin/simula”}; OutputSandbox = {“simula.out”, “simula.err”, “core”}; InputData = “LF:test367-2”; Replica Catalog = “ldap://pcrc.cern.ch:2010/rc=Replica Catalog, dc=pcrc, dc=cern, dc=ch” DataAccessProtocol = {“file”, “gridftp”}; OutputSE = “lxde01.pd.infn.it”; Requirements = other.Architecture == “INTEL” && other.OpSys == “LINUX”; Rank = other.AverageSI00;

- Distributed Analysis (07may02 - USA Grid SW BNL) WP1: Sandbox Working area (input & output) replicated on each CE to which Grid job is submitted. —Very convenient & natural. My Concerns: —Requires network access (with associated privileges) to all CEs on Grid. Could be a huge security issue with local administrators. —Not (yet) coordinated with WP2 services. —Sandbox contents not customizable to local (CE/SE/PFN) environment. —Temptation to Abuse (not for data files)

- Distributed Analysis (07may02 - USA Grid SW BNL) EDG JDL job description language: status, objectives, options Status: —Working in EDG testbed Objectives: —Provide WP1 Scheduler enough information to locate necessary resources (CE, SE, data, software) to execute job. Options: