ICE-CREAM Luigi Zangrando On behalf of the JRA1 IT-CZ Padova group

Slides:



Advertisements
Similar presentations
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MyProxy and EGEE Ludek Matyska and Daniel.
Advertisements

GUMS status Gabriele Carcassi PPDG Common Project 12/9/2004.
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
GRID Workload Management System Massimo Sgaravatto INFN Padova.
Connecting OurGrid & GridSAM A Short Overview. Content Goals OurGrid: architecture overview OurGrid: short overview GridSAM: short overview GridSAM: example.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
EGEE is a project funded by the European Union under contract IST Testing processes Leanne Guy Testing activity manager JRA1 All hands meeting,
DORII Joint Research Activities DORII Joint Research Activities Status and Progress 6 th All-Hands-Meeting (AHM) Alexey Cheptsov on.
INFSO-RI Enabling Grids for E-sciencE CREAM: a WebService based CE Massimo Sgaravatto INFN Padova On behalf of the JRA1 IT-CZ Padova.
The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.
Enabling Grids for E-sciencE CREAM-BES Luigi Zangrando INFN Sezione di Padova, Supercomputing'07.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.
Privilege separation in Condor Bruce Beckles University of Cambridge Computing Service.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CREAM and ICE Massimo Sgaravatto – INFN Padova.
EGEE is a project funded by the European Union under contract INFSO-RI Practical approaches to Grid workload management in the EGEE project Massimo.
Glite. Architecture Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware Higher-Level Grid Services are supposed.
INFSO-RI Enabling Grids for E-sciencE EGEE is a project funded by the European Union under contract IST Job sandboxes.
Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group
EGEE-II INFSO-RI Enabling Grids for E-sciencE Practical using WMProxy advanced job submission.
INFSO-RI Enabling Grids for E-sciencE Status of BLAH Francesco Prelz ( ) JRA1 All-Hands meeting, Nicosia,
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
EGEE is a project funded by the European Union under contract IST Datamat Status Report F. Pacini Datamat S.p.a. Milan, IT-CZ JRA1 meeting,
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CREAM: current status and next steps EGEE-JRA1.
CE design report Luigi Zangrando
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
EGEE is a project funded by the European Union under contract IST Padova report Massimo Sgaravatto On behalf of the INFN Padova JRA1 Group.
CREAM Status and plans Massimo Sgaravatto – INFN Padova
EMI INFSO-RI EMI 1 (Kebnekaise) Updates C. Aiftimiei (INFN) EMI Release Manager.
INFSO-RI Enabling Grids for E-sciencE CREAM, WMS integration and possible deployment scenarios Massimo Sgaravatto – INFN Padova.
Resource access in the EGEE project Massimo Sgaravatto INFN Padova
Practical using C++ WMProxy API advanced job submission
Gri2Win: Porting gLite to run under Windows XP Platform
Grid2Win Porting of gLite middleware to Windows XP platform
CEMon
Reading e-Science Centre
Turin, IT-CZ JRA1 meeting, 4-5 Nov 2004
JRA1 IT-CZ cluster meeting Milano, May 3-4, 2004
CREAM and ICE Test Results
2. OPERATING SYSTEM 2.1 Operating System Function
StoRM: a SRM solution for disk based storage systems
CE-Monitor Luigi Zangrando INFN-Padova
Module Overview Installing and Configuring a Network Policy Server
Security aspects of the CREAM-CE
Workload Management System ( WMS )
Summary on PPS-pilot activity on CREAM CE
Joint JRA1/JRA3/NA4 session
BOSS: the CMS interface for job summission, monitoring and bookkeeping
BOSS: the CMS interface for job summission, monitoring and bookkeeping
Overview – SOE PatchTT December 2013.
Chapter 2: System Structures
Massimo Sgaravatto INFN Padova On behalf of the CREAM product team
The CREAM CE: When can the LCG-CE be replaced?
Grid2Win: Porting of gLite middleware to Windows XP platform
Grid2Win: Porting of gLite middleware to Windows XP platform
BOSS: the CMS interface for job summission, monitoring and bookkeeping
Grid Services Ouafa Bentaleb CERIST, Algeria
LCGAA nightlies infrastructure
Short update on the latest gLite status
Gri2Win: Porting gLite to run under Windows XP Platform
Francesco Giacomini – INFN JRA1 All-Hands Nikhef, February 2008
Globus Job Management. Globus Job Management Globus Job Management A: GRAM B: Globus Job Commands C: Laboratory: globusrun.
What’s changed in the Shibboleth 1.2 Origin
Chapter 2: System Structures
WMS Options: DIRAC and GlideIN-WMS
Chapter 2: Operating-System Structures
Chapter 2: Operating-System Structures
Message Passing Systems
Presentation transcript:

ICE-CREAM Luigi Zangrando On behalf of the JRA1 IT-CZ Padova group

Slide shown last time Last time we showed these 2 slides: Test: LSFConnector vs BLAHConnector Submitted to CREAM 100 jobs to a CREAM based CE, sequentially No other load (e.g. no other jobs) on CREAM Measured LRMSSubmissionTime – SubmissionTime for all the jobs, in the two scenarios (LSFConnector and BLAHConnector) SubmissionTime: when the job is received by CREAM (i.e. when CREAM insert the job in its journal manager) LRMSSubmissionTime: when the job is submitted to LSF (as reported by the LSF log) For the purpose of this test, jobs in the JournalManager are managed sequentially (i.e. a job is submitted to the LRMS, only when the previous job has been submitted) I.e. Used the sync mode, for what concerns BLAH Possible to do a better job for both connectors EGEE JRA1-ITCZ cluster meeting. Torino, November 2005

Slide shown last time EGEE JRA1-ITCZ cluster meeting. Torino, November 2005

CREAM - BLAH This triggered a discussion with BLAH developers Decided a revision of the CREAM architecture Decided to give up with the LRMS specific connector to use instead BLAH for every “interaction” with the underlying resource management system CREAM journal manager modified allowing parallel BLAH submissions Since BLAH submission is I/O bound Number of threads is configurable Test repeated with 10 threads 9-10 s. (constant) as LRMSSubmissionTime – SubmissionTime 4 s. measured on another CREAM installation Not investigated further EGEE JRA1-ITCZ cluster meeting. Torino, November 2005

CREAM - BLAH Changes negotiated with BLAH developers to get by BLAH log parser notifications about job status changes See: http://savannah.cern.ch/bugs/?func=detailitem&item_id=12225 Just provided by the BLAH developers Starting integration with CREAM Changes negotiated with BLAH developers to have BLAH commands working on multiple jobs Waiting to get these modifications EGEE JRA1-ITCZ cluster meeting. Torino, November 2005

Credential mapping Glexec (formerly known as su-exec) not in GLite 1.5 and not released yet Needed for credential mapping Talked in Pisa with JRA3 developers Discussed about the dirty details Agreed on some needed modifications They reported that in about 10 days after Pisa they should be able to release something working It should be now Started discussing with BLAH developers where to apply this integration In CREAM calling BLAH or in BLAH calling the LRMS commands ? Decision also depends on the overhead introduced by glxexec To be measured when glexec is usable In the meantime started applying some other needed changes Deployment and integration gridftp server LCMAPS enabled Proper ownerships and protections of directories EGEE JRA1-ITCZ cluster meeting. Torino, November 2005

CREAM: other accomplishments Porting to Axis 1.2.1 Porting to GSoap 2.7.6b Several problems managing faults Applied modifications needed because of changes in delegation stuff Support for configuration file in CREAM CLI Several bug fixes (in both client and server) User documentation updated First draft of a “high level” document describing CREAM architecture and functionality available First unit tests committed EGEE JRA1-ITCZ cluster meeting. Torino, November 2005

CREAM: other issues and other next steps Integration of VOMS based authorization VomsPDP just released Integration with CEMon To provide asynchronous notifications about CREAM jobs Support of DAGs and bulk jobs We plan to implement parametric and collection jobs as DAG jobs, as done in the WMS CREAM CLI in the build system Still the circular dependency problem to be addressed EGEE JRA1-ITCZ cluster meeting. Torino, November 2005

Safe interactive access to jobs Tobia Conforto joined us for his “BSC” University stage Resumed the work about interactive access to job General idea One-way interactivity: job → user Let a user monitor her job’s stdout, stderr and output files in real time In detail Interactive read-only access to a running job’s environment The CREAM JobId is the only parameter needed Remote ps, top, ls, cat and tail-like functionality on the Worker Node Intelligent browsing of remote files: client-side hex viewer and view-like functionality only trasfers needed chunks of the remote file as needed GUI clients are possible, although not currently scheduled Why Inspection of long-running jobs: the user is not blind to the job’s progress, she can make an informed decision on whether to stop it or let it run Early sampling of a batch of jobs’ correct operation can save considerable amounts of possibly wasted resources Faster turnaround of debug sessions, trial runs and other kinds of tests EGEE JRA1-ITCZ cluster meeting. Torino, November 2005

Safe interactive access to jobs How glite-authenticated SOAP messages ssh as the local user C++ Client Specific CE Webservice Worker Node (Internet) (CE LAN) Security considerations Access to the service is subject to the same authentication as CREAM is The user has only access to worker nodes where one of her jobs is running She may only issue a fixed set of commands, none of which can alter files User-supplied arguments are strictly parsed against shell escaping Privacy considerations SOAP messages, including all traffic payload, are encrypted with SSL The set of files / directories / devices the user has read access to on the worker node is restricted by the same OS file permissions as her job’s Additional filters can restrict the commands to the job’s working directories EGEE JRA1-ITCZ cluster meeting. Torino, November 2005

CREAM: “external links” GRIDCC GRIDCC is integrating CREAM submission in their portal Based on Java clients that we provided, as they requested This has been shown in the recent GridCC EU review We maintain a CREAM installation, deployed on a small LSF farm in Legnaro Support to Laura Del Cano (Elettra, Trieste) who is doing the work AVANADE Software company with whom we had a meeting some time ago Interesting in evaluating our stuff and possibly collaborating with us Trying to deploy CREAM in their .NET environment They need a document/literal version of the services Provided for CREAM The problem is with the delegation stuff Pinged the security group, but it looks like they are not going to do it in the short term EGEE JRA1-ITCZ cluster meeting. Torino, November 2005

XYZ  ICE Found a better (than XYZ) name for the WMS component dealing with submissions to CREAM CEs: ICE ICE: Interface to Cream Environment Isn’t “ICE-CREAM integration” nice ?  Contacted first the GridICE team They didn’t see problems, even if the ICE name can make people think about GridICE (and viceversa) But this couldn’t be bad ICE in CVS (org.glite.wms.ice) but not yet linked to the build system EGEE JRA1-ITCZ cluster meeting. Torino, November 2005

ICE (Interface to Cream Environment) ICE is the software component acting as an interface between the WMS and CREAM CEs Operations initially handled by ICE Job submissions Job removals ICE is being developed as a stand-alone process Written in C++ It will be investigated if it can be a WM thread for the future At the moment it is under heavy development; many features are missing Jobs right now are polled to get status changes In the future, there will be an additional ICE thread which will receive notifications from CEMon coupled with CREAM CEs EGEE JRA1-ITCZ cluster meeting. Torino, November 2005

WMS-CREAM integration / 1 ICE takes the job management requests from its filelist ICE manages the submission to CREAM (see next slides); ICE keeps the mapping between the GridjobId and CREAMjobId This mapping is critical. It is essential that ICE remembers which job it controls The mapping is for the moment kept on-disk, using a journal to record updates To be investigated of LBproxy can be used Failed submissions are reinserted into the WM’s filelist as in the current implementation (JC+LM) ICE features: Multithreaded (the submitter and status poller are two separate threads) Uses log4cpp for logging debug messages Tries to be fault tolerant NS WMProxy Helpers FileList MM WM JA FileList FileList ICE Submitter Poller JC+LM Condor CREAM EGEE JRA1-ITCZ cluster meeting. Torino, November 2005

ICE: what we have so far Implemented To do Submission to a CREAM CE is working Support for multiple CREAM CEs done sequentially right now The job status poller is working fine Removal (cancel) of a job is coming soon To do Job status change listener via CEMon Extending ICE to handle submission to multiple CREAM CEs In parallel (being implemented) LB logging “Lease” submission protocol Proxy Renewal All tests are being done with a stand-alone ICE to easily identify where problems are located Requests are inserted into the “WM” filelist via a testing tool which simulates the WM inserting requests in the filelist “True” WM integration will be done when everything is tested enough EGEE JRA1-ITCZ cluster meeting. Torino, November 2005