Preview Testbed Massimo Sgaravatto – INFN Padova

Slides:



Advertisements
Similar presentations
WP1 Grid Workload Management Massimo Sgaravatto INFN Padova
Advertisements

INFSO-RI Enabling Grids for E-sciencE XACML and G-PBox update MWSG 14-15/09/2005 Presenter: Vincenzo Ciaschini.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
Grid Workload Management Massimo Sgaravatto INFN Padova.
Getting started DIRAC Project. Outline  DIRAC information system  Documentation sources  DIRAC users and groups  Registration with DIRAC  Getting.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CREAM and ICE Massimo Sgaravatto – INFN Padova.
Conference name Company name INFSOM-RI Speaker name The ETICS Job management architecture EGEE ‘08 Istanbul, September 25 th 2008 Valerio Venturi.
WP1 WMS rel. 2.0 Some issues Massimo Sgaravatto INFN Padova.
EGEE is a project funded by the European Union under contract INFSO-RI Practical approaches to Grid workload management in the EGEE project Massimo.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Update Authorization Service Christoph Witzig,
Summary from WP 1 Parallel Section Massimo Sgaravatto INFN Padova.
INFSO-RI Enabling Grids for E-sciencE Policy management and fair share in gLite Andrea Guarise HPDC 2006 Paris June 19th, 2006.
WP1 WMS release 2: status and open issues Massimo Sgaravatto INFN Padova.
EGEE 3 rd conference - Athens – 20/04/2005 CREAM JDL vs JSDL Massimo Sgaravatto INFN - Padova.
WP1 Status and plans Francesco Prelz, Massimo Sgaravatto 4 th EDG Project Conference Paris, March 6 th, 2002.
INFN GRID Production Infrastructure Status and operation organization Cristina Vistoli Cnaf GDB Bologna, 11/10/2005.
Stephen Burke – Sysman meeting - 22/4/2002 Partner Logo The Testbed – A User View Stephen Burke, PPARC/RAL.
EGEE is a project funded by the European Union under contract IST LCG open issues Massimo Sgaravatto INFN Padova JRA1 IT-CZ cluster meeting,
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Management Claudio Grandi.
INFN/IGI contributions Federated Clouds Task Force F2F meeting November 24, 2011, Amsterdam.
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Architectural Framework Presentation Vincenzo Ciaschini CNAF 15/5/06.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CREAM: current status and next steps EGEE-JRA1.
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
Job Priorities and Resource sharing in CMS A. Sciabà ECGI meeting on job priorities 15 May 2006.
CE design report Luigi Zangrando
EGEE-II INFSO-RI Enabling Grids for E-sciencE Simone Campana (CERN) Job Priorities: status.
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
EGEE is a project funded by the European Union under contract IST Padova report Massimo Sgaravatto On behalf of the INFN Padova JRA1 Group.
CREAM Status and plans Massimo Sgaravatto – INFN Padova
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI MPI VT report OMB Meeting 28 th February 2012.
INFSO-RI Enabling Grids for E-sciencE CREAM, WMS integration and possible deployment scenarios Massimo Sgaravatto – INFN Padova.
G-PBox Facts and status JRA1 Authz Coord Meeting January CNAF/INFN Bologna Andrea Ferraro.
JRA1/Job Submission and Monitoring Moreno Marzolla INFN Padova.
Resource access in the EGEE project Massimo Sgaravatto INFN Padova
LCG and Glite open issues Massimo Sgaravatto INFN Padova
Practical using C++ WMProxy API advanced job submission
CEMon
Grid Computing: Running your Jobs around the World
Job monitoring and accounting data visualization
JRA1 IT-CZ cluster meeting Milano, May 3-4, 2004
The EDG Testbed Deployment Details
CREAM and ICE Test Results
Security aspects of the CREAM-CE
Design rationale and status of the org.glite.overlay component
WP1 WMS release 2: status and open issues
Andreas Unterkircher CERN Grid Deployment
Summary on PPS-pilot activity on CREAM CE
Claudio Grandi (INFN and CERN)
BLAHPd developments (David, Massimo, Giuseppe)
Comparison of LCG-2 and gLite v1.0
Global Banning List and Authorization Service
gLite Middleware Status
Accounting at the T1/T2 Sites of the Italian Grid
Massimo Sgaravatto INFN Padova On behalf of the CREAM product team
The CREAM CE: When can the LCG-CE be replaced?
Grid2Win: Porting of gLite middleware to Windows XP platform
Introduction to Grid Technology
CRC exercises Not happy with the way the document for testbed architecture is progressing More a collection of contributions from the mware groups rather.
Short update on the latest gLite status
Outline Introduction Objectives Motivation Expected Output
TCG Discussion on CE Strategy & SL4 Move
CMS report from FNAL demo week Marco Verlato (INFN-Padova)
Francesco Giacomini – INFN JRA1 All-Hands Nikhef, February 2008
Report on GLUE activities 5th EU-DataGRID Conference
QoS and SLA in INFN Grid INFN team: Andrea Ceccanti, Vincenzo Ciaschini, Alberto Forti, Andrea Ferraro, Valerio Venturi Location Catania (Italy) Date 4/3/2008.
Presentation transcript:

Preview Testbed Massimo Sgaravatto – INFN Padova Luigi Zangrando – INFN Padova EGEE06 conference

Preview testbed Testbed where JRA1 developers can expose early versions of middleware to users to get early feedback on the matching with the requirements Not a system intended to bypass the integration and certification process Activity not foreseen in the EGEE II proposal, so no official resources have been devoted to this Partners involved in JRA1 asked to provide support (hardware and manpower) for this activity without compromising their commitments Now CESNET (Prague), INFN (CNAF & Padova), RAL are participating in this testbed Good News: We have had confirmation that NIKHEF (FOM) and HIP (Helsinki Institute of Physics) will participate to PTB. Massimo Sgaravatto INFN Padova - EGEE06 conference

Middleware in the preview testbed Software components planned to be exposed in the preview testbed CREAM New (3.1) WMS (including ICE, supporting job submissions to CREAM based CE) Job Provenance G-PBOX R-GMA Glexec on WN Massimo Sgaravatto INFN Padova - EGEE06 conference

CREAM Sites Documentation INFN Padova Prague INFN CNAF NIKHEF (FOM) 1 CREAM CE machine + 6 WNs Both LSF and Torque installed Software installed Even if installation delayed because of some perturbations due to attempted adoption of new VDT Prague Installation of CREAM is in progress INFN CNAF CREAM UI Not installed yet, because of the 3.1 build issues NIKHEF (FOM) Four new machines ordered and Installed on private firewalled network segment. Outside access provided for CREAM installation. Local experts will setup/support glexec on CE and WN. HIP (Helsinki Institute of Physics) Four machines will be made available (approx 1.4 Ghz AMD) Outside access provided. Can be gLite CE or CREAM. Documentation CREAM User’s guide and CREAM JDL specification available in the CREAM web site (http://www.pd.infn.it/cream/field.php) CREAM Preview testbed user’s guide available as well Massimo Sgaravatto INFN Padova - EGEE06 conference

CREAM Performed stress tests (by developers) Submissions of an increasing number of jobs Submissions of 100 jobs from 1, 2, 5, 10 threads Submissions of 500 jobs from 1, 2, 5, 10 threads Submissions of 1000 jobs from 1, 2, 5, 10 threads Submissions of 2000 jobs from 1, 2, 5, 10 threads Measured values The number of failed jobs (taking into account the reported failure reasons) For each job, the time took to submit the job to the CREAM based CE (i.e. the time needed to get back the CREAM-jobid) For each job, the “schedule time”, i.e. the time needed to actual submit the job to the LRMS, since when it has been received by CREAM Tests done for both LSF and Torque Tests done using a UI in Padova, since the official one at CNAF is not ready yet Massimo Sgaravatto INFN Padova - EGEE06 conference

Submission times in 2 tests Some test results: submission time in 2 tests … more details in the CREAM web site (http://grid.pd.infn.it/cream/field.php) under test results Massimo Sgaravatto INFN Padova - EGEE06 conference

Failures in 2 tests 100 jobs 500 jobs 1000 jobs 2000 jobs 1 thread 0/100 (LSF) 0/100 (Torque) 0/500 (LSF) 0/500 (Torque) 6/1000 (LSF) 11/1000 (Torque) 23/2000 (LSF) 32/2000 (Torque) 2 thread 7/1000 (LSF) 4/1000 (Torque) 26/2000 (LSF) 38/2000 (Torque) 5 threads 3/100 (Torque) 1/500 (Torque) 5/1000 (Torque) 22/2000 (LSF) 29/2000 (Torque) 10 threads 1/100 (Torque) 2/1000 (LSF) 8/1000 (Torque) 24/2000 (LSF) 22/2000 (Torque) Basically all failures because of AuthN problems in gridftp while transferring sandbox Not fully understood yet the reasons Massimo Sgaravatto INFN Padova - EGEE06 conference

CREAM When the UI is ready at CNAF, we are basically open to expose CREAM “direct” job submissions to interested users Users can test first of all the functionality provided by CREAM Job submissions, Proxy delegation, Job cancellation, Job status monitoring, Job listing, Job suspension and resume, Job proxy renewal. Job purging for completed jobs, Disabling/enabling of job submissions (when a certain condition is meet and when these operations are forced by the administrator) Users are also invited to do performance measurements Relevant values The time took to submit the job to the CREAM based CE Represented by the time needed to get back the CREAM-jobid. The time needed to actual submit the job to the LRMS, since when it has been received by CREAM Since for the Glite (Condor based) CE a direct submission is not straightforward, comparisons between the CREAM based CE and the Glite CE is not easy for this direct submission scenario Massimo Sgaravatto INFN Padova - EGEE06 conference

New WMS (including ICE) Testbed configuration WMS, BDII, UI @ INFN-CNAF CREAM CE @ INFN-PADOVA, Prague and HIP, FOM (?) Glite CE @ INFN-CNAF, Prague and HIP, FOM (?) Installation of the WMS services perturbed and delayed Because of the attempted adoption of new VDT Because Glite 3.1 build system “definition” far from being ideal wrt testing purposes Mix of branch and release tags We are now converging Recent decisions of having a branch for GLite 3.1 with old VDT and latest (i.e. blessed by users) release tags for the various subsystems New tags have been produced Now we should be ready to proceed with installation in the Preview testbed We (developers) will perform some initial tests and then the testbed will be opened to interested users Massimo Sgaravatto INFN Padova - EGEE06 conference

ICE Test first if everything work as expected Job submissions tests (simple sequential jobs, MPICH jobs and interactive jobs), job cancellation, job status monitoring, job output retrieval, proxy renewal, job perusal Tests already done on a more controlled (and smaller) environment Performance and stress tests focusing on the functionality involving ICE and CREAM, so comparing submissions to CREAM based CEs (via the ICE component) with submissions to Glite (Condor based) CEs (via the JC, LM, Condor components) Submissions of an increasing number of jobs to a Glite WMS, making them match only CREAM based CEs (Scenario 1) Then same test, but making the jobs match only Glite (Condor based) CEs (Scenario 2). Values to be measured The number of failed jobs (taking into account the reported failure reasons) For each job the time took by ICE (in Scenario 1) / JC + LM + Condor (in Scenario 2) to actual submit the jobs to the CREAM (in Scenario 1) / Condor based (in Scenario 2) CEs. For each job, the time took by the system to actual detect the job status changes More details in the ICE&CREAM Preview Testbed User’s guide Massimo Sgaravatto INFN Padova - EGEE06 conference

Job Provenance JP installed at skurut1.cesnet.cz, together with the 3.1RC LB. 3.1 LB is required due the support of data export to JP However, it is 100% compatible with 3.0 clients at the wire level If necessary due to performance reasons, more resources to setup multiple JP index servers (the primary endpoints for user queries) can be provided Tests already performed by developers Basic functionality verified according to the JP Test Plan. Participation in the First Provenance Challenge, (http://twiki.ipaw.info/bin/view/Challenge/CESNET) Good smoke test opportunity Ready to be used by internal users Needed to setup the WMS first In order to use the service, one has redirect his jobs to LB at skurut1, i.e. specifying LBAddress = "skurut1.cesnet.cz:9000" in JDL. JP User Guide and JP Test Plan available at http://egee.cesnet.cz/en/JRA1/index.html The service is new in gLite, providing rather low-level interface only Developers expect feedback on its design and suggestion on extensions that will be evaluated and considered in design of more user-friendly interfaces. Massimo Sgaravatto INFN Padova - EGEE06 conference

R-GMA Registry, MON and Glue archiver installed at rgma16.pp.rl.ac.uk Developers will deploy new code on the preview system as and when they have new functionality to offer The idea is also to test RGMA as information system (instead of BDII) Massimo Sgaravatto INFN Padova - EGEE06 conference

G-PBox deployment on Preview TB G-PBox tests will require the deployment of the following components to Preview TB: 1 VOMS server (needed to have under control the creation of groups and role inside the VOs) (CNAF) 1 VO G-PBox server (CNAF) 1 SITE G-PBox server (for 4 gLite CEs located at least in two different sites) Additional gLite CEs as needed G-PBox installations: will start in October 2006 will be opened to VOs after a period of internal testing UI UI VOMS VOMS VO G-PBox VO G-PBox gLite-3.1 WMS gLite-3.1 WMS BDII BDII gLite-3.0 CE + WNs gLite-3.0 CE + WNs SITE G-Pbox SITE G-Pbox gLite-3.0 CE + WNs gLite-3.0 CE + WNs gLite-3.0 CE + WNs gLite-3.0 CE + WNs gLite-3.0 CE + WNs gLite-3.0 CE + WNs Massimo Sgaravatto INFN Padova - EGEE06 conference Massimo Sgaravatto INFN Padova - EGEE06 conference 13

G-PBox/Service Class: What It Is All About PROBLEM STATEMENT Intra-VO fair sharing: enabling VO’s to differentiate the access and usage of computing resources within its users. OUR PROPOSAL In each site, a VO has a certain share (i.e., utilization target) agreeded with the site for computing resources We divide the utilization target in three abstracted sub-shares: GOLD >> SILVER >> BRONZE We enable the VO to assign VOMS groups/roles to the different sub-shares We call GOLD, SILVER, BRONZE as Service Classes published by the resources providers (so each Service Class characterizes a specific level of quality of service of the resource ) Massimo Sgaravatto INFN Padova - EGEE06 conference

Use case: VO realm VO G-PBox helps the VO Resource Broker (RB) to choose right CEs based on Service Classes (SC) policies. The RB asks to the G-PBox what is the SC for the group which user belongs to. Then the RB can choose the CEs associated with the SC provided by the G-PBox. VO G - PBox VO Mapping Policies VO Mapping Policies VO GROUP: /atlas/analysis Atlas group Atlas group Service Class (SC) Service Class (SC) VO G - PBox USER: smith VO G - PBox /atlas/production /atlas/production (SC:GOLD) (SC:GOLD) VO: atlas analysis /atlas/analysis /atlas/analysis GROUP: (SC:SILVER) (SC:SILVER) ATLAS RB SC:SILVER /atlas/students /atlas/students (SC:BRONZE) (SC:BRONZE) ATLAS ATLAS ATLAS ATLAS ATLAS ATLAS CE CE CE CE CE CE Massimo Sgaravatto INFN Padova - EGEE06 conference

Use case: Site realm G-PBox helps CE to choose the right unix group for the submission to the LRMS. SITE G-PBox helps CE to choose the right unix group for the submission to the LRMS. CE asks to G-PBox what is the unix group for the VO group which user belongs to. G-PBox scans its policies: “VO group <-> SC“ mapping policies “SC <-> Unix Group” mapping policies and returns a matching Unix group. Then CE can submit to the LRMS with the Unix group provided by the G-PBox. G - PBox VO Mapping Policies VO Mapping Policies USER: smith Atlas group Atlas group Service Class (SC) Service Class (SC) VO: atlas /atlas/production /atlas/production (SC:GOLD) (SC:GOLD) GROUP: analysis /atlas/analysis /atlas/analysis (SC:SILVER) (SC:SILVER) VO GROUP: /atlas/analysis /atlas/students /atlas/students (SC:BRONZE) (SC:BRONZE) ATLAS CE Site Mapping Policies Site Mapping Policies UNIX GROUP: atlas_mid_priority Service Class (SC) Service Class (SC) Local UNIX group Local UNIX group (SC:GOLD) (SC:GOLD) atlas_hi_priority atlas_hi_priority LRMS (SC:SILVER) (SC:SILVER) atlas_mid_priority atlas_mid_priority (LSF) (SC:BRONZE) (SC:BRONZE) atlas_low_priority atlas_low_priority Massimo Sgaravatto INFN Padova - EGEE06 conference