INFN-GRID: Stato ed Organizzazione

Slides:



Advertisements
Similar presentations
Deployment Team. Deployment –Central Management Team Takes care of the deployment of the release, certificates the sites and manages the grid services.
Advertisements

FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Consistency of Accounting Information with.
Status of Globus activities within INFN Massimo Sgaravatto INFN Padova for the INFN Globus group
Globus activities within INFN Massimo Sgaravatto INFN Padova for the INFN Globus group
INFN Testbed status report L. Gaido WP6 meeting CERN - October 30th, 2002.
08/11/908 WP2 e-NMR Grid deployment and operations Technical Review in Brussels, 8 th of December 2008 Marco Verlato.
The EDG Testbed Deployment Details The European DataGrid Project
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
Monitoring in EGEE EGEE/SEEGRID Summer School 2006, Budapest Judit Novak, CERN Piotr Nyczyk, CERN Valentin Vidic, CERN/RBI.
Certification and test activity IT ROC/CIC Deployment Team LCG WorkShop on Operations, CERN 2-4 Nov
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Glite. Architecture Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware Higher-Level Grid Services are supposed.
Recent improvements in HLRmon, an accounting portal suitable for national Grids Enrico Fattibene (speaker), Andrea Cristofori, Luciano Gaido, Paolo Veronesi.
Certification and test activity ROC/CIC Deployment Team EGEE-SA1 Conference, CNAF – Bologna 05 Oct
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Condor on WAN D. Bortolotti - INFN Bologna T. Ferrari - INFN Cnaf A.Ghiselli - INFN Cnaf P.Mazzanti - INFN Bologna F. Prelz - INFN Milano F.Semeria - INFN.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) The Egyptian Grid Infrastructure Maha Metawei
LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006.
ROC managers meeting, Barcelona, Luciano Gaido (thanks to Paolo Veronesi for the slides) ROC-IT status.
Automatic testing and certification procedure for IGI products in the EMI era and beyond Sara Bertocco INFN Padova on behalf of IGI Release Team EGI Community.
M. Cristina Vistoli EGEE SA1 Organization Meeting EGEE is proposed as a project funded by the European Union under contract IST Regional Operations.
INFN GRID Production Infrastructure Status and operation organization Cristina Vistoli Cnaf GDB Bologna, 11/10/2005.
INFSO-RI Enabling Grids for E-sciencE DGAS, current status & plans Andrea Guarise EGEE JRA1 All Hands Meeting Plzen July 11th, 2006.
1 GRID – Stato dell’arte Alessandro Paolini (INFN-CNAF) Workshop della Commissione Calcolo e Reti dell'INFN Laboratori Nazionali del Gran Sasso 10 – 13.
EGEE is a project funded by the European Union under contract IST Service Activity 1 M.Cristina Vistoli ROC Coordinator All activity meeting,
II EGEE conference Den Haag November, ROC-CIC status in Italy
– n° 1 Grid di produzione INFN – GRID Cristina Vistoli INFN-CNAF Bologna Workshop di INFN-Grid ottobre 2004 Bari.
1/3/2006 Grid operations: structure and organization Cristina Vistoli INFN CNAF – Bologna - Italy.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
DGAS Distributed Grid Accounting System INFN Workshop /05/1009, Palau Giuseppe Patania Andrea Guarise 6/18/20161.
WorkShop 2007 sul Calcolo e Reti dell'INFN Enabling Grids for E-sciencE Rimini, 7-11 Maggio 2007 Operation and Support at INFN-GRID Daniele Cesini – INFN-CNAF.
Enabling Grids for E-sciencE INFN Workshop – May 7-11 Rimini 1 Grid Accounting Status at INFN Riccardo Brunetti INFN-TORINO.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
– n° 1 The Grid Production infrastructure Cristina Vistoli INFN CNAF.
Claudio Grandi INFN Bologna Workshop congiunto CCR e INFNGrid 13 maggio 2009 Le strategie per l’analisi nell’esperimento CMS Claudio Grandi (INFN Bologna)
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI solution for high throughput data analysis Peter Solagna EGI.eu Operations.
1 Introduzione al progetto INFNGRID Alessandro Paolini (INFN-CNAF) II corso di formazione INFN per amministratori di siti GRID ICTP TRIESTE 24 – 28 Novembre.
Servizi core INFN Grid presso il CNAF: setup attuale
Service Availability Monitoring
INFN Grid Project Targets, organisation, procedures
Il Sistema di Supporto INFNGrid & GGUS (Global Grid User Support )
Grid Computing: Running your Jobs around the World
Job monitoring and accounting data visualization
DGAS A.Guarise April 19th, Athens
Regional Operations Centres Core infrastructure Centres
BaBar-Grid Status and Prospects
The EDG Testbed Deployment Details
Il sistema di supporto di INFNGRID e GGUS
EGEE is a project funded by the European Union
LCG Service Challenge: Planning and Milestones
StoRM: a SRM solution for disk based storage systems
Giuseppe Andronico INFN Catania
Andreas Unterkircher CERN Grid Deployment
Summary on PPS-pilot activity on CREAM CE
Brief overview on GridICE and Ticketing System
Enrico Fattibene INFN-CNAF
Sistemi di monitoraggio e allarmistica
Accounting at the T1/T2 Sites of the Italian Grid
Presenter (on behalf of the authors): Cristina Vistoli
INFN – GRID status and activities
Giuseppe Patania Nov, Martina Franca (Ta)‏
Short update on the latest gLite status
LCG Operations Workshop, e-IRG Workshop
Leigh Grundhoefer Indiana University
Pierre Girard ATLAS Visit
The GENIUS portal and the GILDA t-Infrastructure
HLRmon accounting portal
EGEE Operation Tools and Procedures
Site availability Dec. 19 th 2006
Presentation transcript:

INFN-GRID: Stato ed Organizzazione Alessandro Paolini INFN-CNAF Incontro dei Progetti PON Avviso 1575 con il ROC di INFN Grid Catania, 4 luglio 2008

Primary components of the production grid The primary components of the Italian Production Grid are: Computing and storage resources Access point to the grid Services Other elements are as much fundamental for the working, managing and monitoring of the grid: Middleware Monitoring tool Accounting tool Management and control infrastructure Users

GRID Management Grid management is performed by the Italian Regional Operation Center (ROC). The main activities are: Production of Infngrid release and test Deployment of the release to the sites, support to local administrators and sites certification Periodical check of the resources and services status Support at an Italian level Support at an European level Introduction of new Italian sites in the grid Introduction of new regional VOs in the grid

The Italian Regional Operation Center (ROC) Operations Coordination Centre (OCC) Management, oversight of all operational and support activities Regional Operations Centres (ROC) providing the core of the support infrastructure, each supporting a number of resource centres within its region Grid Operator on Duty Grid User Support (GGUS) At FZK, coordination and management of user support, single point of contact for users One of 10 existing ROC in EGEE

Central Management Team (CMT) Shifts About 20 supporters perform a checking activity composed of 1 shift per day, from Monday to Friday, with 2 person per shift, during which a report is compiled: Checking of the grid status and problem warning, tailing them until their solution if possible Doing sites certification during the deployment phases Checking of the ticket still opened and pressing the expert or the site-managers for answering and solving them

Service Availability Monitoring (SAM) SAM jobs are launched every hour and allow to find out submission problem, among which batch system errors, CA not updated and replicas errors There are also more specific tests for SRM, SE and LFC

Service Availability Monitoring (SAM) CE Tests

SAM Admin SAM jobs available for both EGEE production and preproduction sites each site-manager can submit new sam test on his site Each ROC can submit new tests job of the site of own region

GSTAT GSTAT queries the Information System every 5 minutes The sites and nodes checked are those registered in the GOC DB The inconsistency of the information published and the eventual missing of a service that a site should publish are reported as an error

Introducing a new site Before entering in grid, each site have to accept several norms of behaviour, described in a Memorandum of Understanding (MoU). The COLG (Grid Local Coordinator) read and sign it, and they fax this document to INFN-CNAF. Moreover all sites must provide this email alias: grid-prod@. This alias will be used to report problems and it will be added to the site managers' mailing list. Of course it should include all site managers of your grid site. At this point, IT-ROC register site and site-managers in the GOC-DB, and create a supporter-operative group in the ticketing system XOOPS. site-managers have to register in XOOPS, so they can be assigned to their supporter-operative groups; each site-manager has to register in the test VOs infngrid and dteam

Introducing a new site Site-managers install the middleware, following the instructions distribuited by the Release Team (http://grid-it.cnaf.infn.it/ Installation section) . When finished, they make some preliminary test (http://grid-it.cnaf.infn.it/ --> Test&Cert --> Fry) and then they make the request to be certified by own ROC. IT-ROC log a ticket to communicate with site-managers during the certification.

Memorandum of Understanding Every site have to: Provide computing and storage resources. Farm dimensions (at least 10 cpu) and storage capacity will be agreed with each site Guarantee sufficient man power to manage the site: at least 2 persons Manage efficently the site resources: middleware installation and upgrade, patch application, configuration changes as requested by CMT and do that by the maximum time stated for the several operation Answer to the ticket by 24 hours (T2) or 48 hours (other sites) from Mon to Fry Check from time to time own status Guarantee continuity to site management and support, also in holidays period Partecipate to SA1/Production-Grid phone conferences an meetings and compile weekly pre report Keep updated the information on the GOC DB Enable test VOs (ops, dteam and infngrid), with a higher priority than other VOs Eventual non-fulfilment noticed by ROC will be referred to the biweekly INFNGRID phone conferences, then to COLG, eventually to EB

Availability & Reliability Viene preso in cosiderazione lo stato dei servizi CE,SE,SRM e sBDII come risulta dagli esiti dei test SAM Viene applicato un AND logico tra questi servizi ed un OR logico tra i servizi dello stesso tipo nel caso un sito abbia più istanze di uno stesso servizio Un sito deve risultare disponibile (available) almeno il 70% del tempo al mese (la disponibilità giornaliera è misurata sulle 24 ore) L’affidabilità (reliability) del sito deve essere di almeno 75% al mese (Reliability = Availability / (Availability + Unscheduled Downtime)) I periodi di scheduled downtime devono essere dichiarati in anticipo sul GOC-DB Gli Scheduled Downtime incidono negativamente sulla availability, ma non sulla reliability

GRID Services Allow you to use the grid resources: Resource Broker (RB) / Workload Management System (WMS): they are responsible for the acceptance of submitted jobs and for sending those jobs to the appropriate resources Information System (IS): provides information about the grid resources and their status Virtual Organization Management System (VOMS): database for the authentication and authorization of the users Gridice: monitoring of resources, services and jobs Home Location Register (HLR): database for the accounting informations of the usage of resources LCG file catalog (LFC): file catalog File Transfer Service (FTS): file movements in an efficient and reliable way MonBox: collector for local data of R-GMA

General Purpose Services (I) http://grid-it.cnaf.infn.it/index.php?gridservices0&type=1 2 Resource Brokers 1 Top Level BDII 2 voms servers + 1 replica per ciascuno 1 gridice server 1 LCG File Catalog server

General Purpose Services (II) 3 WMS 2 Logging & Bookkeeping 2 Resource Brokers 2 Top Level BDII 1 server MyProxy 1 FTS

Accounting using DGAS DGAS (Distributed Grid Accounting System) is fully deployed in INFNGrid (13 site HLRs + 1 HLR of 2nd level (testing). The site HLR is a service designed to manage a set of ‘accounts’ for the Computing Elements of a given computing site. For each job executed on a Computing Element (or a on local queue), the Usage Record for that job is stored on the database of the site HLR. Each site HLR can: Receive Usage Records from the registered Computing Elements. Answer to site manager queries such as: Datailed job list queries (with many search keys: per user, VO, FQAN ,CEId…) Aggregate usage reports, such as per hour, day, month…, with flexible search criteria. Optionally forward Usage Records to APEL database. Optionally forward Usage Records to a VO specific HLR. Site HLR Site layer Usage Metering Resource’s layer -Aggregate site info -VO (with role/group) usage on the site. Detailed Resource Usage info Job level info GOC

Tier1 & Tier2 HLRs 11 Home Location Register di sito per Tier1 e Tier2 HLR prod-hlr-01.ct.infn.it   (INFN-CATANIA) reference for central-southern area sites host sito hlr-t1.cr.cnaf.infn.it INFN-T1  prod-hlr-02.ct.infn.it  INFN-CATANIA  prod-hlr-01.pd.infn.it  INFN-PADOVA prod-hlr-01.ba.infn.it  INFN-BARI atlashlr.lnf.infn.it  INFN-FRASCATI t2-hlr-01.lnl.infn.it  INFN-LEGNARO prod-hlr-01.mi.infn.it  INFN-MILANO  t2-hlr-01.na.infn.it INFN-NAPOLI (ATLAS, PAMELA) gridhlr.pi.infn.it INFN-PISA t2-hlr-01.roma1.infn.it  INFN-ROMA1, INFN-ROMA1-CMS, INFN-ROMA1-VIRGO grid005.to.infn.it  INFN-TORINO ENEA-INFO INFN-ROMA3 INFN-CAGLIARI ITB-BARI INFN-LECCE SPACI-CS-IA64 INFN-LNS SPACI-LECCE-IA64 INFN-NAPOLI-CMS SPACI-NAPOLI INFN-ROMA2 SPACI-NAPOLI-IA64 11 Home Location Register di sito per Tier1 e Tier2 2 HLRs per i siti medio-piccoli CNR-ILC-PISA INFN-GENOVA CNR-PROD-PISA INFN-PARMA INAF-TRIESTE INFN-PERUGIA INFN-CNAF INFN-TRIESTE INFN-BOLOGNA SNS-PISA INFN-FERRARA UNIV-PERUGIA INFN-FIRENZE HLR prod-hlr-01.pd.infn.it  (INFN-PADOVA) reference for central-northern area sites

VO Dedicated Services (I)

VO Dedicated Services (II)

Experimental Services Tests su alcuni componenti rilasciati dagli sviluppatori, in parallelo con SA3 Applicazione delle ultime patch appena rilasciate su alcuni WMS presenti in produzione, per consentire alle VO di testarne la compatibilità con i loro tools CreamCE: in collaborazione con alcuni siti in cui sono state installate diverse istanze

Other Services

Deployment Status (I) 45 Siti in totale: 35 Siti attivi SITE STATUS CNR-ILC-PISA CERTIFIED INFN-PERUGIA CNR-PROD-PISA INFN-PISA ENEA-INFO INFN-ROMA1 ESA-ESRIN INFN-ROMA1-CMS INFN-BARI INFN-ROMA1-VIRGO INFN-BOLOGNA INFN-ROMA2 INFN-CATANIA INFN-ROMA3 INFN-CNAF INFN-T1 INFN-CNAF-LHCB INFN-TORINO INFN-FERRARA INFN-TRIESTE INFN-FRASCATI SNS-PISA INFN-GENOVA SPACI-CS-IA64 INFN-LECCE UNI-PERUGIA INFN-LNL-2 INAF-TRIESTE HW PROBLEMS INFN-LNS INFN-CAGLIARI INFN-MILANO INFN-CASCINA INFN-NAPOLI INFN-FIRENZE Supp. Unavailable INFN-NAPOLI-ATLAS ITB-BARI Cooling Maint. INFN-NAPOLI-CMS SISSA-TRIESTE TESTs ONGOING INFN-NAPOLI-PAMELA SPACI-LECCE-IA64 HW & MW PROBLEMS INFN-PADOVA SPACI-NAPOLI INFN-PARMA SPACI-NAPOLI-IA64 NEW SITE STATUS INFN-NAPOLI-ARGO CANDIDATE 45 Siti in totale: 35 Siti attivi 2 siti in fase di certificazione 32 siti INFN 13 siti di altri enti (cnr, enea, esa, inaf, spaci, univ.PG) 3 siti con architettura IA64 (1 attivo)

Release INFNGRID Based on gLite3 We are still in a O.S. transition phase: there are two releases INFNGRID: 3.0 for SL3, 3.1 for SL4 Several customizations: additional VOs (~20) accounting (DGAS): New profile (HLR server) + additional packages on CE and WN monitoring (GRIDICE) Quattor (collaboration with CNAF-T1) Dynamic Information-Providers for LSF: corrected configuration, new vomaxjobs (3.1/SL4 WIP) transparent support to MPICH and MPICH-2 GRelC (Grid Relational Catalog) StoRM (Storage Resource Manager) GFAL Java API & NTP Work-in-progress: creamCE patched MyProxy (long-live proxy delegation with voms extensions) AMGA Web Interface GSAF (Grid Storage Access Framework) Secure Storage System gLite for Windows with torque/maui support

Deployment Status (II) EGEE gLite 3.1 updates (http://glite.web.cern.ch/glite/packages/R3.1/updates.asp) 09.06.08 - 3.1 Update 26 29.05.08 - 3.1 Update 25 22.05.08 - 3.1 Update 24 16.05.08 - 3.1 Update 23 13.05.08 - 3.1 Update 22 24.04.08 - 3.1 Update 21 22.04.08 - 3.1 Update 20 15.04.08 - 3.1 Update 19 07.04.08 - 3.1 Update 18 19.03.08 - 3.1 Update 17 06.03.08 - 3.1 Update 16 27.02.08 - 3.1 Update 15 21.02.08 - 3.1 Update 14 INFNGRID gLite 3.1 Update 22/23/24/25/26 (SL4) – 24/06/2008 INFNGRID gLite 3.1 Update 18/19/20/21 (SL4) - 28/04/2008 INFNGRID gLite 3.1 Update 17 (SL4) - 01/04/2008 INFNGRID gLite 3.1 Update 14/15/16 (SL4) - 18/03/2008

Deployment Status (III) EGEE gLite 3.0 updates (http://glite.web.cern.ch/glite/packages/R3.0/updates.asp) 22.05.08 - 3.0.2 Update 43 18.04.08 - 3.0.2 Update 42 19.03.08 - 3.0.2 Update 41 04.03.08 - 3.0.2 Update 40 25.01.08 - 3.0.2 Update 39 14.01.08 - 3.0.2 Update 38 28.11.07 - 3.0.2 Update 37 12.11.07 - 3.0.2 Update 36 31.10.07 - 3.0.2 Update 35 INFNGRID gLite 3.0 Update 43 (SL3) - 24/06/2008 INFNGRID gLite 3.0 Update 42 (SL3) - 28/04/2008 INFNGRID gLite 3.0 Update 41 (SL3) - 01/04/2008 INFNGRID gLite 3.0 Update 40 (SL3) - 18/03/2008 INFNGRID gLite 3.0 Update 39 (SL3) - 05/02/2008 INFNGRID gLite 3.0 Update 38 (SL3) - 25/01/2008 INFNGRID gLite 3.0 Update 37 (SL3) - 05/12/2007 INFNGRID gLite 3.0 Update 35/36 (SL3) - 11/29/2007

Supported VOs 49 VOs supported: 4 LHC (ALICE, ATLAS, CMS, LHCB) 3 test (DTEAM, OPS, INFNGRID) 20 Regional 1 catch all VO: GRIDIT 21 Other VOs 27

VO Regionali 2376 utenti registrati in CDF VO utenti argo 25 bio 68 compassit 7 compchem 59 cyclops 13 egrid 28 enea 12 enmr.eu 14 euchina 61 euindia 51 eumed 99 gridit 132 inaf 27 infngrid 207 ingv libi 17 lights.infn.it 16 pamela 19 planck 33 theophys 57 virgo 18 2376 utenti registrati in CDF

Introducing a new VO When an experiment asks to enter in grid and to form a new VO, it is necessary a formal request follwed by some technical steps. Formal Part: Needed resources and economical contribution to agree between the experiment and the grid Executive Board (EB) Pick out the software that will be used and verify its functioning Verify the possibility of the support in the several INFN-GRID production sites Communicate to IT-ROC the names of VO-managers, Software-managers, persons responsible of resources and of the support for the software experiment for the users in every site Software requisites, kind of job and the storage final destination (CASTOR, SE, experiment disk server)

Introducing a new VO Once the Executive Board (EB) has approved the experiment request, the technical part begins: IT-ROC will create the VO on its voms server (if doesn’t exist one) IT-ROC will create the VO support group on the ticketing system VO-manager fill in the VO identity card on the CIC portal IT-ROC will make known the existence of the new VO and inform the sites how to enable it

HLRmon

WMS MONITOR (I)

WMS MONITOR (II)

Useful links… Italian grid project: http://grid.infn.it/ Italian production grid: http://grid-it.cnaf.infn.it/ SAM: https://lcg-sam.cern.ch:8443/sam/sam.py CIC Portal: http://cic.gridops.org/ GSTAT: http://goc.grid.sinica.edu.tw/goc/ GridICE: http://gridice4.cnaf.infn.it:50080/gridice/site/site.php EGEE SA1 Failover: http://www.gridops.org/ HLR MON: https://dgas.cnaf.infn.it/hlrmon/report/charts.php WMS MON: https://cert-wms-01.cnaf.infn.it:8443/wmsmon/main/main.php