Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Introduzione al progetto INFNGRID Alessandro Paolini (INFN-CNAF) II corso di formazione INFN per amministratori di siti GRID ICTP TRIESTE 24 – 28 Novembre.

Similar presentations


Presentation on theme: "1 Introduzione al progetto INFNGRID Alessandro Paolini (INFN-CNAF) II corso di formazione INFN per amministratori di siti GRID ICTP TRIESTE 24 – 28 Novembre."— Presentation transcript:

1 1 Introduzione al progetto INFNGRID Alessandro Paolini (INFN-CNAF) II corso di formazione INFN per amministratori di siti GRID ICTP TRIESTE 24 – 28 Novembre 2008

2 2 Primary components of the production grid The primary components of the Italian Production Grid are: Computing and storage resources Access point to the grid Services Other elements are as much fundamental for the working, managing and monitoring of the grid: Middleware Monitoring tool Accounting tool Management and control infrastructure Users

3 3 GRID Management Grid management is performed by the Italian Regional Operation Center (ROC). The main activities are: Production of Infngrid release and test Deployment of the release to the sites, support to local administrators and sites certification Periodical check of the resources and services status Support at an Italian level Support at an European level Introduction of new Italian sites in the grid Introduction of new regional VOs in the grid

4 4 The Italian Regional Operation Center (ROC) One of 10 existing ROC in EGEE Operations Coordination Centre (OCC) –Management, oversight of all operational and support activities Regional Operations Centres (ROC) –providing the core of the support infrastructure, each supporting a number of resource centres within its region Grid Operator on Duty Grid User Support (GGUS) –At FZK, coordination and management of user support, single point of contact for users

5 5 About 20 supporters perform a checking activity composed of 1 shift per day, from Monday to Friday, with 2 person per shift, during which a report is compiled: Checking of the grid status and problem warning, tailing them until their solution if possible Doing sites certification during the deployment phases Checking of the ticket still opened and pressing the expert or the site-managers for answering and solving them Central Management Team (CMT) Shifts

6 6 Users and sites support EGEE make use of the GGUS (Global Grid UserSupport) ticketing system Each ROC utilizes different tools interfaced to GGUS in a bidirectional Way. By means of Web services, it is possible to: Transfer tickets from the global to regional system Transfer tickets from the regional to the global system The user groups support, whom ticket will be addressed, are defined Either in GGUS either in the regional systems In the Italian Regional Operation Centre the ticketing system utilized is based on XOOPS/xHelp

7 7 Sistema di supporto italiano

8 8 A new ticket arrives from ggus We assign the ticket to the site concerning it Interface to GGUS

9 9 The site reassigns the ticket to GGUS… …and adds a response Interface to GGUS

10 10 Service Availability Monitoring (SAM) SAM jobs are launched every hour and allow to find out submission problem, among which batch system errors, CA not updated and replicas errors There are also more specific tests for SRMv2, SE, LFC, CREAMCE, WMS, BDII..

11 11 CE Tests Service Availability Monitoring (SAM)

12 12 RGMA (MONBOX) Tests Test if the service host certificate is valid On MONBOX host certificate is also present in rgma and tomcat (hidden) directories which usually are: /etc/tomcat5/ /opt/glite/var/rgma/.certs/ So when you change the host certificate you have to restart also /etc/rc.d/init.d/rgma-servicetool /etc/rc.d/init.d/tomcat5 Service Availability Monitoring (SAM)

13 13 SAM jobs available for both EGEE production and preproduction sites each site-manager can submit new sam test on his site Each ROC can submit new tests job of the site of own region SAM Admin

14 14 GSTAT queries the Information System every 5 minutes The sites and nodes checked are those registered in the GOC DB The inconsistency of the information published and the eventual missing of a service that a site should publish are reported as an error GSTAT

15 15 Before entering in grid, each site have to accept several norms of behaviour, described in a Memorandum of Understanding (MoU). The COLG (Grid Local Coordinator) read and sign it, and they fax this document to INFN-CNAF. Moreover all sites must provide this email alias: grid-prod@. This alias will be used to report problems and it will be added to the site managers' mailing list. Of course it should include all site managers of your grid site. At this point, IT-ROC register site and site-managers in the GOC- DB, and create a supporter-operative group in the ticketing system XOOPS. XOOPS site-managers have to register in XOOPS, so they can be assigned to their supporter-operative groups; each site-manager has to register in the test VOs infngrid and dteam Introducing a new site

16 16 Site-managers install the middleware, following the instructions distribuited by the Release Team (http://grid- it.cnaf.infn.it/ Installation section). When finished, they make some preliminary test (http://grid-it.cnaf.infn.it/ --> Test&Cert --> Fry) and then theyhttp://grid- it.cnaf.infn.it/http://grid-it.cnaf.infn.it/ make the request to be certified by own ROC. IT-ROC log a ticket to communicate with site-managers during the certification. Introducing a new site

17 17 Every site have to: Provide computing and storage resources. Farm dimensions (at least 10 cpu) and storage capacity will be agreed with each site Guarantee sufficient man power to manage the site: at least 2 persons Manage efficently the site resources: middleware installation and upgrade, patch application, configuration changes as requested by CMT and do that by the maximum time stated for the several operation Answer to the ticket by 24 hours (T2) or 48 hours (other sites) from Mon to Fry Check from time to time own status Guarantee continuity to site management and support, also in holidays period Partecipate to SA1/Production-Grid phone conferences an meetings and compile weekly pre report Keep updated the information on the GOC DB Enable test VOs (ops, dteam and infngrid), with a higher priority than other VOs Eventual non-fulfilment noticed by ROC will be referred to the biweekly INFNGRID phone conferences, then to COLG, eventually to EB Memorandum of Understanding

18 18 3 phases: candidate Monitoring is OFF uncertified Monitoring can be turned on certified Monitoring is ON Registration in GOC DB

19 19 Viene preso in cosiderazione lo stato dei servizi CE,SE,SRM e sBDII come risulta dagli esiti dei test SAM Viene applicato un AND logico tra questi servizi ed un OR logico tra i servizi dello stesso tipo nel caso un sito abbia più istanze di uno stesso servizio Un sito deve risultare disponibile (available) almeno il 70% del tempo al mese (la disponibilità giornaliera è misurata sulle 24 ore) L’affidabilità (reliability) del sito deve essere di almeno 75% al mese (Reliability = Availability / (Availability + Unscheduled Downtime)) I periodi di scheduled downtime devono essere dichiarati in anticipo sul GOC-DB Gli Scheduled Downtime incidono negativamente sulla availability, ma non sulla reliability Availability & Reliability

20 20 GRID Services Allow you to use the grid resources: Resource Broker (RB) / Workload Management System (WMS): they are responsible for the acceptance of submitted jobs and for sending those jobs to the appropriate resources Information System (IS): provides information about the grid resources and their status Virtual Organization Management System (VOMS): database for the authentication and authorization of the users Gridice: monitoring of resources, services and jobs Home Location Register (HLR): database for the accounting informations of the usage of resources LCG file catalog (LFC): file catalog File Transfer Service (FTS): file movements in an efficient and reliable way MonBox: collector for local data of R-GMA

21 21 Access to the GRID Access by means of an User Interface (UI). It could be: –A dedicated PC, installed in a similar way to the others grid elements –UI Plug-and-Play (UI PnP), a software you can install on any pc without root privilegies –A web portal: https://genius.ct.infn.it/https://genius.ct.infn.it/ To access the GRID you need a personal certificate released by a Certification Authority trusted by EGEE/LCG infrastructure: the user authentication is performed through X-509 certificates To be authorized to submit jobs you have to belong to a Virtual Organisation (VO). A VO is a kind of users group usually working on the same project and using the same application software on the grid.

22 22 General Purpose Services (I) 1 Resource Broker 1 WMS 1 LB 1 Top Level BDII 2 voms servers + 1 replica per ciascuno 1 gridice server 1 LCG File Catalog server http://grid-it.cnaf.infn.it/index.php?gridservices0&type=1

23 23 General Purpose Services (II) 3 WMS 2 Logging & Bookkeeping 1 Resource Brokers 5 in 1 Top Level BDII 1 server MyProxy 1 FTS

24 24 VOMS Servers (I)

25 25 VOMS Servers (II)

26 26 Accounting using DGAS DGAS (Distributed Grid Accounting System) is fully deployed in INFNGrid (13 site HLRs + 1 HLR of 2 nd level (testing). The site HLR is a service designed to manage a set of ‘accounts’ for the Computing Elements of a given computing site. For each job executed on a Computing Element (or a on local queue), the Usage Record for that job is stored on the database of the site HLR. Each site HLR can: Receive Usage Records from the registered Computing Elements. Answer to site manager queries such as: –Datailed job list queries (with many search keys: per user, VO, FQAN,CEId…) –Aggregate usage reports, such as per hour, day, month…, with flexible search criteria. Optionally forward Usage Records to APEL database. Optionally forward Usage Records to a VO specific HLR. Site HLR Site layer Usage MeteringResource’s layer -Aggregate site info -VO (with role/group) usage on the site. -Detailed Resource Usage info -Job level info GOC

27 27 Tier1 & Tier2 HLRs hostsito hlr-t1.cr.cnaf.infn.itINFN-T1 prod-hlr-02.ct.infn.it INFN-CATANIA prod-hlr-01.pd.infn.it INFN-PADOVA prod-hlr-01.ba.infn.it INFN-BARI atlashlr.lnf.infn.it INFN-FRASCATI t2-hlr-01.lnl.infn.it INFN-LEGNARO prod-hlr-01.mi.infn.it INFN-MILANO t2-hlr-01.na.infn.itINFN-NAPOLI, INFN-NAPOLI-ATLAS gridhlr.pi.infn.itINFN-PISA t2-hlr-01.roma1.infn.it INFN-ROMA1, INFN-ROMA1-CMS, INFN-ROMA1-VIRGO grid005.to.infn.it INFN-TORINO HLR prod-hlr-01.pd.infn.it (INFN-PADOVA) reference for central-northern area sites CNR-ILC-PISAINFN-GENOVA CNR-PROD-PISAINFN-PARMA INAF-TRIESTEINFN-PERUGIA INFN-CNAFINFN-TRIESTE INFN-BOLOGNASNS-PISA INFN-FERRARAUNIV-PERUGIA INFN-FIRENZE HLR prod-hlr-01.ct.infn.it (INFN-CATANIA) reference for central-southern area sites ENEA-INFOINFN-ROMA3 INFN-CAGLIARIITB-BARI INFN-LECCESPACI-CS-IA64 INFN-LNSSPACI-LECCE-IA64 INFN-NAPOLI-ARGOSPACI-LECCE INFN-NAPOLI-CMSSPACI-NAPOLI INFN-ROMA2SPACI-NAPOLI-IA64 11 Home Location Register di sito per Tier1 e Tier2 2 HLRs per i siti medio-piccoli

28 28 VO Dedicated Services (I)

29 29 VO Dedicated Services (II)

30 30 Experimental Services Tests su alcuni componenti rilasciati dagli sviluppatori, in parallelo con SA3 –Applicazione delle ultime patch appena rilasciate su alcuni WMS presenti in produzione, per consentire alle VO di testarne la compatibilità con i loro tools –CreamCE: in collaborazione con alcuni siti in cui sono state installate diverse istanze

31 31 Other Services

32 32 Deployment Status (I) 48 Siti in totale: 38 Siti attivi 1 siti in fase di certificazione 34 siti INFN 14 siti di altri enti (cnr, enea, esa, inaf, spaci, univ.PG) 3 siti con architettura IA64 (1 attivo) SITESTATUSSITESTATUS CNR-ILC-PISACERTIFIEDINFN-PERUGIACERTIFIED CNR-PROD-PISACERTIFIEDINFN-PISACERTIFIED ENEA-INFOCERTIFIEDINFN-ROMA1CERTIFIED ESA-ESRINCERTIFIEDINFN-ROMA1-CMSCERTIFIED INFN-BARICERTIFIEDINFN-ROMA1-TEOCERTIFIED INFN-BOLOGNACERTIFIEDINFN-ROMA1-VIRGOCERTIFIED INFN-CAGLIARICERTIFIEDINFN-ROMA3CERTIFIED INFN-CATANIACERTIFIEDINFN-T1CERTIFIED INFN-CNAFCERTIFIEDINFN-TORINOCERTIFIED INFN-CNAF-LHCBCERTIFIEDINFN-TRIESTECERTIFIED INFN-CSCERTIFIEDSNS-PISACERTIFIED INFN-FERRARACERTIFIEDSPACI-LECCECERTIFIED INFN-FRASCATICERTIFIEDSPACI-CS-IA64CERTIFIED INFN-GENOVACERTIFIEDUNI-PERUGIACERTIFIED INFN-LNL-2CERTIFIEDINFN-LECCEfarm migration to sl4 INFN-LNSCERTIFIEDINFN-ROMA2farm migration to sl4 INFN-MILANOCERTIFIEDSISSA-TRIESTETESTs ONGOING INFN-NAPOLICERTIFIEDINAF-TRIESTEHW PROBLEMS INFN-NAPOLI-ARGOCERTIFIEDINFN-CASCINAHW PROBLEMS INFN-NAPOLI-ATLASCERTIFIEDINFN-FIRENZESupp. Unavailable INFN-NAPOLI-CMSCERTIFIEDITB-BARICooling Maint. INFN-NAPOLI-PAMELACERTIFIEDSPACI-LECCE-IA64HW & MW PROBLEMS INFN-PADOVACERTIFIEDSPACI-NAPOLIHW PROBLEMS INFN-PARMACERTIFIEDSPACI-NAPOLI-IA64HW & MW PROBLEMS

33 33 Release INFNGRID Based on gLite3 We are still in a O.S. transition phase: there are two releases INFNGRID: 3.0 for SL3, 3.1 for SL4 Several customizations: additional VOs (~20) Secure Storage System CreamCE accounting (DGAS): New profile (HLR server) + additional packages on CE and WN monitoring (GRIDICE) Quattor (collaboration with CNAF-T1) Dynamic Information-Providers for LSF: corrected configuration, new vomaxjobs (3.1/SL4) transparent support to MPICH and MPICH-2 GRelC (Grid Relational Catalog) StoRM (Storage Resource Manager) GFAL Java API & NTP Work-in-progress: patched MyProxy (long-live proxy delegation with voms extensions) AMGA Web Interface GSAF (Grid Storage Access Framework) gLite for Windows with torque/maui support

34 34 Deployment Status (II) EGEE gLite 3.1 updates ( http://glite.web.cern.ch/glite/packages/R3.1/updates.asp) 16.10.08 - 3.1 Update 34 08.10.08 - 3.1 Update 33 01.10.08 - 3.1 Update 32 16.09.08 - 3.1 Update 31 01.09.08 - 3.1 Update 30 18.08.08 - 3.1 Update 29 06.08.08 - 3.1 Update 28 03.07.08 - 3.1 Update 27 09.06.08 - 3.1 Update 26 29.05.08 - 3.1 Update 25 22.05.08 - 3.1 Update 24 16.05.08 - 3.1 Update 23 13.05.08 - 3.1 Update 22 24.04.08 - 3.1 Update 21 22.04.08 - 3.1 Update 20 15.04.08 - 3.1 Update 19 07.04.08 - 3.1 Update 18 INFNGRID gLite 3.1 Update 18/19/20/21 (SL4) - 28/04/2008 INFNGRID gLite 3.1Update 22/23/24/25/26 (SL4) – 24/06/2008 INFNGRID gLite 3.1Update 27 (SL4) – 22/07/2008 INFNGRID gLite 3.1Update 28/29 (SL4) – 27/08/2008 INFNGRID gLite 3.1Update 30/31 (SL4) – 23/09/2008 INFNGRID gLite 3.1Update 32/33/34 (SL4) – 07/11/2008

35 35 Deployment Status (III) EGEE gLite 3.0 updates (http://glite.web.cern.ch/glite/packages/R3.0/updates.asp)http://glite.web.cern.ch/glite/packages/R3.0/updates.asp 02.10.08 - 3.0.2 Update 44 22.05.08 - 3.0.2 Update 43 18.04.08 - 3.0.2 Update 42 19.03.08 - 3.0.2 Update 41 04.03.08 - 3.0.2 Update 40 25.01.08 - 3.0.2 Update 39 14.01.08 - 3.0.2 Update 38 28.11.07 - 3.0.2 Update 37 12.11.07 - 3.0.2 Update 36 31.10.07 - 3.0.2 Update 35 INFNGRID gLite 3.0 Update 35/36 (SL3) - 11/29/2007 INFNGRID gLite 3.0 Update 37 (SL3) - 05/12/2007 INFNGRID gLite 3.0 Update 38 (SL3) - 25/01/2008 INFNGRID gLite 3.0 Update 39 (SL3) - 05/02/2008 INFNGRID gLite 3.0 Update 43 (SL3) – 24/06/2008 INFNGRID gLite 3.0 Update 42 (SL3) - 28/04/2008 INFNGRID gLite 3.0 Update 41 (SL3) - 01/04/2008 INFNGRID gLite 3.0 Update 40 (SL3) - 18/03/2008 INFNGRID gLite 3.0 Update 44 (SL3) – 02/10/2008

36 36 Supported VOs 36 52 VOs supported: 4 LHC (ALICE, ATLAS, CMS, LHCB) 3 test (DTEAM, OPS, INFNGRID) 23 Regional 1 catch all VO: GRIDIT 21 Other VOs

37 37 VO Regionali VO utenti argo33 bio69 compassit8 compchem73 cyclops15 egrid28 enea13 enmr.eu24 euchina61 euindia62 eumed101 glast.org6 gridit136 inaf27 infngrid214 ingv13 libi17 lights.infn.it22 pamela20 planck38 theophys67 tps.infn.it3 virgo29 2268 utenti registrati in CDF

38 38 When an experiment asks to enter in grid and to form a new VO, it is necessary a formal request follwed by some technical steps. Formal Part: Needed resources and economical contribution to agree between the experiment and the grid Executive Board (EB) Pick out the software that will be used and verify its functioning Verify the possibility of the support in the several INFN-GRID production sites Communicate to IT-ROC the names of VO-managers, Software-managers, persons responsible of resources and of the support for the software experiment for the users in every site Software requisites, kind of job and the storage final destination (CASTOR, SE, experiment disk server) Introducing a new VO

39 39 Once the Executive Board (EB) has approved the experiment request, the technical part begins: IT-ROC will create the VO on its voms server (if doesn’t exist one) IT-ROC will create the VO support group on the ticketing system VO-manager fill in the VO identity card on the CIC portal IT-ROC will make known the existence of the new VO and inform the sites how to enable it Introducing a new VO

40 40 Every VO supported in EGEE has to be registered in the VO identity card: General informations VOMS informations Contacts CIC Portal has been created as a part of the SA1 activity. It is dedicated to ensure: to be a management and operations tool to be an entry point for all Egee actors for their operational needs to manage the available informations about EGEE VOs and related VOs to monitor and ensure grid day-to-day operations on grid resources and services CIC Portal and VO Identity Card

41 41 The Freedom of Choice for Resources is a VO Policy enforcement tool, to manipulate top-level BDIIs. It is fully integrated with the SAM ( Service Availiblity Monitoring ) framework. FCR allows the VOs to define a preference on Grid resources, optionally taking the SAM test results in account as well. Only VO responsibles (VO Software Managers, etc.) can get access to the FCR Admin Pages, where they can modify their VO's FCR profile. They can select the: set of Critical Tests for all services set of Site Resources (CEs and SEs) to be used by the VO set of Central Service Nodes ( Note : this will be used in the future) Changes are written to the database (that's shared with SAM ), and an LDAP ldif file is created, which the top- level BDIIs download in every 2 minutes in order to apply Site Resources changes. Freedom of Choice for Resources

42 42 HLRmon

43 43 WMS MONITOR (I)

44 44 WMS MONITOR (II)

45 45 Useful links Italian grid project: http://grid.infn.it/http://grid.infn.it/ Italian production grid: http://grid-it.cnaf.infn.it/http://grid-it.cnaf.infn.it/ HLR MON: https://dgas.cnaf.infn.it/hlrmon/report/charts.phphttps://dgas.cnaf.infn.it/hlrmon/report/charts.php WMS MON: https://cert-wms-01.cnaf.infn.it:8443/wmsmon/main/main.phphttps://cert-wms-01.cnaf.infn.it:8443/wmsmon/main/main.php gLite Middleware: http://glite.web.cern.ch/glite/default.asphttp://glite.web.cern.ch/glite/default.asp


Download ppt "1 Introduzione al progetto INFNGRID Alessandro Paolini (INFN-CNAF) II corso di formazione INFN per amministratori di siti GRID ICTP TRIESTE 24 – 28 Novembre."

Similar presentations


Ads by Google