INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org NE ROC overview Jules Wolfrat/ Per Öster Pisa.

Slides:



Advertisements
Similar presentations
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Torsten Antoni – LCG Operations Workshop, CERN 02-04/11/04 Global Grid User Support - GGUS -
Advertisements

INFSO-RI Enabling Grids for E-sciencE Sven Hermann, Clemens Koerdt Forschungszentrum Karlsruhe ROC DECH Report.
1 Deployment of an LCG Infrastructure in Australia How-To Setup the LCG Grid Middleware – A beginner's perspective Marco La Rosa
08/11/908 WP2 e-NMR Grid deployment and operations Technical Review in Brussels, 8 th of December 2008 Marco Verlato.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
SA1 / Operation & support Enabling Grids for E-sciencE Integration of heterogeneous computational resources in.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
SA1 / Operation & support Enabling Grids for E-sciencE Integration of heterogeneous computational resources in.
INFSO-RI Enabling Grids for E-sciencE Status of LCG-2 porting Stephen Childs, Brian Coghlan and Eamonn Kenny Grid-Ireland/EGEE October.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Configuring and Maintaining EGEE Production.
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: GridKA School 2009 MPI on Grids 1 MPI On Grids September 3 rd, GridKA School 2009.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Enabling Grids for E-sciencE ENEA and the EGEE project gLite and interoperability Andrea Santoro, Carlo Sciò Enea Frascati, 22 November.
Monitoring in EGEE EGEE/SEEGRID Summer School 2006, Budapest Judit Novak, CERN Piotr Nyczyk, CERN Valentin Vidic, CERN/RBI.
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
SouthGrid SouthGrid SouthGrid is a distributed Tier 2 centre, one of four setup in the UK as part of the GridPP project. SouthGrid.
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft 1 Institute for Scientific Computing in the Forschungszentrum Karlsruhe Overview Rainer Kupsch.
Training and Dissemination Enabling Grids for E-sciencE Jinny Chien, ASGC 1 Training and Dissemination Jinny Chien Academia Sinica Grid.
INFSO-RI Enabling Grids for E-sciencE SA1 and gLite: Test, Certification and Pre-production Nick Thackray SA1, CERN.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Next steps with EGEE EGEE training community.
CSCS Status Peter Kunszt Manager Swiss Grid Initiative CHIPP, 21 April, 2006.
Certification and test activity IT ROC/CIC Deployment Team LCG WorkShop on Operations, CERN 2-4 Nov
INFSO-RI Enabling Grids for E-sciencE “Round the regions” - France P. Girard ARM-7, Krakow,
Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Usage of virtualization in gLite certification Andreas Unterkircher.
INFSO-RI Enabling Grids for E-sciencE OSG-LCG Interoperability Activity Author: Laurence Field (CERN)
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.
Evolution of a High Performance Computing and Monitoring system onto the GRID for High Energy Experiments T.L. Hsieh, S. Hou, P.K. Teng Academia Sinica,
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
INFSO-RI Enabling Grids for E-sciencE EGEE SA1 in EGEE-II – Overview Ian Bird IT Department CERN, Switzerland EGEE.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The EGEE User Support Infrastructure Torsten.
8 th CIC on Duty meeting Krakow /2006 Enabling Grids for E-sciencE Feedback from SEE first COD shift Emanoil Atanassov Todor Gurov.
EGEE is a project funded by the European Union under contract IST Support in EGEE Ron Trompert SARA NEROC Meeting, 28 October
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
INFSO-RI Enabling Grids for E-sciencE An overview of EGEE operations & support procedures Jules Wolfrat SARA.
Certification and test activity ROC/CIC Deployment Team EGEE-SA1 Conference, CNAF – Bologna 05 Oct
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
INFSO-RI Enabling Grids for E-sciencE /10/20054th EGEE Conference - Pisa1 gLite Configuration and Deployment Models JRA1 Integration.
LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006.
INFSO-RI Enabling Grids for E-sciencE The EGEE Project Owen Appleton EGEE Dissemination Officer CERN, Switzerland Danish Grid Forum.
INFSO-RI SA2 ETICS2 first Review Valerio Venturi INFN Bruxelles, 3 April 2009 Infrastructure Support.
INFSO-RI Enabling Grids for E-sciencE gLite Certification and Deployment Process Markus Schulz, SA1, CERN EGEE 1 st EU Review 9-11/02/2005.
INFSO-RI Enabling Grids for E-sciencE gLite Test and Certification Effort Nick Thackray CERN.
INFN GRID Production Infrastructure Status and operation organization Cristina Vistoli Cnaf GDB Bologna, 11/10/2005.
INFSO-RI Enabling Grids for E-sciencE Operations Parallel Session Summary Markus Schulz CERN IT/GD Joint OSG and EGEE Operations.
INFSO-RI Enabling Grids for E-sciencE Round the regions ROC managers Lyon.
EGEE is a project funded by the European Union under contract IST New VO Integration Fabio Hernandez ROC Managers Workshop,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operations: Evolution of the Role of.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
II EGEE conference Den Haag November, ROC-CIC status in Italy
– n° 1 Grid di produzione INFN – GRID Cristina Vistoli INFN-CNAF Bologna Workshop di INFN-Grid ottobre 2004 Bari.
1/3/2006 Grid operations: structure and organization Cristina Vistoli INFN CNAF – Bologna - Italy.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CYFRONET site report Marcin Radecki CYFRONET.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
BaBar & Grid Eleonora Luppi for the BaBarGrid Group TB GRID Bologna 15 febbraio 2005.
INFN-Grid WS, Bari, 2004/10/15 Andrea Caltroni, INFN-Padova Marco Verlato, INFN-Padova Andrea Ferraro, INFN-CNAF Bologna EGEE User Support Report.
Bob Jones EGEE Technical Director
Regional Operations Centres Core infrastructure Centres
Ian Bird GDB Meeting CERN 9 September 2003
Pierre Girard ATLAS Visit
Site availability Dec. 19 th 2006
Presentation transcript:

INFSO-RI Enabling Grids for E-sciencE NE ROC overview Jules Wolfrat/ Per Öster Pisa

Enabling Grids for E-sciencE INFSO-RI SA1 open meeting 26 October, Pisa 2 NE Resources 8 sites certified (BE, NL, SE) – delivering 526 CPUs – Adding 600 CPUs coming weeks 3 sites in certification process – Adding 200 CPUs VOs supported – National : NCF, PVIER, ASTRON, Nadc, VLe, Asci, BEtest – EGEE: ALICE, ATLAS, LHCb, CMS, Dzero, ESR,, Biomed, Magic Service challenge support – 2 tier1 centers – SARA/NIKHEF, NDGF

Enabling Grids for E-sciencE INFSO-RI SA1 open meeting 26 October, Pisa 3 Activities and Issues NE problem ticket procedures – Difficulties with the interface between the NE RT system and GGUS the procedure generates a mail loop between the 2 systems which generates a lot of new tickets Lack of robustness in GGUS SOAP interface – information must be sent in right order – NE RT system is functioning, but procedures are not all present to run a smooth daily service Focused activity started in summer to increase number of RCs, see –Nordic sites: –BEgrid and Belgrid joined Lack of dedicated application support –No NA4 involvement in NE federation

INFSO-RI Enabling Grids for E-sciencE Sven Hermann, Holger Marten Forschungszentrum Karlsruhe ROC DECH Report

Enabling Grids for E-sciencE INFSO-RI SA1 open meeting 26 October, Pisa 5 Centres in DECH contributing to the infrastructure DESY GSI FhG SCAI FZK HU Berlin LRZ München U Wuppertal RWTH Aachen TU Karlsruhe, IEKP CSCS, Manno FhG ITWM, Kaiserslautern Distributed ROC RC RC to be certified U Freiburg U Dortmund

Enabling Grids for E-sciencE INFSO-RI SA1 open meeting 26 October, Pisa 6 Resources in ROC DECH ServicesCount VOs GlobalRegional VO Server 936 RLS/RMC 734 RB 835 BDII 945 UI >7 ResourcesMagnitude VOs GlobalRegional Sites 990 Estimated CPUs Disk Storage 121 TB 0 Mass Storage Systems 312 EGEE certified Funded and Unfunded Partners + 4 more sites to be certified

Enabling Grids for E-sciencE INFSO-RI SA1 open meeting 26 October, Pisa 7 Services in DECH 1/2 FZK –Milestone October fulfilled: > 1550 CPUs, 310 TB disk (83 TB for LHC), 0,5 PB Tape –DECH Portal online –GridKa School 05 : largest event in region –Pre-Prod ready by 80% DESY –Storage management developments (dCache-SE with SRM) –Running Monte Carlo Production (Simulations) for H1, ZEUS and ILC in collaboration with 22 sites, 1800 CPUs Status: H1: 61 M Events, ZEUS: 343 M Events

Enabling Grids for E-sciencE INFSO-RI SA1 open meeting 26 October, Pisa 8 Services in DECH 2/2 FhG/SCAI –Participation in BioMed Data Challenge –Operating Meta Data Server for ESR –Hosting regional DECH VO (LCG and regional gLite testbed) GSI –Running SC and Analysis jobs for ALICE with 336 CPUs in total –Successful participation in LCG SC3 –Migrated from classic SE to dCache based SE with SRM front-end CSCS –Running production environment –Added support for H1 VO –Cannot migrate from Classic SE to DPM (CMS PhEDEx) fix end of month

Enabling Grids for E-sciencE INFSO-RI SA1 open meeting 26 October, Pisa 9 Issues by Site Managers Restrictions on OS –Need at least few Linux distributions –Source code should always be provided –List of required packages –Improve documentation of migrations Few supported batch systems (make it difficult for new sites to join) Firewall Management –High Performance FW –Port list –No change of iptables More Transparent Release management Interface to Fabric Management, put middleware into the fabric management tools deployment tools session

INFSO-RI Enabling Grids for E-sciencE UK/I ROC current issues and concerns Jeremy Coles

Enabling Grids for E-sciencE INFSO-RI SA1 open meeting 26 October, Pisa 11 UKI – current issues & concerns I VO boxes. Our LCG Tier-1 site (RAL) is evaluating the impact within SC3 but generally our sites are concerned. Some are shared sites and feel current VO box requirements will compromise site security policies. One box per site per VO will not scale. GridPP will evaluate responses to the security questionnaire and seek ways to agree a solution. Nevertheless some VOs will likely get less resources if VO boxes are a requirement. How will this area move forward? Middleware Releases. Sites were adjusting to regular releases and wanted this to continue. The “continuous update” model (for SC4) will introduce problems – what software version will sites need, when will they need to update, how will “versions” be tagged and what will be the operational and support impact be if sites are running different software versions? Nb. The EGEE II proposal stresses certification and testing models with release cycles. In addition what is the future of YAIM – will sites have to use the gLite installation tool and if so when? Defining and measuring availability. LCG sites are being asked to deliver to a specified level of service in an LCG MoU. How does this translate to EGEE sites.

Enabling Grids for E-sciencE INFSO-RI SA1 open meeting 26 October, Pisa 12 UKI – current issues & concerns II User/operational support. We need to present a support system that is easy to use and transparent in operation with efficient workflows. Do we tell all users the same thing whether EGEE wide or national? How do we improve the service (which requires more use) and keep users happy (who are reluctant to trust the system!)? Technical forums. Can we create forums to discuss technical issues like use of volatile vs permanent storage? Levels of SFT. The BDII can apply specific tests for VOs but the COD only reviews one set of critical tests. We need to move towards greater flexibility – eg. The UK National Grid Service could run and use SFT results but they would fail tests not important to them (e.g. LCG replication). This will soon be an issue for EGEE anyway – as LCG specific tools feature more in the middleware release. Different subset for regional sft is foreseen anyway Sites in sc3 have specific services need special SFT

INFSO-RI Enabling Grids for E-sciencE Asia Pacific ROC current issues and concerns Min Tsai

Enabling Grids for E-sciencE INFSO-RI SA1 open meeting 26 October, Pisa 14 AsiaPacific Federation Members include: –India, Korea, Japan, Singapore, Taiwan (8 sites) Participation with CMS and Atlas Service Challenge –Seeking storage requirements from Atlas –starting to contact more RC for participation Regions site’s experience and participation varies widely To improve this situation, we: –have compiled list of recommended startup documentation for new administrators –need to improve the level of communications in this large region –are considering providing centralized deployment model for some sites to improve reliability How have others done this?

Enabling Grids for E-sciencE INFSO-RI SA1 open meeting 26 October, Pisa 15 AsiaPacific Federation Deployment Tools –ROC use Quattor for WN installation High learning curve, got good support from CERN and NIKHEF –YAIM for service nodes Ticketing using OTRS –Automatically create tickets from GGUS s –But not integrated with GGUS Web services yet Common issues encountered in the region –Missing BDII entries (service endpoints) –Replication errors (could related to IS issue) –Queuing system level failures (jobs get stuck, maui malfunction) –Experiment software problem (updating, debugging etc) –Proxy expiration using UI with local time zone (switched to UTC) –Missing SW repository (missing entry in /etc/fstab, or simply NFS errors) –CA rpms not updated

INFSO-RI Enabling Grids for E-sciencE Production status - France P. Girard, R. Rumler 4th EGEE Conference, Pisa,

Enabling Grids for E-sciencE INFSO-RI SA1 open meeting 26 October, Pisa 17 Contents Status in the region Open issues

Enabling Grids for E-sciencE INFSO-RI SA1 open meeting 26 October, Pisa 18 Site status in the region Number of sites increases if counting geographically Three sites decided to put their resources together, two other ones will enter that cooperation (region of Paris) A new one under installation at Nantes Current list: –CGG –GRIF CEA-DAPNIA IN2P3-IPNO IN2P3-LAL IN2P3-LLRP IN2P3-LPNHE –IN2P3-CC –IN2P3-CPPM –IN2P3-LAPP –IN2P3-LPC –IN2P3-SUBATECH –IPSL-IPGP

Enabling Grids for E-sciencE INFSO-RI SA1 open meeting 26 October, Pisa 19 Open Issues –ROC support (TPM) –Managing production and pre-production together –Reliability and pertinence of SFT results –VO box implementation, setup, and configuration –Operations metrics

INFSO-RI Enabling Grids for E-sciencE IT ROC E. Ferro INFN Padova, P.Veronesi, A. Cavalli INFN Cnaf G. Bracco ENEA frascati

Enabling Grids for E-sciencE INFSO-RI SA1 open meeting 26 October, Pisa 21 Italian ROC The production infrastructure LCG/EGEE is used by other projects/experiments –Grid.it, egrid, Babar, Virgo, CDF, ARGO, MAGIC, Zeus,... Additional configuration for the middleware is provided at ROC level by a regional team: –More VOs: define once VO servers, poolaccounts, add VOMS certificates,... to reduce misconfiguration risks –MPI (requested by non-HEP sciences), additional GridICE config (monitor Wns), AFS read-only (CDF requirement),... Deploy additional middleware in a non intrusive way: –Since Nov VOMS, now in EGEE –DGAS (DataGrid Accounting System) –NetworkMonitor (monitor network connection quality) Of course 100% compatibility is mandatory

Enabling Grids for E-sciencE INFSO-RI SA1 open meeting 26 October, Pisa 22 Customizing Yaim This activity is not so complex because Yaim is designed to be flexible and extendible –An additional package (ig-yaim) provides scripts and configuration files to add/override features on standard lcg-yaim When needed new metapackages are created –Super-sets of the corresponding LCG metapackages Documentation is published at each release –Release notes, upgrade and installation guides Everything is available for site managers on a central repository Collaboration with the LCG Certification and Deployment Team via Savannah –request for new features, bugs, help request, suggestions, etc...

Enabling Grids for E-sciencE INFSO-RI SA1 open meeting 26 October, Pisa 23 Improving regional certification We would like to improve the process FRY, a set of basic low level tests –To be run on every grid element –Perform some boring checks (daemons, disks full, certificate, CRL, home directories,...) –Useful to tests nodes not yet in grid or debug nodes with troubles –To be updated on every release Use and extend the LCG certification test suite –Powerful and flexible framework for site testing maintained by G. Grodidier –We are extending it with additional modules to test our customizations Development of a generic grid middleware benchmark framework –Simulate the activity of one or more grid users –Measure the performance of users tools under stress conditions –Prototype implementation just started

Enabling Grids for E-sciencE INFSO-RI SA1 open meeting 26 October, Pisa 24 Open issues Computing resources, storage and services published via a unique site BDII –The GRID services information are tightly realated with the CE status –Is a separate site BDII reliable enough to avoid problems with the CE impact the SE access? Minimum resource center size (CPU and storage)? –Overhead for COD and Roc On Duty –General agreement or regional policy?

Enabling Grids for E-sciencE INFSO-RI SA1 open meeting 26 October, Pisa 25 Integration of ENEA-GRID multi-platform resources in EGEE G. Bracco, P. D'Angelo, S.Migliori, A.Quintiliani, F. Simoni ENEA INFO, Frascati, Roma (Italy), C.Sciò [Esse3Esse, Roma], A. Secco [Nice], M. Fontana, M. Leone [CRIAI, Portici] ENEA: Italian National Agency for New Technologies, Energy and Environment ENEA-GRID: the infrastructure providing an unified user environment and an homogeneous access method to ENEA computational resources distributed in 6 Italian sites, connected over WAN. ENEA-GRID computational resources: ~300 cpu: IBM SP; SGI Altix & Onyx; Linux clusters 32/64; Apple MacOSX cluster; Windows servers. Most relevant resource: IBM SP4 128 nodes ~1Tflops, located at Frascati site. ~600 registered users; installed storage ~15 TB Centro di Brindis i

Enabling Grids for E-sciencE INFSO-RI SA1 open meeting 26 October, Pisa 26 GRID functionalities (unique authentication, authorization, resource access and resource discovery) are provided using “mature”, multi-platform components that constitute ENEA-GRID middle-ware: Distributed File System: AFS/OpenAFS (native authentication & authorization) Resource Manager: LSF Multicluster (Job Forwarding Model) Unified user interface: Java & Citrix Technologies (including Web access) ENEA participation in EGEE/SA1 The focus of the participation of ENEA in other GRID projects is interoperability: sharing resources with other GRIDs must be compatible with ENEA GRID architecture -> gateway implementation. Moreover gateways are a simple approach to Multi-platform resources. ENEA EGEE site [ENEA-INFO] is located in Frascati and consists of: Standard EGEE Linux nodes: YAM/YUM/YAIM repository LCG 2.6, SE, UI EGEE/ENEA Linux Cluster: CE egce.frascati.enea.it, 16 WNs EGEE/ENEA Gateway to AIX: Linux CE egceaix.frascati.enea.it ENEA-GRID Architecture

Enabling Grids for E-sciencE INFSO-RI SA1 open meeting 26 October, Pisa 27 EGEE/ENEA Linux Cluster LCG 2.6 Middle-ware has been patched for compatibility with ENEA-GRID architecture, AFS and LSF: Pool account users are pre-defined as standard AFS user of ENEA GRID with AFS homes: e.g. for dteam VO $HOME=/afs/enea.it/grd_egee/user/dteam lcmaps has been modified to acquire an AFS token using gssklog. gssklogd daemon has been modified for compatibility with VO security. lsf job manager is used and JobManager.pm and lsf.pm have been modified for compatibility with AFS file system and ENEA LSF configuration. In this approach CE and WN share the same file system via AFS. Jobs can be submitted by edg-job-submit using INFN certification RB and setting explicitly CE: egce.frascati.enea.it. Implementing the gateway: one of the Linux WNs was modified and tested ● 1) /etc/grid-security and /opt/ directories was completely removed. ● 2) a new /opt/globus has been created with 2 subdirectories: bin and etc. ● 3) /opt/globus/etc/globus-user-env.sh is a link to the usual file in an AFS location ● 4) /opt/globus/bin/globus-url-copy & grid-proxy-info are links to wrappers in AFS ●e.g.: globus-url-copy -> wrapper ● wrapper: lsrun -m egce.frascati.enea.it globus-url-copy $*

Enabling Grids for E-sciencE INFSO-RI SA1 open meeting 26 October, Pisa 28 where lsrun is the LSF command for immediate execution on a remote host. As a result all data transfer between WN and RB are effectively done by the CE which shares $HOME with WN (AFS). EGEE/ENEA gateway to AIX With the described approach, a similar /opt/globus has been created on IBM SP hosts and the CE egceaix.frascati.enea.it differs from the Linux CE only for the name of the resource where LSF submits are performed. Successful tests have been performed with the same approach also with submissions to SGI/Irix 6.5, to Apple/MacOSX and SGI/Altix IA64 : hosts located at 3 different sites Frascati, Trisaia (Southern Italy), Casaccia Advantage of this approach: easy WN firewall configuration! Limitations: on WN monitoring and accounting middle-ware is not yet available ENEA-INFO site status: site certification in progress Conclusion: successful job submission and result retrieval to AIX, MacOSX,IRIX and Altix IA64 Worker Nodes has been demonstrated using a Linux gateway between EGEE LCG 2.6 and ENEA-GRID.