ALICE WLCG operations report Maarten Litmaath CERN IT-SDC ALICE T1-T2 Workshop Torino Feb 23, 2015 v1.2.

Slides:



Advertisements
Similar presentations
ALICE G RID SERVICES IP V 6 READINESS
Advertisements

New VOMS servers campaign GDB, 8 th Oct 2014 Maarten Litmaath IT/SDC.
ALICE Operations short summary and directions in 2012 WLCG workshop May 19-20, 2012.
Patricia Méndez Lorenzo (IT/GS) ALICE Offline Week (18th March 2009)
INFSO-RI Enabling Grids for E-sciencE Practicals on VOMS and MyProxy Emidio Giorgio INFN Retreat between GILDA and ESR VO, Bratislava,
G RID SERVICES IP V 6 READINESS
EVOLUTION OF THE EXPERIMENT PROBE SUBMISSION FRAMEWORK (SAM/NAGIOS) Marian Babik.
Marian Babik, Luca Magnoni SAM Test Framework. Outline  SAM Test Framework  Update on Job Submission Timeouts  Impact of Condor and direct CREAM tests.
Wahid, Sam, Alastair. Now installed on production storage Edinburgh: srm.glite.ecdf.ed.ac.uk  Local and global redir work (port open) e.g. root://srm.glite.ecdf.ed.ac.uk//atlas/dq2/mc12_8TeV/NTUP_SMWZ/e1242_a159_a165_r3549_p1067/mc1.
N EWS OF M ON ALISA SITE MONITORING
WLCG GDB, CERN, 10th December 2008 Latchezar Betev (ALICE-Offline) and Patricia Méndez Lorenzo (WLCG-IT/GS) 1.
Maarten Litmaath (CERN), GDB meeting, CERN, 2006/02/08 VOMS deployment Extent of VOMS usage in LCG-2 –Node types gLite 3.0 Issues Conclusions.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Nagios for Grid Services E. Imamagic, SRCE.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Site Monitoring with Nagios E. Imamagic,
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
VO Box Issues Summary of concerns expressed following publication of Jeff’s slides Ian Bird GDB, Bologna, 12 Oct 2005 (not necessarily the opinion of)
ALICE DATA ACCESS MODEL Outline 05/13/2014 ALICE Data Access Model 2  ALICE data access model  Infrastructure and SE monitoring.
+ AliEn site services and monitoring Miguel Martinez Pedreira.
Patricia Méndez Lorenzo (CERN, IT/GS-EIS) ċ. Introduction  Welcome to the first ALICE T1/T2 tutorial  Delivered for site admins and regional experts.
Christmas running post- mortem (Part III) ALICE TF Meeting 15/01/09.
03/09/2007http://pcalimonitor.cern.ch/1 Monitoring in ALICE Costin Grigoras 03/09/2007 WLCG Meeting, CHEP.
New solutions for large scale functional tests in the WLCG infrastructure with SAM/Nagios: The experiments experience ES IT Department CERN J. Andreeva.
Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
ALICE Run 2 Readiness WLCG Collaboration Workshop Okinawa Apr 11, 2015 Maarten Litmaath CERN v1.2 1.
The GridPP DIRAC project DIRAC for non-LHC communities.
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
MW Readiness WG Update Andrea Manzi Maria Dimou Lionel Cons Maarten Litmaath On behalf of the WG participants GDB 09/09/2015.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
Probes Requirement Review OTAG-08 03/05/ Requirements that can be directly passed to EMI ● Changes to the MPI test (NGI_IT)
News from the HEPiX IPv6 Working Group David Kelsey (STFC-RAL) HEPIX, BNL 13 Oct 2015.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
WLCG Operations Coordination report Maria Dimou Andrea Sciabà IT/SDC On behalf of the WLCG Operations Coordination team GDB 12 th November 2014.
Open Science Grid Configuring RSV OSG Resource & Service Validation Thomas Wang Grid Operations Center (OSG-GOC) Indiana University.
WLCG operations ALICE T1-T2 Workshop Bergen April 18-20, 2016 Maarten Litmaath CERN-IT v1.0 1.
Pledged and delivered resources to ALICE Grid computing in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
HEPiX IPv6 Working Group David Kelsey (STFC-RAL) GridPP33 Ambleside 22 Aug 2014.
Storage discovery in AliEn
The ALICE Christmas Production L. Betev, S. Lemaitre, M. Litmaath, P. Mendez, E. Roche WLCG LCG Meeting 14th January 2009.
Status of the SL5 migration ALICE TF Meeting
Kilian Schwarz ALICE Computing Meeting GSI, October 7, 2009
Daniele Bonacorsi Andrea Sciabà
WLCG IPv6 deployment strategy
OpenSSL and Java 7 vs. 512-bit proxy keys
NGI and Site Nagios Monitoring
Multi User Pilot Jobs update
Practicals on VOMS and MyProxy
ALICE Monitoring
Latest WMS news and more
GDB 8th March 2006 Flavia Donno IT/GD, CERN
CREAM Status and Plans Massimo Sgaravatto – INFN Padova
WLCG Operations Coordination
Patricia Méndez Lorenzo ALICE Offline Week CERN, 13th July 2007
Grid status ALICE Offline week Nov 3, Maarten Litmaath CERN-IT v1.0
Central services – 2016 … Mostly stable, a few incidents
Update on Plan for KISTI-GSDC
The CREAM CE: When can the LCG-CE be replaced?
Update on SHA-2 and RFC proxy support
Short update on the latest gLite status
Torrent-based software distribution
WLCG security landscape in EGI and beyond Maarten Litmaath CERN v1
Storage elements discovery
Grid status ALICE Offline week March 30, Maarten Litmaath CERN-IT v1.1
ALICE – FAIR Offline Meeting KVI (Groningen), 3-4 May 2010
DPM releases and platforms status
TCG Discussion on CE Strategy & SL4 Move
Publishing ALICE data & CVMFS infrastructure monitoring
The LHCb Computing Data Challenge DC06
Presentation transcript:

ALICE WLCG operations report Maarten Litmaath CERN IT-SDC ALICE T1-T2 Workshop Torino Feb 23, 2015 v1.2

AliEn vs. OpenSSL (1)  AliEn needs to move to newer OpenSSL to close a potential vulnerability on the VOBOX  Despite a big effort, recent OpenSSL builds for AliEn could not be made to work with Globus legacy proxies in use today  But RFC proxies work fine  WLCG anyway wants to move to RFC proxies  standard  better supported  Ergo: now is the time to do that! 2

AliEn vs. OpenSSL (2)  Legacy proxy: $ grid-proxy-info subject : [...]/CN=proxy/CN=proxy/CN=proxy/CN=proxy issuer : [...]/CN=proxy/CN=proxy/CN=proxy identity : [...] type : full legacy globus proxy strength : 1024 bits path : /var/lib/vobox/alice/proxy_repository/[...] timeleft : 47:47:34 (2.0 days)  RFC proxy: $ grid-proxy-info subject : [...]/CN= /CN= /CN= /CN= issuer : [...]/CN= /CN= /CN= identity : [...] type : RFC 3820 compliant impersonation proxy strength : 1024 bits path : /var/lib/vobox/alice/proxy_repository/[...] timeleft : 47:44:50 (2.0 days) 3

AliEn vs. OpenSSL (3)  RFC proxies are not supported by the old gLite VOBOX (still used at 3 small sites)   They work with the current AliEn “v ”  Therefore the plan is as follows: 1.Switch each WLCG VOBOX to an RFC proxy  Everything keeps working as before 2.Make the new AliEn v the default  The gLite VOBOXes keep using the old version, but should be upgraded ASAP 4

AliEn vs. OpenSSL (4)  RFC proxies are not the default yet  That may change later this year  To switch a WLCG VOBOX to an RFC proxy one just needs to upload a new “myproxy” of that type: export GT_PROXY_MODE=rfc myproxy-init –d –n –t 48 –c 3000  Please let us know when you do that!  The proxy renewal daemon will discover and handle the change automatically 5

AliEn vs. OpenSSL (5)  The WLCG VOBOXes at CERN already run the new AliEn with RFC proxies:  More in Miguel’s talk 6

WLCG VOBOX firewall configuration  Please open the firewall only as needed:  8084/TCP from CERN and the site WN - ClusterMonitor  1093/TCP from World - MonALISA FTD server  8884/UDP from the site WN and the site SE nodes - Monitoring info  9930/UDP from the site SE nodes – Xrootd metrics  ICMP incoming and outgoing - network topology for file placement and access  Please allow inbound connectivity to port 1975 (gsissh) from CERN networks:  IPv4  /16  /16  /15  IPv6  2001:1458::/32  FD01:1458::/32 7

WLCG VOBOX VOMS configuration  Reminder: the old VOMS servers should no longer be configured anywhere  Please check /etc/edg-mkgridmap.conf :  lcg-voms2.cern.ch and voms2.cern.ch should both be present  voms.cern.ch still works today, but will soon be decommissioned  Please check /etc/vomses :  Remove *-voms.cern.ch  Documentation   8

SAM-3 Availability/Reliability computation (1)   SAM-Nagios machinery only tests CE  Mostly CREAM, a few ARC  MonALISA forwards selected metrics to SAM  VOBOX and SE tests  We now can and should include them in a new formula to determine if a site looks available/ reliable for use by ALICE  In particular this will allow sites without a CE to appear (again) in the WLCG A/R reports  Notably NDGF and OSG 9

SAM-3 Availability/Reliability computation (2)  Considerations  Sites using a CE should have at least one working.  Sites not using a CE take full responsibility of their VOBOX  it has to look OK.  In particular the AliEn proxy must be valid.  Sites with an SE should at least have it working for reading files.  The write test is allowed to fail if the SE is (almost) full  not easy to determine reliably  warning only.  Sites without an SE should set one up!  Else can only be used efficiently for some workflows. 10

SAM-3 Availability/Reliability computation (3)  Proposal for new A/R formula as of March: Computing = (any CE) || (!CE && VOBOX) Storage = all SE Value = Computing && Storage  Meaning 1.If any CE is working  Computing OK 2.If no CE is used and the VOBOX is working  Computing OK 3.If all SE at the site are working  Storage OK  A T1 has multiple (logical) SE 4.If the site has no SE  Storage OK (!)  For now…  Discussion… 11

SAM-3 Availability/Reliability computation (4)  NDGF is a special case  The computing is spread over 5 sites  There is a shared SE  One site has its own SE  SAM-3 is flexible by design  we think we can implement a reasonable A/R determination  ARC CE tests  Currently via a WMS at RAL  Direct submission probe expected soon  SAM-3 overview  see Pablo’s talk 12

Sites mostly OK - thanks! However…  VOBOX issues  CE not ready for jobs, wrong proxy being used, MyProxy running out, …  Admins please check site issues page   Subscribe to relevant notifications   Files unavailable due to SE problem  See above  Absence of “system” library on WN  HEP_OSlibs rpm was created to avoid that 13