Download presentation
Presentation is loading. Please wait.
Published byCuthbert Rogers Modified over 9 years ago
1
GDB - February 2014 Summary Jeremy’s notes Agenda: http://indico.cern.ch/event/272618/
2
Introduction (M Jouvin) Please check 2014 meeting dates. March 12 th – CNAF Bologna (register) WLCG workshop (1 st /2 nd week July). Barcelona. Possibly 8 th -9 th July. GDB actions: https://twiki.cern.ch/twiki/bin/view/LCG/GDBActionInProgresshttps://twiki.cern.ch/twiki/bin/view/LCG/GDBActionInProgress Future (pre-)GDB topics welcome Upcoming: By introducing a pay‐per‐usage scheme as part of funding model the funding agencies will have the information to be able to measure the level of usage of a service and whether it justifies their investments. In addition, if the pay‐per‐usage model is implemented by giving some of the financial control to the users then they will favour those services which offer better value‐propositions. Site Nagios testing – any feedback? OSG Federation workshop: https://indico.fnal.gov/conferenceDisplay.py?confId=7207https://indico.fnal.gov/conferenceDisplay.py?confId=7207 HEPiX May 19-23rd May. Annecy: EGI CF 19-23rd May. Helsinki.
3
HEP SW Collaboration (I Bird) Performance now a limiting factor. CPU technology trends. More transistors but not easy to use them. Most s/w designed for sequential processing. Migrating to multi- threaded not easy. Target geant and root. Concurrency Forum est. 2yrs ago. Towards Open Scientific Software Initiative. Components such as Geant and ROOT should be part of a modular infrastructure. HEP S/W Collab: goal to build /maintain libraries… Establish a formal collaboration to develop open scientific software packages guaranteed to work together (inc. frameworks to assemble apps). Workshop 3 rd -4 th April 2014
4
IPv6 Update (D Kelsey) WG meeting 23/24 Jan 2014 (included CERN cloud and OpenStack.) Progress in various areas. CERN campus wide deployment in March (some dhcpv6 issues): http://ipv6.web.cern.ch/http://ipv6.web.cern.ch/ PerfSONAR very useful… works IC. Run dual stack? IPv6 file transfer test bed. Decayed a bit. ATLAS testing (Alastair): AGIS. Simple tests then HC. Squid 2.8 not IPv6 compatible. Plan to get mesh working again. Site deployments. Move to use SRM/FTS… Define use-cases Barrier to move for some sites if availability affected going to dual stack etc. Software survey shows 15/66 ‘services’ known to be fully compliant. Pre-release of dCache 2.8.0 has IPv6 fixes. Want to survey sites – when will they run out of IPv4 and be capable of IPv6. pre- GDB meeting in June.
5
Future of SLC (J Polok) CentOS team joining Red Hat in open standards team. Not RHEL. CentOS Linux platform is not changing Impact for SL5/6: Source packages may have to be generated from git repositories. No other changes – releases stay as now SL(C) 7 options being discussed May rebuild from source as for 5 and 6 OR create a Scientific Centos variant OR adopt Centos core. Approaches: 1. Keep process: build from source with our actual tool chain. 2. Create SIG for our variant. 3. SL become an add-on repository to CentOS core. Centos 7 Beta in preparation. RHEL7 production due in summer. Source RPMs not guaranteed after summer. Need to ensure risks for 5 and 6 covered.
6
Ops coordination report (S. Campana) Input based on pre-GDB Ops Coordination meeting. gLexec: CMS SAM test not yet critical. Still 20 sites have not deployed. perfSONAR: It is a service. Site w/o or at an old release will feature in report(s) to MB. Tracking tools evolution – Savannah to JIRA. JIRA still lacking GGUS some functionality SHA-2 migration: progress with VOMS-admin but manual process needed. New host certs soon. Machine/Job features: Prototype ready. Options for clouds being looked at. Middleware Readiness: Model will rely on experiments & frameworks + sites deploying test instances + monitoring. MB will discuss process for ‘rewarding’ site participation.
7
Ops Coordination - cont Baseline enforcement: Looking at options to monitor and then automate for campaigns WMS decommissioning: Shared/CMS instances end in April. SAM will use till June. Multi-core deployment: ATLAS & CMS different usage. Trying prototypes. Torque/Maui a concern. FTS3 deployment: FTS3 works well. Few instances needed – 3 or 4 for resilience. Experiment Computing Commissioning: Experiment plans for 2014 discussed. Conclude no need for common commissioning exercise. Conclusion – some deployment areas being escalated.
8
High memory jobs (J Templon) NIKHEF observations Which high mem problem!? Virtual memory usage in GB. Pvmem 4096MB. User jobs and some prod jobs high usage. These don’t ‘ask’ for the memory. Link multi-core and high mem. Pvmem – ulimit on process – allows handling of out-of-mem signal (not kill) Different ways to ask for more memory in job… few work. Inconsistencies arise. Situation being reviewed.
9
SAM test scheduling (Luca Magnoni) SAM: framework to schedule checks (Nagios) via dedicated plug-in (probes = scripts/executables) Categories: Public grid/cloud services (custom probes); job submission (via WMS); WNs (via job payloads). Job submission – to include direct CREAM and condor-G Remote testing assumes deterministic execution. There are granularity issues (CE vs site) and not always agreement between site and experiment views. Can test with different credentials. Jobs can timeout whe VO out of share. Site availability determined by experiment critical profiles. Most timeouts looked to be on WMS side! New Condor-G and CREAM probes for job submission coming Aim to provide web UI/API for users Looking at options to replace Nagios for scheduler Test submission via other frameworks (e.g. HC) being investigated – ATLAS want a hybrid approach, CMS do not support framework approach.
10
New transfer dashboard (A Beche) Reviewed history of data transfer monitoring. Separate web API/UI for FTS, FAX, AAA. Added in ALICE and EOS. Plan to federate. Data split into schemas: FTS, XRootD and high optimization. Data retention policies differ – raw and statistics Dashboard now aggregates over APIs Plan for a map view
11
WLCG monitoring coordination (Pablo Saiz) Consolidation group: reduce complexity; modular design; simplify ops and support; common dev and core. Need more site input. Timeline – starting to deploy. Survey & tasks. Tasks in JIRA: https://its.cern.ch/jira/browse/WLCGMON 1. Application support (for jobs, transfers, infrastructure…) 2. Running the services (moving to AI, Koji, SL6, puppet…) 3. Merging applications (SSB+SAM; SSB+REBUS; HC+Nagios…). Idea is to reduce to make maintenance easier. Many infrastructure monitoring tools - schema copes with several use-cases. http://wlcg-sam-atlas.cern.ch/ 4. Technology evaluation Nagios plug-in for sites developed by PIC SAM/SUM -> SAM3 (for SUM background see https://indico.cern.ch/event/285330/contribution/3/material/slides/1.pdf ) Next steps: https://its.cern.ch/jira/browse/WLCGMONhttps://its.cern.ch/jira/browse/WLCGMON
12
Data Preservation Update (J Shiers) Things are going well. Workshop. Increasing archive growth. Annual cost of WLCG is 100M euro. Need 4 staff: documentation; standards;.. DPHEP portal core. Digital library. Sustainable software+virtualisation tech+validation frameworks. Sustainable funding. Open data.
13
LHCOPN/LHCONE evolution workshop (E. Martelli) Networking stable. Key. Growth with technology evolution ok. New sites in areas where network under-developed. ATLAS: Expect bursty traffic. US sites-> 40/100 CMS: Mesh will increase traffic. LHCb: no specific concerns. More bandwidth needed at T2s. Connectivity needs to improve to Asia – capacity and rtts. Demands for better network monitoring & LHCONE operations. P2P-link-on-demand (over provisioning vs complexity (L3VPN))
14
perfSONAR (Shawn McKee) Sites to use “mesh” configuration Metrics will adjust over time 85% sites with PS have issues to resolve (firewalls, versions…). Likely go with MaDDash (Monitoring and Debugging Dashboard) Checking of primitive services – OMD (Open Monitoring Distribution) For test instance…. WLC*** WLC*** Context between all sites … 3.3.4 release will mean only one machine needed Alerting – high-priority but complicated
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.