ADC Requirements and Recommendations for Sites

Slides:



Advertisements
Similar presentations
Network and Transfer WG Metrics Area Meeting Shawn McKee, Marian Babik Network and Transfer Metrics Kick-off Meeting 26 h November 2014.
Advertisements

Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
OSG Production Report OSG Area Coordinator’s Meeting Aug 12, 2010 Dan Fraser.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
OSG Technology Area Brian Bockelman Area Coordinator’s Meeting February 15, 2012.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Data Management: US Focus Kaushik De, Armen Vartapetian Univ. of Texas at Arlington US ATLAS Facility, SLAC Apr 7, 2014.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
LCG Accounting Update John Gordon, CCLRC-RAL WLCG Workshop, CERN 24/1/2007 LCG.
WLCG Information System Use Cases Review WLCG Operations Coordination Meeting 18 th June 2015 Maria Alandes IT/SDC.
HTCondor-CE for USATLAS Bob Ball AGLT2/University of Michigan OSG AHM March, 2015 Bob Ball AGLT2/University of Michigan OSG AHM March, 2015.
Day in the Life (DITL) Production Operations with Energy Builder Copyright © 2015 EDataViz LLC.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Configuration Data or “What should be.
Data Management at Tier-1 and Tier-2 Centers Hironori Ito Brookhaven National Laboratory US ATLAS Tier-2/Tier-3/OSG meeting March 2010.
LCG Accounting Update John Gordon, CCLRC-RAL 10/1/2007.
WLCG Information System Status Maria Alandes Pradillo, CERN CERN IT Department, Support for Distributed Computing Group GDB 9 th September 2015.
CERN LCG1 to LCG2 Transition Markus Schulz LCG Workshop March 2004.
Maria Alandes Pradillo, CERN Training on GLUE 2 information validation EGI Technical Forum September 2013.
ALICE WLCG operations report Maarten Litmaath CERN IT-SDC ALICE T1-T2 Workshop Torino Feb 23, 2015 v1.2.
Panda Monitoring, Job Information, Performance Collection Kaushik De (UT Arlington), Torre Wenaus (BNL) OSG All Hands Consortium Meeting March 3, 2008.
SRM v2.2: service availability testing and monitoring SRM v2.2 deployment Workshop - Edinburgh, UK November 2007 Flavia Donno IT/GD, CERN.
The Grid Information System Maria Alandes Pradillo IT-SDC White Area Lecture, 4th June 2014.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI MPI VT report OMB Meeting 28 th February 2012.
COPA Rollover How to successfully complete the COPA School Year End Rollover from to
Daniele Bonacorsi Andrea Sciabà
WLCG IPv6 deployment strategy
WLCG Workshop 2017 [Manchester] Operations Session Summary
Project Management: Messages
Computing Operations Roadmap
ESS (Employee Self Service) Assistance
Bob Ball/University of Michigan
NGI and Site Nagios Monitoring
Virtualization and Clouds ATLAS position
ATLAS Grid Information System
Hiring Manager onboarding
Future of WAN Access in ATLAS
Key Activities. MND sections
Microsoft WorkSpace Step by Step Guide January 2017.
POW MND section.
Flavia Donno CERN GSSD Storage Workshop 3 July 2007
Short term improvements to the Information System: a status report
Web-Time Entry (WTE) Training
CREAM Status and Plans Massimo Sgaravatto – INFN Padova
Elizabeth Gallas - Oxford ADC Weekly September 13, 2011
Evolution of SAM in an enhanced model for monitoring the WLCG grid
How to enable computing
Taming the protocol zoo
SRM2 Migration Strategy
Experiment Dashboard overviw of the applications
Readiness of ATLAS Computing - A personal view
Marian Babik Andrea Sciabà
MISSION POSSIBLE:  Migrating to Oracle’s Planning and Budgeting Cloud Service Bob Usset, EPM Manager © 2016 eCapital Advisors, LLC.
Advancements in Availability and Reliability computation Introduction and current status of the Comp Reports mini project C. Kanellopoulos GRNET.
Update from the HEPiX IPv6 WG
Summary from last MB “The MB agreed that a detailed deployment plan and a realistic time scale are required for deploying glexec with setuid mode at WLCG.
NET2.
Outline Introduction Objectives Motivation Expected Output
WLCG Demonstrator R.Seuster (UVic) 09 November, 2016
WLCG and support for IPv6-only CPU
Discussions on group meeting
Main Memory Management
Monitoring of the infrastructure from the VO perspective
ESS - Time & Labour - Employee
Leigh Grundhoefer Indiana University
Unemployment Insurance Agency Michigan Web Account Manager
EGEE Operation Tools and Procedures
Technical Outreach Expert
Site availability Dec. 19 th 2006
IPv6 update Duncan Rand Imperial College London
Enhanced agent workspace for messaging
Presentation transcript:

ADC Requirements and Recommendations for Sites Bob Ball and Wei Yang March, 2017 OSG-AHM San Diego, March 2017

Some Goals For ATLAS Sites Unify and simplify treatment of all ATLAS sites Complete transition to HTCondor-CE Eliminate legacy protocols BDII Lcg-utils Automate APF usage, thereby simplifying manpower needs Migration of USERDISK to SCRATCHDISK Preparation for SL7 Other interesting trends OSG-AHM San Diego, March 2017

ATLAS Site Simplification The goal is to Unify and Simplify treatment of all ATLAS sites Naming of “Resources” in a more common and somewhat standardized ways Eg, Panda Queue naming All sites should create AGIS entries in the same way and with the same conventions Complex, multi-location sites (eg, NET2, SWT2) may require close coordination with Ale DiG The transition to HTCondor-CE has been pushed for over a year now. Can everyone just please do this? Without further delay? OSG-AHM San Diego, March 2017

Eliminate the BDII This primarily affects SAM tests Implement ETF default in AGIS and auto set it to queue with pq_is_default=1 & pq_capability=score (sites can change this manually if we don’t pick the right queue from start) Add etf_default flag to ce_resources in the VO feed (to propagate the information, so we can configure ETF), see https://twiki.Cern.Ch/twiki/bin/view/EGEE/vofeeddoc for details As was agreed in the last IS TF, add nonprod=true/false flags to services in the VO feed, marking non- production services. I would take is_monitored=0 and status != production for this as a start. ETF will anyway still monitor all services, it’s just that SAM3 won’t consider them for reports (corner cases can be followed up during validation) and we can drop the existing code we have for this. Site admins should check their AGIS queues for these settings and adjust as appropriate Site validation cannot begin until the first 2 points are completed Historically this has taken 6-12 weeks to then complete OSG-AHM San Diego, March 2017

Eliminate the BDII Status update, March 6, 2017 We implemented a flag in AGIS (etf_default) and a corresponding change has been done to the SAM ETF probe. However, the change is applied only for pre-production. Currently, validation of the new system is being done. It may take additional two or more weeks to make sure everything works. See https://its.cern.ch/jira/browse/ADCINFR-33 OSG-AHM San Diego, March 2017

Dropping of Lcg-utils OSG no longer provides the lcg-utils Still available from EPEL New mover controls do not support them Of course, lsm could still use lcg-utils suite, but…. The ADC considers this a closed issue OSG-AHM San Diego, March 2017

Apf and queues The new, automated APF will pull information from AGIS queues No more use of manual setups at BNL (for US) OSG 3.3.21 recently released with many enhancements for this [Resource entry CHANGEME] section in 30-gip.Ini Osg-configure pulls info directly from this section as input to AGIS via OSG GOCDB See google doc for examples of how to configure this at your site https://docs.google.com/document/d/1D-Z3_FTKfPVKZDe-WRsbHcc62pcj_ZPqrbVz6ULhW_Q/edit All site admins should follow up on this ASAP Interesting Note: Time granularity for new AGIS/switcher2 mechanism to check downtimes in GOCDB is about 20-30 minutes. Be wary of when you change the duration of an outage Working on correct mapping of GOCDB downtime names to AGIS downtime names OSG-AHM San Diego, March 2017

Migration of USERDISK to SCRATCHDISK Armen will coordinate with sites During switchover, following an initial boost space token sizes will gradually swap between the two Change is likely to be a simple reorder of the “Associated DDM Storages“ on the queues https://its.cern.ch/jira/browse/ADCINFR-38 OSG-AHM San Diego, March 2017

Preparation for sl7 There is no rush to this, but I don’t believe there are now any reasons to hang back Singularity will allow SL6 images to run natively inside an SL7 WN https://its.cern.ch/jira/browse/ADCINFR-11 on CentOS7 WNs readiness https://twiki.cern.ch/twiki/bin/view/AtlasComputing/CentOS7Readiness OSG-AHM San Diego, March 2017

Other new trends from ADC Things worth paying attention Container technology singularity-2.2.1-1.osgup.el6.x86_64 is available from osg-forthcoming repo Ipv6 Object store such as ceph Efforts underway at multiple sites Globus online for data transfer Making srm optional Store caching AFS free (CERN not ADC) OSG-AHM San Diego, March 2017

Some Interesting Links A list of registered ADC Infrastructure issues is here https://its.cern.ch/jira/projects/ADCINFR/issues/ADCINFR-28?filter=allopenissues ADC Technical Coordination Board meetings every Monday 4pm CERN Time Vidyo room: ADC_Technical_Coordination_Board ADC Weekly meeting every Tuesday 3:40pm CERN Time Vidyo room: ADC_Weekly OSG-AHM San Diego, March 2017