Frontier Status Alessandro De Salvo on behalf of the Frontier group

Slides:



Advertisements
Similar presentations
Refeng Wu CQ5 WCM System Administrator
Advertisements

Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
Virtualization for the LHCb Online system CHEP Taipei Dedicato a Zio Renato Enrico Bonaccorsi, (CERN)
RAL Site Report Castor Face-to-Face meeting September 2014 Rob Appleyard, Shaun de Witt, Juan Sierra.
DELETION SERVICE ISSUES ADC Development meeting
CERN-IT Oracle Database Physics Services Maria Girone, IT-DB 13 December 2004.
CERN IT Department CH-1211 Genève 23 Switzerland t Load Testing Dennis Waldron, CERN IT/DM/DA CASTOR Face-to-Face Meeting, Feb 19 th 2009.
Tier-1 Andrew Sansum Deployment Board 12 July 2007.
CERN IT Department CH-1211 Geneva 23 Switzerland t WLCG Operation Coordination Luca Canali (for IT-DB) Oracle Upgrades.
Complete VM Mobility Across the Datacenter Server Virtualization Hyper-V 2012 Live Migrate VM and Storage to Clusters Live Migrate VM and Storage Between.
CNAF Database Service Barbara Martelli CNAF-INFN Elisabetta Vilucchi CNAF-INFN Simone Dalla Fina INFN-Padua.
Sergey Baranov: PanDA Infrastructure at CERN 3 Sep PanDA Infrastructure at CERN Status Sergey Baranov 3 Sep 2013.
Virtual Machine Movement and Hyper-V Replica
LCG Issues from GDB John Gordon, STFC WLCG MB meeting September 28 th 2010.
CERN IT Department CH-1211 Geneva 23 Switzerland t ES 1 how to profit of the ATLAS HLT farm during the LS1 & after Sergio Ballestrero.
Log Shipping, Mirroring, Replication and Clustering Which should I use? That depends on a few questions we must ask the user. We will go over these questions.
John Samuels October, Why Now?  Vista Problems  New Features  >4GB Memory Support  Experience.
ASGC incident report ASGC/OPS Jason Shih Nov 26 th 2009 Distributed Database Operations Workshop.
CVMFS Alessandro De Salvo Outline  CVMFS architecture  CVMFS usage in the.
Andrew Lahiff HEP SYSMAN June 2016 Hiding infrastructure problems from users: load balancers at the RAL Tier-1 1.
RAL Site Report HEP SYSMAN June 2016 – RAL Gareth Smith, STFC-RAL With thanks to Martin Bly, STFC-RAL.
EGI-Engage is co-funded by the Horizon 2020 Framework Programme of the European Union under grant number Federated Cloud Update.
INFSO-RI Enabling Grids for E-sciencE Running reliable services: the LFC at CERN Sophie Lemaitre
OIS Progress on Drupal pilot service ENTICE meeting, 30 th September 2010 Jarosław (Jarek) Polok IT-OIS Operating systems and Internet services.
Dynamic Extension of the INFN Tier-1 on external resources
Extending the farm to external sites: the INFN Tier-1 experience
WLCG IPv6 deployment strategy
Monitoring Evolution and IPv6
WLCG Workshop 2017 [Manchester] Operations Session Summary
EGI Operations Management Board
The Beijing Tier 2: status and plans
IT Services Katarzyna Dziedziniewicz-Wojcik IT-DB.
LCG Service Challenge: Planning and Milestones
Virtualization and Clouds ATLAS position
INFN CNAF TIER1 Network Service
Lee Lueking WLCG Workshop DB BoF 22 Jan. 2007
IT-DB Physics Services Planning for LHC start-up
ATLAS Cloud Operations
HEPiX Spring 2014 Annecy-le Vieux May Martin Bly, STFC-RAL
Andrea Chierici On behalf of INFN-T1 staff
Database Services at CERN Status Update
Elizabeth Gallas - Oxford ADC Weekly September 13, 2011
Patricia Méndez Lorenzo ALICE Offline Week CERN, 13th July 2007
Enrico Bonaccorsi, (CERN) Loic Brarda, (CERN) Gary Moine, (CERN)
Update on Plan for KISTI-GSDC
Support for IPv6-only CPU – an update from the HEPiX IPv6 WG
Generator Services planning meeting
WLCG Management Board, 16th July 2013
ATLAS Software Installation redundancy Alessandro De Salvo Alessandro
Olof Bärring LCG-LHCC Review, 22nd September 2008
WLCG Service Interventions
HPEiX Spring RAL Site Report
Update from the HEPiX IPv6 WG
AGLT2 Site Report Shawn McKee/University of Michigan
Conditions Data access using FroNTier Squid cache Server
Workshop Summary Dirk Duellmann.
Network Monitoring Update: June 14, 2017 Shawn McKee
Discussions on group meeting
GridPP Tier1 Review Fabric
HEPiX IPv6 Working Group F2F Meeting
Workflow Best Practices
Oracle Storage Performance Studies
Tech Inside Extended Document Management System (EDMS)
11/17/ :39 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Обзор Windows Azure Connect
ETHZ, Zürich September 1st , 2016
2/24/2019 7:49 PM BRK2198 Four new Azure management experiences to run your business critical applications Dushyant Gill | Jan Kalis.
CHIPP - CSCS F2F meeting CSCS, Lugano January 25th , 2018.
Pete Gronbech, Kashif Mohammad and Vipul Davda
Presentation transcript:

Frontier Status Alessandro De Salvo on behalf of the Frontier group 25-6-2019 A. De Salvo – June 25th 2019

Sites’ news: CERN CERN Investigating the possibility to upgrade to CentOS Migrate keeping the same Ips, if possible, or change the Ips Both options are viable, but keeping the old Ips would be diserable Smooth operation, no relevant problems observed since March 2

Emmanouil Vamvakopoulos Sites’ news: Lyon Lyon Stable infrastructure 4 Frontier-Lpads 3xVM 4VPU 8GB RAM 200G on CC-openstack 1 physical machine with 2 x E5-2623 v3 @ 3.00GHz 32GB RAM 2x1TB disk , (DELL R430) Frontier squid version 4.4-1.1 Frontier tomcat version 7.0.90_3.40-1 All monitoring tools configured and installed ( snmp mrtg, awstats, max-threads, filebeat to send log on ElasticSearch) Hardware Upgrade of Oracle infrastructure The 4 dedicated machines shared between DBATL ( ATRL replica) and DBAMI ( AMI ) replaced on Sep 2015 ( DELL 630 servers) with 7 year warranty The SAN storage back-end and the Fabric switches were replaced on 2014 with 5 year warranty, ( Hitachi, HS130) Plan to replace the SAN storage backend by next year Recent problems with the DB infrastructure Some isolated issues with storage backend I/O saturation on FEB 2019 The backend SAN storage is shared between among many DBs/Groups, some other oracle DB (different nodes) trigger a I/O issue Tuning performed by the DBMaster (on the other VO DB which trigger the issue) Emmanouil Vamvakopoulos 3

Sites’ news: RAL Completed migration of RAL Frontier Service from old Hyper-V to new VMWare Hypervisors [1 Apr 2019] 3 new servers on LHC-OPN network, allowing logstash monitoring through the CERN firewall All monitoring now working: MRTG, logstash (Kibana), awstats, maxthreads Added IPv6 support [4 June 2019] All running smoothly One incident [2 June 2019] when service degraded for 14 hours due to disk full on 2 of 3 servers. Fixed by enabling rotate for one logfile. RAL Frontier service supported until end of 2019. As agreed with ADC, will be decommissioned in January as Oracle is phased out at RAL. Tim Adye 4

ES/Kibana Monitoring Smooth operation New filters, enabling the parsing of the SQL queries The performance of the filters is enough to guarantee no delay in operations Very useful for debugging queries at all levels, including the analytics Lost data incident in Chicago affecting temporarly the ES/Kibana monitoring operations Solved thanks to the prompt restore of the data Data from May are not available, not clear why they haven’t been recorded, still under investigation, but it’s not a critical problem Some progress on the CMS dashboards and data collection 5

Backup Proxies and Failover monitor Backup proxies in production since a few months No problem observed and they were very useful to track down misconfigured sites or general problems AGIS cleanup completed, removing the off-site squids from the AGIS site configs Plans to inhibit the direct connections to the launchpads starting in July Need to coordinate with ADC in order to test this Failover monitor protection against corrupted awstats records added sometimes many l are added to the beginning hostname this causes increase in number of hostnames resulting in huge and hard-to-read table and long script processing time the problem is hard to debug and therefore it was decided to just put a protection into the monitoring if there are more than 2 l in the hostname, the number is measured and removed from the beginning of the hostname moving unidentified cern.ch hostnames from Unknown site to CERN-PROD site IP for hostnames of (presumably) VMs which end with cern.ch often cannot be identified now, every hostname in Unknown site which ends with cern.ch is movedto CERN-PROD site IP ranges are now used to discern sites which share geoip location IP ranges are defined in the GOCDB in site definition (or OIM equivalent) IP ranges are rarely used when it is defined, most often sites define it as 0.0.0.0/0 (which is ignored by the script as it seems all IPs belong to that range) this makes difference for very few sites right now 6

WLCG Squid Ops joint group New (joint) group of experts following up the issues shown in the monitoring pages (both Frontier and CVMFS) Michal Svatos (ATLAS), Edita Kizinevic (CMS) and Barry Blumenfeld (CMS) wlcg-squid-ops@cern.ch 7