Presentation is loading. Please wait.

Presentation is loading. Please wait.

Frontier Status Alessandro De Salvo on behalf of the Frontier group

Similar presentations


Presentation on theme: "Frontier Status Alessandro De Salvo on behalf of the Frontier group"— Presentation transcript:

1 Frontier Status Alessandro De Salvo on behalf of the Frontier group 25-6-2019
A. De Salvo – June 25th 2019

2 Sites’ news: CERN CERN Investigating the possibility to upgrade to CentOS Migrate keeping the same Ips, if possible, or change the Ips Both options are viable, but keeping the old Ips would be diserable Smooth operation, no relevant problems observed since March 2

3 Emmanouil Vamvakopoulos
Sites’ news: Lyon Lyon Stable infrastructure 4 Frontier-Lpads 3xVM 4VPU 8GB RAM 200G on CC-openstack 1 physical machine with 2 x E GHz 32GB RAM 2x1TB disk , (DELL R430) Frontier squid version Frontier tomcat version _3.40-1 All monitoring tools configured and installed ( snmp mrtg, awstats, max-threads, filebeat to send log on ElasticSearch) Hardware Upgrade of Oracle infrastructure The 4 dedicated machines shared between DBATL ( ATRL replica) and DBAMI ( AMI ) replaced on Sep 2015 ( DELL 630 servers) with 7 year warranty The SAN storage back-end and the Fabric switches were replaced on 2014 with 5 year warranty, ( Hitachi, HS130) Plan to replace the SAN storage backend by next year Recent problems with the DB infrastructure Some isolated issues with storage backend I/O saturation on FEB 2019 The backend SAN storage is shared between among many DBs/Groups, some other oracle DB (different nodes) trigger a I/O issue Tuning performed by the DBMaster (on the other VO DB which trigger the issue) Emmanouil Vamvakopoulos 3

4 Sites’ news: RAL Completed migration of RAL Frontier Service from old Hyper-V to new VMWare Hypervisors [1 Apr 2019] 3 new servers on LHC-OPN network, allowing logstash monitoring through the CERN firewall All monitoring now working: MRTG, logstash (Kibana), awstats, maxthreads Added IPv6 support [4 June 2019] All running smoothly One incident [2 June 2019] when service degraded for 14 hours due to disk full on 2 of 3 servers. Fixed by enabling rotate for one logfile. RAL Frontier service supported until end of As agreed with ADC, will be decommissioned in January as Oracle is phased out at RAL. Tim Adye 4

5 ES/Kibana Monitoring Smooth operation
New filters, enabling the parsing of the SQL queries The performance of the filters is enough to guarantee no delay in operations Very useful for debugging queries at all levels, including the analytics Lost data incident in Chicago affecting temporarly the ES/Kibana monitoring operations Solved thanks to the prompt restore of the data Data from May are not available, not clear why they haven’t been recorded, still under investigation, but it’s not a critical problem Some progress on the CMS dashboards and data collection 5

6 Backup Proxies and Failover monitor
Backup proxies in production since a few months No problem observed and they were very useful to track down misconfigured sites or general problems AGIS cleanup completed, removing the off-site squids from the AGIS site configs Plans to inhibit the direct connections to the launchpads starting in July Need to coordinate with ADC in order to test this Failover monitor protection against corrupted awstats records added sometimes many l are added to the beginning hostname this causes increase in number of hostnames resulting in huge and hard-to-read table and long script processing time the problem is hard to debug and therefore it was decided to just put a protection into the monitoring if there are more than 2 l in the hostname, the number is measured and removed from the beginning of the hostname moving unidentified cern.ch hostnames from Unknown site to CERN-PROD site IP for hostnames of (presumably) VMs which end with cern.ch often cannot be identified now, every hostname in Unknown site which ends with cern.ch is movedto CERN-PROD site IP ranges are now used to discern sites which share geoip location IP ranges are defined in the GOCDB in site definition (or OIM equivalent) IP ranges are rarely used when it is defined, most often sites define it as /0 (which is ignored by the script as it seems all IPs belong to that range) this makes difference for very few sites right now 6

7 WLCG Squid Ops joint group
New (joint) group of experts following up the issues shown in the monitoring pages (both Frontier and CVMFS) Michal Svatos (ATLAS), Edita Kizinevic (CMS) and Barry Blumenfeld (CMS) 7


Download ppt "Frontier Status Alessandro De Salvo on behalf of the Frontier group"

Similar presentations


Ads by Google