Download presentation
Presentation is loading. Please wait.
Published byBelinda Ford Modified over 8 years ago
1
www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE www.egi.eu EGI-InSPIRE RI-261323 EGI 2 nd level support training Marian Babik, David Collados, Wojciech Lapka, Pedro Andrade, Paloma Fuente, Jacobo Tarragon (CERN) Emir Imamagic (SRCE) Christos Triantafyllidis (AUTH)
2
www.egi.eu EGI-InSPIRE RI-261323 Introduction Aim –provide detailed technical overview of SAM improve understanding how the system works help you to solve most common issues –get feedback from 2nd level Approach: –overview of architecture –per component (3 slides) configuration, debugging what are the most common issues, how to resolve them
3
www.egi.eu EGI-InSPIRE RI-261323 Introduction GGUS 2 nd level –69 tickets GGUS 3 rd level –249 tickets
4
www.egi.eu EGI-InSPIRE RI-261323 Disclaimer many internal/development APIs will be shown they can change anytime and shouldn’t be considered public public API is documented at: –https://tomtools.cern.ch/confluence/display/S AMDOC/Web+Services+Specification
5
www.egi.eu EGI-InSPIRE RI-261323 Terminology service – endpoint (hostname, port) service flavour – service type (GOCDB) profile – set of tuples (flavour, metric, vo, fqan) status – discrete state (one of ok, critical, warning, unknown) availability – time period for which status was ok (- downtime) reliability – availability (+ downtime)
6
www.egi.eu EGI-InSPIRE RI-261323 SAM Architecture
7
www.egi.eu EGI-InSPIRE RI-261323 SAM Architecture
8
www.egi.eu EGI-InSPIRE RI-261323 ATP - Configuration atp_synchro.conf : main configuration file –debug level –external data sources location (GOCDB, CIC, VOMS, etc) –location of vo feed and roc configuration files –synchronizer selector atp_db.conf : database connection configuration atp_logging_files.conf : location of log configuration file atp_logging_parameters_config.conf : log configuration roc.conf : list of enabled regions vo_feeds.conf : list of enabled vo feeds All configuration files are based on key-value pairs Default configuration structure distributed in ATP package 8
9
www.egi.eu EGI-InSPIRE RI-261323 ATP - Debugging Log of last execution: /var/log/atp/atp.log Log of all executions: /var/log/atp/atp_full.log (with logrotate) Errors are also sent to system logging Six levels of debugging: –CRITICAL, ERROR, WARNING, INFO, DEBUG, NOTSET –Default configuration is on INFO (20) Standard log file line: –“2012-03-22 15:24:02,308 - ATP - INFO - CIC - Execution – Starting” –CIC: synchronizer name (e.g. CIC, GOCDB Topology, VOFeeds, etc) –Execution: task type (e.g. configuration, validation, execution) –Starting: action description ATP_sync probe POEM/NCG calls (for all non-deleted VOs): –localhost/atp/api/search/servicemap/json?vo= &ismonitored=on 9
10
www.egi.eu EGI-InSPIRE RI-261323 ATP – Common Issues A line-by-line analysis of atp.log allows to understand 99% of the problems with atp synchronizer ATP synchronizes data from several distinct external data sources. Sometimes ATP execution fails due to “invalid” or “not available” input data –Check for “Validation” tag in the log to understand which data source was not reachable or was providing invalid data ATP is based on several PL/SQL procedures/functions –If you detect ORA-* error codes please assign the ticket to 3 rd level 10
11
www.egi.eu EGI-InSPIRE RI-261323 POEM sync /etc/poem/poem_sync.ini –logging –database details –POEM_SYNC_NS_URLS – list of URLs from which to synchronize (NGI defaults to grid-monitoring, VO defaults to localhost) –POEM_SYNC_NS_RESTRICT – space separated list of namespace!profile which should be synchronized for given namespace (ch.cern.sam!ROC ch.cern.sam reasonable defaults are provided debugging –localhost/poem_sync/api/0.1/json/servicemetricinstances –localhost/poem_sync/api/0.1/json/profiles Poem_sync probe (dumps log information)
12
www.egi.eu EGI-InSPIRE RI-261323 POEM Web /etc/poem/poem.ini : main configuration file for poem web –database details –logging –namespace poem web instance, list of defined profiles, metrics –localhost/poem/api/0.1/json/profiles/ –localhost/poem/api/0.1/json/namespace/ poem web (mod_wsgi), django admin –DEBUG=True in /etc/poem/poem.ini
13
www.egi.eu EGI-InSPIRE RI-261323 POEM known issues no history –changes take effect immediately (critical profiles need to be changed at beginning of a month – PROC10) metric configuration is not integrated with poem –poem web doesn’t filter metrics in any way –no guidance in terms of dependencies, internal metrics, etc. FQAN support –if fqan is null this means results with any fqan will be accepted –local profiles with custom fqans can overwrite results of the central profiles
14
www.egi.eu EGI-InSPIRE RI-261323 NCG configuration /etc/ncg/ncg.conf –basic structure /etc/ncg/ncg outputs to /etc/nagios/wlcg.d/ log /var/log/ncg/ncg.log
15
www.egi.eu EGI-InSPIRE RI-261323 NCG debugging review /var/log/ncg/ncg.log check metric configuration –/etc/ncg-metric-config.conf –/etc/ncg-metric-config.d probes –NCGPidFile (freshness) –ncg_sync
16
www.egi.eu EGI-InSPIRE RI-261323 NCG known issues
17
www.egi.eu EGI-InSPIRE RI-261323 voms2htpasswd Authorization for Nagios Configuration files: –/etc/voms2htpasswd.conf Major configuration file –/etc/voms2htpasswd-bans.conf Banned DNs –/etc/voms2htpasswd-static.d/ Files containing list of DNs Sample entries for /etc/voms2htpasswd.conf: –atps://grid-monitoring.cern.ch/atp/api/search/contactgroup/json?groupname=NGI_HU –atps://grid- monitoring.cern.ch/atp/api/search/contactgroup/json?groupname=NGI_PL&role=Regional %20Manager –atps://grid-monitoring.cern.ch/atp/api/search/contactsite/json?sitename=KR-KISTI-GSDC- 01 Sample entries for /etc/voms2htpasswd-bans.conf and /etc/voms2htpasswd-static.d/ –/C=GR/O=HellasGrid/OU=auth.gr/CN=Christos Triantafyllidis Debugging: –Check existence of entries in: /etc/httpd/httpd.users
18
www.egi.eu EGI-InSPIRE RI-261323 Messaging config brokers: –/var/cache/msg/broker-cache-file/broker-list msg-to-handler daemon: –/etc/msg-to-handler.conf (/etc/msg-to- handler.d) Nagios probes: –org.egee.SendToMsg – publishes config and metrics –org.egee.RecvFromQueue – imports results
19
www.egi.eu EGI-InSPIRE RI-261323 MRS configuration basic configuration –mrs.conf is located at: /etc/mrs.d/mysql-mrs.conf (MySQL) /etc/mrs.d/oracle-mrs.conf (Oracle) send_to_db.ini is located at –/etc/nagios/plugins/send_to_db.ini structure: –[send_to_db] –db_uri=mrs;host=localhost –db_user=msuser –db_pwd=mspass
20
www.egi.eu EGI-InSPIRE RI-261323 MRS debugging select uts_to_w3ctime(max(check_time)) from metricdata_spool; (ORACLE) select FROM_UNIXTIME(max(check_time)) from metricdata_spool; (MySQL) latest entry in metricdata_spool, it shouldn’t be old (if too old.. maybe metrics aren’t received from messaging) select uts_to_w3ctime(max(check_time)) from metricdata; (ORACLE) select FROM_UNIXTIME(max(check_time)) from metricdata; (MySQL) latest entry in metricdata, it shouldn’t be old (if too old.. maybe metrics aren’t received from metricdata_spool) select uts_to_w3ctime(m.check_time), uts_to_w3ctime(m.insert_time), m.* from metricdata_rejected m; select FROM_UNIXTIME(m.check_time), from_unixtime(m.insert_time), m.* from metricdata_rejected m; see reason to understand why metric was rejected Nagios probes: SendToMetricStore, MrsDirSize, MrsCheckMissingProbes
21
www.egi.eu EGI-InSPIRE RI-261323 MRS known issues no known issues Basic contracts –metric is marked as REMOVED if status is MISSING and service is marked as deleted –metric is marked as REMOVED if its tuple disappears from mrs bootstrapper –metric is marked as MISSING after 24 hours statuschange_service_profile table keeps data for 12 months metricdata table keeps data for 6 months metricdata_rejected table keeps data for 1 month metricdata_latest table contains metric results newer than 7 days.
22
www.egi.eu EGI-InSPIRE RI-261323 SAM reloading /etc/rc.d/init.d/sam-sync /var/log/sam-sync.log reloads SAM: –suspends ATP, POEM –ncg.reload.sh –mrs bootstrapping –resumes ATP, POEM
23
www.egi.eu EGI-InSPIRE RI-261323 myEGI config and debug /etc/mywlcg/mywlcg.ini –database connection /var/log/httpd/error.log based on django (mod_wsgi) –you can get more explicit errors if you set DEBUG=True myegi tests myegi web service tests
24
www.egi.eu EGI-InSPIRE RI-261323 ACE - Configuration ace.conf: main configuration file –database configuration file path –logging level and configuration file path –computation_delay: used to set a maximum time for which computations can be performed. ie: Current time: 08.45 Computation delay: 15 (minutes) When calculations are performed, last period considered will end at 08.30 ace_db.conf : database connection configuration atp_logging.conf : log path and logging configuration All configuration files are based on key-value pairs Default configuration structure distributed in ACE package 24
25
www.egi.eu EGI-InSPIRE RI-261323 ACE - Debugging Log of last execution: /var/log/ace/ace.log –Used for both ace_status and ace_availability Five levels of logging: –CRITICAL, ERROR, WARNING, INFO, DEBUG –Default configuration is on ERROR (40) Logging of performed actions –Status auto-summarization (missing status calculations in the past 24h) –Regular status summarization (from last summarization to current time – delay) –Availability auto-summarization (missing availability calculations in the past 24h) –Regular availability summarization (from last summarization to current time – delay) Hourly, daily, weekly and monthly calculations for each hour, day, week and month within the period. 25
26
www.egi.eu EGI-InSPIRE RI-261323 ACE – Common Issues Availability recomputation requests –Must follow request policy:request policy caused by problems in the monitoring infrastructure requested up to 10 days after the publication of the monthly report –If coming from site admin, assign to regional operations staff policy for EGI sites and regions: https://wiki.egi.eu/wiki/PROC10https://wiki.egi.eu/wiki/PROC10 –If coming from regional operations staff, assign to 3 rd level Apparently wrong values caused by external reasons –topology issues –MRS data 26
27
www.egi.eu EGI-InSPIRE RI-261323 Documentation https://tomtools.cern.ch/confluence/displa y/SAMDOC/Homehttps://tomtools.cern.ch/confluence/displa y/SAMDOC/Home https://tomtools.cern.ch/confluence/displa y/SAMDOC/FAQshttps://tomtools.cern.ch/confluence/displa y/SAMDOC/FAQs https://tomtools.cern.ch/confluence/displa y/SAMDOC/Troubleshootinghttps://tomtools.cern.ch/confluence/displa y/SAMDOC/Troubleshooting https://tomtools.cern.ch/confluence/displa y/SAMDOC/Released+Probes
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.