SQL Server on Linux Troubleshooting tips and tricks Suresh Kandoth SQL Server on Linux Troubleshooting tips and tricks
What exactly are you troubleshooting? https://cloudblogs.microsoft.com/sqlserver/2016/12/16/sql-server-on-linux-how-introduction/
diagnostic info & basic logs Identify installed software SQL Server packages -> yum info mssql-server / dpkg -l mssql-server select @@version systemctl status mssql-server SQL Server logs default location -> /var/opt/mssql/log Database engine -> errorlog Database engine -> default trace [log*.trc] & default system session [system_health*.xel] SQL Agent -> sqlagent.out Operating System logs centralized system logging -> journalctl kernel messages -> dmesg application or daemon logs -> syslog / messages Other key logs-> /var/logs
permissions & accounts binary /opt/mssql/ ownership = root fixed path data /var/opt/mssql/ ownership = mssql customizable config service daemon process startup context mssql config file
troubleshooting file activity client application, sql server error log, journalctl error messages specific process, specific path, typical syscalls tracing errors - strace writes & reads normal activity - strace
troubleshooting memory consumption free / sar swap file OOM Killer settings operating system pidstat mssql-conf [memory.memorylimitmb] multiple components inside single process Process DMV [dm_os_process_memory, dm_os_memory_nodes, dm_os_memory_clerks] DBCC MEMORYSTATUS SSMS Memory consumption report Extended Event [page_allocated and page_freed] sp_configure ‘max server memory’, memory grants [RG/hints] database engine
Performance monitoring OS diagnostics overview of system (top / vmstat / uptime / pmstat); process view (pidstat) cpu (mpstat); memory (free / sar –r); disk (iostat / iotop); network (sar –n) long term collection: sar or pcp collectd/InfluxDB/Grafana [mssql-monitoring] telegraph [cross platform monitoring agents for SQL Server] SQL diagnostics Activity monitor [SSMS] Performance dashboard [SSMS] Dashboard insights [sqlops] Extended Events
db engine performance troubleshooting Query Store runtime statistics plans used at different times wait categories DMV’s queries [sys.dm_exec_requests & sys.dm_exec_query_stats] waits [sys.dm_os_waiting_tasks & sys.dm_os_wait_stats] PSSDIAG and SQL Nexus utility unified data collection Linux collector on GitHub import data into database for analysis sys.dm_os_performance_counters can be used to query performance counters of db engine
deeper investigations sqldumper.exe exceptions in db engine dbcc stackdump dumptrigger / -y / create_dump event Memory dumps gdb or paldumper exceptions in sql pal generate-sql-dump.sh <pid> <target_dir> Core dumps Track activity of SQLPAL components /var/opt/mssql/logger.ini Trace written to log file PAL logger
Availability Groups pcs cluster status pcs cluster start / stop cluster management pcs resource show --full pcs status / crm_mon pcs resource update resource management pcs resource move pcs resource clear pcs resource unmanage failover resources
high availability logs /var/log/cluster/ or /var/log/corosync/ corosync.log /var/opt/mssql/log/ errorlog var/opt/mssql/log/ AlwaysOn_health*.xel
/var/log/sssd KRB5_TRACE AD authentication SPN configuration (setspn) keytab file (ktutil) kerberos (id, kinit, klist) networking issues SSSD+NSS /var/log/sssd KRB5_TRACE
Thank you!
PSSDIAG for SQL Server on Linux shell script based unified data collection SQL Server instances host instance container instances OS distributions RHEL, Ubuntu & SLES customize & extend analyze data using same tools [SQL Nexus] framework performance static logs memory dumps scenarios OS data points performance data using sysstat configuration information common OS logs SQL data points perfstats scripts DMV’s & catalog views SQL performance counters extended events SQL log files data collection start_collector.sh stop_collector.sh scenario config [*.scn files] control scripts https://github.com/Microsoft/DiagManager/blob/master/LinuxPSSDiag/Readme.txt
Live Monitoring : mssql-monitoring https://blogs.msdn.microsoft.com/sqlcat/2017/07/03/how-the-sqlcat-customer-lab-is-monitoring-sql-on-linux/
PAL logging Content of /var/opt/mssql/logger.ini to trace libos activity Content of /var/opt/mssql/logger.ini to trace kerberos activity [Output.FileOutput] type=File filename=/var/opt/mssql/log/palstart_$(_pid).log [Logger] level=silent [Logger.libos.trace] level=debug outputs=FileOutput [Output.FileOutput] type=File filename=/var/opt/mssql/log/palkerb_$(_pid).log [Logger] level=silent [Logger.security.kerberos] level=debug outputs=FileOutput trace output is captured in /var/opt/mssql/log/pal*.log; don’t leave the trace running, can get very big