Anything But, Troubleshooting when it’s not SQL Server Ben DeBow CEO Fortified Data
Specialties / Focus Areas About Ben DeBow Founder and CEO of Fortified Data bdebow@fortifieddata.com @BBQSQL www.FortifiedData.com Specialties / Focus Areas Scaling Systems Performance Tuning Private/Public Clouds Infrastructure Design BBQ – Smoking out High Availability/Disaster Recovery Health And Efficiency Consolidated Architectures Deep-dive Troubleshooting BBQ Judge – Tasty Job 2 |
Identifying the Problem Application Stack and SQL Top Non-SQL Issues Agenda Identifying the Problem Application Stack and SQL Top Non-SQL Issues Supporting Complex Environments
Classic Issue Application Owner: DBA: DBA My application is not responding and I think it the database. DBA: I just looked and nothing is happening on SQL Server. But my issue is still happening, can you check again? DBA I just did but let me now engage the other infrastructure teams. Ben
The Game - Clue Who did it? Where? How? When? Mrs. White Library Candlestick When? Last week Ben
Define the Problem To often, we jump in Clearly define the problem Ask the right questions Understand the impact Business and users Data and systems Ben
Fundamentals Talk facts – hard data only Rely on fundamentals To many times, hypotheticals Rely on fundamentals Eliminate and minimize options Do not get tunnel vision Ben
Today Applications Infrastructure No longer 2 or 3 tier Multiple Languages Thousands or millions of users Infrastructure No longer physical No longer big, physical servers Separate groups for each technology Ben
Know the Environment Configuration Need to know current state!! BIOS and Hypervisor Operation System SQL Server Supporting Infrastructure Disk Subsystem Server Hardware Network Topology Need to know current state!! Ben
Know the Environment OLTP versus OLAP Batch Processes, Reporting Ratio – Many mixed environments Batch Processes, Reporting High level application architecture Interfaces Scheduled events Environment (virus updates, patching, maintenance) Application Ben
Application Stack SQL Server Web Cache Web Cache Load Balancer Web Ben Web Application Web Application
Virtualization Issues SQL Server looks fine but batch process is running hours longer after going virtual. DEMO Demo 1 Host level contention is impacting the workload How do we see it Talk about not seeing large CPU consumers and have high scheduler pressure inside of SQL Server Look for C-stop times CPU Ready Time Demo 2 DRS is set to aggressive ESX hosts over allocated day 1 Very high number of vMotions SQL VM will see different run times at the workload level May see the scheduler pressure or slight slowness when the vMotion is occurring
CPU Issues Scale application and process 50,000+ trans/sec – but hitting ceiling and seeing delays. DEMO BIOS configuration C States All Disabled C2 and C3 disable Power settings Memory CPU Spec CPU for workload Non-licensed CPUs Non-licensed CPUs with multiple instances Lower frequency Older generation – DL980 versus DL580
Slow IO but there is a large SAN with 100% solid state drives. Storage Issues Slow IO but there is a large SAN with 100% solid state drives. DEMO Demo 1 Storage contention Lower IOs but higher latency All the time or periodically? Demo 2 Bus saturation Higher IOPS or more importantly – high Disk Bytes/sec Divide by 1024 and know the configuration HBA or network card speed What PCI slot is it slotted in? Misconfiguration iSCSI – jumbo frames PVSCI versus LSI Pathing Block Size Thin versus thick provisioning – old school IO Affinity
Reports running slow but I have tuned the stored procedures. Network Issues Reports running slow but I have tuned the stored procedures. DEMO Demo 1 Report returning 1GB of data Async_Network_IO times OLEDB waits Demo 2 Network transfer times – Latency DR replication
DEMO Shared Server Issues My application is slow and fast for the same processes to run during the day. DEMO Demo 1 Shared server scenario SSRS, SSAS, SSIS – 12 different servers Each DE configured with most of the memory CPU affinity set Application components Competing local resources
Application calls take a long time and the code is tuned and fast. Application Issues Application calls take a long time and the code is tuned and fast. DEMO Demo 1 High number of round trips Show how to capture the data Application cursors
Root Cause Analysis Defines Create for critical events Issue Timeline Findings Facts only Next Steps and Recommendations Create for critical events Ben
Troubleshooting and Support Monitor full stack Validate the configuration Leverage certified builds Right size the full stack Other application VMs on ESX Capacity planning Application profiling Work with other teams to solve issues Ben
Summary Define the issue clearly Facts and hard data only Validate the configurations Even if built from templates Work with other teams High level about other technologies Monitor and historical data collection Ben
Questions