Download presentation
Presentation is loading. Please wait.
1
Troubleshooting beyond what you understand
Or: How to figure out what’s broken so you can get some help from the real owner because your stuff never breaks. Right? Ryan McCauley #492 – Phoenix 2016
2
Ryan McCauley VB6/VB.NET developer for 10 years
Full-time DBA/T-SQL dev for 6 years Currently employed by Cable ONE as Data and Reporting Manager Microsoft Certified Professional (MCTS – SQL 2008 DBA) Active on Experts-Exchange and StackOverflow Blog: SQL SATURDAY | #492 | PHOENIX 2016
3
It Was a Dark and Stormy Night
Also, applications are broken somewhere… Talk about the rotating DNS issue Connections to SQL Server intermittent Information comes in slowly SQL SATURDAY | #492 | PHOENIX 2016
4
Agenda Today Ground rules Techniques Major symptoms Common confusion
Next steps SQL SATURDAY | #492 | PHOENIX 2016
5
Ground Rules SQL SATURDAY | #492 | PHOENIX 2016
6
Ground Rules Never say “randomly”, say “intermittent”
It’s not just your components Consider their interaction and what’s around intermittent is something you don't yet understand, but it always has a cause when you say "random", you're saying you can't own it because it's not in your control Given same inputs, behavior of computers is always consistent See everything as something you own and can influence – you’re not helpless SQL SATURDAY | #492 | PHOENIX 2016
7
Ground Rules Something always changed! Always!
Just maybe not in purpose Don’t take anything for granted! Both this class and in troubleshooting Monitoring only has a single perspective
8
Techniques SQL SATURDAY | #492 | PHOENIX 2016
9
Techniques Figure out what it’s not
If that’s true, what else would be true? Make the problem as small as possible Need to isolate it to prove it Does it work at all? Where can you connect from? Myers-Briggs and S (focus on resolving the examples) vs N (every example needs to fit pattern first) Small problem - You need to isolate it to prove it, especially to others Reproduce the problem in a second location with as much different as possible SQL SATURDAY | #492 | PHOENIX 2016
10
Techniques Is it consistent? Can you find somewhere it’s not broken?
Shared vs. Dedicated components VMs can dramatically complicate things Time it takes when it does run - does it vary? Is it quick or slow? same sources always broken? DAC FTP issue - 1 server takes 0.5 seconds, other 7 take seconds, even for failed login Which components are shared vs. dedicated? VMs complicate this issue because everything is shared and live migration is seamless SQL SATURDAY | #492 | PHOENIX 2016
12
Simplify everything! Things your service depends on How they get
to your service Your service Customers
13
Major symptom – cheat sheet
SQL SATURDAY | #492 | PHOENIX 2016
14
Major Symptoms, part 1 Never works Intermittently not accessible
Firewall or app not listening Intermittently not accessible What’s changing? Load balancer/cluster? Always slow but consistent Hardware config/resource Likely not load on shared components SQL SATURDAY | #492 | PHOENIX 2016
15
Major Symptoms, part 2 Intermittent slowness Unchanging or predictable
Hardware bottleneck or shared resource? Unchanging or predictable More likely configuration Shifting or unpredictable More likely capacity somewhere VM as shared component, harder to see the impact SQL SATURDAY | #492 | PHOENIX 2016
16
Common Confusion SQL SATURDAY | #492 | PHOENIX 2016
17
Common Confusion Login failures vs. firewall timeouts
Ever used TCPING? Know common ports! Firewall rules – when are they evaluated? If somebody says “Kerberos”, it’s probably not Ping isn’t the same as making sure the path is open! Ping doesn’t use a TCP port at all Talk about subnets/VLANs SQL SATURDAY | #492 | PHOENIX 2016
18
Slightly less dark and stormy…
Let’s approach our outage again Resolve the DNS issue If time, talk about either Firewall timeouts when we moved reporting servers (5 minutes) Mis-aligned disks on clusters = consistently slow read times SQL SATURDAY | #492 | PHOENIX 2016
19
Next Steps SQL SATURDAY | #492 | PHOENIX 2016
20
Next Steps Learn about what you don’t know
Shadowing, training, ask! Specialized knowledge not required, but can help If you don’t understand concept, ask It’s not resolved until you understand why! Root cause analysis is critical Don’t let “root cause analysis” be “it’s not happening anymore” or it resolved itself = it’s not resolved until you know it’s not going to happen again! SQL SATURDAY | #492 | PHOENIX 2016
21
Thanks for attending, and visit the sponsors!
SQL SATURDAY | #492 | PHOENIX 2016
22
Platinum Level Sponsors
Gold Level Sponsors Venue Sponsor Key Note Sponsor Pre Conference Sponsor
23
Silver Level Sponsors Bronze Level Sponsors
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.