Presentation is loading. Please wait.

Presentation is loading. Please wait.

Troubleshooting beyond what you understand

Similar presentations


Presentation on theme: "Troubleshooting beyond what you understand"— Presentation transcript:

1 Troubleshooting beyond what you understand
Or: How to figure out what’s broken so you can get some help from the real owner because your stuff never breaks. Right? Ryan McCauley #492 – Phoenix 2016

2 Ryan McCauley VB6/VB.NET developer for 10 years
Full-time DBA/T-SQL dev for 6 years Currently employed by Cable ONE as Data and Reporting Manager Microsoft Certified Professional (MCTS – SQL 2008 DBA) Active on Experts-Exchange and StackOverflow Blog: SQL SATURDAY | #492 | PHOENIX 2016

3 It Was a Dark and Stormy Night
Also, applications are broken somewhere… Talk about the rotating DNS issue Connections to SQL Server intermittent Information comes in slowly SQL SATURDAY | #492 | PHOENIX 2016

4 Agenda Today Ground rules Techniques Major symptoms Common confusion
Next steps SQL SATURDAY | #492 | PHOENIX 2016

5 Ground Rules SQL SATURDAY | #492 | PHOENIX 2016

6 Ground Rules Never say “randomly”, say “intermittent”
It’s not just your components Consider their interaction and what’s around intermittent is something you don't yet understand, but it always has a cause when you say "random", you're saying you can't own it because it's not in your control Given same inputs, behavior of computers is always consistent See everything as something you own and can influence – you’re not helpless SQL SATURDAY | #492 | PHOENIX 2016

7 Ground Rules Something always changed! Always!
Just maybe not in purpose Don’t take anything for granted! Both this class and in troubleshooting Monitoring only has a single perspective

8 Techniques SQL SATURDAY | #492 | PHOENIX 2016

9 Techniques Figure out what it’s not
If that’s true, what else would be true? Make the problem as small as possible Need to isolate it to prove it Does it work at all? Where can you connect from? Myers-Briggs and S (focus on resolving the examples) vs N (every example needs to fit pattern first) Small problem - You need to isolate it to prove it, especially to others Reproduce the problem in a second location with as much different as possible SQL SATURDAY | #492 | PHOENIX 2016

10 Techniques Is it consistent? Can you find somewhere it’s not broken?
Shared vs. Dedicated components VMs can dramatically complicate things Time it takes when it does run - does it vary? Is it quick or slow? same sources always broken? DAC FTP issue - 1 server takes 0.5 seconds, other 7 take seconds, even for failed login Which components are shared vs. dedicated? VMs complicate this issue because everything is shared and live migration is seamless SQL SATURDAY | #492 | PHOENIX 2016

11

12 Simplify everything! Things your service depends on How they get
to your service Your service Customers

13 Major symptom – cheat sheet
SQL SATURDAY | #492 | PHOENIX 2016

14 Major Symptoms, part 1 Never works Intermittently not accessible
Firewall or app not listening Intermittently not accessible What’s changing? Load balancer/cluster? Always slow but consistent Hardware config/resource Likely not load on shared components SQL SATURDAY | #492 | PHOENIX 2016

15 Major Symptoms, part 2 Intermittent slowness Unchanging or predictable
Hardware bottleneck or shared resource? Unchanging or predictable More likely configuration Shifting or unpredictable More likely capacity somewhere VM as shared component, harder to see the impact SQL SATURDAY | #492 | PHOENIX 2016

16 Common Confusion SQL SATURDAY | #492 | PHOENIX 2016

17 Common Confusion Login failures vs. firewall timeouts
Ever used TCPING? Know common ports! Firewall rules – when are they evaluated? If somebody says “Kerberos”, it’s probably not Ping isn’t the same as making sure the path is open! Ping doesn’t use a TCP port at all Talk about subnets/VLANs SQL SATURDAY | #492 | PHOENIX 2016

18 Slightly less dark and stormy…
Let’s approach our outage again Resolve the DNS issue If time, talk about either Firewall timeouts when we moved reporting servers (5 minutes) Mis-aligned disks on clusters = consistently slow read times SQL SATURDAY | #492 | PHOENIX 2016

19 Next Steps SQL SATURDAY | #492 | PHOENIX 2016

20 Next Steps Learn about what you don’t know
Shadowing, training, ask! Specialized knowledge not required, but can help If you don’t understand concept, ask It’s not resolved until you understand why! Root cause analysis is critical Don’t let “root cause analysis” be “it’s not happening anymore” or it resolved itself = it’s not resolved until you know it’s not going to happen again! SQL SATURDAY | #492 | PHOENIX 2016

21 Thanks for attending, and visit the sponsors!
SQL SATURDAY | #492 | PHOENIX 2016

22 Platinum Level Sponsors
Gold Level Sponsors Venue Sponsor Key Note Sponsor Pre Conference Sponsor

23 Silver Level Sponsors Bronze Level Sponsors


Download ppt "Troubleshooting beyond what you understand"

Similar presentations


Ads by Google