Performance Tuning Renegade Target: FUDCon Boston 2009 Version: 1.0 Contact: mmcgrath@fedoraproject.org Topics on performance tuning and web applications. Mike McGrath Fedora Infrastructure 2009-01-05
Target of Discussion Specifically as it relates to web applications How Fedora preps for a release Not so much on coding, more on the infrastructure side In like a lamb, out like a lion
NOT Performance Tuning 'Making Things Faster' An Art A quick fix Hope Your mom
Performance Tuning Metrics (investigation) Science (research) Desires (SLA / what you define / your boss) Solutions Planning (Get the fix designed) Blood, Sweat, Tears, sacrifice (cost) Metrics (Prove you fixed it!) Control (you run the servers, not the other way around)
Obligatory Einstein Quote 'Any intelligent fool can make things bigger and more complex... It takes a touch of genius – and a lot of courage to move in the opposite direction'
Fedora Release Day Key Systems Distribution Marketing Documentation
Fedora Release Day Needs (desires) Site availability Low service time (time for page load) Ease of troubleshooting
Flash Back! Problems SPOF Moin and the wiki Slow page loads / Low concurrency Hanging Goals: Availability, service time, troubleshooting
Problem 1: SPOF Issues One Server to rule them all Performance issues and no flexability Solutions Scaled out from one site, to 4 (via dns balancing) Cost Simplicity $$
Problem 2: Moin Issues: Flat File Poor performance over nfs Few caching options Solutions Move to a database backend (mediawiki) Cache Cost Simplicity Man hours
You get the idea Take previous high level views and apply them at the lowe level Quick note about queueing Little's Law N = λT N = Average length of the queue λ = Average arrival rate T = Average time request a sits waiting Quicker T's vs adding a new queue
Problem 3: Slow Page Loads (issues) VPN bottleneck (remember our new proxy servers?) Database speed High Load on app servers(not a problem, a symptom, a metric) Lots of io wait
Problem 3: Slow Page Loads (metrics) Ab – What is 'slow' Load times vs concurrency Pick your target Mod headers – Where is it slow? Sar – check machine status (proxy, app, db)
Problem 3: Slow Page Loads (science) Measurements are not problems! Create sar disk graph Demonstrate mod_headers Ab example Hits/sec actual
Problem 3: Slow Page Loads (desires) Serve every request Serve it in a defined period of time 'possible' is often a limit of $$ Faster database access (it's shared)
Problem 3: Slow Page Loads (solutions) Proxy caching Decreases service times Scales for more concurrent connections RAID10 on database Iowait, raid5 writes, logs on different disks, the works my-large.cnf Mediawiki caching decreases reads from db
Problem 3: Slow Page Loads (cost) Simplicity $$ (new db server) Caching increases risk of stale content being shown
Problem 4: Hanging (Issues) Pages don't load Can't log in Shell not coming up Dom0 still responsive
Problem 4: Hanging (Metrics) ab sar -W sar -r Other questions What does 'unresponsive' mean? (console timeout)
Problem 4: Hanging (Science) Swap vs swapping (measurement vs activity) Apache and memory usage http://mmcgrath.fedorapeople.org/csi/html-single/
Problem 4: Hanging (Desires) Limit failed requests (via MaxClients and a queue) Queue == memory Shell access via ssh or console that is responsive (swap) Providing predictable growth (hits/s / server) Provide 'in meeting estimates' Impress your boss
Problem 4: Hanging (Solutions) Proper MaxClients Other tunables in sysctl Connection limits
Problem 4: Hanging (Cost) Lower hits / second Raised time to serve requests under high load (the queue) Made assumptions about load type based on release day traffic
Questions Ask anything