Scalable Web Site Antipatterns Justin Leitgeb Stack Builders Inc.
Overview Based on architectures that have caused significant down- time and pain Like examples in Nygard's book, but more emphasis on essential rather than accidental properties of system
Anti-pattern 1: Monotonically-increasing data set with rapid growth Having a system that relies on querying all historical data Requires joins from mega-tables (hundreds of millions of rows) Often from automatically aggregated data
Detection Slow query log SHOW FULL PROCESSLIST SHOW ENGINE INNODB STATUS vmstat
Anti-solutions Partitioning Pre-caching (cron jobs) Switching to MyISAM NoSQL?
NoSQL Out-of-the box solutions with NoSQL (e.g., Mongo) help with data modeling Use CAP instead of ACID May lead to better ability to distribute algorithms But: o Haven't had as much effort yet expended on engines as MySQL (INNODB) o Often use the same algorithms (e.g., Btree indexes) o Can require more dev time (e.g., Cassandra and good implementation of distributed algorithms)
Stop the bleeding Cut off long queries Turn off site sections Fail whale
Band-aids Obvious - adding app servers, memcached, bigger DB server Adding app servers puts more pressure on DB server HTTP Caching (varnish) MySQL tuning (look for things like FILESORT) Read slaves
Solutions Hard-limit data volume - look for cases where data decreases in value with time o Add features related to scale Distributed algorithms and data stores Data warehousing
Anti-pattern 2: Allowing "risky" writes to block HTTP responses Symptoms: o Slow requests o Servers hitting MaxClients and 500 error
Possible Causes Possible causes: database backed analytics tracking Session management Any SQL DML (UPDATE, DELETE)
Risk increases with: The number of requests invoking the write operation Traffic Concurrent background operations The algorithmic complexity of the write Slow AWS I/O on EBS
Solutions Asynchronize! o Write to a queue Write to memcached or other non-ACID store o Later bring to data warehouse for advanced analytics
More info 1.Nygard, Michael T. Release It!: Design and Deploy Production-ready Software. Raleigh, NC: Pragmatic, Fowler, Martin. Patterns of Enterprise Application Architecture. Boston: Addison-Wesley, Kimball, Ralph. The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses: John Wiley & Sons Schwartz, Baron. High Performance MySQL: O'Reilly, 2008