Alejandro Álvarez on behalf of the FTS team The FTS Case Alejandro Álvarez on behalf of the FTS team
Introduction about FTS Implements low-level transferring for LHCb, ATLAS and CMS And a few other smaller Multi-level, fair share transfer scheduler Maximize resource usage & congestion avoidance Multiple protocol support Support for recall from tape
Introduction about FTS Experiments need to copy a big number of files between sites They send the list to FTS3 FTS3 decides how many transfers to run (optimizer) and how to share them (scheduler) FTS3 runs the transfer when suitable Messages are sent for asynchronous clients and for monitoring
MySQL in FTS3 FTS3 uses MySQL to keep the queue, and the state of each transfer When scheduling, need to get them from there On changes, need to update the DB For the optimizer and monitoring views, need to aggregate One database used by a few hosts
MySQL in FTS3 Performing well with the DB is necessary for a well performing service MySQL could be quite stressed (80% CPU usage wasn’t rare) Architecture changes were (are) considered, but that’s very hard! Can’t take years to make things better The DB was a “low” hanging fruit
MySQL in FTS3 Some ideas were already in place Each node only access an even subset of the tables Avoids contention
Today Architecture is still the same CPU usage now between 14% and peaks of 50% Way better! We can do more with the same What changed?
Step 1: Disable backups Yes, really DboD scheduled backups We can afford it Recovering from 23 hours old backup is worse than non recovering at all (for us!) Was damaging us Symptom: blocked queries and a “FLUSH TABLES WITH READ LOCK;”
Step 1: Disable backups From time to time we would see MySQL (thus, FTS3) deadlocking A massive query Q1 may be read-locking table T1 mysqldump tries to get a global lock, blocking updates first, but then mysqldump gets blocked by Q1 All updates get blocked until Q1 is done
Step 1: Disable backups From time to time we would see MySQL (thus, FTS3) deadlocking A massive query Q1 may be read-locking table T1 mysqldump tries to get a global lock, blocking updates first, but then mysqldump gets blocked by Q1 All updates get blocked until Q1 is done
Step 1: Disable backups Don’t do this! Again, we can afford it Nice to know this can happen You may be able to live without single transaction dumps Or reconsider the long queries Or use master-slave replication
Step 2: Profile database Slow queries Archival of old jobs is the most time consuming part! Unused indexes Archive tables… hum Thanks to the DB people!
Step 3: Reconsider engine type ARCHIVE is better for data rarely read, never modified Not indexed Low disk footprint, fast INSERT Perfect for the archive tables!
Step 4: Low hanging fruit Drop unused and redudant fields Smaller reads Reconsider column types Reconsider index types
Step 4: Low hanging fruit Reconsider column types Some string fields (state) could be enums 1 byte vs ~O(10) Indexed! Adding is cheap, deleting/renaming is expensive Some string fields could be booleans And others could be shorter
Step 4: Low hanging fruit Reconsider index type BTREE vs HASH index type HASH only supported by MEMORY engine Nevermind, then
Step 5: Slow queries Gives a hint at what indexes to add Look at EXPLAIN <query> Indexes improves SELECT, hurt INSERT/DELETE, and maybe UPDATE
Step 6: Redundant indexes (a, b, c) covers a a,b a,b,c (a,b,c) is redundant with (b,c,a), but (b,c) may be needed, and (a,b) never
Step 6: Redundant indexes Very coupled with which queries are run Queries can be reworded to match index Or an index can be added to match the query More queries using the same index => less indexes => good Harder, because you need to move queries and indexes in lockstep
Step 7: Rewrite queries Multiple nodes may pick the same entry SELECT FOR UPDATE is bad, locks the record for potentially a long time SELECT job FROM t_job WHERE running=0 FOR UPDATE UPDATE SET running = 1 WHERE job = X Rather, UPDATE WHERE + affected rows SELECT job FROM t_job WHERE running=0 UPDATE t_job SET running = 1 WHERE job = X AND running = 0 mysql_affected_rows() > 0
An example Retrieve files to recall from tape was very slow Reading way more than needed # Attribute pct total min max avg 95% stddev median # ============ === ======= ======= ======= ======= ======= ======= ======= # Count 0 450 # Exec time 74 395835s 10s 24060s 880s 3781s 2352s 113s # Lock time 0 95ms 115us 2ms 211us 515us 161us 159us # Rows sent 1 263.24k 0 1000 599.01 964.41 440.36 964.41 # Rows examine 99 94.44G 141.27k 5.16G 214.91M 1006.04M 566.21M 23.50M # Query size 1 377.26k 855 859 858.47 833.10 0 833.10
An example The original query had a DEPENDENT SUBQUERY, which blows the number of rows read It degraded with time, ending N2 To fix it, had to consider both query an index Many iterations of EXPLAIN and rewrite Managed to drop the nested query Turned into a self-JOIN
An example To make queries easier, wrapped the self-join into a view 2 PRIMARY + DEPENDENT SUBQUERY went away, replaced with three SIMPLE Added an index to make it better
An example Exec time: 395 835s => 13 490s # Attribute pct total min max avg 95% stddev median # ============ === ======= ======= ======= ======= ======= ======= ======= # Count 1 316 # Exec time 2 13490s 10s 128s 43s 76s 21s 42s # Lock time 0 151ms 227us 14ms 476us 384us 1ms 287us # Rows sent 3 308.19k 609 1000 998.68 964.41 20.91 964.41 # Rows examine 3 1.23G 42.99k 6.80M 4.00M 5.71M 1.10M 4.06M # Query size 0 104.04k 334 338 337.15 329.68 0 329.68 Exec time: 395 835s => 13 490s No way around knowing your queries, and iterating
TODO: UUID are terrible keys 36 characters InnoDB stores the primary key on all secondary keys It is randomly distributed which is actually bad Scattered writes Fragmentation See Percona’s blog post on it