Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mailboxes and MySQL at Zimbra

Similar presentations


Presentation on theme: "Mailboxes and MySQL at Zimbra"— Presentation transcript:

1 Mailboxes and MySQL at Zimbra
Boris Burtin Dan Karp 1

2 Overview Zimbra Collaboration Server architecture
Why we chose MySQL and InnoDB Backup/restore implementation Performance monitoring Database problems we've had to solve Questions

3 What is ZCS? Zimbra Collaboration Server Email server AJAX web UI
SOAP API POP/IMAP Calendaring Contacts

4 ZCS Architecture OpenLDAP Accounts and server settings SOAP/HTTP ZCS
Java Jetty MySQL 5.0 Mailbox and message metadata Filesystem RFC 822 message content POP/IMAP Lucene Index data for search Horizontal scaling: each ZCS server has its own file store, MySQL, and OpenLDAP

5 Why Zimbra ♥ MySQL and InnoDB
The best combination of features, performance, robustness, cost, and ease-of-use available today Simple things are simple. Don't need a DBA on staff. Transactions: need to roll back db ops if LDAP or filesystem ops fail MySQL has a world-class support team Huge user base and online community means you can usually find the answers you need with an internet search Open source: we can figure out how the internals work when we need to

6 Server size and throughput
Sample ISP numbers (per server): 95k mailboxes 11M messages 530k messages delivered per day 464k SOAP requests per day 6.4M POP requests per day Hardware HP 460c quad-core servers, 32GB RAM Hitachi storage array

7 Backup, restore, crash recovery
ZCS has its own redo log infrastructure We don't use MySQL binlog or replication Mailbox operations affect the database, filesystem, index, and LDAP Database recovery alone is not enough Full backup dumps the filesystem, database (SELECT INTO OUTFILE) and LDAP Restore Loads most recent full backup Replays subsequent redo log operations ZCS redo log is also used to play uncommitted operations during crash recovery

8 Performance monitoring
Data that we collect CPU utilization (system and per-process) Disk utilization and throughput per device Process memory consumption Client request count and server response time InnoDB buffer pool hit rate InnoDB pages read/written

9 Performance monitoring
More data that we collect Message delivery rate and speed Slow query count Active connections per client protocol Java VM garbage collection Monitoring interfaces: JMX, SOAP

10 Fun database problem #1: tags
A user can set up to 64 tags on a message Tags are stored as a 64-bit bitfield Violates 1st Normal Form, but... Optimized for storage: all tags stored as a single long integer Optimized for performance: avoids joins and subqueries Makes AND/OR queries easy Problem: bitwise operations in a WHERE clause result in a bad execution plan

11 Fun database problem #1: tags
How to search for a tag combination and use an index? In practice, users have a small number of distinct tag combinations (I have 80) The server knows all possible tag combinations for each mailbox When a user does a tag search, the server determines which tags match and generates a "tag IN (t1, t2, ...)" query

12 Fun problem #2: inbox query
Select the latest messages in the inbox folder, sorted by date: mysql> EXPLAIN SELECT id, ... -> FROM mboxgroup34.mail_item -> WHERE mailbox_id = 134 -> AND type = 5 -> AND flags IN (0, 1, 2, 3, ) -> AND folder_id = 2 -> ORDER BY date DESC LIMIT 0, 33\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: mail_item type: ref possible_keys: ... key: i_flags_date key_len: 4 ref: const rows: 16427 Extra: Using where; Using filesort (mailbox_id, flags, date) Not so good

13 Fun problem #2: inbox query
Solution: force MySQL to use the index on (mailbox_id, folder_id, date) Mailbox has 128k messages, 37k in INBOX LIMIT returns a small subset of rows Do an index sort first, then select data mysql> EXPLAIN SELECT id, ... -> FROM mboxgroup34.mail_item -> FORCE INDEX (i_folder_id_date) -> WHERE mailbox_id = 134 -> AND type = 5 -> AND flags IN (0, 1, 2, 3, ) -> AND folder_id = 2 -> ORDER BY date DESC LIMIT 0, 33\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: mail_item type: ref possible_keys: i_folder_id_date key: i_folder_id_date key_len: 9 ref: const,const rows: 43512 Extra: Using where (mailbox_id, folder_id, date) Row count is misleading Better

14 Problem #3: data partitioning
How to store data for many mailboxes with maximum stability and performance? Initial design: all mailboxes in one database Pros Purest, simplest design Cons Huge mail_item table Filesystem corruption means all mailboxes are hosed

15 Problem #3: data partitioning
Second design: database per mailbox Pros Filesystem corruption in one mailbox doesn't affect others Small table size Cons Limit of 32k databases imposed by ext3 and other filesystems (MySQL bug 20796) InnoDB doesn't scale with large schemas (MySQL bug 20877). 2-3GB of table metadata for 100k tables. InnoDB index dives 1MB of data read when a database is first accessed: > 1GB of data read per 1000 mailboxes Index dives recur when a table grows by 10%

16 Problem #3: data partitioning
Final design: mailbox groups Mailboxes distributed among 100 databases About 900 tables total Not the smallest table size, but 100's of thousands of rows per table is much better than millions Result: good performance, InnoDB reads are under control

17 Wrap-up Zimbra is a happy MySQL customer 40M mailboxes deployed worldwide Existing issues IMAP performance Slow migration Would really help if we could add/drop columns without rewriting the whole table Taming buffer pool requirements Hopefully InnoDB plugin table compression in 5.1 will help Questions?


Download ppt "Mailboxes and MySQL at Zimbra"

Similar presentations


Ads by Google