Bill Boebel, CTO of Webmail.us & Mark Washenberger, SW Engineer at Webmail.us Creating an Email Archiving Service with Amazon S3.

Bill Boebel, CTO of Webmail.us & Mark Washenberger, SW Engineer at Webmail.us Creating an Email Archiving Service with Amazon S3

Replace your tape drives with something truly scalable

Who are we?

an email hosting company Blacksburg, VA founded in 1999 by two Virginia Tech students 54 employees 47,261 customers 476,130 email accounts 200 resellers

Amazon Web Services (AWS)‏ Infrastructure: S3 = Simple Storage Service EC2 = Elastic Compute Cloud (virtual servers) SQS = Simple Queue Service E-Commerce & Data: ECS = E-commerce Service Historical Pricing Mechanical Turk Alexa

Example Uses Data backup (S3) - Altexa, JungleDisk Content Delivery (S3) - Microsoft (MSDN Student Download program)‏ Live Application Data (S3) - 37signals Image Repository (S3) - SmugMug

Example Uses Audio/Video Streaming (EC2+S3) – Jamglue, GigaVox Web Indexing (EC2 + S3) - Powerset Development (EC2 + S3) - UC Santa Barbara

Our Use... Backing up Email Data

Backing up the old way (tape)‏ Not smart - file system diffs - but... maildir filenames change - wasteful (needless I/O, duplicates = $$$)‏ Does not scale - 100s of servers = slow backups - needed more tape systems... egh Hard to add on to - we like to build stuff

Possible solutions Commercial Storage Systems - e.g. Isilon, Netapp... $$$ Clustered File Systems - e.g. Lustre, Red Hat GFS Distributed Storage Systems - e.g. MogileFS, Hadoop Build it ourselves - again, we like to build stuff

Possible solutions These all require a lot of development work, and we needed a solution quickly...

Build whatever you want! Amazon S3 to the rescue In Spring 2006, Amazon released a new storage API: Put, Get, List, Delete Quickly

Amazon S3 to the rescue photo by pseudopff @ Flickr

Backing up the new way (S3)‏ Smart - because we wrote the client - maildir filename changes are OK - everything is incremental Scales - no longer our concern... Amazon's concern - all servers backup in parallel Cheap - old cost = $180K per year - new cost = $36K per year

Backing up the new way (S3)‏ And look what else we can build now! - web-based restore tool for customers - custom retention policies - real-time archiving

The backup client Two processes run nightly on each mail server: 1. Figure out what to back up - take a snapshot of file list per maildir - compare to previous night's snapshot - create list of new files 2. Send it to S3 - compress each file and send - 1 email = 1 file = 1 S3 object (for now) - send state information too (flags, status)

The backup client

The restore client Command line utility: Get list of backup snapshots for a given mailbox Get number of emails contained in a given snapshot Get list of folders contained in a given snapshot Restore a folder or set of folders

Repetitive, manual work 3-4 restore requests per day Must charge $100 Only one-in-four customers go through with it Customer not happy when they accidentally delete mail Customer not happy about $100 fee Customer not happy if they decide to save $100 and not get the restore

Repetitive, manual work We want happy customers So, automate it...and make it free

Web-based Restore Tool In customer control panel Full control (list backups, list emails/folders, restore)‏ Free = Happy customers

Web-based Restore Tool Behind the scenes Control panel does not talk to S3 directly Calls our custom REST API hosted on EC2 servers EC2 servers talk to S3 Inserts restore jobs into a queue Mail servers pop restore jobs from the queue

Deleting old data from S3 Distributed workload via EC2 and SQS Thousands of cleanup jobs inserted into SQS Many worker EC2 servers are spawned which pop jobs out of the SQS queue Job = set of mailboxes to cleanup Workers check retention policies and delete old data EC2 servers killed when work is complete

AWS Customer Support Forums (but very active)‏

Things to watch out for Internal Server Errors are frequent, but manageable - Work around it Request Overhead can really slow things down - Batch your small files into larger S3 objects - Hit 'em too hard and they'll ask you to throttle - PUT and LIST requests are much slower than GET

Batch your files Note: - testing done from EC2 - 100 requests per data point

No really... batch your files! Requests are expensive. New pricing will force everyone to play nice. effective June 1st, 2007: Requests (new): $0.01 per 1,000 PUT or LIST requests $0.01 per 10,000 GET and all other requests* * No charge for delete requests Storage: same as before ($0.15/GB-month)‏ Bandwidth: slightly cheaper (was $0.20/GB)‏

Things we're working on Batching files :)‏ Real-time archiving - Send data to S3 as it arrives using transaction log - Transaction log already used for live mirroring

My Amazon Wishlist SLA ability to modify S3 meta data static IP option for EC2 load balancing for EC2 monitoring/management tool for EC2

Please fill out your session evaluation form and return it to the box at the entrance to the room. Thank you! Questions? Blog: http://billboebel.typepad.com Amazon Web Service home: http://aws.amazon.com

Bill Boebel, CTO of Webmail.us & Mark Washenberger, SW Engineer at Webmail.us Creating an Email Archiving Service with Amazon S3.

Similar presentations

Presentation on theme: "Bill Boebel, CTO of Webmail.us & Mark Washenberger, SW Engineer at Webmail.us Creating an Email Archiving Service with Amazon S3."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Bill Boebel, CTO of Webmail.us & Mark Washenberger, SW Engineer at Webmail.us Creating an Email Archiving Service with Amazon S3.

Similar presentations

Presentation on theme: "Bill Boebel, CTO of Webmail.us & Mark Washenberger, SW Engineer at Webmail.us Creating an Email Archiving Service with Amazon S3."— Presentation transcript:

Similar presentations

About project

Feedback