Analysis of File Systems Performance in Amazon EC2 Storage

Analysis of File Systems Performance in Amazon EC2 Storage
UCCS Master’s Thesis Proposal Nagamani Pudtha 10/02/2014

Motivation & Challenges
Cloud Computing is emerging today as a commercial infrastructure Storage performance is often mysterious and hard to understand. In particular, clouds further complicates the issue, with even more complex storage settings. Not much work in the past with different storage settings with different file systems

Objective Understand performance implications of file systems with two different storage settings. A detailed block-level analysis of different file systems. Uncover the reasons behind I/O performance variations in different file system in Amazon EC2’s EBS and instance There is a lack of full understanding of the performance of these two settings.

Goals of the project Are there any wide range of performance variations between Amazon EC2 storage options Can the block size be a cause of I/O performance degradation Which one delivers the better peak performance Which one delivers more consistent performance

Experimental Setup The entire host OS is installed on a single disk (sda) Another single disk (sdb) is used for experiments. Created multiple equal-sized partitions from sdb, each corresponding to a different host ﬁle system.

/dev/xvdb1 30720 Ext2 /dev/xvdb2 Ext3 /dev/xvdb3 Ext4 /dev/xvdb4
Devices #Blocks Type /dev/xvdb1 30720 Ext2 /dev/xvdb2 Ext3 /dev/xvdb3 Ext4 /dev/xvdb4 ReiserFS

Benchmarks Macro-benchmarks: To understand the potential performance impact of ﬁle systems on realistic workloads FileBench Micro-benchmarks: coupled with low-level I/O tracing mechanisms to investigate the underlying cause. FIO Blktrace: Block layer IO tracing mechanism. It generate traces of the i/o traffic on block devices Filebench: The file server workload emulates a server that hosts home directories of multiple users (threads). Users are assumed to access files and directories belonging only to their respective home directories. Each thread picks up a different set of files based on its thread id. Each thread performs a sequence of create, delete, append, read, write, and stat operations, exercising both the meta-data and data paths of the file system. FIO: To produce necessary workloads in our experiments, we used FIO, or Flexible I/O Test, Synthetic Benchmark. FIO is an I/O tool was utilized to generate various multi-threaded I/O workloads with different configurations, such as read/write ratio, block size, and the number of concurrent I/O jobs, and others. It reports bandwidth, IOPS, and latency as performance metrics of specific I/O workloads for each SSD storage device over a given period of time

Filebench: File server: emulates a server that hosts home directories of multiple users (threads). Web server: Emulates a web service. Mail server: Emulates an service. DB server: Emulates the I/O characteristic of Oracle 9i. FIO : Random read, Random write, Sequential read, Sequential write •File Server. The file server workload emulates a server that hosts home directories of multiple users (threads). Users are assumed to access files and directories belonging only to their respective home directories. Each thread picks up a different set of files based on its thread id. Each thread performs a sequence of create, delete, append, read, write, and stat operations, exercising both the meta-data and data paths of the file system Web server: File operations are dominated by reads: open, read, and close. Writing to the web log file is emulated by having one append operation per open. • Mail server: File operations are within a single directory consisting of I/O sequences such as open/read/close, open/append/close, and delete. • Database server: File operations are mostly read and write on small files. To simulate database logging, a stream of synchronous writes is used. Blktrace keeps detailed account of each I/O request from start to finish as it goes through various I/O states (e.g., put the request onto an I/O queue, merge with an existing request, and wait on the I/O queue

Tasks and Timeline Task Timeline Status
Understand EBS and Instance storage 15 days Completed Understand File Systems Setup a VM with requirements 1 day Identify Benchmarking tools 1day Research how to use Filebench, Fio and blktrace 1.5 month Proposal writing 10 days

Utilizing Filebench benchmarking tool, create four server workloads: a file, web, mail and a database server ~10days Estimated Completion – October 5th Use FIO benchmarking tool to examine disk I/O workloads. ~10 days Estimated Completion – October 15th Understand the root causes of performance variations by using Blktrace tool ~7 days Estimated Completion – October 22nd

Perform analysis of collected data and draw conclusions about limitations and ideas to improve performance. ~15 days Estimated Completion –November 6th Final Project Report Estimated Completion – November 21st Final Presentation + Defense ~5 days Estimated Completion – November 26th

References “Wikipedia” Storage Virtualization: FIO - How to. Blktrace- generate traces of the I/O trafﬁc on block devices. git://git.kernel.org/pub/scm/linux/kernel/git/axboe/blktrace.gitbt [Accessed: Sep 2011]. Walker, E.: Benchmarking Amazon EC2 for HP Scientific Computing. Login 33(5), 18–23 (2008)

References B. Wolman and T. M. Olson. Iobench: a system independent io benchmark. SIGARCH Compute. Archit. News, 17:55–70, September 1989 H. Shan, K. Antypas, and J. Shalf. Characterizing and predicting the I/O performance of hpc applications using a parameterized synthetic benchmark. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing, SC ’08, pages 42:1–42:12, Piscataway, Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T.L., Ho, A., Pratt, I., Warfield, A.: Xen and the art of virtualization. In: SOSP. ACM, New York (2003) Deelman, E., Singh, G., Livny, M., Berriman, J.B., Good, J.: The cost of doing science on the cloud: the Montage example. In: SC, p. 50. IEEE/ACM (2008) Nagarajan, A.B., Mueller, F., Engelmann, C., Scott, S.L.: Proactive fault tolerance for HPC with Xen virtualization. In: ICS, pp. 23–32. ACM, New York (2007)

Analysis of File Systems Performance in Amazon EC2 Storage

Similar presentations

Presentation on theme: "Analysis of File Systems Performance in Amazon EC2 Storage"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Analysis of File Systems Performance in Amazon EC2 Storage

Similar presentations

Presentation on theme: "Analysis of File Systems Performance in Amazon EC2 Storage"— Presentation transcript:

Similar presentations

About project

Feedback