AMIR RACHUM CHAI RONEN FINAL PRESENTATION INDUSTRIAL SUPERVISOR: DR. ROEE ENGELBERG, LSI Optimized Caching Policies for Storage Systems
System data is stored over different types of storage devices Generally speaking, in data storage, for a given price, the higher the speed, the lower the volume The idea is enable use of larger, low- cost disk space with the benefits of high-speed hardware-optimize data storage for fastest overall disk access This requires a dynamic algorithm for managing (migrating) the data across the tiers. Introduction – Storage Tiering SSD High Cost High Performance Low Volume SATA Drive Low Cost Low Performance High Volume
Goals Creating a platform which will allow us to test different algorithms in system-specific scenarios. Testing several algorithms and finding the optimal algorithm amongst them for storage tiering in different scenarios.
Methodology We coded a simulator that represents the platform running the tiered storage system. We created several data structures that represent the data on the system, its location at all times, record read/write operations, and several other unique features We used a recording of real I/O calls for such a system to simulate an actual scenario.
Accomplishments Created an Algorithm interface that supports any algorithm, multiple tiers and multiple platform data structures. Our design is generic enough to enable very easy addition of usage statistics and platform data. CLI enabled quick input of input file, chunk size, tiers information. Varying chunk size let us research the effect of the size on run time and algorithm effectiveness. We implemented 2 caching algorithms: A “naïve” algorithm that transfers every chunk to the top tier upon IO A more efficient algorithm that minimizes migrations Smart implementation resulted in low disk space usage for the various data structures (used a default tier).
Algorithm conclusions We ran 3 different scenarios: Small chunk size (16B), small SSD size (64B, *4 chunk size) Large chunk size (2048B), (relatively) small SSD size( 8196B, *4 chunk size) Small chunk size (16B), relatively large SSD size ( 8196B, *512 chunk size)
Algorithm conclusions When using extremely small SSD size (*4 chunk size), both caching algorithms are ineffective: The naïve one showed a high number of reads from higher tier, yet had twice as many migrations between tiers The smart algorithm, despite having half the migrations of the naïve algorithm, showed very little reading from higher tier. In this case, the dummy algorithm proved very efficient, as it saved all the time needed for relatively useless migrations.
Algorithm Conclusions (16/64)
Algorithm conclusions When running with a large chunk size and *4 SSD size, the caching algorithms received much better results than the dummy algorithm. However, the 2 caching algorithms did not differ in between themselves.
Algorithm Conclusions (2048/8192)
Algorithm conclusions Running with a small chunk size and a large SSD size, the 2 caching algorithms also gave similar results. However, they were far inferior to the results from the previous run.
Algorithm Conclusions (16/8192)
General Conclusions Chunk size greatly affects the runtime of the platform, but “standard” size does not take long to run. Smart usage of Boost greatly decreases work and is very effective. Good implementation can result in huge disk space saving. Despite having data structures in the platform, most non-naïve algorithms also need their own data structure of some sort Working with Git source control proved to be very helpful: Retrieving old code that was once thought to be obsolete. Collaboration.