Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 A New Approach to File System Cache Writeback of Application Data Sorin Faibish – EMC Distinguished Engineer P. Bixby, J. Forecast, P. Armangau and S.

Similar presentations


Presentation on theme: "1 A New Approach to File System Cache Writeback of Application Data Sorin Faibish – EMC Distinguished Engineer P. Bixby, J. Forecast, P. Armangau and S."— Presentation transcript:

1 1 A New Approach to File System Cache Writeback of Application Data Sorin Faibish – EMC Distinguished Engineer P. Bixby, J. Forecast, P. Armangau and S. Pawar EMC USD Advanced Development SYSTOR 2010, May 24-26, 2010, Haifa, Israel

2 2  Motivation: changes in servers technology  Cache writeback problem statement  Monitoring behavior of application data flush  Cache writeback as a closed loop system  Current cache writeback methods are obsolete  I/O “slow down” problem  New algorithms for cache writeback  Simulation results of new algorithms  Experimental results of a real NFS server  Summary and conclusions  Future work and extension to Linux FS Outline

3 3  Large numbers of cores in CPUs – more computing power  Large cheaper memory caches – cached data very large  Very large disk drives – but modest increase in disk throughput  Application data I/O increased much faster – but require constant flush to disk  Cache writeback is used to smooth bursty I/O traffic to disk  Conclusion: cache writeback of large amounts of application data is slower Motivation: changes in servers technology

4 4  I/O speeds increase forcing caching large amounts of dirty pages at servers to hide disk latency  Large number of clients access servers increasing burstiness of disk I/O and need for cache  Large caches of the FS and servers allow longer retention  Cache writeback flush is based on cache fullness metrics  Flush to disk is done at maximum speed when cache full leaving no room for additional I/Os  As long as cache is full I/Os will have to wait for empty cache pages availability – I/O “stoppage”  Result application performance is lower than disk performance Cache writeback problem statement

5 5 Monitoring behavior of application data flush Understanding the problem: Instrument kernel to measure cache Dirty Pages dynamics Monitor the behavior of DP in Buffer Cache Run benchmark multi-client application

6 6 Cache writeback as a closed loop system  Application controls the flush using I/O commit based on application cache state –DP in cache are difference between incoming I/O and DP flushed to disk –Goal is to keep difference/error zero –The error loop is closed as application send commits after each I/O –Cache Writeback is controlled by application  Flush to disk based on state of fullness of the Buffer Cache –Cache control mechanism ensure cache availability for new I/Os –DP in cache like water in tank –Water level is controlled by cache manager to prevent overflow –No relation between application I/O arrival and when the I/O is flush to disk –Result in large delays between I/O creation and I/O on disk – open loop –Cache writeback is controlled by algorithm

7 7 Current cache writeback methods  Trickle flush of DPs –Flush based on proportion of incoming application I/Os (rate based) –Use low priority to reduce CPU consumption –Background task with low efficiency –Used only to reduce memory pressures –Cannot address high bursts of I/O  Watermark based flush of DPs –Inspired from database and transactional applications –Cache writeback triggered by number/proportion of DP in the cache –There is no prediction of high I/O bursts – disadvantage for multi-clients –Flush is done at maximum disk speed to reduce latency –Close to incoming I/O rate for small caches – flush often –Inefficient for very large caches –Interfere with metadata and read operations

8 8 Current cache writeback deficiency  Watermark based flush of DPs is similar a non-linear saturation effect in the cache closed loop  Introduces oscillations in the DP behavior due to the saturation  The oscillation introduces additional I/O latencies to the disk latencies  Creates burstiness to the disk I/O – reduce aggregate performance

9 9 I/O “slow down” problem  Application data flush require FS MD updates to same disks  Flush is triggered when high watermark threshold is crossed  Watermark based flushes cannot throttle the I/O speed as it is an ultimate resort before kernel crash on starvation  Additional I/Os are slowed down until the MD is flushed for the new arriving I/Os  Even if NVRAM is used the DP need to be removed from cache to make room for additional I/Os  Application I/Os latency increases until the cache is freed – “slow down”  In worst cases the latency is so high that resemble to a I/O stoppage  If additional burst of I/Os on other new clients there is no room to put I/Os and new I/Os will wait until the watermark goes under low watermark - stoppage

10 10 New algorithms for cache writeback  Trying to address deficiency of current cache writeback methods  Inspired from control system and signal processing theory  Use adaptive control and machine learning methods  Utilize better modern HW characteristics  The goals of the solution are: –Reduce the I/O slowdown limited only by maximum disk I/O throughput –Reduce to minimum disk I/O burstiness and –Maximize aggregate I/O performance of the system (benchmark)  Same algorithms apply to network as well as local FSs  All the algorithms can be used for application DPs and MD DPs flush

11 11 New algorithms for cache writeback (cont.)  We present and simulate only 5 algorithms (more were considered): –Modified Trickle Flush – improved version of trickle by changing priority and use more CPU –Fixed Interval Algorithm – use a goal as target of number of DPs similar to watermark methods but compensate better for bursts of I/O (semi- throttling) by pacing the flush to disk –Variable Interval Algorithm – use an adaptive control scheme that adapt the time interval based on the change in DP during previous interval similar to trickle but with faster adaptation in response to I/O bursts –Quantum Flush – use the idea of lowest retention of DP in cache similar to watermark based methods but adapt flush speed proportional to number of new I/Os in the previous sample time –Rate of Change Proportional Algorithm – flushes DPs proportional to the first derivative of the number of DPs using fixed interval and a forgetting factor proportional to difference between I/O rate and maximum disk throughput: c = R * (t - ti ) + W * μ μ = α * (B – R) / B

12 12 Simulation results of new algorithms  Selection of best algorithm by: –Optimal behavior to unexpected bursts of I/Os –Flush best matching the rate of change in DPs in the cache (minimum DP level) –Minimize I/O slow down to clients (reduce I/O average latency)  Rate of change based algorithm with forgetting factor was best

13 13 Experimental results of a real NFS server  We implemented the Modified Trickle and Rate Proportional algorithms on the Celerra NAS server  Used SPEC sfs2008 benchmark and measured the number of DP in cache with 4 msec resolution  Experimental results show some I/O slowdown using the MT algorithm resulting in 92K NFS iops (diagrams sampled at same 55K NFS iops level)  The Rate Proportional algorithm show much shorter I/O slow down time resulting in 110.6K NFS iops

14 14 Summary and conclusions  Discussed new algorithms and paradigm to address the cache writeback in modern FS and servers  Discussed how the new algorithm can reduce the impact of bursts of application I/Os to the aggregate I/O performance otherwise bounded by the maximum disk speeds  We show how current cache writeback algorithms create I/O slowdown at I/O speeds that are lower than disk speed but changing rapidly  We presented reduced number of algorithms that are presented in the literature explaining their deficiencies  We discuss several new algorithms and show simulation results that allowed us to select the best algorithm for experimentation  We presented experimental results for 2 algorithms and show that Rate Proportional is the best algorithm based on the given criteria of success  Finally we discuss how these algorithms can be used for MD and DP on any file system network or local

15 15 Future work and extension to Linux FS  Investigation of additional algorithms inspired from signal processing of non-linear signals that address oscillatory behavior  Address similar behavior for cache writeback of local file systems including ext3, ReiserFS and ext4 in Linux OS (a discussion at next Linux workshop)  Linux FS developers are aware of this behavior and currently work to instrument the Linux kernel with same measurement tools as we used  We are also looking to use machine learning in order to be able to compensate for very fast I/O rate changes that will allow to optimize application performance for very large number of clients  Additional work is needed to find algorithms that will allow the maximum application performance equal the maximum aggregate disk performance  We are also looking to instrument NFS clients’ kernel to allow us evaluate the I/O slow down and tune the flush algorithm to reduce the slow down effect to zero  More work is needed to extend this study to MD and find new MD specific flushing methods

16


Download ppt "1 A New Approach to File System Cache Writeback of Application Data Sorin Faibish – EMC Distinguished Engineer P. Bixby, J. Forecast, P. Armangau and S."

Similar presentations


Ads by Google