1 Ekow J. Otoo Frank Olken Arie Shoshani Adaptive File Caching in Distributed Systems.

1 Ekow J. Otoo Frank Olken Arie Shoshani Adaptive File Caching in Distributed Systems

2 Objectives Goals —Develop a coordinated optimal file caching and replication of distributed datasets —Develop a software module, called Policy Advisory Module (PAM) as part of Storage Resource Managers (SRMs) and other grid storage middleware Examples of application areas Particle Physics Data Grid (PPDG) Earth Science, Grid (ESG) Grid Physics Network (GriPhyN).

3 Managing File Requests at a Single Site Shared disk Queuing And Scheduling Mass Storage System File requests Network Multiple Clients Using a Shared Disk for Accessing Remote MSS Other Sites Policy Advisory Module Storage Resource Manager

4 Two Principal Components of Policy Advisory Module A disk cache replacement policy —Evaluates which files are to be replaced when space is needed Admission policy for file requests —Determines which request is to be processed next —e.g. may prefer to admit requests for files already in cache Work done so far concerns: —Disk cache replacement policies —Development of SRM-PAM Interface —Some models of file admission policies

5 New Results Since Last Meeting Implementation of the Greedy Dual Size (GDS), replacement policy New experimental runs with new workloads. 6 –month log of access trace from Jlab Synthetic workload with file sizes from 500K to 2.14G Bytes Implementation of SRM-PAM simulation in OMNeT++ Papers: 1.Disk cache replacement algorithm for storage resource Managers on the grid – SC2002. 2.Disk file caching Algorithms that account for for delays in space reservation, file transfers and file processing. To be submitted to Mass Storage Conference 3.A discrete event simulation model of a storage resource manager – (To be submitted to SIGMMETRICS’2002).

6 Some Known Results in Caching (1) Disk to Memory Caching — Least Recently Used (LRU); keeps the last ref. time — Least Frequently Used (LFU); keeps reference counts — LRU-K; keeps last reference times up to a max of K. Best known result (O’Neil et al. 1993) — Small K is sufficient (K=2, 3) — Gain 5-10% over LRU depending on reference pattern — Significance of a 10% saving in time per reference Improved response time In the Grid and Wide Area Networks, this translates to — Reduced network traffic — Reduce load at the source — Savings in time to access files

7 Some Known Results in Caching (2) File Caching in Tertiary Storage to Disk —Modeling of Robotic Tapes [Johnson 1995, Sarawagi 1995,..] —Hazard Rate Optimal [Olken 1983] —Object LRU [Hahn et al. 1999] Web Caching —Self Adjusted LRU [Aggarwal and Yu 1997] —Greedy Dual [Young 1991]; —Greedy Dual Size (GDS), [Cao and Irani 1997]

8 Difference Between Environments Caching in primary memory —Fixed page size —Cost (in time) is assumed constant Transfer time is negligible Latency is assumed fixed for disk — Memory reference is instantaneous Caching in Grid environment —Large files with variable sizes —Cost of retrieval (in time) varies considerably From one time instant to another even for the same file ; Files may be available from different locations in a WAN Transfer time may be comparable to the latency in a WAN —Duration of file reference is significant and cannot be ignored —Main goal is to reduce network traffic and file access times

9 Our Theoretical Result on Caching Policies in Grid Environment Latency delays, transfer delays and file size impact caching policies in the Grid —Cache replacement algorithms, such as LRU, LRU-K do not take these into account and therefore are inappropriate The replacement policy we advocate is based on a cost-beneficial function computed at time t 0 as t 0 is the current time, k i (t 0 ) is the count of references for file i up to max of K C i (t 0 ) is the cost in time of accessing the file i, S i is size of file i. f i (t 0 ) is the total count of references to the file i over its active time T. t -K is the time of the k th backward reference. Eviction candidate is one with minimum  i (t 0 )

10 Implementations from the Theoretical Results Two new practical implementations developed: - MIT-K: Maximum average Inter-arrival Time, an improved LRU-K. - MIT-K dominates LRU-K - Does not take into account access costs and file size - The main ranking function is - LCB-K: Least Cost Beneficial with K backward references - does take into account retrieval delay and file size

11 Some Details of the Implementation Algorithms Evaluation of replacement policies with no delay considerations involves: - a reference stream r 1, r 2, r 3, …, r N - a specified cache size Z, and - two appropriate data structures: One holds information of referenced files and A second holds information about the files in cache but also allows for fast selection of eviction candidate. Binary Search Tree (BST) Cache content as a priority queue (PQ) LRU

12 Implementation When Delays are Considered When delays are considered each reference r i in the reference stream has 5 event times: time of arrival, time when file caching starts, time when caching ends, time when processing begins and time when processing ends and file is released. BST, Active lifetime T PQ of unpinned files in cache A vector of pinned files in cache Varies with different policies

13 Performance Metrics Hit Ratio Byte Hit Ratio Retrieval Cost Per Reference = = =

14 Implications of Metric Measures Hit Ratio: —Measure of the relative savings as a count of the number of files hit Byte Hit Ratio: —Measure of the relative savings as the time avoided in data transfers Retrieval Cost Per Reference: —Measure of the relative savings as the time avoided in data transfers and in retrieving data from their sources

15 Parameters of the Simulations Real workload from Jefferson Nat’l. Accelerator Facility (JLab) —A six month trace log of file accesses to tertiary storage —Log contains batched requests —Replacement policy used in JLab is LRU Synthetic workload based on JLab —250,000 files with large sizes uniformly distributed between 500K to 2.147 GBytes —Inter-arrival time is exponentially distributed with mean 90 sec —Number of references generated is about 500,000 —Locality of reference: partition the references into random size intervals follows the 80-20 rule within each interval (80% of references are on 20% of the files)

16 Replacement Policies Compared RND: Random LFU: Least Frequently Used LRU: Least Recently Used MIT-K: Maximum Inter-Arrival Time based on last K references LCB-K: Least Cost Beneficial based on last K references GDS: Greedy Dual Size Active lifetime of a file T is set at 5-days All results accumulated with variance reduction technique.

17 Simulations Results for JNAF Workload: Comparison of Hit Ratios MIT-K and LRU give the best performance LCB-K, GDS and RND are comparable LFU is the worst Higher values represent better performance

18 Simulations Results for JNAF Workload: Comparison of Byte Hit Ratios Higher values represent better performance MIT-K and LRU give slightly best performance All policies except LFU are comparable LFU is the worst

19 Simulations Results for JNAF Workload: Comparison of Average Retrieval Time Per Reference LCB-K and GDS give the best performance MIT-K, LRU and RND are comparable LFU shows the worst performance Lower values represent better performance

20 Simulations Results for Synthetic Workload: Comparison of Average Retrieval Time Per reference Lower values represent better performance LCB-K clearly gives the best performance although not significantly better than GDS LFU is still the worst Hit ratio and Byte Hit Ratio are not good measures of caching policy effectiveness on the Grid

21 Summary Developed a good replacement policy, LCB-K, for caching in storage resources management on the grid. Developed a realistic model for evaluating cache replacement policies taking into account delays at the data sources, transfers and processing. Applied the model for extensive simulation of different policies under synthetic and real workloads of access to mass storage system in JNAF We conclude that two worthwhile replacement policies for storage resource management on the GRID are LCB-K and GDS. The LCB-K gives about 10% savings in retrieval cost per reference compared to the widely used LRU. The cumulative effect can be significant in terms of reduced network traffic and reduced load at the source.

1 Ekow J. Otoo Frank Olken Arie Shoshani Adaptive File Caching in Distributed Systems.

Similar presentations

Presentation on theme: "1 Ekow J. Otoo Frank Olken Arie Shoshani Adaptive File Caching in Distributed Systems."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Ekow J. Otoo Frank Olken Arie Shoshani Adaptive File Caching in Distributed Systems.

Similar presentations

Presentation on theme: "1 Ekow J. Otoo Frank Olken Arie Shoshani Adaptive File Caching in Distributed Systems."— Presentation transcript:

Similar presentations

About project

Feedback