An Effective Disk Caching Algorithm in Data Grid Why Disk Caching in Data Grids?  It takes a long latency (up to several minutes) to load data from a.

Slides:



Advertisements
Similar presentations
Song Jiang1 and Xiaodong Zhang1,2 1College of William and Mary
Advertisements

ULC: An Unified Placement and Replacement Protocol in Multi-level Storage Systems Song Jiang and Xiaodong Zhang College of William and Mary.
A Survey of Web Cache Replacement Strategies Stefan Podlipnig, Laszlo Boszormenyl University Klagenfurt ACM Computing Surveys, December 2003 Presenter:
Chapter 8 Virtual Memory
ARC: A SELF-TUNING, LOW OVERHEAD REPLACEMENT CACHE
Outperforming LRU with an Adaptive Replacement Cache Algorithm Nimrod megiddo Dharmendra S. Modha IBM Almaden Research Center.
Chapter 101 Cleaning Policy When should a modified page be written out to disk?  Demand cleaning write page out only when its frame has been selected.
Virtual Memory Background Demand Paging Performance of Demand Paging
Virtual Memory Introduction to Operating Systems: Module 9.
CGS 3763 Operating Systems Concepts Spring 2013 Dan C. Marinescu Office: HEC 304 Office hours: M-Wd 11: :30 AM.
Memory/Storage Architecture Lab Computer Architecture Virtual Memory.
Chapter 101 Virtual Memory Chapter 10 Sections and plus (Skip:10.3.2, 10.7, rest of 10.8)
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
OS2-1 Chapter 2 Computer System Structures. OS2-2 Outlines Computer System Operation I/O Structure Storage Structure Storage Hierarchy Hardware Protection.
Virtual Memory Chapter 8. Hardware and Control Structures Memory references are dynamically translated into physical addresses at run time –A process.
Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.
Improving Proxy Cache Performance: Analysis of Three Replacement Policies John Dilley and Martin Arlitt IEEE internet computing volume3 Nov-Dec 1999 Chun-Fu.
Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,
Virtual Memory Chapter 8.
1 Virtual Memory Chapter 8. 2 Hardware and Control Structures Memory references are dynamically translated into physical addresses at run time –A process.
1 Chapter 8 Virtual Memory Virtual memory is a storage allocation scheme in which secondary memory can be addressed as though it were part of main memory.
Internet Cache Pollution Attacks and Countermeasures Yan Gao, Leiwen Deng, Aleksandar Kuzmanovic, and Yan Chen Electrical Engineering and Computer Science.
ECE7995 Caching and Prefetching Techniques in Computer Systems Lecture 8: Buffer Cache in Main Memory (IV)
Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.
A Hybrid Caching Strategy for Streaming Media Files Jussara M. Almeida Derek L. Eager Mary K. Vernon University of Wisconsin-Madison University of Saskatchewan.
Web Caching Schemes For The Internet – cont. By Jia Wang.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 34 – Media Server (Part 3) Klara Nahrstedt Spring 2012.
Maninder Kaur CACHE MEMORY 24-Nov
1 Ekow J. Otoo Frank Olken Arie Shoshani Adaptive File Caching in Distributed Systems.
Virtual Memory.
Shilpa Seth.  Centralized System Centralized System  Client Server System Client Server System  Parallel System Parallel System.
RAID Ref: Stallings. Introduction The rate in improvement in secondary storage performance has been considerably less than the rate for processors and.
Design and Implement an Efficient Web Application Server Presented by Tai-Lin Han Date: 11/28/2000.
Page 19/17/2015 CSE 30341: Operating Systems Principles Optimal Algorithm  Replace page that will not be used for longest period of time  Used for measuring.
NETWORK SERVERS Oliver Topping (with a little help from my Mum)
Search Engine Caching Rank-preserving two-level caching for scalable search engines, Paricia Correia Saraiva et al, September 2001
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Proxy Cache and YOU By Stuart H. Schwartz. What is cache anyway? The general idea of cache is simple… Buffer data from a slow, large source within a (usually)
Virtual Memory Chapter 8. Hardware and Control Structures Memory references are dynamically translated into physical addresses at run time –A process.
Operating Systems David Goldschmidt, Ph.D. Computer Science The College of Saint Rose CIS 432.
Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.
PROP: A Scalable and Reliable P2P Assisted Proxy Streaming System Computer Science Department College of William and Mary Lei Guo, Songqing Chen, and Xiaodong.
Performance of Web Proxy Caching in Heterogeneous Bandwidth Environments IEEE Infocom, 1999 Anja Feldmann et.al. AT&T Research Lab 발표자 : 임 민 열, DB lab,
Virtual Memory The memory space of a process is normally divided into blocks that are either pages or segments. Virtual memory management takes.
A BRIEF INTRODUCTION TO CACHE LOCALITY YIN WEI DONG 14 SS.
An Overview of Proxy Caching Algorithms Haifeng Wang.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Energy Efficient Prefetching and Caching Athanasios E. Papathanasiou and Michael L. Scott. University of Rochester Proceedings of 2004 USENIX Annual Technical.
Lectures 8 & 9 Virtual Memory - Paging & Segmentation System Design.
Jeffrey Ellak CS 147. Topics What is memory hierarchy? What are the different types of memory? What is in charge of accessing memory?
CS422 Principles of Database Systems Buffer Management Chengyu Sun California State University, Los Angeles.
On Caching Search Engine Query Results Evangelos Markatos Evangelos Markatoshttp://archvlsi.ics.forth.gr/OS/os.html Computer Architecture and VLSI Systems.
LIRS: Low Inter-reference Recency Set Replacement for VM and Buffer Caches Xiaodong Zhang College of William and Mary.
Virtual Memory Chapter 8.
Virtual memory.
LRFU (Least Recently/Frequently Used) Block Replacement Policy
Memory Thrashing Protection in Multi-Programming Environment
Cache Memory Presentation I
Chapter 8 Virtual Memory
Virtual Memory Chapter 8.
Informed Prefetching and Caching
Chapter 9: Virtual-Memory Management
CGS 3763 Operating Systems Concepts Spring 2013
Distributed Systems CS
Virtual Memory فصل هشتم.
Qingbo Zhu, Asim Shankar and Yuanyuan Zhou
Computer Architecture
Virtual Memory: Working Sets
Operating Systems: Internals and Design Principles, 6/E
Virtual Memory.
Presentation transcript:

An Effective Disk Caching Algorithm in Data Grid Why Disk Caching in Data Grids?  It takes a long latency (up to several minutes) to load data from a Mass Storage System (MSS) via LAN;  It takes a very long time (up to a few hours) to complete file transfers for a request over wide-area networks (WAN);  A researcher's workstation or even her local computer center may not be able to keep all the required dataset for a long time for his needs. Song Jiang and Xiaodong Zhang College of William and Mary, USA

Wide Area Network Clients DRM HRM HPSS Disk Caching in Storage Resource Managers (SRM) HPSS: High Performance Storage System HRM: Hierarchical Resource Manager DRM: Disk Resource Manager

A General Replacement Utility Function based on Locality, Sizes, and Transfer Cost Locality estimation of files is the most critical factor determining hit ratio of disk caching. In data grid, the replacement is managed by middleware, the maintenance cost can be more relaxed than that in OS. For a file i at time T, Φ i (t) = L i (t) * C i / S i L i (t) denotes its locality strength, S i denotes the file size, C i denotes its retrieving cost

Existing Replacement and Their Limits (1) LRU Based Only -: Carries all the weaknesses of LRU; No size and transfer cost consideration +: Easy to implement and low maintenance cost. (2) LRU Based Locality Estimation -: Carries all the weaknesses of LRU. +: Easy to implement and low maintenance cost. (3) Frequency based Estimation Based on the number of times an object in the file has been accessed since it is fetched. -: The frequency sometimes can be misleading due to frequencies can be interval dependent. Cache pollution: a file with accumulated high frequency can stay for long, but no more accesses. +: Easy to implement, and monitor multiple objects.

Existing Replacement and Their Limits (Continued) (4) Greedy-Dual-Size (USENIX Internet Tech’97) For a time interval, locality estimator is access frequency / number of missed objects -: Share the same weaknesses frequency based. +: Easy to implement. (5) Least Cost Beneficial based on K backward References (SC’02) For a time interval, locality estimator is number of most recent references * total references -: the estimator is the time interval dependent. +: Easy to implement and low maintenance cost.

Drawbacks in the existing Locality Estimation Methods: Recency Based (1) Locality of large file access in Data Grids is weaker than that of I/O block and Web file caching Hard to deal with weak locality file requests; Example: Greedy Dual-Size (GDS) (2) Frequency Based Pollution problem Examples: Greedy Dual-Size with Frequency (GDSF), Hybrid, Lowest relative Value (LRV) (3) Re-use Density Based Overcome the drawbacks of previous methods; Could be irrelevant to locality because density is computed over wall clock time; Examples: Least Cost Beneficial Based on the K backward References (LCB-K)

Our Principle on Disk Replacement  Only relevant history information is used in locality estimation: the order of file requests, and cache size demands  Total cache size is used with the locality estimation to answer the question: Does a file have enough locality so that it if it is cached it could be hit for its next reference before it is replaced from the cache with its specific size ? Our Utility Function Caching Time of a file measures how much cache is consumed since the last reference to the file. A caching time is maintained for a file for a certain period of time even it is replaced, so that the utility of its next access can be more accurately assessed Resident Non- Resident FileSize CachingTime Caching Time Stack Least Value Based on Caching Time (LVCT) (1) If f is fetched to cache, push its entry to stack top, set caching time ``0”. (2) If file f is hit in the cache, advance the entry up to Size (f) positions. (3) If the access to file f is a miss:  Select a set of resident files whose utility values are smallest among resident files;  If the utility value of file f is smaller than any of the files in the set, f is not cached. (mainly determined by its file size and transfer cost).  Otherwise, f is cached  The caching time stack is updated accordingly.

Trace Description  Collected in a MSS system, JASMine, at Jefferson's National Accelerator Facility (JLab),  Represent the file access activities for a period of about 6 months.  207,331 files accessed in the trace, and their total size is TBytes. Simulation Results LVCT Advantage  Replace files with large caching times timely because they are less likely to generate hits even if cached;  Cache space is saved to serve small caching time files;  Improvement is more apparent with small cache sizes.

Main Results  Disk caching in data grids exhibit properties different from transitional I/O buffering and Web file caching;  We identify a critical drawback in abstracting locality information from Grid request streams, i.e. counting on misleading time measurements.  We propose a new locality estimator using more relevant access events;  The real-life workload traces demonstrate the effectiveness of our replacement.