Proxy Cache and YOU By Stuart H. Schwartz. What is cache anyway? The general idea of cache is simple… Buffer data from a slow, large source within a (usually)

Slides:



Advertisements
Similar presentations
361 Computer Architecture Lecture 15: Cache Memory
Advertisements

A Preliminary Attempt ECEn 670 Semester Project Wei Dang Jacob Frogget Poisson Processes and Maximum Likelihood Estimator for Cache Replacement.
Paging: Design Issues. Readings r Silbershatz et al: ,
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
Input/Output Management and Disk Scheduling
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
FALL 2004CENG 351 Data Management and File Structures1 External Sorting Reference: Chapter 8.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.
Disk Drivers May 10, 2000 Instructor: Gary Kimura.
FALL 2006CENG 351 Data Management and File Structures1 External Sorting.
1  Caches load multiple bytes per block to take advantage of spatial locality  If cache block size = 2 n bytes, conceptually split memory into 2 n -byte.
Computer Organization and Architecture
Computer ArchitectureFall 2008 © November 3 rd, 2008 Nael Abu-Ghazaleh CS-447– Computer.
Computer System Overview Chapter 1. Basic computer structure CPU Memory memory bus I/O bus diskNet interface.
1  2004 Morgan Kaufmann Publishers Chapter Seven.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
CS 524 (Wi 2003/04) - Asim LUMS 1 Cache Basics Adapted from a presentation by Beth Richardson
Cost-Aware WWW Proxy Caching Algorithms Pei Cao University of Wisconsin-Madison Sandy Irani University of California-Irvine Proceedings of the USENIX Symposium.
Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.
Disk and I/O Management
Systems I Locality and Caching
Operating Systems ECE344 Ding Yuan Page Replacement Lecture 9: Page Replacement.
 Higher associativity means more complex hardware  But a highly-associative cache will also exhibit a lower miss rate —Each set has more blocks, so there’s.
CMPE 421 Parallel Computer Architecture
CSC 213 – Large Scale Programming Lecture 37: External Caching & (a,b)-Trees.
IT253: Computer Organization
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
Chapter 21 Virtual Memoey: Policies Chien-Chung Shen CIS, UD
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
L/O/G/O Cache Memory Chapter 3 (b) CS.216 Computer Architecture and Organization.
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
An Overview of Proxy Caching Algorithms Haifeng Wang.
FALL 2005CENG 351 Data Management and File Structures1 External Sorting Reference: Chapter 8.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
1  2004 Morgan Kaufmann Publishers Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality:
Evaluating Content Management Technique for Web Proxy Cache M. Arlitt, L. Cherkasova, J. Dilley, R. Friedrich and T. Jin MinSu Shin.
Jeffrey Ellak CS 147. Topics What is memory hierarchy? What are the different types of memory? What is in charge of accessing memory?
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
CSE373: Data Structures & Algorithms Lecture 26: Memory Hierarchy and Locality 1 Kevin Quinn, Fall 2015.
Virtual Memory. 2 Last Week Memory Management Increase degree of multiprogramming –Entire process needs to fit into memory Dynamic Linking and Loading.
CSE 351 Caches. Before we start… A lot of people confused lea and mov on the midterm Totally understandable, but it’s important to make the distinction.
Chapter 9: Virtual Memory – Part I
Computer Architecture
Chapter 21 Virtual Memoey: Policies
Memory Management 6/20/ :27 PM
Multilevel Memories (Improving performance using alittle “cash”)
How will execution time grow with SIZE?
Today How’s Lab 3 going? HW 3 will be out today
5.2 Eleven Advanced Optimizations of Cache Performance
Cache Memory Presentation I
CS61C : Machine Structures Lecture 6. 2
Cooperative Caching, Simplified
Chapter 11 I/O Management and Disk Scheduling
CENG 351 Data Management and File Structures
Memory System Performance Chapter 3
10/18: Lecture Topics Using spatial locality
Presentation transcript:

Proxy Cache and YOU By Stuart H. Schwartz

What is cache anyway? The general idea of cache is simple… Buffer data from a slow, large source within a (usually) smaller, faster source. We can free up bus traffic (or bandwidth) and also speeeeeed up access to data. Used everywhere in modern computers.

Why are we here? Today..? Learn a little about: Cache Web cache Problems of web cache Solutions to the problems of web cache Becoming s-m-r-t Smart!

Ok.. Cool… How does this apply to the inter-ma-web? Well, the web can be thought of as another I/O medium like a very very very large number of external devices on a USB bus Caching data at various locations between the requesting process and the data source can really help speed things up for the local machine and across the web.

So, why not just use regular caching methods? Some small differences exists between web cache and cache used on a local machine in the hardware architecture. On a local machine, the program using the memory usually changes the values. Once all changes are made, or an eviction occurs, the values are written back to main memory…. This is not the case for web cache. Web cache is generally pulled in for read only.

There’s more… Traditional replacement policies work under the assumption that blocks of data are pulled into cache in uniform sizes, and that the cost of pulling data in is relatively consistent for every block of data that could be requested... Again, web cache is different. Individual web objects can vary in size and can also vary in network cost for retrieval.

So… who cares? It's not a big problem until the web cache must evict data. Common eviction policies aren’t designed to handle non-uniform size blocks of data nor non-uniform costs for retrieving the data. Evicting the object that is to be used furthest in the future is optimal only in situations where size and cost are all equal among objects.

That sucks… so what do we do? We come up with a policy that will evict objects from the web cache in such an order as to maximize some metric we care about… …or have someone smarter than us come up with one.

Wait, what metrics are you talking about? Isn’t it just hit or miss? Now that there is more to the problem, there are more metrics we might care about minimizing or maximizing. Some include: Object hit rate Byte hit rate Latency Network Hops $

So, hasn’t anyone looked into this? YES! Some replacement policies exist, like: Least Recently Used Least Frequently Used LRU-Threshold Log(Size) + LRU Hyper-G

And.. Pitkow/Recker Lowest-Latency-First Hybrid Lowest Relative Value GreedyDual-Size

Whoa… That’s a lot of algorithms… What’s the best? Well, that’s a tough call. They all have their advantages and disadvantages, but GreedyDual-Size has been tested to function very well under normal conditions. GreedyDual-Size does well at maximizing object hits or byte hits, and also does well at minimizing latency or network hops… but it can only be setup to do one of those at any given time.

How do you know GreedyDual-Size performs well? Huh? Proving optimality is pretty difficult in such a complicated problem. It's possible to use sample sets of web access requests (traces) that were recorded over time and can then be used to simulate how well the individual algorithms will perform with respect to the specific metrics discussed before.

Locality is still a factor. In machine cache, locality is a big factor, data that is logically close to already accessed data is more likely to be accessed next, than data far away. The same goes for temporal locality, data that has been recently accessed is more likely to be accessed again than data that has not been accessed in a while.

Locality on the web Studies have shown that web access also follows some of the same patterns. Data within a single web site is more likely to be accessed next rather than data from another site. Data that has been accessed most recently is likely to be accessed again. Another odd property is this generally occurs in k * 24 hour cycles.

Back to GreedyDual-Size Greedy Dual-Size is an eviction policy that attempts to perform very well without fine- tuning any heuristics based on network behavior. It is based on the tried and true idea of Least Recently Used but also adds provisions for different object sizes and different network costs to bring the object in.

How does it work? Well GreedyDual-Size works by associating a value (we call H) with every object that is in cache. This value H = Cost / Size, where Cost is some abstract cost of bringing the object into cache, and Size is the size of the object in bytes. This simple Cost / Size relationship works very well at maximizing or minimizing desired metrics.

Ok… then what? When it comes time to evict an object, we pick the object with the lowest H value to evict. Then, we subtract that H value from all the objects still in memory, essentially depreciating their H value as evictions occur over time. If an object in memory is accessed again, we bring its H value back to the original Cost / Size again.

Pseudo Code Set L = 0 If Object is in memory Set H for the object to L + Cost(Object) / Size(Object) Return object Else While memory cannot fit Size(Object) L = minimum H value of objects in memory Evict object with H value of L End While Insert Object into memory Set Object’s H = L + Cost(Object) / Size(Object) End If

Wow, that’s crazy, what does that do? If we subtracted the minimum H value from every object in memory, the Big O for an eviction would be O(n), where n is the number of Objects in cache, and that is unreasonable. Instead, this pseudo code uses a heap queue where it is sorted by H value. And instead of depreciating the H values of objects already in memory, we just appreciate the H value of new objects coming into memory by L, which ‘remembers’ the appreciation. This allows evictions and insertions to occur in O(log n) time, which is very reasonable.

Performance Results Comparing the performance of GreedyDual-Size with LRU, Size, Hybrid and LRV yielded very promising results. Using a sample trace set, GreedyDual-Size performs better than its competitors as far as hit ratio. Incurring only a 5% miss rate when the cache is 5% of the total data size. LRV (Lowest Relative Value) sometimes performs better than GreedyDual-Size but it can be attributed to the fact that LRV is customized for the patterns of the networks where as GreedyDual-Size is generic.

Main flavors of GreedyDual-Size GD-Size(1) – Set all the network costs to 1; this aims to achieve maximum object hit rate. GD-Size(Packets) – Setting the network costs to the number of packets required for an object Aims to reduce network traffic. GD-Size(Latency) – Account for network latency and improve response times GD-Size(Average Latency) – Take an average of the network latency, works better on larger caches. GD-Size(Hops) – Number of network hops. Works the best GD-Size(Weighted Hops) – Number of network hops weighted by the number of packets to transfer.

Use GreedyDual-Size! GreedyDual-Size(Hops) and GreedyDual- Size(Weighted Hops) work the best at minimizing latency and network traffic as well as maximizing hit rate. They are simple to implement with no need for custom-i-zation. Using these cache replacement algorithms for all levels of web cache would yield a faster internet!

Cache is useful Cache is useful in many ways and a good cache replacement policy is the key to making it perform well. Well performing cache can bring us web pages, images and videos at much faster rates than no cache at all or cache with a poor replacement policy. Cache can bring us media like…

LASER CATS! Laser fast! This picture came from the internet