Cache Performance Analysis of Traversals and Random Accesses R. E. Ladner, J. D. Fix, and A. LaMarca Presented by Tomer Shiran.

Slides:



Advertisements
Similar presentations
Routing in a Parallel Computer. A network of processors is represented by graph G=(V,E), where |V| = N. Each processor has unique ID between 1 and N.
Advertisements

QuickSort Average Case Analysis An Incompressibility Approach Brendan Lucier August 2, 2005.
Theory of Computing Lecture 3 MAS 714 Hartmut Klauck.
Fusion Trees Advanced Data Structures Aris Tentes.
Main Index Contents 11 Main Index Contents Week 6 – Binary Trees.
Data Structures and Algorithms (AT70.02) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: CLRS “Intro.
Divide-and-Conquer Matrix multiplication and Strassen’s algorithm Median Problem –In general finding the kth largest element of an unsorted list of numbers.
Divide and Conquer. Recall Complexity Analysis – Comparison of algorithm – Big O Simplification From source code – Recursive.
Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis  Need the knowledge of the distribution of the inputs Indicator.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Nattee Niparnan. Recall  Complexity Analysis  Comparison of Two Algos  Big O  Simplification  From source code  Recursive.
Generalized Derangements Anthony Fraticelli Missouri State University REUJuly 30, 2009 Advisor: Dr. Les Reid.
Probabilistic (Average-Case) Analysis and Randomized Algorithms Two different approaches –Probabilistic analysis of a deterministic algorithm –Randomized.
Median/Order Statistics Algorithms
Updated QuickSort Problem From a given set of n integers, find the missing integer from 0 to n using O(n) queries of type: “what is bit[j]
1 Hiring Problem and Generating Random Permutations Andreas Klappenecker Partially based on slides by Prof. Welch.
Hash Tables How well do hash tables support dynamic set operations? Implementations –Direct address –Hash functions Collision resolution methods –Universal.
Dictionaries and Hash Tables1  
Probabilistic (Average-Case) Analysis and Randomized Algorithms Two different but similar analyses –Probabilistic analysis of a deterministic algorithm.
Universal Hashing When attempting to foil an malicious adversary, randomize the algorithm Universal hashing: pick a hash function randomly when the algorithm.
More on protocol implementation Packet parsing Memory management Data structures for lookup.
Memory Management A memory manager should take care of allocating memory when needed by programs release memory that is no longer used to the heap. Memory.
10/10/2002CSE Memory Hierarchy CSE Algorithms Quicksort vs Heapsort: the “inside” story or A Two-Level Model of Memory.
Tirgul 6 B-Trees – Another kind of balanced trees Problem set 1 - some solutions.
Chapter 12 Trees. Copyright © 2005 Pearson Addison-Wesley. All rights reserved Chapter Objectives Define trees as data structures Define the terms.
Hashing General idea: Get a large array
Discrete Random Variables and Probability Distributions
David Luebke 1 7/2/2015 Medians and Order Statistics Structures for Dynamic Sets.
C o n f i d e n t i a l HOME NEXT Subject Name: Data Structure Using C Unit Title: Trees.
Memory Allocation CS Introduction to Operating Systems.
Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?
CHAPTER 71 TREE. Binary Tree A binary tree T is a finite set of one or more nodes such that: (a) T is empty or (b) There is a specially designated node.
Copyright © Curt Hill Other Trees Applications of the Tree Structure.
Chapter 19: Binary Trees. Objectives In this chapter, you will: – Learn about binary trees – Explore various binary tree traversal algorithms – Organize.
8.4 paging Paging is a memory-management scheme that permits the physical address space of a process to be non-contiguous. The basic method for implementation.
“On an Algorithm of Zemlyachenko for Subtree Isomorphism” Yefim Dinitz, Alon Itai, Michael Rodeh (1998) Presented by: Masha Igra, Merav Bukra.
Order Statistics The ith order statistic in a set of n elements is the ith smallest element The minimum is thus the 1st order statistic The maximum is.
1 Trees A tree is a data structure used to represent different kinds of data and help solve a number of algorithmic problems Game trees (i.e., chess ),
Implementing Dictionaries Many applications require a dynamic set that supports dictionary-type operations such as Insert, Delete, and Search. E.g., a.
3. Counting Permutations Combinations Pigeonhole principle Elements of Probability Recurrence Relations.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
+ Simulation Design. + Types event-advance and unit-time advance. Both these designs are event-based but utilize different ways of advancing the time.
CHAPTER 10 Sequences, Induction, & Probability Sequences & Summation Notation Objectives –Find particular terms of sequence from the general term.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Data Structures and Algorithms (AT70.02) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Prof. Sumanta Guha Slide Sources: CLRS “Intro.
CSC 211 Data Structures Lecture 13
Data Structures and Algorithms (AT70. 02) Comp. Sc. and Inf. Mgmt
CSE 5314 On-line Computation Homework 1 Wook Choi Feb/26/2004.
Chapter 18: Searching and Sorting Algorithms. Objectives In this chapter, you will: Learn the various search algorithms Implement sequential and binary.
CE 221 Data Structures and Algorithms Chapter 4: Trees (Binary) Text: Read Weiss, §4.1 – 4.2 1Izmir University of Economics.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
Liang, Introduction to Java Programming, Sixth Edition, (c) 2007 Pearson Education, Inc. All rights reserved Chapter 23 Algorithm Efficiency.
Informationsteknologi Wednesday, October 3, 2007Computer Systems/Operating Systems - Class 121 Today’s class Memory management Virtual memory.
The bin packing problem. For n objects with sizes s 1, …, s n where 0 < s i ≤1, find the smallest number of bins with capacity one, such that n objects.
Mathematical Induction Section 5.1. Climbing an Infinite Ladder Suppose we have an infinite ladder: 1.We can reach the first rung of the ladder. 2.If.
Memory Management OS Fazal Rehman Shamil. swapping Swapping concept comes in terms of process scheduling. Swapping is basically implemented by Medium.
1 Ch.19 Divide and Conquer. 2 BIRD’S-EYE VIEW Divide and conquer algorithms Decompose a problem instance into several smaller independent instances May.
CSC317 1 Quicksort on average run time We’ll prove that average run time with random pivots for any input array is O(n log n) Randomness is in choosing.
Copyright © Curt Hill Other Trees Applications of the Tree Structure.
CHAPTER 51 LINKED LISTS. Introduction link list is a linear array collection of data elements called nodes, where the linear order is given by means of.
LINKED LISTS.
David Luebke 1 6/26/2016 CS 332: Algorithms Linear-Time Sorting Continued Medians and Order Statistics.
Theory of Computational Complexity M1 Takao Inoshita Iwama & Ito Lab Graduate School of Informatics, Kyoto University.
David Luebke 1 7/2/2016 CS 332: Algorithms Linear-Time Sorting: Review + Bucket Sort Medians and Order Statistics.
Chapter 9: Selection of Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic.
CS 201 Data Structures and Algorithms
Random Testing.
CE 221 Data Structures and Algorithms
CE 221 Data Structures and Algorithms
Presentation transcript:

Cache Performance Analysis of Traversals and Random Accesses R. E. Ladner, J. D. Fix, and A. LaMarca Presented by Tomer Shiran

The Model A large memory – M blocks A smaller cache – C blocks We examine only direct-mapped caches Each block y in the cache is associated with exactly one block of memory such that y=x modC.

The Model (2)

The Model (3) There are n different memory blocks that map to each cache block. Thus, M=nC

Algorithms and Cache An algorithm is simply a sequence of accesses to blocks in memory We assume that initially, none of the blocks to be accessed are in the cache A read or write to a variable that is part of a block is modeled as one access to the block We do not distinguish between reads and writes – a copy back architecture with a write buffer is used An access to a memory block x is a hit if x is in the cache and is a miss, otherwise The cache performance of an algorithm is measured by the number of misses it incurs

Traversals A traversal with block access rate K accesses each block of a contiguous array of N/K blocks exactly K times each (we always assume that K divides N) There are a total of N accesses in a traversal Two types of traversals: –Scan traversal –Permutation traversal

Scan Traversals A scan traversal accesses the first block K times, then the second block K times, and so forth (for a total of N/K blocks and N accesses) Scan traversals are extremely common in algorithms that manipulate arrays –If B array elements fit in a block then a left- to-right traversal of the array is a scan traversal with block access rate B [P-5.1] A scan traversal with block access rate K has 1/K cache misses per access

Permutation Traversals Consider the multiset S that contains K copies of x where 0 ≤ x < N/K Let σ= σ 1 σ 2 …σ N be a permutation of S, chosen uniformly at random If σ i =x then the i-th access (out of N) in the permutation traversal is to x At any point in the permutation traversal, if there are k accesses remaining and memory block x has j accesses remaining, then memory block x is chosen for the next access with probability j/k

Hit Rate of Permutation Traversals [T-5.1] Assuming all permutations are equally likely, a permutation traversal with block access rate K of N/K contiguous memory blocks has the following number of misses per access:

Hit Rate of Permutation Traversals (2) x is a particular cache block m 1, m 2, …, m n are memory blocks that map to cache block x in the region accessed by the traversal (N=nCK) During the traversal, nK accesses will be made to x B i =j whenever the i-th access that maps to x is to location m j (1≤i≤nK)

Hit Rate of Permutation Traversals (3) X ij is a random variable that indicates whether the i- th access that maps to x is a hit to location m j The first access to x is always a miss, so X 1j =0 for all j For i>1 (and i≤nK) we have the following:

Hit Rate of Permutation Traversals (4) For a traversal, the expected number of hits at x is then: For the expected number of hits incurred by the traversal for all cache blocks, we need to multiply the result by the number of cache blocks:

Tree Traversals – An Example The nodes of the tree are allocated contiguously in memory L is the number of tree nodes that fit in a single cache block  K=3L Even if the tree is arbitrary, the permutation traversal that arises from a preorder traversal is not completely arbitrary: –When the key of a node is visited, the next access will always be to pL (the left child pointer) –pR (the right child pointer) will be accessed next for the majority of nodes (the leaves), or may be accessed soon after Therefore, we model the accesses to the keys as a permutation traversal with K=L, and the remaining accesses to the child pointers as hits

Tree Traversals – An Example (2) The total number of misses in a preorder traversal is: This result was validated with an implementation in C on a DEC Alpha (the memory access was monitored using Atom), and was found to be extremely accurate!

Random Access In a random access pattern each block x of memory is accessed statistically (in other words, on a given access x is accessed with some probability) We assume the independent reference assumption The analysis of a set of random access patterns is called collective analysis

Collective Analysis The cache is partitioned into a set R of regions The accesses are partitioned into a set P of processes The processes are used to model accesses to different portions of memory that map to the same portion of the cache (a single process doesn’t access different data items that conflict in the cache) λ ij is the probability that region i is accessed by process j r i is the is the size of region i in blocks λ i is the probability that region i is accessed

Collective Analysis (2) [P-6.1] In a system of random accesses, in the limit as the number of accesses goes to infinity, the expected number of misses per access is: We define the following quantities:

Random Access for a Finite Period Proposition 6.1 gives the expected miss ratio if we think of a system of random accesses running forever In some cases we are interested in the number of misses that occur in N accesses [L-6.1] In a system of random accesses, for each block in region i, the expected number of misses in N accesses is:

Random Access for a Finite Period (2) x is a particular block in region i ρ ik is the probability that the k-th access is a miss at block x q ik is the probability that the k-th access was a hit to x given that it was an access to x (i.e., q ik is the hit ratio of x at access k)

Random Access for a Finite Period (3) From Lemma 6.1 (which we just proved), we can find the expected number of misses in all the N accesses [T-6.1] In a system of random accesses, the expected number of misses per access in N accesses is: As N goes to infinity the expected number of misses per access goes to 1-η, the expected miss rate from Proposition 6.1

Random Access for a Finite Period (4) In the most simple case, there is only one process and one region In the collective analysis model, an access to a block in a direct mapped cache by process j will be a hit if no other process has accessed the block since the last access by process j When there is only one process an access to a block is always a hit, so η=1 As a consequence the expected number of misses per access simplifies to:

Interaction of a Scan Traversal with a System of Random Access Suppose we have a system of accesses that consists of a scan traversal with block access rate K to some segment of memory interleaved with a system of random accesses to another segment of memory that makes L accesses per traversal access The pattern of access is described by the regular expression: (t 1 r L t 2 r L … t K r L ) *, where a sequence t 1 t 2… t K indicates K accesses to the same block and r represents a random access We assume that the system of random access has regions R and processes P and the probability that process j accesses region i is λ ij As before, region i has r i blocks

Scan Traversal with Access Rate 1 In this case K=1 and we are analyzing the access pattern described by the regular expression (tr L ) *, where t indicates a traversal access and r indicates a random access N is the total number of accesses and we assume that (1+L)C divides N A traversal access is always a miss, because K=1 and the traversal accesses and random accesses are to different memory segments The number of traversal misses is N/(1+L)

Scan Traversal with Access Rate 1 (2) Consider a block x in region i Every C traversal accesses the traversal captures the block x (i.e., the traversal accesses a memory block that maps to x) During the next C-1 traversal accesses, a random access might be made to the block that was evicted from x by the traversal By Lemma 6.1 (with N=LC) the expected number of misses per block of region i in the random accesses during C traversal accesses is: The expected number of misses, both traversal and random accesses, during C traversal accesses is:

Scan Traversal with Access Rate 1 (3) [T-7.1] In a system consisting of a scan traversal with access rate 1 and system of random accesses with L accesses per traversal access, the expected number of misses per access is:

Scan Traversal with Access Rate 1 (4)

Scan Traversal with Access Rate 1 (5) Assume there is one region of size C and two processes where each is equally likely to access a given block r 1 =C, λ 1 =1, and η=η 1 =½ For large size C the previous formula (Theorem 7.1) evaluates to approximately: For L=1 (creating the access pattern (tr) * ) this formula evaluates to approximately 0.91 misses per access As L grows the number of misses per access approaches 0.5 which is what one would expect with the system of random accesses without any interaction with a traversal

Any Questions?