Learning Cache Models by Measurements Jan Reineke joint work with Andreas Abel Uppsala University December 20, 2012.

Slides:



Advertisements
Similar presentations
You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
Advertisements

Advanced Piloting Cruise Plot.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 1 Embedded Computing.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 3 CPUs.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
ALGEBRA Number Walls
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
My Alphabet Book abcdefghijklm nopqrstuvwxyz.
Multiplying binomials You will have 20 seconds to answer each of the following multiplication problems. If you get hung up, go to the next problem when.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Addition Facts
Year 6 mental test 5 second questions
Year 6 mental test 10 second questions
Chapter 4 Memory Management 4.1 Basic memory management 4.2 Swapping
Re-examining Instruction Reuse in Pre-execution Approaches By Sonya R. Wolff Prof. Ronald D. Barnes June 5, 2011.
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
SE-292 High Performance Computing
DOROTHY Design Of customeR dRiven shOes and multi-siTe factorY Product and Production Configuration Method (PPCM) ICE 2009 IMS Workshops Dorothy Parallel.
Chapter 4 Memory Management Basic memory management Swapping
ABC Technology Project
CS 241 Spring 2007 System Programming 1 Memory Replacement Policies Lecture 32 Klara Nahrstedt.
Page Replacement Algorithms
Online Algorithm Huaping Wang Apr.21
Cache and Virtual Memory Replacement Algorithms
Chapter 3.3 : OS Policies for Virtual Memory
Chapter 3 Memory Management
Virtual Memory II Chapter 8.
Memory Management.
A Survey of Web Cache Replacement Strategies Stefan Podlipnig, Laszlo Boszormenyl University Klagenfurt ACM Computing Surveys, December 2003 Presenter:
Virtual Memory 1 Computer Organization II © McQuain Virtual Memory Use main memory as a cache for secondary (disk) storage – Managed jointly.
Bypass and Insertion Algorithms for Exclusive Last-level Caches
1 Undirected Breadth First Search F A BCG DE H 2 F A BCG DE H Queue: A get Undiscovered Fringe Finished Active 0 distance from A visit(A)
VOORBLAD.
Name Convolutional codes Tomashevich Victor. Name- 2 - Introduction Convolutional codes map information to code bits sequentially by convolving a sequence.
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
BIOLOGY AUGUST 2013 OPENING ASSIGNMENTS. AUGUST 7, 2013  Question goes here!
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
Virtual Memory In this lecture, slides from lecture 16 from the course Computer Architecture ECE 201 by Professor Mike Schulte are used with permission.
© 2012 National Heart Foundation of Australia. Slide 2.
Lets play bingo!!. Calculate: MEAN Calculate: MEDIAN
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Chapter 5 Test Review Sections 5-1 through 5-4.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
Addition 1’s to 20.
25 seconds left…...
Januar MDMDFSSMDMDFSSS
Week 1.
SE-292 High Performance Computing
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
1 Unit 1 Kinematics Chapter 1 Day
PSSA Preparation.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 23, 2002 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
CpSc 3220 Designing a Database
From Model-based to Model-driven Design of User Interfaces.
The University of Adelaide, School of Computer Science
Presentation transcript:

Learning Cache Models by Measurements Jan Reineke joint work with Andreas Abel Uppsala University December 20, 2012

Jan Reineke, Saarland 2 The Timing Analysis Problem Embedded Software Timing Requirements ? Microarchitecture +

Jan Reineke, Saarland 3 What does the execution time of a program depend on? Input-dependent control flow Microarchitectural State + Pipeline, Memory Hierarchy, Interconnect

Jan Reineke, Saarland 4 Example of Influence of Microarchitectural State Motorola PowerPC 755

Jan Reineke, Saarland 5 Typical Structure of Timing Analysis Tools ISA-specific analysis parts Microarchitecture- specific part

Jan Reineke, Saarland 6 Construction of Timing Models Needs to accurately model all aspects of the microarchitecture that influence execution time.

Jan Reineke, Saarland 7 Construction of Timing Models Today

Jan Reineke, Saarland 8 Construction of Timing Models Tomorrow

Jan Reineke, Saarland 9 Cache Models Cache Model is important part of Timing Model Caches have simple interface: loads+stores Can be characterized by a few parameters: ABC: associativity, block size, capacity Replacement policy

Jan Reineke, Saarland 10 High-level Approach Generate memory access sequences Determine number of cache misses on these sequences using performance counters, or by measuring execution times. Infer property from measurement results.

Jan Reineke, Saarland 11 Warm-up I: Inferring the Cache Capacity Notation: Preparatory accesses Accesses to measure Basic idea: Access sets of memory blocks A once. Then access A again and count cache misses. If A fits into the cache, no misses will occur. Measure with.

Jan Reineke, Saarland 12 Example: Intel Core 2 Duo E6750, L1 Data Cache Capacity = 32 KB Way Size = 4 KB |Misses| |Size|

Jan Reineke, Saarland 13 Warm-up II: Inferring the Block Size Given: way size W and associativity A Wanted: block size B

Jan Reineke, Saarland 14 Inferring the Replacement Policy There are infinitely many conceivable replacement policies… For any set of observations, multiple policies remain possible. Need to make some assumption on possible policies to render inference possible.

Jan Reineke, Saarland 15 A Class of Replacement Policies: Permutation Policies Permutation Policies: Maintain total order on blocks in each cache set Evict the greatest block in the order Update the order based on the position of the accessed block in the order Can be specified by A+1 permutations Associativity-many hit permutations one miss permutation Examples: LRU, FIFO, PLRU, …

Jan Reineke, Saarland 16 Permutation Policy LRU LRU (least recently used): order on blocks based on recency of last access a b c d c a b d c e c a b e most-recently-used least-recently-used hit permutation 3miss permutation

Jan Reineke, Saarland 17 Permutation Policy FIFO FIFO (first-in first-out): order on blocks based on order of cache misses a b c d a b c d c e a b c e last-in first-in hit permutation 3miss permutation

Jan Reineke, Saarland 18 Inferring Permutation Policies Strategy to infer permutation i: 1) Establish a known cache state s 2) Trigger permutation i 3) Read out the resulting cache state s. Deduce permutation from s and s.

Jan Reineke, Saarland 19 1) Establishing a known cache state Assume miss permutation of FIFO, LRU, PLRU: ? ? ? ? d ? ? ? c d ? ? b c d ? a b c d dcba

Jan Reineke, Saarland 20 2) Triggering permutation i Simply access i th block in cache state. E,g, for i = 3, access c: a b c d ? ? ? ? ? c

Jan Reineke, Saarland 21 3) Reading out a cache state Exploit miss permutation to determine position of each of the original blocks: If position is j then A-j+1 misses will evict the block, but A-j misses will not. ? ? ? c ? ? c ? ? ? ? ? After 1 miss, c is still cached. After 2 misses, c is not cached

Jan Reineke, Saarland 22 Inferring Permutation Policies: Example of LRU, Permutation 3 a b c d c a b d c most-recently-used least-recently-used 1) Establish known state 2) Trigger permutation 3 3) Read out resulting state

Jan Reineke, Saarland 23 Implementation Challenges Interference Prefetching Instruction Caches Virtual Memory L2+L3 Caches: Strictly-inclusive Non-inclusive Exclusive Shared Caches: Coherence

Jan Reineke, Saarland 24 Experimental Results Undocumente d variant of LRU Nehalem introduced L0 micro-op buffer Exclusive hierarchy Surprising measurements ;-)

Jan Reineke, Saarland 25 L2 Caches on some Core 2 Duos |misses| n Number of cache sets, 4096 associativities Core 2 Duo E6300, 2 MB, 8 ways Core 2 Duo E6750, 4 MB, 16 ways Core 2 Duo E8400, 6 MB, 24 ways

Jan Reineke, Saarland 26 Replacement Policy of the Intel Atom D525 Discovered to our knowledge undocumented hierarchical policy: a b c d e f d c a b e f d x e d c a b x ATOM = LRU(3, LRU(2)) PLRU(2k) = LRU(2, PLRU(k))

Jan Reineke, Saarland 27 Conclusions and Future Work Inference of cache models I More general class of replacement policies, e.g. by inferring canonical register automata. Shared caches in multicores, coherency protocols, etc. Deal with other architectural features: translation lookaside buffers, branch predictors, prefetchers Infer abstractions instead of concrete models

Jan Reineke, Saarland 28 Questions? Measurement-based Modeling of the Cache Replacement Policy A. Abel and J. Reineke RTAS, 2013 (to appear).