Computer Architecture Project #2 Cache Simulator

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
Advertisements

Lecture 19: Cache Basics Today’s topics: Out-of-order execution
1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Cache Here we focus on cache improvements to support at least 1 instruction fetch and at least 1 data access per cycle – With a superscalar, we might need.
CSC 4250 Computer Architectures December 8, 2006 Chapter 5. Memory Hierarchy.
How caches take advantage of Temporal locality
Memory Problems Prof. Sin-Min Lee Department of Mathematics and Computer Sciences.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Lecture 41: Review Session #3 Reminders –Office hours during final week TA as usual (Tuesday & Thursday 12:50pm-2:50pm) Hassan: Wednesday 1pm to 4pm or.
Cache Organization Topics Background Simple examples.
CS 61C: Great Ideas in Computer Architecture
5/27/99 Ashish Sabharwal1 Cache Misses - The 3 C’s n Compulsory misses –cold start –don’t have a choice –except that by increasing block size, can reduce.
2/27/2002CSE Cache II Caches, part II CPU On-chip cache Off-chip cache DRAM memory Disk memory.
COEN 180 Main Memory Cache Architectures. Basics Speed difference between cache and memory is small. Therefore:  Cache algorithms need to be implemented.
Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.
Maninder Kaur CACHE MEMORY 24-Nov
ECE Dept., University of Toronto
CacheLab 10/10/2011 By Gennady Pekhimenko. Outline Memory organization Caching – Different types of locality – Cache organization Cachelab – Warnings.
CMPE 421 Parallel Computer Architecture
Memory & Storage Architecture Seoul National University Computer Architecture Byoungjun Kim School of Computer Science and.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
CacheLab Recitation 7 10/8/2012. Outline Memory organization Caching – Different types of locality – Cache organization Cachelab – Tips (warnings, getopt,
1 Cache Memories Andrew Case Slides adapted from Jinyang Li, Randy Bryant and Dave O’Hallaron.
Garo Bournoutian and Alex Orailoglu Proceedings of the 45th ACM/IEEE Design Automation Conference (DAC’08) June /10/28.
Computer Architecture Lecture 26 Fasih ur Rehman.
Lecture 40: Review Session #2 Reminders –Final exam, Thursday 3:10pm Sloan 150 –Course evaluation (Blue Course Evaluation) Access through.
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
1 Seoul National University Cache Memories. 2 Seoul National University Cache Memories Cache memory organization and operation Performance impact of caches.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
1 CSCI 2510 Computer Organization Memory System II Cache In Action.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
11 Intro to cache memory Kosarev Nikolay MIPT Nov, 2009.
Systems I Cache Organization
Cache Memory.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
CAM Content Addressable Memory
Cache Organization 1 Computer Organization II © CS:APP & McQuain Cache Memory and Performance Many of the following slides are taken with.
Introduction to computer architecture April 7th. Access to main memory –E.g. 1: individual memory accesses for j=0, j++, j
Lecture 5 Cache Operation
CSCI206 - Computer Organization & Programming
COSC3330 Computer Architecture
CSE 351 Section 9 3/1/12.
CS2100 Computer Organization
Replacement Policy Replacement policy:
CSC 4250 Computer Architectures
Multilevel Memories (Improving performance using alittle “cash”)
Basic Performance Parameters in Computer Architecture:
The Hardware/Software Interface CSE351 Winter 2013
Caches III CSE 351 Autumn 2017 Instructor: Justin Hsia
Cache Memory Presentation I
Consider a Direct Mapped Cache with 4 word blocks
Lecture 21: Memory Hierarchy
Part V Memory System Design
CSCI206 - Computer Organization & Programming
Lecture 23: Cache, Memory, Virtual Memory
CMPT 886: Computer Architecture Primer
Chapter 5 Memory CSE 820.
Lecture 08: Memory Hierarchy Cache Performance
Lecture 22: Cache Hierarchies, Memory
Module IV Memory Organization.
CMSC 611: Advanced Computer Architecture
Caches III CSE 351 Autumn 2018 Instructor: Justin Hsia
Lecture 22: Cache Hierarchies, Memory
CS-447– Computer Architecture Lecture 20 Cache Memories
Cache - Optimization.
Cache Memory and Performance
Caches III CSE 351 Spring 2019 Instructor: Ruth Anderson
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

Computer Architecture Project #2 Cache Simulator

Objectives To understand cache memory Organization Set associativity Operation Cache Read & Write, Hit & Miss LRU replacement policy Performance Hit/miss ratio, miss penalty To develop your own cache simulator Cache Simulator Memory Access Pattern Cache Organization Display Option Hit/Miss Performance

General Cache Organization (S, E, B) E = 2e lines per set S = 2s sets set line 1 2 B-1 tag v B = 2b bytes per cache block (the data) Cache size: C = S x E x B data bytes valid bit If e = 1, “Direct Mapped Cache” else If s = 1, “Fully Associative Cache” else “E-Way Set Associative Cache”

E-way Set Associative Cache (Here: E = 2) E = 2: Two lines per set Assume that cache block size is 8 bytes Address of short int: t bits 0…01 100 v tag 1 2 3 4 5 6 7 v tag 1 2 3 4 5 6 7 find set v tag 1 2 3 4 5 6 7 v tag 1 2 3 4 5 6 7 v tag 1 2 3 4 5 6 7 v tag 1 2 3 4 5 6 7 v tag 1 2 3 4 5 6 7 v tag 1 2 3 4 5 6 7

E-way Set Associative Cache (Here: E = 2) E = 2: Two lines per set Assume that cache block size is 8 bytes Address of short int: t bits 0…01 100 compare both valid? + match: yes = hit v tag tag 1 2 3 4 5 6 7 v tag 1 2 3 4 5 6 7 block offset

E-way Set Associative Cache (Here: E = 2) E = 2: Two lines per set Assume that cache block size is 8 bytes Address of short int: t bits 0…01 100 compare both valid? + match: yes = hit v tag 1 2 3 4 5 6 7 v tag 1 2 3 4 5 6 7 block offset short int (2 Bytes) is here No match : One line in set is selected for eviction and replacement Replacement policies: random, least recently used (LRU), …

LRU Replacement Policy Theoretically… Practically… Address 1 2 3 4 5 Set

Performance (Average Access Time) = (Hit Time) + (Miss Rate) × (Miss Penalty) = (Hit Time) + [1 – (Hit Rate)] × (Miss Penalty) Example Suppose cache hit time is 1 cycle, Miss penalty is 100 cycles, and hit rate is 97%. Then average access time is: 1 cycle + ( 1 – 0.97 ) × 100 cycles = 1 + 0.03 × 100 = 4 cycles.

Requirements of the cache simulator (1) Cache simulator (hereinafter referred to CSIM) shall implement arbitrary numbers of sets and lines, and block size. You should implement a way to provide the numbers of sets and lines, and block size as inputs to CSIM. CSIM shall a read trace file line by line and process it. You should determine whether each memory operation is a cache hit or miss. You should implement the LRU replacement policy CSIM shall report the result of cache simulation. You should report these three basic results: numbers of Hits, misses, and evicts You should be able to report the average access time of cache simulation You should be able to report whether each memory access in trace file results in a cache hit or miss

Restrictions & Advices Implement method for input parameters. You should implement it by argument passing. (full credit) If you can’t, you can use standard input such as scanf(). (low credit) Evaluate only data cache performance. Therefore, you should ignore instruction load. You should assume that the memory accesses are aligned properly. Therefore, you can ignore requested size in trace file. You should evaluate your CSIM with, at least, 3 different trace data. You can use one provided with this project. Calculate average access time using below assumption: Hit time = 1 cycle, miss penalty = 100 cycles. Compile your CSIM without warnings.

How to trace memory accesses “valgrind” GPL licensed programming tool for memory debugging, memory leak detection, and profiling. (from http://en.wikipedia.org/wiki/Valgrind) Usage: >> valgrind -log-fd=1 --tool=lackey -v --trace-mem=yes ls -l Valgrind prints out memory accesses of “ls -l” on stdout, so you need to capture it by: >> valgrind -log-fd=1 --tool=lackey -v --trace-mem=yes ls -l > ls.trace Output Format: [space]operation address,size Output Type Example Naccess [space] I 0400d7d4,8 Instruction load All instructions 1 X L 04f6b868,8 Data Load movl (%eax), %ebx O S 7ff0005c8,8 Data Store movl %eax, (%ebx) M 0421c7f0,4 Data Modify incl (%ecx) 2

Reference Cache Simulator Usage: >>./csim [-v] -s <s> -E <E> -b <b> -t <trace file> -v: Optional verbose flag that displays trace info -s <s>: Number of set index bits (S = 2s is the number of sets) -E <E>: Associativity (number of lines per set) -b <b>: Number of block bits (B = 2b is the block size) -t <trace file>: Name of the valgrind trace to replay S = 2s sets set line 1 2 B-1 tag v B = 2b bytes per cache block (the data) Cache size: C = S x E x B data bytes valid bit

Cache Simulation Example (1) Usage: >>./csim [-v] -s <s> -E <E> -b <b> -t <trace file> Example: >>./csim -v -s 4 -E 1 -b 4 -t ./traces/yi.trace Number of set index bits = 4 (16 sets) Associativity = 1 (Direct Mapped Cache) Number of block bits = 4 (16 blocks in a cache line) Output L 10,1 miss M 20,1 miss hit …. hits: 4 misses:5 eviction: 3

Cache Simulation Example (2) Example memory access pattern Oper. Address Byte Load 0x10 1 Modify 0x20 0x22 Store 0x18 0x110 0x210 0x12 S V Tag 1 2 3 4 5 6 7 8 9 A B C D E F I

Cache Simulation Example (3) R/W Address Byte Load 0x10 1 Modify 0x20 0x22 Store 0x18 0x110 0x210 0x12 S V Tag 1 2 3 4 5 6 7 8 9 A B C D E F I 0x0 Hit Miss Evict 1

Cache Simulation Example (4) R/W Address Byte Load 0x10 1 Modify 0x20 0x22 Store 0x18 0x110 0x210 0x12 S V Tag 1 2 3 4 5 6 7 8 9 A B C D E F I 0x0 Hit Miss Evict 1 2

Cache Simulation Example (5) R/W Address Byte Load 0x10 1 Modify 0x20 0x22 Store 0x18 0x110 0x210 0x12 S V Tag 1 2 3 4 5 6 7 8 9 A B C D E F I 0x0 Hit Miss Evict 2

Cache Simulation Example (6) R/W Address Byte Load 0x10 1 Modify 0x20 0x22 Store 0x18 0x110 0x210 0x12 S V Tag 1 2 3 4 5 6 7 8 9 A B C D E F I 0x0 Hit Miss Evict 3 2

Cache Simulation Example (7) R/W Address Byte Load 0x10 1 Modify 0x20 0x22 Store 0x18 0x110 0x210 0x12 S V Tag 1 2 3 4 5 6 7 8 9 A B C D E F I 0x1 0x0 Hit Miss Evict 3 1

Cache Simulation Example (8) R/W Address Byte Load 0x10 1 Modify 0x20 0x22 Store 0x18 0x110 0x210 0x12 S V Tag 1 2 3 4 5 6 7 8 9 A B C D E F I 0x2 0x0 Hit Miss Evict 3 4 2

Cache Simulation Example (9) R/W Address Byte Load 0x10 1 Modify 0x20 0x22 Store 0x18 0x110 0x210 0x12 S V Tag 1 2 3 4 5 6 7 8 9 A B C D E F I 0x0 Hit Miss Evict 4 5 3 Average Access Time = 1 + (5 / 9) * 100 = 56.5 Cycle

보고서 작성요령 (1) 아래의 내용을 포함할 것 설계 요구사항 제시된 CSIM의 설계 요구사항을 자신의 CSIM에 맞춰 재정의 구현 시험 아래의 내용을 포함할 것 설계 요구사항 제시된 CSIM의 설계 요구사항을 자신의 CSIM에 맞춰 재정의 구현 자신의 CSIM이 어떤 식으로 동작하며 어떻게 설계 요구사항을 반영하는지 서술 자신의 CSIM의 사용법과 시뮬레이션 결과 출력 방법에 대해 서술 시험 CSIM의 요구사항을 어떤 방법으로 검증하였는지 서술 최소 3가지 Trace Data를 이용하여 검증 수행 추가적으로, Trace Data를 어떤 방법으로 얻었는지를 서술 CSIM 구현 내용을 알 수 있도록 캡쳐된 이미지를 첨부할 것

보고서 작성요령 (2) 아래의 내용을 포함할 것 성능 평가 Design Coding Testing 아래의 내용을 포함할 것 성능 평가 각각의 Cache 구조 (direct mapped, E-way set associative 및 fully associative cache)별로 성능을 측정하고 각각을 비교할 것

평가기준 Title Pts. Description Details CSIM 70 제출 10 Warning: 각 -0.5 pt. / Error: 각 -1 pt. Parameter Input Argument Passing: 10 pts., Other methods: 5 pts. Cache Organization 5 Dynamic allocation 사용 시: 5 pts. - 배열 사용 시: 2 pts. Cache Operation 20 Hit/miss의 정확한 처리: 10pts. Replacement policy (LRU): 5 pts - implementing random replacement: 3pts. 각각의 Memory Access에 대한 결과 (Hit/Miss) 시현: 4pts. - 결과 시현 여부를 선택할 수 있는 옵션 제공: 1pts. 성능 평가 정확한 Average Access Time의 제공 주석 최대한 각각의 라인에 주석을 제공 보고서 30 설계 요구사항 7 구현 시험 8 제출지연 매 1일 당 -5 제출 기한 1주일까지 제출 가능

제출방법 아래 제출 목록의 산출물들을 메일로 제출 제출 목록 제출 기한 : ’13. 12. 18(수) 23:59 까지 E-mail address: yonghunlee@archi.snu.ac.kr E-mail 제목: “[CSIM]학번_이름” 산출물들은 “학번_이름.zip” 또는 “학번_이름.tar”으로 압축하여 제출 제출 목록 CSIM source code Project 보고서 CSIM의 검증 시 사용한 Trace file 제출 기한 : ’13. 12. 18(수) 23:59 까지