Computer Architecture Project #2 Cache Simulator
Objectives To understand cache memory Organization Set associativity Operation Cache Read & Write, Hit & Miss LRU replacement policy Performance Hit/miss ratio, miss penalty To develop your own cache simulator Cache Simulator Memory Access Pattern Cache Organization Display Option Hit/Miss Performance
General Cache Organization (S, E, B) E = 2e lines per set S = 2s sets set line 1 2 B-1 tag v B = 2b bytes per cache block (the data) Cache size: C = S x E x B data bytes valid bit If e = 1, “Direct Mapped Cache” else If s = 1, “Fully Associative Cache” else “E-Way Set Associative Cache”
E-way Set Associative Cache (Here: E = 2) E = 2: Two lines per set Assume that cache block size is 8 bytes Address of short int: t bits 0…01 100 v tag 1 2 3 4 5 6 7 v tag 1 2 3 4 5 6 7 find set v tag 1 2 3 4 5 6 7 v tag 1 2 3 4 5 6 7 v tag 1 2 3 4 5 6 7 v tag 1 2 3 4 5 6 7 v tag 1 2 3 4 5 6 7 v tag 1 2 3 4 5 6 7
E-way Set Associative Cache (Here: E = 2) E = 2: Two lines per set Assume that cache block size is 8 bytes Address of short int: t bits 0…01 100 compare both valid? + match: yes = hit v tag tag 1 2 3 4 5 6 7 v tag 1 2 3 4 5 6 7 block offset
E-way Set Associative Cache (Here: E = 2) E = 2: Two lines per set Assume that cache block size is 8 bytes Address of short int: t bits 0…01 100 compare both valid? + match: yes = hit v tag 1 2 3 4 5 6 7 v tag 1 2 3 4 5 6 7 block offset short int (2 Bytes) is here No match : One line in set is selected for eviction and replacement Replacement policies: random, least recently used (LRU), …
LRU Replacement Policy Theoretically… Practically… Address 1 2 3 4 5 Set
Performance (Average Access Time) = (Hit Time) + (Miss Rate) × (Miss Penalty) = (Hit Time) + [1 – (Hit Rate)] × (Miss Penalty) Example Suppose cache hit time is 1 cycle, Miss penalty is 100 cycles, and hit rate is 97%. Then average access time is: 1 cycle + ( 1 – 0.97 ) × 100 cycles = 1 + 0.03 × 100 = 4 cycles.
Requirements of the cache simulator (1) Cache simulator (hereinafter referred to CSIM) shall implement arbitrary numbers of sets and lines, and block size. You should implement a way to provide the numbers of sets and lines, and block size as inputs to CSIM. CSIM shall a read trace file line by line and process it. You should determine whether each memory operation is a cache hit or miss. You should implement the LRU replacement policy CSIM shall report the result of cache simulation. You should report these three basic results: numbers of Hits, misses, and evicts You should be able to report the average access time of cache simulation You should be able to report whether each memory access in trace file results in a cache hit or miss
Restrictions & Advices Implement method for input parameters. You should implement it by argument passing. (full credit) If you can’t, you can use standard input such as scanf(). (low credit) Evaluate only data cache performance. Therefore, you should ignore instruction load. You should assume that the memory accesses are aligned properly. Therefore, you can ignore requested size in trace file. You should evaluate your CSIM with, at least, 3 different trace data. You can use one provided with this project. Calculate average access time using below assumption: Hit time = 1 cycle, miss penalty = 100 cycles. Compile your CSIM without warnings.
How to trace memory accesses “valgrind” GPL licensed programming tool for memory debugging, memory leak detection, and profiling. (from http://en.wikipedia.org/wiki/Valgrind) Usage: >> valgrind -log-fd=1 --tool=lackey -v --trace-mem=yes ls -l Valgrind prints out memory accesses of “ls -l” on stdout, so you need to capture it by: >> valgrind -log-fd=1 --tool=lackey -v --trace-mem=yes ls -l > ls.trace Output Format: [space]operation address,size Output Type Example Naccess [space] I 0400d7d4,8 Instruction load All instructions 1 X L 04f6b868,8 Data Load movl (%eax), %ebx O S 7ff0005c8,8 Data Store movl %eax, (%ebx) M 0421c7f0,4 Data Modify incl (%ecx) 2
Reference Cache Simulator Usage: >>./csim [-v] -s <s> -E <E> -b <b> -t <trace file> -v: Optional verbose flag that displays trace info -s <s>: Number of set index bits (S = 2s is the number of sets) -E <E>: Associativity (number of lines per set) -b <b>: Number of block bits (B = 2b is the block size) -t <trace file>: Name of the valgrind trace to replay S = 2s sets set line 1 2 B-1 tag v B = 2b bytes per cache block (the data) Cache size: C = S x E x B data bytes valid bit
Cache Simulation Example (1) Usage: >>./csim [-v] -s <s> -E <E> -b <b> -t <trace file> Example: >>./csim -v -s 4 -E 1 -b 4 -t ./traces/yi.trace Number of set index bits = 4 (16 sets) Associativity = 1 (Direct Mapped Cache) Number of block bits = 4 (16 blocks in a cache line) Output L 10,1 miss M 20,1 miss hit …. hits: 4 misses:5 eviction: 3
Cache Simulation Example (2) Example memory access pattern Oper. Address Byte Load 0x10 1 Modify 0x20 0x22 Store 0x18 0x110 0x210 0x12 S V Tag 1 2 3 4 5 6 7 8 9 A B C D E F I
Cache Simulation Example (3) R/W Address Byte Load 0x10 1 Modify 0x20 0x22 Store 0x18 0x110 0x210 0x12 S V Tag 1 2 3 4 5 6 7 8 9 A B C D E F I 0x0 Hit Miss Evict 1
Cache Simulation Example (4) R/W Address Byte Load 0x10 1 Modify 0x20 0x22 Store 0x18 0x110 0x210 0x12 S V Tag 1 2 3 4 5 6 7 8 9 A B C D E F I 0x0 Hit Miss Evict 1 2
Cache Simulation Example (5) R/W Address Byte Load 0x10 1 Modify 0x20 0x22 Store 0x18 0x110 0x210 0x12 S V Tag 1 2 3 4 5 6 7 8 9 A B C D E F I 0x0 Hit Miss Evict 2
Cache Simulation Example (6) R/W Address Byte Load 0x10 1 Modify 0x20 0x22 Store 0x18 0x110 0x210 0x12 S V Tag 1 2 3 4 5 6 7 8 9 A B C D E F I 0x0 Hit Miss Evict 3 2
Cache Simulation Example (7) R/W Address Byte Load 0x10 1 Modify 0x20 0x22 Store 0x18 0x110 0x210 0x12 S V Tag 1 2 3 4 5 6 7 8 9 A B C D E F I 0x1 0x0 Hit Miss Evict 3 1
Cache Simulation Example (8) R/W Address Byte Load 0x10 1 Modify 0x20 0x22 Store 0x18 0x110 0x210 0x12 S V Tag 1 2 3 4 5 6 7 8 9 A B C D E F I 0x2 0x0 Hit Miss Evict 3 4 2
Cache Simulation Example (9) R/W Address Byte Load 0x10 1 Modify 0x20 0x22 Store 0x18 0x110 0x210 0x12 S V Tag 1 2 3 4 5 6 7 8 9 A B C D E F I 0x0 Hit Miss Evict 4 5 3 Average Access Time = 1 + (5 / 9) * 100 = 56.5 Cycle
보고서 작성요령 (1) 아래의 내용을 포함할 것 설계 요구사항 제시된 CSIM의 설계 요구사항을 자신의 CSIM에 맞춰 재정의 구현 시험 아래의 내용을 포함할 것 설계 요구사항 제시된 CSIM의 설계 요구사항을 자신의 CSIM에 맞춰 재정의 구현 자신의 CSIM이 어떤 식으로 동작하며 어떻게 설계 요구사항을 반영하는지 서술 자신의 CSIM의 사용법과 시뮬레이션 결과 출력 방법에 대해 서술 시험 CSIM의 요구사항을 어떤 방법으로 검증하였는지 서술 최소 3가지 Trace Data를 이용하여 검증 수행 추가적으로, Trace Data를 어떤 방법으로 얻었는지를 서술 CSIM 구현 내용을 알 수 있도록 캡쳐된 이미지를 첨부할 것
보고서 작성요령 (2) 아래의 내용을 포함할 것 성능 평가 Design Coding Testing 아래의 내용을 포함할 것 성능 평가 각각의 Cache 구조 (direct mapped, E-way set associative 및 fully associative cache)별로 성능을 측정하고 각각을 비교할 것
평가기준 Title Pts. Description Details CSIM 70 제출 10 Warning: 각 -0.5 pt. / Error: 각 -1 pt. Parameter Input Argument Passing: 10 pts., Other methods: 5 pts. Cache Organization 5 Dynamic allocation 사용 시: 5 pts. - 배열 사용 시: 2 pts. Cache Operation 20 Hit/miss의 정확한 처리: 10pts. Replacement policy (LRU): 5 pts - implementing random replacement: 3pts. 각각의 Memory Access에 대한 결과 (Hit/Miss) 시현: 4pts. - 결과 시현 여부를 선택할 수 있는 옵션 제공: 1pts. 성능 평가 정확한 Average Access Time의 제공 주석 최대한 각각의 라인에 주석을 제공 보고서 30 설계 요구사항 7 구현 시험 8 제출지연 매 1일 당 -5 제출 기한 1주일까지 제출 가능
제출방법 아래 제출 목록의 산출물들을 메일로 제출 제출 목록 제출 기한 : ’13. 12. 18(수) 23:59 까지 E-mail address: yonghunlee@archi.snu.ac.kr E-mail 제목: “[CSIM]학번_이름” 산출물들은 “학번_이름.zip” 또는 “학번_이름.tar”으로 압축하여 제출 제출 목록 CSIM source code Project 보고서 CSIM의 검증 시 사용한 Trace file 제출 기한 : ’13. 12. 18(수) 23:59 까지