Download presentation
Presentation is loading. Please wait.
Published byHerbert Welch Modified over 6 years ago
1
Optical Overlay NUCA: A High Speed Substrate for Shared L2 Caches
OONUCA Optical Overlay NUCA: A High Speed Substrate for Shared L2 Caches Eldhose Peter*, Anuj Arora**, Akriti Bagaria* and Dr. Smruti R Sarangi* *IIT Delhi, **CISCO Bangalore
2
Motivation Overlay NUCA Architecture Results
3
Understand the problem - Cache
UCA Cache performance α 1/(hit latency) Cache performance α hit rate Sets Static => Not adaptable based on access pattern L2 L2 L2 L2 L2 Improved cache utilization L2 L2 L2 L2 NUCA Lower Level Memory Optical overlay NUCA: A high speed substrate for shared L2 caches 11/22/2018
4
Understand the problem – Optical Communication
Electrical Optical 50-60 cycles 1-2 cycles Optical overlay NUCA: A high speed substrate for shared L2 caches 11/22/2018
5
Optical Communication
S D1 D2 D3 Basic Components Reservation assisted Single Write Multi Read(R-SWMR) Methods to Leverage Optical Networks for Multicore Processors 11/22/2018
6
Prior Approaches No prior work in cache using optical NOC
Electrical NOC SNUCA DNUCA RNUCA L1 Migration near to the core L1 L1 L1 L1 Search L1 L1 L1 L1 Lower Level Memory 1000 100011 Tag Set Index Block Size HomeBank 10100 Optical overlay NUCA: A high speed substrate for shared L2 caches 11/22/2018
7
Equidistant nodes Banks are equidistant in terms of delay(approx)
Dynamic creation of sets Improves the utilization of banks Improves hit rate I am near to S X cycles S I am also near to S X cycles Optical overlay NUCA: A high speed substrate for shared L2 caches 11/22/2018
8
Phases of operation Phase 1: SNUCA + Profiling
Phase 2 : Reconfiguration Phase 3 : OONUCA Optical overlay NUCA: A high speed substrate for shared L2 caches 11/22/2018
9
Optical Overlay Profiling information – Cache bank accesses, bank contention, cache lines used Experimentally determined that the ring topology 8 banks is the best Optical overlay NUCA: A high speed substrate for shared L2 caches 11/22/2018
10
Creation of overlay High Low 1 2 20 3 4 5 13 6 7 8 30 Hybrid 6 9 10 11
14 12 17 13 14 19 15 16 8 3 Infreq 17 18 19 7 20 12 21 22 4 23 31 24 26 25 25 26 27 28 29 30 31 32 15 5 29 32 11 16 28 10 23 Optical overlay NUCA: A high speed substrate for shared L2 caches 11/22/2018
11
Operations in Overlay NUCA – Search
Home Bank 2 23 15 Two-Side Incremental (TSI) 20 27 Broadcast 18 31 5 Optical overlay NUCA: A high speed substrate for shared L2 caches 11/22/2018
12
Operations in Overlay NUCA - Eviction
Main Memory Home Bank Eviction from L2 18 20 21 24 26 29 31 32 Optical overlay NUCA: A high speed substrate for shared L2 caches 11/22/2018
13
TSI - Protocol L1 Cache Miss L2 Cache Bank If message type is request?
If space available in message queue? Hit Reply If non home bank Kill to home bank Kill to opposite branch Yes - Search Remove RCB Entry Type of Message ? Miss No – NACK to sender(Exponential back off) Notify RCB Entry Notify Kill Miss Home bank? Add notify Create Entry in RCB (Home bank) Yes No Any Child? Remove MQ Entry If notify = 2 Send request to Main memory Hit Remove entry from RCB Remove entry from MQ Yes No Miss Send request message to children Notify message to home bank Optical overlay NUCA: A high speed substrate for shared L2 caches 11/22/2018
14
Home Bank Controller Main memory Read Response Collection Buffer
Main memory Write Main Memory Message ID Block Addr MRBV Response to the sender Miss Eviction Logic Hit Victim Buffer Migrate block Miss Overlay Info store Evicted block Cache Bank NACK Message Fill Bank Read/Write NACK controller Response Forward request to other banks Search Logic Notify Full Message Queue Hit To core Read/Write Search Bank Kill controller Kill Optical overlay NUCA: A high speed substrate for shared L2 caches 11/22/2018
15
Message Structure Control Message Data Message 1 Flit – 3 cycles
NACK, Kill, Notify Data Message 5 Flits – 7 cycles Request, Response Optical overlay NUCA: A high speed substrate for shared L2 caches 11/22/2018
16
Clustered architecture – 16 stations
32 cores and 32 cache banks Clustered architecture – 16 stations Distribute directory Off chip laser Optical overlay NUCA: A high speed substrate for shared L2 caches 11/22/2018
17
Story till now Optical Overlay Operations Home Bank Controller
Message Format Architecture Optical overlay NUCA: A high speed substrate for shared L2 caches 11/22/2018
18
Configuration System L1 cache L2 cache Main memory
Cores – 32 Technology – 18nm Frequency – 3.4 GHz Laser – Off chip L1 cache Block size- 64 B Write mode – Write back Size – 32 KB MSHR - 32 L2 cache Banks – 32 Associativity – 8 Size – 256 KB Main memory Latency – 250 cycles Memory controllers – 4 Auxiliary Structures RCB – 128 MQ – 16 VB - 20 Optical overlay NUCA: A high speed substrate for shared L2 caches 11/22/2018
19
Results Hits in Home Bank
More non-home bank hits => high performance in Optical Overlay NUCA More non home bank hits Optical overlay NUCA: A high speed substrate for shared L2 caches 11/22/2018
20
Results Normalized Average Hit Latency 0.4-0.8 0.2-0.55
Optical overlay NUCA: A high speed substrate for shared L2 caches 11/22/2018
21
Comparable to DNUCA, much better than SNUCA
Results More home bank hits L2 Hit Rate Comparable to DNUCA, much better than SNUCA Optical overlay NUCA: A high speed substrate for shared L2 caches 11/22/2018
22
Results Normalized IPC 167% 161% 2-3% 50, 24,18%
High non-home bank hits Less L2 requests Optical overlay NUCA: A high speed substrate for shared L2 caches 11/22/2018
23
What we achieved Flexible and efficient access mechanism for large shared L2 caches TSI protocol has less number of accesses compared to naive broadcast protocol Performance difference of TSI and broadcast is minimal(2-3%) Mean speedup of 50% over SNUCA Proposal is agnostic to the network topology Optical overlay NUCA: A high speed substrate for shared L2 caches 11/22/2018
24
Thank you Optical overlay NUCA: A high speed substrate for shared L2 caches 11/22/2018
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.