InputsMetricsCodeResults
MAIN MEMORY core Interconnection network Private data (LI) cache Cache controller core Cache controller Private data (LI) cache MULTICORE PROCESSOR CHIP
InputsMetricsCodeResults
Name#coresC Latency M Latency M Blocks Cache Blocks Input Size Store % Number of invalidate messages in MSI and MESI Number of write-backs in MSI and MOSI L K1K 0 L K1K 40 L K1K 80 M K1K10K0 M K1K10K40 M K1K10K80 H K1K100K0 H K1K100K40 H K1K100K80 InputsMetricsCodeResults
Name#coresC Latency M Latency M Blocks Cache Blocks Input Size Store % Sensitivity of write-backs to the cache size MC K1010K50 MC K10010K50 MC K1K10K50 Sensitivity of write-backs to the # of cores MW K10010K50 MW K10010K50 MW K10010K50 InputsMetricsCodeResults Goals: number of invalidate messages (MESI), number of write backs (MOSI)
InputsMetricsCodeResults
InputsMetricsCodeResults
InputsMetricsCodeResults
InputsMetricsCodeResults
InputsMetricsCodeResults
InputsMetricsCodeResults
InputsMetricsCodeResults
IntroductionSnoopingDirectoryConclusion
Name#coresC Latency M Latency M Blocks Cache Blocks Input Size Store % Number of invalidate messages in MSI and MESI Number of write-backs in MSI and MOSI L K1K 0 L K1K 40 L K1K 80 M K1K10K0 M K1K10K40 M K1K10K80 H K1K100K0 H K1K100K40 H K1K100K80 InputsMetricsCodeResults
InputsMetricsCodeResults
InputsMetricsCodeResults
InputsMetricsCodeResults
InputsMetricsCodeResults Name#coresC Latency M Latency M Blocks Cache Blocks Input Size Store % Sensitivity of write-backs to the cache size MC K1010K50 MC K10010K50 MC K1K10K50
InputsMetricsCodeResults
InputsMetricsCodeResults Name#coresC Latency M Latency M Blocks Cache Blocks Input Size Store % Sensitivity of write-backs to the # of cores MW K10010K50 MW K10010K50 MW K10010K50
InputsMetricsCodeResults
Interconnection network MAIN MEMORY core Private data (LI) cache Cache controller Directory controller Directory MAIN MEMORY core Private data (LI) cache Cache controller Directory controller Directory In this presentation, we present the result of implementing multiprocessor system model with distributed directory
… Directory controller Cache Block Cache controller Core Cache controller Core Cache controller Core Cache controller sends request to directory
Cache controller Core … Cache controller Directory controller Cache Block Core Cache controller Core bottleneck
Cache controller Core … Cache controller Directory controller Cache Block Core Cache controller Core Directory controller Cache controller responses to every request by unicasting message
Messages typesStates
MOSI_protocol_cache_request: Executing cache controller request MOSI_protocol_directory_request: Executing directory controller response I_state_cache: Performing cache actions when it is in I state Transition_I_to_SD: Performing cache actions when it is in I state and wants to change to S state with condition D Directory_I: Performing directory action upon receiving message on cache controller for a block in I state
MOSI protocol: Number of cores: 8; Number of request/cycle: 4 L1 Block Size (bytes) Write-Back/ Memory References Write backs L1 cash size (KB) Write backs L1 block size (bytes) Block size =16 bytes Cache size = 128 bytes
Number of write backs mean(MOSI/MSI) =
IntroductionDirectorySnoopingConclusion Number of blocks/cache: 1000 Number of cache:100 Number of request/cycle: 4 Number of stalls mean(MOSI/MSI) =
Number of blocks/cache: 1000 Number of cache:100 Number of request/cycle: 4 Number of cycles mean(MOSI/MSI) = 1.459
Number of blocks/cache: 1000 Number of cache:100 Number of request/cycle: 4 mean(MOSI/MSI) = 1.345mean(MOSI/MSI) = 1.273
[1] - Daniel J. S. Mark D. H. David A. W., “A Primer on Memory Consistency and Cache Coherence,” Morgan Claypool Publishers, [2] – Suleman, Linda Bigelow Veynu Narasiman Aater. "An Evaluation of Snoop- Based Cache Coherence Protocols." [3] – Tiwari, Anoop. Performance comparison of cache coherence protocol on multi-core architecture. Diss [4] – Chang, Mu-Tien, Shih-Lien Lu, and Bruce Jacob. "Impact of Cache Coherence Protocols on the Power Consumption of STT-RAM-Based LLC." [5] – CMU : Parallel Architecture and Programming. Lecture Series. Spring 2012