Presentation is loading. Please wait.

Presentation is loading. Please wait.

Waltham, Massachusetts, USA Wichita State University (WSU), USA

Similar presentations


Presentation on theme: "Waltham, Massachusetts, USA Wichita State University (WSU), USA"— Presentation transcript:

1 Waltham, Massachusetts, USA Wichita State University (WSU), USA
IEEE PAST 2016 Waltham, Massachusetts, USA “A Novel Directory Based Hybrid Cache Coherence Protocol for Shared Memory Multiprocessors” Author, Presenter: Dr. Abu Asaduzzaman, Associate Professor Director of CAPPLab and Advisor of the IEEE Student Branch at Wichita State Department of Electrical Engineering and Computer Science (EECS) Wichita State University (WSU), USA 18 – 21 October 2016

2 Waltham, Massachusetts, USA Wichita State University (WSU), USA
IEEE PAST 2016 Waltham, Massachusetts, USA “A Novel Directory Based Hybrid Cache Coherence Protocol for Shared Memory Multiprocessors” Author, Preparation: Mr. Kishore K. Chidella PhD Student Department of Electrical Engineering and Computer Science (EECS) Wichita State University (WSU), USA 18 – 21 October 2016

3 “A Novel Directory Based Hybrid Cache Coherence Protocol for Shared Memory Multiprocessors”
Outline Introduction Cache coherence in shared memory multiprocessors Directory based hybrid cache coherence protocol Background and Motivation Multiprocessors are convenient to program; inconsistency among cached data, limitations of snoopy, PWU, and PWI protocols Proposed Secure Processing Technique Sharer groups of processors with snoopy protocol One request per sharer group to the directory at shared CL2 Experimental Results, Conclusions Q/A, Discussion Contact Information QUESTIONS? Any time, please!

4 “A Novel Directory Based Hybrid Cache Coherence Protocol for Shared Memory Multiprocessors”

5 Flynn’s Classification of Computer Architectures [1]
“A Novel Directory Based Hybrid Cache Coherence Protocol for Shared Memory Multiprocessors” Flynn’s Classification of Computer Architectures [1] UMA – Uniform memory access COMA – Cache only MA NUMA – Non UMA MPP – Massively parallel processor COW – Cluster of workstations CC-NUMA – Cache coherent NUMA NC-NUMA – Non Coherent NUMA Flynn’s Classification: based on instruction streams and data streams An instruction stream corresponds to a program counter (PC). A system with n CPUs has n PCs, hence n instruction streams. A data stream consists of a set of operands. Multiple data streams are common in modern computing. The instruction and data streams are, to some extent, independent.

6 Computer Architectures [1]
“A Novel Directory Based Hybrid Cache Coherence Protocol for Shared Memory Multiprocessors” Computer Architectures [1] MIMD MIMD SIMD Multicomputer Multiprocessor Array Processor SISD SISD SIMD Pure Harvard Architecture Von Neumann Architecture Vector Processor

7 Cache Memory Hierarchy [2]
“A Novel Directory Based Hybrid Cache Coherence Protocol for Shared Memory Multiprocessors” Cache Memory Hierarchy [2] Single-Core System CL1 (I1 & D1), CL2 Private/Dedicated Cache CL1, CL2, AMD Opteron Shared CL2 CL2, Intel Kentsfield XE TLB – translation lookaside buffer

8 “A Novel Directory Based Hybrid Cache Coherence Protocol for Shared Memory Multiprocessors”
Introduction The advancements in high performance computing (HPC) can be enhanced by using better hardware and/or software [3-6]. The processor performance can be improved with private and multilevel cache [7, 8]. A common choice for the memory organization of a multicore system is a two-level cache hierarchy, where level-1 cache (CL1) is normally split into I1 and D1 and CL2 is shared by the cores [9].

9 “A Novel Directory Based Hybrid Cache Coherence Protocol for Shared Memory Multiprocessors”
Introduction (+) However, the presence of multilevel caches in multiprocessors worsens the cache coherence problem. The cache coherence problem has solution through software and hardware protocols: snoopy and directory [10, 11]. Pure write update (PWU) protocols typically incur severe performance degradation as compared to pure write invalidate (PWI) protocols because of heavy traffic.

10 “A Novel Directory Based Hybrid Cache Coherence Protocol for Shared Memory Multiprocessors”
Problem Description While addressing cache coherency in shared memory multiprocessors, traditional snoopy based PWU and PWI protocols have many issues including low bandwidth, high memory latency, and large cache miss ratio. Contributions A directory based hybrid cache coherence protocol to better address the cache coherency and improve performance of shared memory multiprocessors.

11 Proposed Directory Based Protocol
“A Novel Directory Based Hybrid Cache Coherence Protocol for Shared Memory Multiprocessors” Proposed Directory Based Protocol The system-wide status information that is relevant to coherence maintenance is stored in a directory in the shared level-2 cache. Processors are logically grouped into Sharer Groups depending on the request types. Processors 1 and 5 are members of Share Group 1, Processors 3 and N are members of Share Group 2, etc. The snoopy methods (PWU, PWI) are used within each Sharer Group. One request from a Sharer Group to L2 Cache.

12 “A Novel Directory Based Hybrid Cache Coherence Protocol for Shared Memory Multiprocessors”
Working Principle First, priority of a request is checked, then dependency and sharer groups. Among the sharer groups, the sharer group with the maximum number of requests should be considered first. Requests from more than one sharer groups can be processed at the same time in a multiprocessor system with multiport shared level-2 cache. In order to avoid any starving, requests that are waiting for more than a predefined certain amount of time (say, 1000 cycles) are processed immediately.

13 Sharer Groups & Requests
“A Novel Directory Based Hybrid Cache Coherence Protocol for Shared Memory Multiprocessors” Sharer Groups & Requests In this study, we use a representative set of read and write requests generated by the processors. Table shows 15 sets of requests for an 8-core multiprocessor system. Say, Sharer Group 1 consists of four processors P1, P2, P3, etc. In set R1, processor P1 of Sharer Group 1 has a write request (W) to memory X1, processor P5 of Sharer Group 2 has a write request to memory Y1, and processor P7 of Sharer Group 3 has a write request to memory Z1…

14 “A Novel Directory Based Hybrid Cache Coherence Protocol for Shared Memory Multiprocessors”
Experimental Details Processors in 8-, 16-, and 32-core systems The workload is used to assess the proposed directory based hybrid, PWU, and PWI protocols. The workload represents priority, dependency, different sharer groups, and starving cases. For 16-core and 32-core systems, the requests are doubled and quadrupled, respectively, as shown in Table.

15 Experimental Details (+)
“A Novel Directory Based Hybrid Cache Coherence Protocol for Shared Memory Multiprocessors” Experimental Details (+) Update request and execution time of 8-core The requests are categorized into read (R) and write (W) depending on the target memory blocks X, Y, and Z. Table shows the categories, number of the same request in each set of requests, and number of requests sent to CL2 per set of requests.

16 “A Novel Directory Based Hybrid Cache Coherence Protocol for Shared Memory Multiprocessors”
Experimental Results The proposed hybrid protocol helps reduce the memory latency up to %, 14.53%, and 21.82% for 8-core, 16-core, and 32-core systems, respectively. The proposed hybrid protocol helps reduce the bandwidth requirement up to 12.50%, 21.75%, and 37.50% for 8-core, 16-core, and 32-core systems, respectively.

17 Experimental Results (+)
“A Novel Directory Based Hybrid Cache Coherence Protocol for Shared Memory Multiprocessors” Experimental Results (+) The proposed hybrid protocol helps reduce the cache miss ratio up to 3.96%, 5.47%, and 11.88% for 8-core, 16-core, and 32-core systems, respectively. The proposed directory based hybrid protocol shows potential to save bandwidth up to 37.50%, decrease memory latency up to 21.82%, and decrease cache miss ratio up to 11.88%.

18 “A Novel Directory Based Hybrid Cache Coherence Protocol for Shared Memory Multiprocessors”
Conclusions A hybrid cache coherence protocol (using snoopy and directory schemes) with improved sharer group mechanism for shared memory multiprocessors is proposed. Using VisualSim, we simulate the snoopy-based PWU, PWI, and the proposed protocols using numerous read/write requests on 8-, 16-, and 32-core multiprocessor systems. Between PWU and PWI, experimental results support the fact that the bandwidth requirement is higher for the PWU scheme; but the average memory latency and overall cache miss ratio is lower for the PWU. Results suggest that the bandwidth requirement, average memory latency, and overall cache miss ratio of multiprocessors can be decreased by about 37%, 22%, and 12%, respectively, using the proposed directory based hybrid cache coherence protocol.

19 “A Novel Directory Based Hybrid Cache Coherence Protocol for Shared Memory Multiprocessors”
References [1] A.S. Tanenbaum and T. Austin, “Structured Computer Organization,” Pearson, sixth edition, 2012. [2] A. Asaduzzaman, “Cache Optimization for Real-Time Embedded Systems,” PhD Dissertation, Florida Atlantic University, Dec [3] P. Gepner and M. F. Kowalik, “Multi-core processors: New way to achieve high system performance,” Proceedings of the International Conference on Parallel Computing in Electrical Engineering (PARELEC 06), pp. 9-13, Sept [4] I. Tartdja and V. Milutinovic, “A Survey of Software Solutions for Maintenance of Cache Consistency in Shared Memory Multiprocessors,” IEEE/ACM, Hawaii lnternational Conference on System Sciences, pp , January 1995. [5] T. Rolf, “Cache organization and memory management of the Intel Nehalem computer architecture,” University of Utah Computer Engineering, 2009. [6] M.M.K. Martin, M.D. Hill, and D.J. Sorin, “Why On-Chip Cache Coherence is Here to Stay,” Duke University Department of ECE Technical Report TR , August 2011. [7] H.F. Jordan, “Shared Versus Distributed Memory Multiprocessors,” Technical Report 91-7, Institute for Computer Applications in Science and Engineering (ICASE), January 1991. [8] A. Asaduzzaman, K.K. Chidella, and M. Moniruzzaman, “Efficient Cache Locking at Private First-Level Caches and Shared Last-Level Cache for Modern Multicore Systems,” IJETAE, Vol. 5, No. 1, 2015. [9] D.J. Lilja, “Cache coherence in large-scale shared-memory multiprocessors: issues and comparisons,” ACM Computing Surveys (CSUR), Vol. 25, No. 3, pp , Sept [10] R. Lawrence, “A Survey of Cache Coherence Mechanisms in Shared Memory Multiprocessors,” University of Manitoba, May 1998. [11] M. Tomasevic and V. Milutinovic, “A survey of hardware solutions for maintenance of cache coherence in shared memory multiprocessors,” Hawaii International Conference on System Sciences, Vol. 1, pp , August 1993.

20 IEEE PAST 2016 at Waltham, Massachusetts
“A Novel Directory Based Hybrid Cache Coherence Protocol for Shared Memory Multiprocessors” QUESTIONS? Contact: Abu Asaduzzaman Phone: CAPPLab: Thank You!


Download ppt "Waltham, Massachusetts, USA Wichita State University (WSU), USA"

Similar presentations


Ads by Google