Download presentation
Presentation is loading. Please wait.
Published byLee McBride Modified over 9 years ago
1
A Flexible Interleaved Memory Design for Generalized Low Conflict Memory Access Laurence S.Kaplan BBN Advanced Computers Inc. Cambridge,MA Distributed Memory Computing Conference, 1991. Proceedings., The Sixth, on Pages: 637 - 644 元智大學 系統實驗室 楊登傑 1999/11/24
2
outline Introduction Interleaved memory system design Definitions Interleaving Implementation Performance Conclusions
3
Abstract High bandwidth delivery of data to the processor(s) is critical for good performance in parallel computer systems. To increase memory throughput, many systems make use of interleaved parallel memory banks. This paper proposes an implementation for an interleaved system that exhibits low contention for memory banks during virtually all patterned accesses. A variant of this design is currently in use on BBN TC2000 parallel computer.
4
Introduction In parallel processing architectures, it allows the different processing elements to service concurrent requests simultaneously, without contention at the memory banks of the system. Physical addresses in such systems are interpreted in a way that spreads the references across these memory banks. A simple interleaving system treats a physical addresses as a binary 2-tuple (target memory bank,byte offset in bank).
5
Introduction (cont.) The width of the target memory bank field,l,determines the maximum number of memory banks that can be interleaved,M= The width of the byte offset field determines the size of each memory bank, b. A stride-access in parallel processor: –This is defined as N processors, each attempting to fetch a distinct item simultaneously, where the items are separated by a stride S and start at base address a.
6
Interleaved memory system design A new method permute addresses that reference interleaved memory. It supports dynamic configuration of the memory banks being interleaved.
7
Definitions A clump is the basic unit of interleaving. It consists of one or more bytes of storage. A gallery refers to the set of page frames. A stripe is a sequence of memory banks starting with a specific bank. The clump numbers for a given stripe are used to index into this sequence. This interleaving approach starts by dividing up the physical address to be permuted into a binary 4-tuple(stripe,gallery,clump,byte).
8
Definitions(cont.) An interleaved page consists of the entire set of clumps for specific values of stripe and gallery. If there are c bits in the clump field, then there are clumps in an interleaved page. With this method, interleaved pages are distinguished by their stripe number. This number determines which element in the target memory bank sequence the first clump in the page indexes to.
9
Definitions(cont.) This interleaving method requires that the clump field of the physical address be at least as wide as the strip field. The strip field width in bits,w,is set according to the maximum number of memory banks. The width of byte field determines the size of a clump.
10
Definitions (cont.)
11
Interleaving The interleaving method takes the physical addresses corresponding to a gallery of page frames and permutes them to yield N interleaved pages. The transformation involves using clump to index into the target memory bank sequence that starts with a memory bank specified by stripe. This is accomplished using the sum of stripe and clump to address a lookup take of size 2^(w+1)by w bits. This lookup produces the target memory bank number that replaces the stripe portion of the address.
12
Interleaving (cont.) Table 1:The clump is the row index of the table,stripe is the column index, and the target memory bank numbers are filled into the table. The address in the Modulus RAM used to store the target memory value is calculated by adding clump to stripe. Table 2:The target memory bank is the column index,clump is the row index, and stripe value are filled into the table. Here,stripe select different staring points in sequence that clump then indexes into.
13
Interleaving (cont.) This picture shows how the pages within a gallery are distributed across the memory banks by the clumps within a page and the stripe value of the page.
14
Implementation A variant of this interleaver design has been implemented on the BBN TC2000 MIMD parallel processor. This computer is a distributed memory machine where each processor has local memory that is also part of the system’s globally addressable physical memory.
15
Implementation (cont.) The hardware automatically forwards memory requests to the appropriate memory bank(remote or local). These RAMs can be dynamically loaded by the local processor.
16
Performance A performance metric is useful to show the success of this method at interleaving regular patterned accesses. Measurements were taken by simulating the interleaving method and measuring the non- uniformity for a stride access for each stride s with 10,000 different starting addresses a. There results can be interpreted to mean that no memory banks are ever referenced more than once for any of the stride accesses simulated.
17
Performance (cont.) The method proposed in this paper performs much better than the randomized non-uniformity for all of the strides.
18
Future work More work needs to be done regarding the ordering of requests and contention within the interconnection network. These are topics of varying importance,depending on the type of processor architecture and this type of interconnection network using this interleaving approach.
19
Conclusions This paper has proposed a highly effective and flexible method for reducing memory conflicts during virtually all stride accesses. This method is applicable to a wide range of architectures desiring low conflict parallel memory access.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.