Presentation is loading. Please wait.

Presentation is loading. Please wait.

Prof. Zhang Gang School of Computer Sci. & Tech.

Similar presentations


Presentation on theme: "Prof. Zhang Gang School of Computer Sci. & Tech."— Presentation transcript:

1 Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Topic 9 Memory Banks
Prof. Zhang Gang School of Computer Sci. & Tech. Tianjin University, Tianjin, P. R. China

2 Memory Banks A memory bank is built for timely and logical data access
(1)Each logical unit of storage is arranged into a consecutive configuration so that all data can be accessed quickly (2)Interleaved memory is another format for a memory bank It allows for data to be accessed even faster by putting specific components of memory in the same place across a series of chips Data can be retrieved across parallel strips instead of being indexed all on one chip

3 Memory Banks The behavior of the load/store vector unit is significantly more complicated than that of the arithmetic functional units The start-up time for a load is the time to get the first word from memory into a register If the rest of the vector can be supplied without stalling, then the vector initiation rate is equal to the rate at which new words are fetched or stored The initiation rate may not necessarily be one clock cycle because memory bank stalls can reduce effective throughput

4 Memory Banks Typically, penalties for start-ups on load/store units are higher than those for arithmetic units over 100 clock cycles on many processors For VMIPS we assume a start-up time of 12 clock cycles, the same as the Cray-1 To maintain an initiation rate of one word fetched or stored per clock, the memory system must be capable of producing or accepting this much data

5 Memory Banks Memory system must be designed to support high bandwidth for vector loads and stores Spread accesses across multiple banks Control bank addresses independently Load or store non sequential words Support multiple vector processors sharing the same memory

6 Example of Memory Banks
The largest configuration of a Cray T90 (Cray T932) has 32 processors each generating 4 loads and 2 stores/cycle Processor cycle time is 2.167ns SRAM cycle time is 15ns How many memory banks needed? 32x6=192 accesses, 15/2.167≈7 processor cycles 1344!

7 Exercises What is a memory bank? What is an interleaved memory?
What is the meaning of start-up time for a load? Why does a memory system must to support high bandwidth for vector loads and stores?


Download ppt "Prof. Zhang Gang School of Computer Sci. & Tech."

Similar presentations


Ads by Google