Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 13: Direct Memory Access

Similar presentations

Presentation on theme: "Chapter 13: Direct Memory Access"— Presentation transcript:

1 Chapter 13: Direct Memory Access
“…DMA…provides direct access to the memory while the microprocessor is temporarily disabled.” Typical uses of DMA Video displays for refreshing the screen Hard disk reads and writes High-speed memory-to-memory transfers Timing behavior Shown in Fig. 13-1 HOLD  HLDA Microprocessor suspends execution of its program and places its address, data, and control bus into high-impendence (Z) states

2 Basic DMA Definitions DMA normally occurs between an I/O device and memory without the use of the CPU DMA read Transfers data from the memory to the I/O device DMA write Transfers data from an I/O device to memory DMAC controls both memory and I/O device simultaneously

3 7. DMA sends INTR to CPU to inform completion of DMA.
ADDR DATA MWTC MRDC CPU IOWC IORC (Bus Request) (DMA Request) Hold DMAC I/O Memory (Bus Grant) (DMA Grant) HoldA 1. CPU sends information for data transfer to DMAC chip (initialization) 2. DMA request from I/O 3. Bus request from DMAC 4. Bus grant from CPU 5. DMA grant from DMAC 6. Data transfer 7. DMA sends INTR to CPU to inform completion of DMA.

4 1. CPU sends information for data transfer to DMAC chip
ADDR DATA MWTC MRDC CPU IOWC IORC (Bus Request) (DMA Request) Hold DMAC I/O Memory (Bus Grant) (DMA Grant) HoldA 1. CPU sends information for data transfer to DMAC chip 2. DMA request from I/O (via DREQs, e.g., 4 channels in 8237) 3. Bus request from DMAC 4. Bus grant from CPU 5. DMA grant from DMAC 6. Data transfer 7. DMA sends INTR to CPU to inform completion of DMA.

5 1. CPU sends information for data transfer to DMAC chip
ADDR DATA MWTC MRDC CPU IOWC IORC (Bus Request) (DMA Request) Hold DMAC I/O Memory (Bus Grant) (DMA Grant) HoldA 1. CPU sends information for data transfer to DMAC chip 2. DMA request from I/O 3. Bus request from DMAC (HRQ, hold request in 8237) 4. Bus grant from CPU 5. DMA grant from DMAC 6. Data transfer 7. DMA sends INTR to CPU to inform completion of DMA.

6 1. CPU sends information for data transfer to DMAC chip
ADDR DATA MWTC MRDC CPU IOWC IORC (Bus Request) (DMA Request) Hold DMAC I/O Memory (Bus Grant) (DMA Grant) HoldA 1. CPU sends information for data transfer to DMAC chip 2. DMA request from I/O 3. Bus request from DMAC 4. Bus grant from CPU (setting all the bus outputs of processor to Z) 5. DMA grant from DMAC 6. Data transfer 7. DMA sends INTR to CPU to inform completion of DMA.

7 1. CPU sends information for data transfer to DMAC chip
ADDR DATA MWTC MRDC CPU IOWC IORC (Bus Request) (DMA Request) Hold DMAC I/O Memory (Bus Grant) (DMA Grant) HoldA 1. CPU sends information for data transfer to DMAC chip 2. DMA request from I/O 3. Bus request from DMAC 4. Bus grant from CPU 5. DMA grant from DMAC (via DACKs) 6. Data transfer 7. DMA sends INTR to CPU to inform completion of DMA.

8 1. CPU sends information for data transfer to DMAC chip
ADDR DATA MWTC MRDC CPU IOWC IORC (Bus Request) (DMA Request) Hold DMAC I/O Memory (Bus Grant) (DMA Grant) HoldA 1. CPU sends information for data transfer to DMAC chip 2. DMA request from I/O 3. Bus request from DMAC 4. Bus grant from CPU 5. DMA grant from DMAC 6. Data transfer (if DRAM read) // MRDC & IOWC signals are controlled 7. DMA sends INTR to CPU to inform completion of DMA.

9 1. CPU sends information for data transfer to DMAC chip
ADDR DATA MWTC MRDC CPU IOWC IORC (Bus Request) (DMA Request) Hold DMAC I/O Memory (Bus Grant) (DMA Grant) HoldA 1. CPU sends information for data transfer to DMAC chip 2. DMA request from I/O 3. Bus request from DMAC 4. Bus grant from CPU 5. DMA grant from DMAC 6. Data transfer (if DMA write) // MWTC & IORC signals are controlled 7. DMA sends INTR to CPU to inform completion of DMA.

10 1. CPU sends information for data transfer to DMAC chip
ADDR DATA MWTC MRDC CPU IOWC IORC (Bus Request) (DMA Request) Hold DMAC I/O Memory (Bus Grant) (DMA Grant) HoldA 1. CPU sends information for data transfer to DMAC chip 2. DMA request from I/O 3. Bus request from DMAC 4. Bus grant from CPU 5. DMA grant from DMAC 6. Data transfer 7. DMA sends INTR to CPU to inform completion of DMA

11 DMA Operation Initiation
CPU sends information about required data transfer operation to the DMAC chip Source device/address, destination device/address, data block size, type of data transfer (demand, single, block), etc. Uses OUT assembly instructions to send DMAC chip this information DMAC chip requests a DMA to the CPU by asserting the HOLD line (via its HRQ) CPU acknowledges request by asserting HLDA Request priority in the microprocessor Reset > Hold > Interrupt

12 Three Types of DMA Mode Demand mode Single mode Block mode
transfers data until DREQ becomes inactive Single mode releases HOLD after each byte of data is transferred If DREQ is active, DMAC requests a DMA transfer to microprocessor Block mode automatically transfers the number of bytes indicated by the count register for the channel

13 Advanced Topics Lecture Practice Cache (5/31)
DRAM (already touched in Chapter 10) Flash memory-based storage (6/14) Practice Introduction to RTL design in Verilog (6/2, LG105) Two practices (6/7 and 6/9, LG114) Note: the two practices are run in the same manner as the normal practices. 1st and 2nd sessions (3:20pm~4:00pm, and 4:00pm~4:40pm)

14 Processor-DRAM Gap (latency)
[Source: K. Asanovic, 2008] Processor-DRAM Gap (latency) µProc 60%/year 1000 CPU “Moore’s Law” Processor-Memory Performance Gap: (grows 50% / year) 100 Performance 10 DRAM 7%/year Why doesn’t DRAM get faster? DRAM 1 1988 1980 1981 1982 1983 1984 1985 1986 1987 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Time Four-issue 2GHz superscalar accessing 100ns DRAM could execute 800 instructions during time for one memory access! CS252 S05

15 [Source: J. Kubiatowicz, 2000]
What is a cache? Small, fast storage used to improve average access time to slow memory. Exploits spatial and temporal locality In computer architecture, almost everything is a cache! Registers a cache on variables First-level cache a cache on second-level cache Second-level cache a cache on memory Memory a cache on disk (virtual memory) TLB a cache on page table Branch-prediction a cache on prediction information? Proc/Regs L1-Cache Bigger Faster L2-Cache Memory Disk, Tape, etc.

16 Typical Memory Reference Patterns
[Source: K. Asanovic, 2008] Typical Memory Reference Patterns Address n loop iterations Instruction fetches Temporal locality Spatial locality Temporal & Spatial locality subroutine call subroutine return Stack accesses argument access vector access Data accesses scalar accesses Time CS252 S05

17 Memory Reference Patterns
[Source: K. Asanovic, 2008] Temporal Locality Memory Address (one dot per access) Spatial Locality Donald J. Hatfield, Jeanette Gerald: Program Restructuring for Virtual Memory. IBM Systems Journal 10(3): (1971) Time CS252 S05

18 A Typical Memory Hierarchy c.2008
[Source: K. Asanovic, 2008] A Typical Memory Hierarchy c.2008 Split instruction & data primary caches (on-chip SRAM) Multiple interleaved memory banks (off-chip DRAM) L1 Instruction Cache Unified L2 Cache Memory CPU Memory Memory L1 Data Cache RF Memory Implementation close to the CPU looks like a Harvard machine…(jse) Multiported register file (part of CPU) Large unified secondary cache (on-chip SRAM) CS252 S05

19 Itanium-2 On-Chip Caches (Intel/HP, 2002)
[Source: K. Asanovic, 2008] Itanium-2 On-Chip Caches (Intel/HP, 2002) Level 1, 16KB, 4-way s.a., 64B line, quad-port (2 load+2 store), single cycle latency Level 2, 256KB, 4-way s.a, 128B line, quad-port (4 load or 4 store), five cycle latency Level 3, 3MB, 12-way s.a., 128B line, single 32B port, twelve cycle latency If two is good, then three must be better (jse) L3 and L2 caches occupy more than 2/3 of total area! CS252 S05

20 Workstation Memory System (Apple PowerMac G5, 2003)
[Source: K. Asanovic, 2008] Workstation Memory System (Apple PowerMac G5, 2003) Dual 2GHz processors, each has: 64KB I-cache, direct mapped 32KB D-cache, 2-way 512KB L2 unified cache, 8-way All 128B lines 1GHz, 2x32-bit bus, 16GB/s AGP Graphics Card, 533MHz, 32-bit bus, 2.1GB/s North Bridge Chip Up to 8GB DDR SDRAM, 400MHz, 128-bit bus, 6.4GB/s Apple Power Mac G5 block diagram from Apple white paper on G5 technology. PCI-X Expansion, 133MHz, 64-bit bus, 1 GB/s CS252 S05

21 Cache Policies Inclusion Placement Replacement

22 Inclusion Policy Inclusive multilevel cache:
[Source: K. Asanovic, 2008] Inclusion Policy Inclusive multilevel cache: Inner cache holds copies of data in outer cache External access need only check outer cache Most common case Exclusive multilevel caches: Inner cache may hold data not in outer cache Swap lines between inner/outer caches on miss Used in AMD Athlon with 64KB primary and 256KB secondary cache Why choose one type or the other? Cache size matters. In general, if L2 size >> L1 size, then inclusion policy New slide (jse) CS252 S05

23 Types of Cache Miss “Three Cs” 1st C: Compulsory Misses
[Source: Garcia, 2008] Types of Cache Miss “Three Cs” 1st C: Compulsory Misses Happen when warming up the cache 2nd C: Conflict Misses E.g., two addresses are mapped to the same cache line Solution: increase associativity 3rd C: Capacity Misses E.g., sequential access of 40KB data via 32KB data cache

24 Placement Policy Memory Cache Fully (2-way) Set Direct
[Source: K. Asanovic, 2008] 3 3 0 1 Block Number Memory Conflict miss! Set Number Cache Simplest scheme is to extract bits from ‘block number’ to determine ‘set’ (jse) More sophisticated schemes will hash the block number ---- why could that be good/bad? Fully (2-way) Set Direct Associative Associative Mapped anywhere anywhere in only into set block 4 (12 mod 4) (12 mod 8) block 12 can be placed CS252 S05

25 Direct-Mapped Cache Tag Index t k b V Tag Data Block 2k lines t = HIT
[Source: K. Asanovic, 2008] Tag Index Block Offset t k b V Tag Data Block 2k lines t = HIT Data Word or Byte CS252 S05

26 Placement Policy Memory Cache Fully (2-way) Set Direct
[Source: K. Asanovic, 2008] 3 3 0 1 Block Number Memory Conflict miss! Set Number Cache Simplest scheme is to extract bits from ‘block number’ to determine ‘set’ (jse) More sophisticated schemes will hash the block number ---- why could that be good/bad? Fully (2-way) Set Direct Associative Associative Mapped anywhere anywhere in only into set block 4 (12 mod 4) (12 mod 8) block 12 can be placed CS252 S05

27 2-Way Set-Associative Cache
[Source: K. Asanovic, 2008] 2-Way Set-Associative Cache Tag Index Block Offset b t k V Tag Data Block V Tag Data Block Set t Compare latency to direct mapped case? (jse) Data Word or Byte = = HIT CS252 S05

28 4-Way Set Associative Cache Circuit
[Source: Garcia, 2008] 4-Way Set Associative Cache Circuit tag index Mux is time consuming!

29 Fully Associative Cache
[Source: K. Asanovic, 2008] V Tag Data Block t = Tag t = HIT Block Offset Data Word or Byte = b CS252 S05

30 Fully Associative Cache
[Source: Garcia, 2008] Fully Associative Cache Benefit of Fully Assoc Cache No Conflict Misses (since data can go anywhere) Drawbacks of Fully Assoc Cache Need hardware comparator for every single entry If we have a 64KB of data in cache with 4B entries, we need 16K comparators and 16K input MUX Infeasible for large size caches However, used for small size (e.g., 128 entry) caches, e.g., TLB

31 Replacement only happens on misses
Replacement Policy [Source: K. Asanovic, 2008] In an associative cache, which block from a set should be evicted when the set becomes full? Random used in highly (fully) associative caches, e.g., TLB Least Recently Used (LRU) LRU cache state must be updated on every access true implementation only feasible for small sets (2-way) pseudo-LRU binary tree often used for 4-8 way First In, First Out (FIFO) a.k.a. Round-Robin used in highly associative caches Other options, e.g., recent frequently used, etc. This is a second-order effect. Why? NLRU used in Alpha TLBs (jse) Replacement only happens on misses CS252 S05

32 3Cs Absolute Miss Rate (SPEC92)
[Source: J. Kubiatowicz, 2000] 3Cs Absolute Miss Rate (SPEC92) Conflict Compulsory vanishingly small

33 [Source: J. Kubiatowicz, 2000]
2:1 Cache Rule miss rate 1-way associative cache size X = miss rate 2-way associative cache size X/2 Conflict

34 [Source: A. Hartstein, 2006] Rule If the workload is large, the cache miss rate is observed to decrease as a power law of the cache size If the cache size is doubled, the miss rate drops by the factor of

Download ppt "Chapter 13: Direct Memory Access"

Similar presentations

Ads by Google