Presentation is loading. Please wait.

Presentation is loading. Please wait.

Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

Similar presentations


Presentation on theme: "Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010."— Presentation transcript:

1 Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010

2 2 Parallelism in FPGAs Larger SoCs on FPGAs → Parallel Systems Parallel systems on FPGAs will need:  Queueing  Data sharing  Communication  Synchronization Boils down to:  FIFOs  Register files We can do all these with multi-ported memories

3 3 Multi-Ported Memory X X X X Existing workarounds are ad-hoc, “roll-your-own”, and have limited parallelism.

4 4 Conventional Approaches

5 5 2W/2R Multi-Ported Memory Doesn't exist on FPGAs Altera used to have one (Mercury)‏

6 6 Stratix III Building Blocks M9K (eg: 32 x 256)‏ M144K (eg: 32 x 4098)‏ Adaptive Logic Modules Registers LUTs Adders Block RAMs Flexible, but slow Fast, but inflexible

7 7 2W/2R Pure-ALM Scales very poorly with memory depth

8 8 1W/nR Replication Multiple read ports Only one write port

9 9 mW/nR Banking Multiple write ports Fragmented data

10 10 mW/nR “Multipumping” Multiple read/write ports No fragmentation Divides clock speed Read/write ordering

11 11 Block RAMs: Simple Dual Port Read Write

12 12 Block RAMs: True Dual Port R / W

13 13 “Pure Multipumping” Read as banked memory (multiple reads)‏

14 14 “Pure Multipumping” Write as replicated memory (avoids fragmentation)‏

15 15 Methodology Generate design variations over space  Vary # of ports, depth, type of memories 1W/2R to 8W/16R 2 to 256 elements deep Pure-ALM, M9K, MLAB, Multipumped  Wrap in testbench for timing and correctness Target Quartus 9.0 to Stratix III  No synthesis optimizations for speed or area  Standard P&R effort (speed, avg. over 10 runs) Measure area as Total Equivalent Area  Expresses area in a single unit (ALMs)‏

16 16 Conventional Multi-Porting Performance

17 17 1W/2R Pure-ALM Area vs. Speed NiosII/f 290 MHz 500 ALMs Smaller Faster Too big and slow!

18 18 1W/2R Replicated vs. Pure-ALM

19 19 1W/2R “Pure Multipumping”

20 20 LVT-Based Multi-Ported Memories

21 21 LVT-Based Memory

22 22 LVT-Based Memory Begin with one block RAM

23 23 LVT-Based Memory Replicate for two read ports

24 24 LVT-Based Memory Bank for two write ports

25 25 LVT-Based Memory Select bank to read from

26 26 LVT-Based Memory Add bank lookup table

27 27 LVT-Based Memory

28 28 Live Value Table Operation

29 29 LVT Operation 2W/2R, 4-deep

30 30 W0W0 LVT Operation W0W0 W1W1 R0R0 R1R1 Live Value Table Write Addresses Read Addresses 0 1 2 3

31 31 W0W0 LVT Operation: Write W0W0 W1W1 R0R0 R1R1 42 @ 1 23 @ 3 0 Records which write port last updated a location 0 1 2 3 1

32 32 W0W0 LVT Operation: Read W0W0 W1W1 R0R0 R1R1 0 @ 1 @ 3 1 0 1 Steers read port to correct memory bank 0 1 2 3

33 33 LVT Implementation LVT remains practical because it is very narrow

34 34 LVT Operation Small Pure-ALM memorycontrolling larger block RAMs

35 35 Advantages of LVTs LVTs add a layer of indirection  Everything operates in parallel  Makes banked memory behave as consistent unit LVTs are narrow  Word width = log 2 (# of write ports) < 4 bits typically  Pure-ALM, but practical size and speed

36 36 LVT Performance

37 37 2W/4R Pure-ALM

38 38 2W/4R LVT-based vs. Pure-ALM 84% smaller 43% faster 412 MHz to 375 MHz

39 39 2W/4R Multipumping Must be careful about read/write ordering!

40 40 Multipumping Performance

41 41 2W/4R Multipumping

42 42 2W/4R Multipumping Pure Multipumping (279 MHz)‏

43 43 4W/8R Multipumping Worsens as # of ports increases

44 44 2W/4R Multipumping 28% smaller on average 193 MHz to 174 MHz 54% slower on average

45 45 Conclusions Pure multipumped memories are better for memories with few ports or low speed. LVT-based memories are faster and smaller than Pure-ALM memories. LVT-based memories are faster than pure multipumping, but at a cost in area.

46 46 Future Work Pure multipumping for LVT-based memories  Build banks with 2W/4R pure multipumping blocks  Possible further area improvement Relaxing the read/write order for multipumping  Allows multiplexing the write ports  Leaves designer to watch for WAR violations

47 47 Thank You


Download ppt "Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010."

Similar presentations


Ads by Google