Download presentation
Presentation is loading. Please wait.
1
The Memory/Logic Interface in FPGA’s with Large Embedded Memory Arrays The Memory/Logic Interface in FPGA’s with Large Embedded Memory Arrays Steven J. E. Wilton, Member, IEEE, Jonathan Rose, Member, IEEE, and Zvonko G. Vranesic, Senior Member, IEEE Laboratory of Reliable Computing Department of Electrical Engineering National Tsing Hua University Hsinchu, Taiwan
2
Reference S. J. E. Wilton, “Architectures and algorithms for field-programmable gate arrays with embedded memory,” Ph.D. dissertation, Dept. Elect. Comput. Eng., Univ. Toronto, Toronto, Ont., Canada, 1997.
3
Outline Introduction Baseline architecture Experiment methodology and result Enhanced architecture and its improvement
4
Introduction In the past, FPGA’s have been primarily used to implement small logic subcircuits As the capacities of FPGA’s grow, they will be use to implement much larger circuits than ever before In order to address the storage requirement of large system, FPGA with large embedded memory arrays are now developed by many vendors
5
Introduction One of the challenges when embedding memory arrays into FPGA is to provide enough interconnect between memory arrays and logic resources
6
Baseline Architecture
7
Memory/Logic Interconnect Block
8
Benchmark Circuit Generation Need to generate benchmark circuit for the architecture because Typical circuits have only a few memories each To gather hundreds of those is not feasible The solution is to study the types of memory configuration found in systems, and develop a stochastic memory configuration generator Make sure they are realistic by some circuit analysis
9
Circuit Analysis Memory configuration Logic memory clustering Interconnect patterns Point to point patterns Shared-connection patterns Point to point with no shuffling patterns
10
Memory Configurations 171 circuits with total of 268 user memories, they are from Recent conference proceeding Recent journal articles Local designer Customer study conducted by Atera
11
Memory Configurations
12
Logic Memory Clustering
13
Interconnect Patterns
14
Stochastic Circuit Generation A stochastic circuit generator is developed using the statistics gathered during circuit analysis The steps of generating a benchmark circuit Choosing logical memory configuration Division logical memories into cluster Choosing interconnect pattern for each cluster Choosing number of data-in data-out subcircuits for the clusters Generate logic subcircuits and connect them to memory arrays
15
Implementation Tool Each benchmark circuit generated is “implemented” in each FPGA Logical to physical mapping Placement Place memory and logic blocks simultaneously Routing Initially nets to memory have higher priority Between each iteration the nets are reordered Repeat 10 times Increase W Determine the minimum value of W
16
Memory/Logic Flexibility Result
18
Area Result The area of the FPGA is the sum of Logic blocks Memory blocks Routing resources Programmable switch Programming bits Metal routing segments
19
Area Result
20
Delay Result A delayed model is used to measure the memory read time of all memories in the circuit CACTI: to estimate array access time Elmore: address in and data out
21
Delay Result
22
Issues Issues Nets connect more than one memory block to one or more than one logic block When combining the small memory arrays to implement a large one When data in pins of several user memories are driven by a common data bus Such nets often appear but unfortunately they are hard to route, especially for larger architecture We can use higher value of Fm for larger architecture or?
23
Further Investigation
24
Enhanced Architecture The above motivates them to study memory to memory connection more closely An enhanced architecture Adding extra switches between memory arrays to support these nets Result Extra switches take up negligible area Improvement in both speed and routability
25
Enhanced Architecture
26
Baseline Architecture
27
Enhanced Architecture
28
Evaluation of Enhanced Architecture Maze routing algorithm must be restricted such that it uses memory-to-memory switches only to implement memory-to-memory connection If the maze router is not modified…
29
Routing Result Using Standard Maze
31
Modified Maze Even though some tracks will be wasted if a circuit contains no or few memory-to-memory connections, it alleviates the problem above
32
Area Result
34
Delay Result
35
Conclusion Even with this relatively unaggressive use of the memory-to-memory switches, area is improved somewhat and speed is improved significantly The development of algorithms that use these tracks more aggressively is left as future work The enhanced architecture reduces the channel width by 0.5~1 tracks, and improved the speed by 25%
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.