Download presentation
Presentation is loading. Please wait.
Published byEdgar Powell Modified over 9 years ago
1
FPGA-Based System Design: Chapter 7 Copyright 2004 Prentice Hall PTR Topics n Hardware/software co-design.
2
FPGA-Based System Design: Chapter 7 Copyright 2004 Prentice Hall PTR Why put CPUs on FPGAs? n Shrink a board to a chip. n What CPUs do best: –Irregular code. –Code that takes advantage of a highly optimized datapath. n What FPGAs do best: –Data-oriented computations. –Computations with local control.
3
FPGA-Based System Design: Chapter 7 Copyright 2004 Prentice Hall PTR System design n True concurrency increases system performance. –CPU and accelerator should run in parallel. n CPU cost is a non-linear function of performance. –Accelerator will be smaller, faster, lower power.
4
FPGA-Based System Design: Chapter 7 Copyright 2004 Prentice Hall PTR Hardware/software partitioning CPU accelerator if (foo < 8) { for (i=0; i<N; i++) x[i] = y[i]*z[i]; }
5
FPGA-Based System Design: Chapter 7 Copyright 2004 Prentice Hall PTR Methodology n Measure the application. n Identify what to put onto the accelerator. n Build interfaces.
6
FPGA-Based System Design: Chapter 7 Copyright 2004 Prentice Hall PTR Concurrency n Concurrent applications provide the most speedup. CPU accelerator if (a > b)... x[i] = y[i] * z[i] No data dependencies
7
FPGA-Based System Design: Chapter 7 Copyright 2004 Prentice Hall PTR Concurrency analysis n Data dependencies. z= x * y; w = z - v; n Control dependencies. if (a < b) u = r + s;
8
FPGA-Based System Design: Chapter 7 Copyright 2004 Prentice Hall PTR Process 2 Process 3 Process 1 Partitioning n Can divide the application into several processes that run concurrently. n Process partitioning exposes opportunities for parallelism. if (i>b) … for (i=0; i<N; i++) … for (j=0; j<N; j++)...
9
FPGA-Based System Design: Chapter 7 Copyright 2004 Prentice Hall PTR Partitioning programs n Reasonable partitioning points: –If statements,etc. –Loop nests.
10
FPGA-Based System Design: Chapter 7 Copyright 2004 Prentice Hall PTR Multi-threaded systems n Single thread: n Multi-thread:
11
FPGA-Based System Design: Chapter 7 Copyright 2004 Prentice Hall PTR Performance analysis n Single threaded: –Find longest possible execution path. n Multi-threaded with no synchronization: –Find the longest of several execution paths. n Multi-threaded with synchronization: –Find the worst-case synchronization conditions.
12
FPGA-Based System Design: Chapter 7 Copyright 2004 Prentice Hall PTR Multi-threaded performance analysis n Synchronization causes the delay along one path to affect the delay along another. synchronization point tata tbtb tctc tdtd Delay = max(t a, t b ) + t d
13
FPGA-Based System Design: Chapter 7 Copyright 2004 Prentice Hall PTR Control n Need to signal between CPU and accelerator. –Data ready. –Complete. n Implementations: –Shared memory. –Handshake.
14
FPGA-Based System Design: Chapter 7 Copyright 2004 Prentice Hall PTR Keeping the accelerator fed n Must get data in, must get data out. n Data transfer costs: –flush CPU cache; –device driver; –bus transactions.
15
FPGA-Based System Design: Chapter 7 Copyright 2004 Prentice Hall PTR Memory buffers n Must keep accelerator fed. –Buffer size in accelerator depends on amount of data needed at a time, delays in obtaining needed values. n Streaming generally requires small buffers: –x[i] = y[i] * z[i]; n Values with long lifetimes need more buffer space.
16
FPGA-Based System Design: Chapter 7 Copyright 2004 Prentice Hall PTR Allocation n How do we decide what goes on the CPU, what goes on the FPGA? n Allocation puts functions on the CPU or FPGA.
17
FPGA-Based System Design: Chapter 7 Copyright 2004 Prentice Hall PTR Speedup n Speedup for one iteration: –t HW - t SW - t I - t O n May be able to set up many iterations at once: –N*(t HW - t SW ) - t I - t O
18
FPGA-Based System Design: Chapter 7 Copyright 2004 Prentice Hall PTR Drivers n Need interface between CPU and accelerator: –transfer data values; –start, stop computation. n If computation time is very predictable, a simpler communication scheme may be possible.
19
FPGA-Based System Design: Chapter 7 Copyright 2004 Prentice Hall PTR Debugging n Hard to test a CPU/accelerator system: –Hard to control and observe the accelerator without the CPU. –Software on CPU may have bugs. n Build separate test benches for CPU code, accelerator. n Test integrated system after components have been tested.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.