Download presentation
Presentation is loading. Please wait.
Published byArabella Wright Modified over 6 years ago
1
Configuration and Programming of Heterogeneous Multiprocessors on a Multi-FPGA System Using TMD-MPI
by Manuel Saldaña, Daniel Nunes, Emanuel Ramalho, and Paul Chow University of Toronto Department of Electrical and Computer Engineering 3rd International Conference on ReConFigurable Computing and FPGAs (ReConFig06) San Luis Potosi, Mexico September, 2006
2
Agenda Motivation Background New Developments Example Application
TMD-MPI Classes of HPC Design Flow New Developments Example Application Heterogeneity test Scalability test Conclusions 11/14/2018 Manuel Saldaña
3
Motivation How Do We Program This? 64-MicroBlaze MPSoC
(Ring,2D-Mesh) topologies XC4VLX Not the largest one! 11/14/2018 Manuel Saldaña
4
Motivation How Do We Program This? Network
512-MicroBlaze Multiprocessor System 11/14/2018 Manuel Saldaña
5
Background: Classes of HPC Machines
Class 1 Machines Supercomputers or clusters of workstations Interconnection Network 06/09/2006 Connections 2006
6
Background: Classes of HPC Machines
Class 1 Machines Supercomputers or clusters of workstations Interconnection Network Class 2 Machines Hybrid network of CPU and FPGA hardware FPGA acts as external co-processor to CPU Interconnection Network 06/09/2006 Connections 2006
7
Background: Classes of HPC Machines
Class 1 Machines Supercomputers or clusters of workstations Interconnection Network Class 2 Machines Hybrid network of CPU and FPGA hardware FPGA acts as external co-processor to CPU Interconnection Network Class 3 Machines FPGA-based multiprocessor Recent area of academic and industrial focus Interconnection Network 06/09/2006 Connections 2006
8
Background: MPSoC and MPI
MPSoC (Class 3) has many similarities to typical multiprocessor computers (Class 1), but also many special requirements Similar concepts but different implementations MPI for MPSoC is desirable (TIMA labs, OpenFPGA, Berkeley BEE2, U. of Queensland, U. Rey Juan Carlos, UofT TMD,...) MPI is a broad standard and designed for big machines MPI Implementations are too big for embedded systems 11/14/2018 Manuel Saldaña
9
Background: TMD-MPI MPSoC (TMD-MPI) Linux Cluster (MPICH)
Network mP Linux Cluster (MPICH) the same code… 11/14/2018 Manuel Saldaña
10
Background: TMD-MPI Use multiple chips to have massive resources
Network mP Network Use multiple chips to have massive resources Network mP Network mP Network TMD-MPI hides the complexity 11/14/2018 Manuel Saldaña
11
Background: TMD-MPI Implementation Layers TMD-MPI MPI_Barrier
MPI_Send/MPI_Recv csend/send fsl_cput / fsl_put (macros) put/get (assembly instructions) Application MPI Application Interface Point-to-Point MPI TMD-MPI Communication Functions Hardware Access Functions Hardware 11/14/2018 Manuel Saldaña
12
Background: TMD-MPI MPI Functions Implemented Point-to-Point MPI_Send
MPI_Recv MPI Functions Implemented Miscellaneous MPI_Init MPI_Finalize MPI_Comm_Rank MPI_Comm_Size MPI_Wtime Collective Operations MPI_Barrier MPI_Bcast MPI_Gather MPI_Reduce 11/14/2018 Manuel Saldaña
13
Background: Design Flow
Flexible Hardware-Software Co-design Flow Previous work: Patel et al.[1] (FCCM 2006) Saldaña et al.[2] (FPL 2006) ReConFig06 11/14/2018 Manuel Saldaña
14
New Developments TMD-MPI for MicroBlaze TMD-MPI for PowerPC405
TMD-MPE for Hardware engines 11/14/2018 Manuel Saldaña
15
New Developments: TMD-MPE and TMD-MPI light
Hardware Engine With Message-Passing 11/14/2018 Manuel Saldaña
16
New Developments TMD-MPE uses the Rendezvous message-passing protocol
11/14/2018 Manuel Saldaña
17
New Developments TMD-MPE includes:
message queues to keep track of unexpected messages packetizing/depacketizing logic to handle large messages top queue 11/14/2018 Manuel Saldaña
18
temperature distribution
Heterogeneity Test Heat Equation Application / Jacobi Iterations Observe the change of temperature distribution over time TMD-MPI TMD-MPE TMD-MPI 11/14/2018 Manuel Saldaña
19
Heterogeneity Test Heat Equation Application / Jacobi Iterations
TMD-MPI TMD-MPE TMD-MPI 11/14/2018 Manuel Saldaña
20
Heterogeneity Test MPSoC Heterogeneous Configurations
(9 Processing Elements, single FPGA) 11/14/2018 Manuel Saldaña
21
Heterogeneity Test Execution Time PPC405 Jacobi Engines MicroBlazes
11/14/2018 Manuel Saldaña
22
Scalability Test Heat Equation Application
5 FPGAS (XC2VP100) (7 mB + 2 PPC405 per FPGA) 45 Processing Elements (35 mB + 10 PPC405) 11/14/2018 Manuel Saldaña
23
Scalability Test Fixed-size Speedup up to 45 Processors 11/14/2018
Manuel Saldaña
24
UofT TMD Prototype 11/14/2018 Manuel Saldaña
25
Conclusions TMD-MPI and TMD-MPE enable the parallel programming of heterogeneous MPSoC across multiple FPGAs including hardware engines TMD-MPI hides the complexity of using heterogeneous links The Heat equation application code was executed in a Linux Cluster and in our multi-FPGA system with minimal changes TMD-MPI can be adapted to a particular architecture TMD prototype is a good platform for further research on MPSoC 11/14/2018 Manuel Saldaña
26
References [1] Arun Patel, Christopher Madill, Manuel Saldaña, Christopher Comis, Régis Pomès, and Paul Chow. A Scalable FPGA-based Multiprocessor. In IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’06), April 2006 [2] Manuel Saldaña and Paul Chow. TMD-MPI: An MPI Implementation for Multiple Processors across Multiple FPGAs. In IEEE International Conference on Field-Programmable Logic and Applications (FPL 2006), August 2006. 11/14/2018 Manuel Saldaña
27
Thank you! (¡Gracias!) 11/14/2018 Manuel Saldaña
28
Rendezvous Overhead Rendezvous Synchronization Overhead 11/14/2018
Manuel Saldaña
29
Testing the Functionality
TMD-MPIbench on-chip communication Internal RAM (BRAM) off-chip communication round-trip tests on-chip communication External RAM (DDR) off-chip communication 11/14/2018 Manuel Saldaña
30
TMD-MPI Implementation
TMD-MPI communication protocols 11/14/2018 Manuel Saldaña
31
Communication Tests TMD-MPIbench.c round trip bisection bandwidth
round trips with congestion worst case traffic scenario all-node broadcasts synchronization performance (barriers/sec) 11/14/2018 Manuel Saldaña
32
Communication Tests Latency: Testbed (internal link) 17 mS @ 40 MHz
Testbed (external link) 22 mS P3-NOW 100 Mb/s Ethernet 75 mS P4-Cluster 1000 Mb/s Gigabit Ethernet 92 mS 11/14/2018 Manuel Saldaña
33
Communication Tests MicroBlaze throughput limit with external RAM
11/14/2018 Manuel Saldaña
34
Communication Tests MicroBlaze throughput limit with internal RAM
Memory access time MicroBlaze throughput limit with external RAM 11/14/2018 Manuel Saldaña
35
Communication Tests Measured Bandwidth @ 40 MHz Startup Frequency
Overhead Frequency Measured Bandwidth @ 40 MHz P4-Cluster P3-NOW 11/14/2018 Manuel Saldaña
36
Communication Tests 11/14/2018 Manuel Saldaña
37
Many variables are involved…
11/14/2018 Manuel Saldaña
38
Background: TMD-MPI TMD-MPI provides a parallel programming model for MPSoC in FPGAs with the following features: Portability - application unaffected by changes in HW Flexibility - to move from generic to application-specific Scalability - for large scale applications Reusability - do not learn a new API for similar applications 11/14/2018 Manuel Saldaña
39
Testing the Functionality
Hardware Testbed 11/14/2018 Manuel Saldaña
40
Testing the Functionality
Hardware Testbed 11/14/2018 Manuel Saldaña
41
New Developments: TMD-MPE
TMD-MPE use and the network 11/14/2018 Manuel Saldaña
42
Background: TMD-MPI TMD-MPI
is a lightweight subset of the MPI standard is tailored to a particular application does not require an Operating System has a small memory footprint ~8.7KB uses a simple protocol 11/14/2018 Manuel Saldaña
43
New Developments: TMD-MPE and TMD-MPI light
11/14/2018 Manuel Saldaña
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.