Multiprocessor Systems Using FPGAs Presented By: Manuel Saldaña Connections 2006 The University of Toronto ECE Graduate Symposium Toronto, Ontario, Canada June 9 th, 2006
Connections /09/20062 Introduction –Multiprocessors in FPGAs can accelerate many computing tasks by up to 2 or 3 orders of magnitude –Massive parallelism can be achieved using multiple FPGAs –The requirements are: - a scalable network - an efficient programming model - a flexible design flow –TMD is a FPGA-based multiprocessor system tailored to perform Molecular Dynamics simulations
Connections /09/20063 Large-Scale Multiprocessor Systems Class 1 Machines –Supercomputers or clusters of workstations –~ interconnected CPUs Interconnection Network
Connections /09/20064 Class 1 Machines –Supercomputers or clusters of workstations –~ interconnected CPUs Class 2 Machines –Hybrid network of CPU and FPGA hardware –FPGA acts as external co-processor to CPU –Programming model still evolving Interconnection Network Large-Scale Multiprocessor Systems
Connections /09/20065 Class 1 Machines –Supercomputers or clusters of workstations –~ interconnected CPUs Class 2 Machines –Hybrid network of CPU and FPGA hardware –FPGA acts as external co-processor to CPU –Programming model still evolving Class 3 Machines –FPGA-based multiprocessor system –Recent area of academic and industrial focus Interconnection Network Large-Scale Multiprocessor Systems
Connections /09/20066 Tier 1: Intra-FPGA Communication –Point-to-Point FIFOs are used as communication channels –Application-specific network topologies can be defined TMD Scalable Network
Connections /09/20067 Tier 1: Intra-FPGA Communication –Point-to-Point FIFOs are used as communication channels –Application-specific network topologies can be defined Tier 2: Inter-FPGA Communication –High-speed serial links used for inter-FPGA communication –Fully-interconnected network topology using 2N*(N-1) pairs of traces TMD Scalable Network
Connections /09/20068 TMD Scalable Network Tier 1: Intra-FPGA Communication –Point-to-Point FIFOs are used as communication channels –Application-specific network topologies can be defined Tier 2: Inter-FPGA Communication –High-speed serial links used for inter-FPGA communication –Fully-interconnected network topology using 2N*(N-1) pairs of traces Tier 3: Inter-Cluster Communication –Commercially-available switches interconnect cluster PCBs –Currently, we intend to use optical links
Connections /09/20069 TMD Programming model TMD-MPI –Parallel applications are defined as collection of computing tasks –Tasks communicate by passing messages using MPI –TMD-MPI is a subset implementation of MPI Application Hardware Standard API Hardware-dependent software Communication Protocol TMD-MPI
Connections /09/ TMD Application Design Flow Step 1: Application Prototyping –Software prototype of application developed –Profiling identifies compute-intensive routines Application Prototype
Connections /09/ TMD Application Design Flow Step 1: Application Prototyping –Software prototype of application developed –Profiling identifies compute-intensive routines Step 2: Application Refinement –Partitioning into tasks communicating using MPI –Communication patterns analyzed to determine network topology Application Prototype Process AProcess BProcess C
Connections /09/ TMD Application Design Flow Step 1: Application Prototyping –Software prototype of application developed –Profiling identifies compute-intensive routines Step 2: Application Refinement –Partitioning into tasks communicating using MPI –Communication patterns analyzed to determine network topology Step 3: TMD Prototyping –Tasks are ported to soft-processors on TMD –Software refined to utilize TMD-MPI library –On-chip communication network verified Application Prototype Process AProcess BProcess C ABC
Connections /09/ TMD Application Design Flow Step 1: Application Prototyping –Software prototype of application developed –Profiling identifies compute-intensive routines Step 2: Application Refinement –Partitioning into tasks communicating using MPI –Communication patterns analyzed to determine network topology Step 3: TMD Prototyping –Tasks are ported to soft-processors on TMD –Software refined to utilize TMD-MPI library –On-chip communication network verified Step 4: TMD Optimization –Intensive tasks replaced with hardware engines –MPE handles communication for hardware engines Application Prototype Process AProcess BProcess C ABC B
Connections /09/ The First TMD Prototype...
Connections /09/ Acknowledgements SOCRN David Chui Christopher Comis Sam Lee Dr. Paul Chow Andrew House Daniel Nunes Manuel Saldaña Emanuel Ramalho Dr. Régis Pomès Christopher Madill Arun Patel Lesley Shannon TMD Group: Past Members: