FPGA Cluster MVM reconstruction Scalable multiple FPGA architecture Matrix is divided among FPGAs; Incoming vector is broadcast to all FPGAs; Partial result accumulated Scalable multiple FPGA architecture MVM can be easily paralleled; Memory bandwidth increasing with the FPGA number; Each FPGA node runs identical firmware UDP external interface FPGA base UDP for the lowest latency and jitter Easier integration to the rest via standard 10GbE hardware, e.g. switch
FPGA Cluster Example FPGA Boards FPGA Cost Board DDR Speed Peak BW (GB) GFLOPs (80%) Cost (EUR) Source KCU105 DDR4 1200 19.2 3.84 3252.43 Digikey XpressKUS DDR3 933 14.928 2.9856 4990 PLDA FPGA Cost Instrument Subaps DM channels Matrix size Freq BW (GB) BW (GFLOPS) KCU 105 boards KCU 105 cost (kEuro) XpressKUS Bords Xpress cost (kEuro) Harmoni 21904 4326 189513408 800 606.44291 151.6107264 40 130.0972 51 254.49 Micado 32856 10000 657120000 500 1314.24 328.56 86 279.70898 111 553.89 MOSIAIC 36479.6 2397147475 250 2397.1475 599.2868688 157 510.63151 201 1002.99 HIRES 284270112 568.54022 142.135056 38 123.59234 48 239.52 EPICS 40000 49053 3924240000 3000 47090.88 11772.72 3066 9971.95038 3944 19680.56
FPGA Cluster Evaluation Using Quickplay tool UDP interface; MVM kernel development in C; PLDA XpressKUS hardware x3; Fully supported by Quickplay tool: 10GbE UDP, DDR3 memory; 1 for Interface processing: broadcast slope data and merge the partial MVM result; 2 (expandable) for MVM processing Scalable Architecture Nodes are connected to a commercial 10GbE switch The performance can be expanded by adding more FPGA hardware External access interface via 10GbE UDP I/F FPGA node Access I/F Switch MVM FPGA nodes