FPGA Cluster MVM reconstruction Scalable multiple FPGA architecture

Slides:



Advertisements
Similar presentations
PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya
Advertisements

System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
IBM 1350 Cluster Expansion Doug Johnson Senior Systems Developer.
Esma Yildirim Department of Computer Engineering Fatih University Istanbul, Turkey DATACLOUD 2013.
University of Michigan Electrical Engineering and Computer Science Transparent CPU-GPU Collaboration for Data-Parallel Kernels on Heterogeneous Systems.
History of Distributed Systems Joseph Cordina
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
Seven Minute Madness: Special-Purpose Parallel Architectures Dr. Jason D. Bakos.
A Comparative Study of Network Protocols & Interconnect for Cluster Computing Performance Evaluation of Fast Ethernet, Gigabit Ethernet and Myrinet.
1 Multi - Core fast Communication for SoPC Multi - Core fast Communication for SoPC Technion – Israel Institute of Technology Department of Electrical.
Analysis and Performance Results of a Molecular Modeling Application on Merrimac Erez, et al. Stanford University 2004 Presented By: Daniel Killebrew.
Chapter 17 Parallel Processing.
Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.
Earth Simulator Jari Halla-aho Pekka Keränen. Architecture MIMD type distributed memory 640 Nodes, 8 vector processors each. 16GB shared memory per node.
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Using FPGAs with Embedded Processors for Complete Hardware and Software Systems Jonah Weber May 2, 2006.
Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
Asis AdvancedTCA Class What is a Backplane? A backplane is an electronic circuit board Sometimes called PCB (Printed Circuit Board) containing circuitry.
General Purpose FIFO on Virtex-6 FPGA ML605 board Students: Oleg Korenev Eugene Reznik Supervisor: Rolf Hilgendorf 1 Semester: spring 2012.
Wir schaffen Wissen – heute für morgen 24 August 2015PSI,24 August 2015PSI, Paul Scherrer Institut Status WP 8.2 RF Low Level Electronic Manuel Brönnimann.
Hosting Virtual Networks on Commodity Hardware VINI Summer Camp.
Redes Inalámbricas Máster Ingeniería de Computadores 2008/2009 Tema 7.- CASTADIVA PROJECT Performance Evaluation of a MANET architecture.
“ Analyzer for 40Gbit Ethernet “ (Bi-semestrial project) Executers: פריד מחאג ' נה Farid Mahajna Husam Kadan חוסאם קעדאן Instructor:
By: Oleg Schtofenmaher Maxim Fudim Supervisor: Walter Isaschar Characterization presentation for project Winter 2007 ( Part A)
Efficient FPGA Implementation of QR
Beowulf Cluster Jon Green Jay Hutchinson Scott Hussey Mentor: Hongchi Shi.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
1 “Fast FPGA-based trigger and data acquisition system for the CERN experiment NA62: architecture and algorithms” Authors G. Collazuol(a), S. Galeotti(b),
Example: Sorting on Distributed Computing Environment Apr 20,
Jump to first page One-gigabit Router Oskar E. Bruening and Cemal Akcaba Advisor: Prof. Agarwal.
Investigating the Performance of Audio/Video Service Architecture II: Broker Network Ahmet Uyar & Geoffrey Fox Tuesday, May 17th, 2005 The 2005 International.
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
PARALLEL COMPUTING overview What is Parallel Computing? Traditionally, software has been written for serial computation: To be run on a single computer.
By: Daniel BarskyNatalie Pistunovich Supervisors: Rolf HilgendorfInna Rivkin 10/06/2010.
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
01/04/09A. Salamon – TDAQ WG - CERN1 LKr calorimeter L0 trigger V. Bonaiuto, L. Cesaroni, A. Fucci, A. Salamon, G. Salina, F. Sargeni.
Connecting EPICS with Easily Reconfigurable I/O Hardware EPICS Collaboration Meeting Fall 2011.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
WorldScape Defense Company, L.L.C. Company Proprietary Slide 1 An Ultra-High Performance Scalable Processing Architecture for HPC and Embedded Applications.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Unit 2 VIRTUALISATION. Unit 2 - Syllabus Basics of Virtualization Types of Virtualization Implementation Levels of Virtualization Virtualization Structures.
The University of Adelaide, School of Computer Science
29/05/09A. Salamon – TDAQ WG - CERN1 LKr calorimeter L0 trigger V. Bonaiuto, L. Cesaroni, A. Fucci, A. Salamon, G. Salina, F. Sargeni.
A Study of Data Partitioning on OpenCL-based FPGAs Zeke Wang (NTU Singapore), Bingsheng He (NTU Singapore), Wei Zhang (HKUST) 1.
Jun Doi IBM Research – Tokyo Early Performance Evaluation of Lattice QCD on POWER+GPU Cluster 17 July 2015.
Relational Query Processing on OpenCL-based FPGAs Zeke Wang, Johns Paul, Hui Yan Cheah (NTU, Singapore), Bingsheng He (NUS, Singapore), Wei Zhang (HKUST,
Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.
Mohamed Abdelfattah Vaughn Betz
JIVE UniBoard Correlator External Review
NaNet Problem: lower communication latency and its fluctuations. How?
DAQ and TTC Integration For MicroTCA in CMS
Mitrion-C Currently a programming language for FPGA accelerators
Data Center Network Architectures
School of Engineering University of Guelph
Scientific requirements and dimensioning for the MICADO-SCAO RTC
Enrico Gamberini, Giovanna Lehmann Miotto, Roland Sipos
A Quantitative Analysis of Stream Algorithms on Raw Fabrics
JIVE UniBoard Correlator (JUC) Firmware
Intel’s Core i7 Processor
XenData SX-550 LTO Archive Servers
Outline Interconnection networks Processor arrays Multiprocessors
DETERMINISTIC ETHERNET FOR SCALABLE MODULAR AVIONICS
Characteristics of Reconfigurable Hardware
University of Wisconsin-Madison
The Uniboard  FPGA Processing for Astronomy
Readout Systems Update
Presentation transcript:

FPGA Cluster MVM reconstruction Scalable multiple FPGA architecture Matrix is divided among FPGAs; Incoming vector is broadcast to all FPGAs; Partial result accumulated Scalable multiple FPGA architecture MVM can be easily paralleled; Memory bandwidth increasing with the FPGA number; Each FPGA node runs identical firmware UDP external interface FPGA base UDP for the lowest latency and jitter Easier integration to the rest via standard 10GbE hardware, e.g. switch

FPGA Cluster Example FPGA Boards FPGA Cost Board DDR Speed Peak BW (GB) GFLOPs (80%) Cost (EUR) Source KCU105 DDR4 1200 19.2 3.84 3252.43 Digikey XpressKUS DDR3 933 14.928 2.9856 4990 PLDA FPGA Cost Instrument Subaps DM channels Matrix size Freq BW (GB) BW (GFLOPS) KCU 105 boards KCU 105 cost (kEuro) XpressKUS Bords Xpress cost (kEuro) Harmoni 21904 4326 189513408 800 606.44291 151.6107264 40 130.0972 51 254.49 Micado 32856 10000 657120000 500 1314.24 328.56 86 279.70898 111 553.89 MOSIAIC 36479.6 2397147475 250 2397.1475 599.2868688 157 510.63151 201 1002.99 HIRES 284270112 568.54022 142.135056 38 123.59234 48 239.52 EPICS 40000 49053 3924240000 3000 47090.88 11772.72 3066 9971.95038 3944 19680.56

FPGA Cluster Evaluation Using Quickplay tool UDP interface; MVM kernel development in C; PLDA XpressKUS hardware x3; Fully supported by Quickplay tool: 10GbE UDP, DDR3 memory; 1 for Interface processing: broadcast slope data and merge the partial MVM result; 2 (expandable) for MVM processing Scalable Architecture Nodes are connected to a commercial 10GbE switch The performance can be expanded by adding more FPGA hardware External access interface via 10GbE UDP I/F FPGA node Access I/F Switch MVM FPGA nodes