Multiprocessor Systems Using FPGAs Presented By: Manuel Saldaña Connections 2006 The University of Toronto ECE Graduate Symposium.

Slides:



Advertisements
Similar presentations
Barcelona Supercomputing Center. The BSC-CNS objectives: R&D in Computer Sciences, Life Sciences and Earth Sciences. Supercomputing support to external.
Advertisements

Program Analysis and Tuning The German High Performance Computing Centre for Climate and Earth System Research Panagiotis Adamidis.
Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.
The LINPACK Benchmark on a Multi-Core Multi-FPGA System by Emanuel Ramalho Supervisor: Prof. Paul Chow University of Toronto Electrical and Computer Engineering.
PARAM Padma SuperComputer
CS 258 Parallel Computer Architecture Lecture 15.1 DASH: Directory Architecture for Shared memory Implementation, cost, performance Daniel Lenoski, et.
History of Distributed Systems Joseph Cordina
Chris Madill Molecular Structure and Function, Hospital for Sick Children Department of Biochemistry, University of Toronto Supervised by Dr. Paul Chow.
Seven Minute Madness: Special-Purpose Parallel Architectures Dr. Jason D. Bakos.
A Scalable FPGA-based Multiprocessor for Molecular Dynamics Simulation Arun Patel 1, Christopher A. Madill 2,3, Manuel Saldaña 1, Christopher Comis 1,
1  1998 Morgan Kaufmann Publishers Chapter 9 Multiprocessors.
Seven Minute Madness: Reconfigurable Computing Dr. Jason D. Bakos.
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.
Seven Minute Madness: Reconfigurable Computing Dr. Jason D. Bakos.
Heterogeneous Computing Dr. Jason D. Bakos. Heterogeneous Computing 2 “Traditional” Parallel/Multi-Processing Large-scale parallel platforms: –Individual.
About Keldysh Institute of Applied Mathematics, Russian Academy of Sciences, Moscow, Russia. Founded in 1953 by Mstislav.
Bioinformatics Protein structure prediction Motif finding Clustering techniques in bioinformatics Sequence alignment and comparison Phylogeny Applying.
Parallel Communications and NUMA Control on the Teragrid’s New Sun Constellation System Lars Koesterke with Kent Milfeld and Karl W. Schulz AUS Presentation.
Shilpa Seth.  Centralized System Centralized System  Client Server System Client Server System  Parallel System Parallel System.
RSC Williams MAPLD 2005/BOF-S1 A Linux-based Software Environment for the Reconfigurable Scalable Computing Project John A. Williams 1
CC02 – Parallel Programming Using OpenMP 1 of 25 PhUSE 2011 Aniruddha Deshmukh Cytel Inc.
Performance Evaluation of Hybrid MPI/OpenMP Implementation of a Lattice Boltzmann Application on Multicore Systems Department of Computer Science and Engineering,
LIGO-G Z 8 June 2001L.S.Finn/LDAS Camp1 How to think about parallel programming.
Tools and Utilities for parallel and serial codes in ENEA-GRID environment CRESCO Project: Salvatore Raia SubProject I.2 C.R. ENEA-Portici. 11/12/2007.
Artdaq Introduction artdaq is a toolkit for creating the event building and filtering portions of a DAQ. A set of ready-to-use components along with hooks.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Data Communications and Networking CSCS 311 Lecture 2 Amjad Hussain Zahid.
A High-Speed Inter-Process Communication Architecture for FPGA-based Hardware Acceleration of Molecular Dynamics Presented by: Chris Comis September 23,
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Cluster Workstations. Recently the distinction between parallel and distributed computers has become blurred with the advent of the network of workstations.
CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page
ESC499 – A TMD-MPI/MPE B ASED H ETEROGENEOUS V IDEO S YSTEM Tony Zhou, Prof. Paul Chow April 6 th, 2010.
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
A Profiler for a Multi-Core Multi-FPGA System by Daniel Nunes Supervisor: Professor Paul Chow September 30 th, 2008 University of Toronto Electrical and.
Embedding Constraint Satisfaction using Parallel Soft-Core Processors on FPGAs Prasad Subramanian, Brandon Eames, Department of Electrical Engineering,
TTCN-3 MOST Challenges Maria Teodorescu
An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines By: David Chui Supervisor: Professor P. Chow.
Distributed DBMSs- Concept and Design Jing Luo CS 157B Dr. Lee Fall, 2003.
The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03.
Computer System Architecture Dept. of Info. Of Computer. Chap. 13 Multiprocessors 13-1 Chap. 13 Multiprocessors n 13-1 Characteristics of Multiprocessors.
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Multiprocessors.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
A Scalable FPGA-based Multiprocessor Arun Patel 1, Christopher A. Madill 2,3, Manuel Saldaña 1, Christopher Comis 1, Régis Pomès 2,3, Paul Chow 1 Presented.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 April 11, 2006 Session 23.
Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage April 2010.
Efficiency of small size tasks calculation in grid clusters using parallel processing.. Olgerts Belmanis Jānis Kūliņš RTU ETF Riga Technical University.
Lec 6 Chap. 13Multiprocessors
Interconnection network network interface and a case study.
By Chi-Chang Chen.  Cluster computing is a technique of linking two or more computers into a network (usually through a local area network) in order.
Server HW CSIS 4490 n-Tier Client/Server Dr. Hoganson Server Hardware Mission-critical –High reliability –redundancy Massive storage (disk) –RAID for redundancy.
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Dec 1, 2005 Part 2.
CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Spring 2010 Lecture 13: Basic Parallel.
Challenges in the Next Generation Internet Xin Yuan Department of Computer Science Florida State University
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
EU-Russia Call Dr. Panagiotis Tsarchopoulos Computing Systems ICT Programme European Commission.
Background Computer System Architectures Computer System Software.
3/12/07CS Visit Days1 A Sea Change in Processor Design Uniprocessor SpecInt Performance: From Hennessy and Patterson, Computer Architecture: A Quantitative.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.
Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.
Grid Computing.
Parallel Programming in C with MPI and OpenMP
by Manuel Saldaña, Daniel Nunes, Emanuel Ramalho, and Paul Chow
Parallel I/O System for Massively Parallel Processors
FPGA Interconnection Algorithm
By Brandon, Ben, and Lee Parallel Computing.
Cluster Computers.
Presentation transcript:

Multiprocessor Systems Using FPGAs Presented By: Manuel Saldaña Connections 2006 The University of Toronto ECE Graduate Symposium Toronto, Ontario, Canada June 9 th, 2006

Connections /09/20062 Introduction –Multiprocessors in FPGAs can accelerate many computing tasks by up to 2 or 3 orders of magnitude –Massive parallelism can be achieved using multiple FPGAs –The requirements are: - a scalable network - an efficient programming model - a flexible design flow –TMD is a FPGA-based multiprocessor system tailored to perform Molecular Dynamics simulations

Connections /09/20063 Large-Scale Multiprocessor Systems Class 1 Machines –Supercomputers or clusters of workstations –~ interconnected CPUs Interconnection Network

Connections /09/20064 Class 1 Machines –Supercomputers or clusters of workstations –~ interconnected CPUs Class 2 Machines –Hybrid network of CPU and FPGA hardware –FPGA acts as external co-processor to CPU –Programming model still evolving Interconnection Network Large-Scale Multiprocessor Systems

Connections /09/20065 Class 1 Machines –Supercomputers or clusters of workstations –~ interconnected CPUs Class 2 Machines –Hybrid network of CPU and FPGA hardware –FPGA acts as external co-processor to CPU –Programming model still evolving Class 3 Machines –FPGA-based multiprocessor system –Recent area of academic and industrial focus Interconnection Network Large-Scale Multiprocessor Systems

Connections /09/20066 Tier 1: Intra-FPGA Communication –Point-to-Point FIFOs are used as communication channels –Application-specific network topologies can be defined TMD Scalable Network

Connections /09/20067 Tier 1: Intra-FPGA Communication –Point-to-Point FIFOs are used as communication channels –Application-specific network topologies can be defined Tier 2: Inter-FPGA Communication –High-speed serial links used for inter-FPGA communication –Fully-interconnected network topology using 2N*(N-1) pairs of traces TMD Scalable Network

Connections /09/20068 TMD Scalable Network Tier 1: Intra-FPGA Communication –Point-to-Point FIFOs are used as communication channels –Application-specific network topologies can be defined Tier 2: Inter-FPGA Communication –High-speed serial links used for inter-FPGA communication –Fully-interconnected network topology using 2N*(N-1) pairs of traces Tier 3: Inter-Cluster Communication –Commercially-available switches interconnect cluster PCBs –Currently, we intend to use optical links

Connections /09/20069 TMD Programming model TMD-MPI –Parallel applications are defined as collection of computing tasks –Tasks communicate by passing messages using MPI –TMD-MPI is a subset implementation of MPI Application Hardware Standard API Hardware-dependent software Communication Protocol TMD-MPI

Connections /09/ TMD Application Design Flow Step 1: Application Prototyping –Software prototype of application developed –Profiling identifies compute-intensive routines Application Prototype

Connections /09/ TMD Application Design Flow Step 1: Application Prototyping –Software prototype of application developed –Profiling identifies compute-intensive routines Step 2: Application Refinement –Partitioning into tasks communicating using MPI –Communication patterns analyzed to determine network topology Application Prototype Process AProcess BProcess C

Connections /09/ TMD Application Design Flow Step 1: Application Prototyping –Software prototype of application developed –Profiling identifies compute-intensive routines Step 2: Application Refinement –Partitioning into tasks communicating using MPI –Communication patterns analyzed to determine network topology Step 3: TMD Prototyping –Tasks are ported to soft-processors on TMD –Software refined to utilize TMD-MPI library –On-chip communication network verified Application Prototype Process AProcess BProcess C ABC

Connections /09/ TMD Application Design Flow Step 1: Application Prototyping –Software prototype of application developed –Profiling identifies compute-intensive routines Step 2: Application Refinement –Partitioning into tasks communicating using MPI –Communication patterns analyzed to determine network topology Step 3: TMD Prototyping –Tasks are ported to soft-processors on TMD –Software refined to utilize TMD-MPI library –On-chip communication network verified Step 4: TMD Optimization –Intensive tasks replaced with hardware engines –MPE handles communication for hardware engines Application Prototype Process AProcess BProcess C ABC B

Connections /09/ The First TMD Prototype...

Connections /09/ Acknowledgements SOCRN David Chui Christopher Comis Sam Lee Dr. Paul Chow Andrew House Daniel Nunes Manuel Saldaña Emanuel Ramalho Dr. Régis Pomès Christopher Madill Arun Patel Lesley Shannon TMD Group: Past Members: