The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie (PARIS) LIP6 laboratory Olivier Glück.

Slides:



Advertisements
Similar presentations
Chapter 3: Introduction to Data Communications and Networking
Advertisements

System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Protocols and software for exploiting Myrinet clusters Congduc Pham and the main contributors P. Geoffray, L. Prylli, B. Tourancheau, R. Westrelin.
Beowulf Supercomputer System Lee, Jung won CS843.
Chapter 4 Conventional Computer Hardware Architecture
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
A Comparative Study of Network Protocols & Interconnect for Cluster Computing Performance Evaluation of Fast Ethernet, Gigabit Ethernet and Myrinet.
Low Overhead Fault Tolerant Networking (in Myrinet)
An Overview of Myrinet By: Ralph Zajac. What is Myrinet? n LAN designed for clusters n Based on USCD’s ATOMIC LAN n Has many characteristics of MPP message-passing.
Figure 1.1 Interaction between applications and the operating system.
High Performance Communication using MPJ Express 1 Presented by Jawad Manzoor National University of Sciences and Technology, Pakistan 29 June 2015.
An overview of Infiniband Reykjavik, June 24th 2008 R E Y K J A V I K U N I V E R S I T Y Dept. Computer Science Center for Analysis and Design of Intelligent.
Real Parallel Computers. Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel.
COM S 614 Advanced Systems Novel Communications U-Net and Active Messages.
Storage area network and System area network (SAN)
Can Google Route? Building a High-Speed Switch from Commodity Hardware Guido Appenzeller, Matthew Holliman Q2/2002.
MULTICOMPUTER 1. MULTICOMPUTER, YANG DIPELAJARI Multiprocessors vs multicomputers Interconnection topologies Switching schemes Communication with messages.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Protocol-Dependent Message-Passing Performance on Linux Clusters Dave Turner – Xuehua Chen – Adam Oline This work is funded by the DOE MICS office.
Lecture 1, 1Spring 2003, COM1337/3501Computer Communication Networks Rajmohan Rajaraman COM1337/3501 Textbook: Computer Networks: A Systems Approach, L.
1 Lecture 7: Part 2: Message Passing Multicomputers (Distributed Memory Machines)
1 Computing platform Andrew A. Chien Mohsen Saneei University of Tehran.
Performance Tradeoffs for Static Allocation of Zero-Copy Buffers Pål Halvorsen, Espen Jorde, Karl-André Skevik, Vera Goebel, and Thomas Plagemann Institute.
RSC Williams MAPLD 2005/BOF-S1 A Linux-based Software Environment for the Reconfigurable Scalable Computing Project John A. Williams 1
1.  Project Goals.  Project System Overview.  System Architecture.  Data Flow.  System Inputs.  System Outputs.  Rates.  Real Time Performance.
NASA SpaceWire Architectures: Present & Future
Network Technologies & Principles 1 Communication Subsystem. Types of Network. Principles of Network. Distributed Protocols.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
GBT Interface Card for a Linux Computer Carson Teale 1.
A Comparative Study of the Linux and Windows Device Driver Architectures with a focus on IEEE1394 (high speed serial bus) drivers Melekam Tsegaye
1.4 Open source implement. Open source implement Open vs. Closed Software Architecture in Linux Systems Linux Kernel Clients and Daemon Servers Interface.
Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links Stefan Haas, Markus Joos CERN Wieslaw Iwanski Henryk Niewodnicznski Institute.
Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device Shuang LiangRanjit NoronhaDhabaleswar K. Panda IEEE.
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
CS 164: Slide Set 2: Chapter 1 -- Introduction (continued).
EECB 473 DATA NETWORK ARCHITECTURE AND ELECTRONICS PREPARED BY JEHANA ERMY JAMALUDDIN Basic Packet Processing: Algorithms and Data Structures.
Frank Lemke DPG Frühjahrstagung 2010 Time synchronization and measurements of a hierarchical DAQ network DPG Conference Bonn 2010 Session: HK 70.3 University.
Chapter 8-2 : Multicomputers Multiprocessors vs multicomputers Multiprocessors vs multicomputers Interconnection topologies Interconnection topologies.
I/O Computer Organization II 1 Interconnecting Components Need interconnections between – CPU, memory, I/O controllers Bus: shared communication channel.
Integrating New Capabilities into NetPIPE Dave Turner, Adam Oline, Xuehua Chen, and Troy Benjegerdes Scalable Computing Laboratory of Ames Laboratory This.
Scalable Systems Lab / The University of New Mexico© Summer 2000 by Adrian Riedo- Slide 1 - by Adrian Riedo - Summer 2000 High Performance Computing using.
Infiniband Bart Taylor. What it is InfiniBand™ Architecture defines a new interconnect technology for servers that changes the way data centers will be.
ECE 526 – Network Processing Systems Design Computer Architecture: traditional network processing systems implementation Chapter 4: D. E. Comer.
University of Mannheim1 ATOLL ATOmic Low Latency – A high-perfomance, low cost SAN Patrick R. Haspel Computer Architecture Group.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
A record and replay mechanism using programmable network interface cards Laurent Lefèvre INRIA / LIP (UMR CNRS, INRIA, ENS, UCB)
The influence of system calls and interrupts on the performances of a PC cluster using a Remote DMA communication primitive Olivier Glück Jean-Luc Lamotte.
Cray Inc. Hot Interconnects 1 Bob Alverson, Duncan Roweth, Larry Kaplan Cray Inc.
Optimizing Charm++ Messaging for the Grid Gregory A. Koenig Parallel Programming Laboratory Department of Computer.
SLAAC/ACS API: A Status Report Virginia Tech Configurable Computing Lab SLAAC Retreat March 1999.
EECB 473 Data Network Architecture and Electronics Lecture 1 Conventional Computer Hardware Architecture
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
1.4 Open source implement. Open source implement Open vs. Closed Software Architecture in Linux Systems Linux Kernel Clients and Daemon Servers Interface.
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb The CMS Event Builder Demonstrator based on Myrinet.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Major OS Components CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Operating Systems.
J.M. Landgraf, M.J. LeVine, A. Ljubicic, Jr., M.W. Schulz
CS 286 Computer Organization and Architecture
CSCI 315 Operating Systems Design
I/O Systems I/O Hardware Application I/O Interface
MPJ: A Java-based Parallel Computing System
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Myrinet 2Gbps Networks (
NVMe.
NetFPGA - an open network development platform
Cluster Computers.
Presentation transcript:

The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie (PARIS) LIP6 laboratory Olivier Glück

Outline Introduction Hardware n Hardware architecture n Hardware components Software n low-level protocol n MPI Conclusion

Introduction n Very low cost and high performance parallel computer n PC cluster using optimized interconnection network n A PCI network board (FastHSL) developed at LIP6 : l High speed communication network (HSL,1 Gbit/s) l RCUBE : router (8x8 crossbar, 8 HSL ports) l PCIDDC : PCI network controller (a specific communication protocol) n Goal : supply efficient soft layers

Hardware architecture R3 PCI DDC R3 PCI DDC Standard PC running LINUX or FreeBSD FastHSL boards Standard PC running LINUX or FreeBSD

The MPC machine

The FastHSL board

Hardware layers HSL link (1 Gbit/s) n coaxial cable, point to point, full duplex n data encoded on 12 bits n low-level flow control RCUBE n Rapid Reconfigurable Router, extensibility n Latency : 150 ns n wormhole strategy, interval routing schemes PCIDDC n the network interface controller n implements communication protocol : Remote DMA n zero copy

Low-level communication protocol n Zero-copy protocol (Direct deposit protocol) n FastHSL board accesses directly to host memory Process Memory Process Memory SenderReceiver Process Memory Kernel Memory I/O Memory I/O Memory Kernel Memory Process Memory

PUT : the lowest level software API n Unix based layer : FreeBSD or Linux n Zero-copy strategy n Provides a basic kernel API using the PCIDDC remote-write n Parameters of a PUT() call : l remote node l local physical address l remote physical address l size of data l message identifier l callback functions for signaling

PUT performances PC Pentium II 350MHz Throughput : 494 Mbit/s Half-throughput : 66 bytes Latency : 4 µs (without system call)

MPI over MPC HSL Network Free BSD or LINUX Driver MPI PUT Implementation of MPICH over PUT API

MPI implementation (1) n 2 main problems : l Where to write data in remote physical memory ? l PUT only transfers contiguous blocks in physical memory n 2 kinds of messages : l control or short messages l data messages

MPI implementation (2) n Short (or control) messages : l Control information or limited-size user data l Use allocated buffers at starting time, contiguous in physical memory l One memory copy in emission and reception

MPI implementation (3) n Data messages : l transfer data larger than the maximum size of a control message or for specific MPI functions (e.g. MPI_Ssend) l RDV protocol l manage zero-copy transfer

MPI performances (1) Latency : 26 µsThroughput : 490 Mbit/s

MPI performances (2) Cray Latency : 57 µsThroughput : 1200 Mbit/s MPCLatency : 26 µsThroughput : 490 Mbit/s

MPI performances (3)

Conclusion n MPC : a very low cost PC clusters n Performances : similar to Myrinet clusters n Very good extensibility (no centralized router) n Perspectives : l a new router l an another network controller l improvements in MPI over MPC