Real Parallel Computers. Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel.

Slides:



Advertisements
Similar presentations
Clusters, Grids and their applications in Physics David Barnes (Astro) Lyle Winton (EPP)
Advertisements

Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 1 CCS-3 P AL STATE OF THE ART.
IBM 1350 Cluster Expansion Doug Johnson Senior Systems Developer.
♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.
2. Computer Clusters for Scalable Parallel Computing
Today’s topics Single processors and the Memory Hierarchy
Beowulf Supercomputer System Lee, Jung won CS843.
Types of Parallel Computers
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
CS 213 Commercial Multiprocessors. Origin2000 System – Shared Memory Directory state in same or separate DRAMs, accessed in parallel Upto 512 nodes (1024.
CHEP04 - Interlaken - Sep. 27th - Oct. 1st 2004T. M. Steinbeck for the Alice Collaboration1/20 New Experiences with the ALICE High Level Trigger Data Transport.
An Overview of Myrinet By: Ralph Zajac. What is Myrinet? n LAN designed for clusters n Based on USCD’s ATOMIC LAN n Has many characteristics of MPP message-passing.
Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.
"Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.
4 december, The Distributed ASCI Supercomputer The third generation Dick Epema (TUD) (with many slides from Henri Bal) Parallel and Distributed.
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
Storage area network and System area network (SAN)
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Prepared by Careene McCallum-Rodney Hardware specification of a computer system.
Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
KYLIN-I 麒麟一号 High-Performance Computing Cluster Institute for Fusion Theory and Simulation, Zhejiang University
1 Lecture 7: Part 2: Message Passing Multicomputers (Distributed Memory Machines)
1 Computing platform Andrew A. Chien Mohsen Saneei University of Tehran.
CHAPTER 11: Modern Computer Systems
Maximizing The Compute Power With Mellanox InfiniBand Connectivity Gilad Shainer Wolfram Technology Conference 2006.
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie (PARIS) LIP6 laboratory Olivier Glück.
High-End Computing Systems EE380 State-of-the-Art Lecture Hank Dietz Professor & Hardymon Chair in Networking Electrical & Computer Engineering Dept. University.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Rensselaer Why not change the world? Rensselaer Why not change the world? 1.
Hardware Trends. Contents Memory Hard Disks Processors Network Accessories Future.
Jaguar Super Computer Topics Covered Introduction Architecture Location & Cost Bench Mark Results Location & Manufacturer Machines in top 500 Operating.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
SoCal Infrastructure OptIPuter Southern California Network Infrastructure Philip Papadopoulos OptIPuter Co-PI University of California, San Diego Program.
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
Computer Science and Engineering Predicting Performance for Grid-Based P. 1 IPDPS’07 A Performance Prediction Framework.
2009/4/21 Third French-Japanese PAAP Workshop 1 A Volumetric 3-D FFT on Clusters of Multi-Core Processors Daisuke Takahashi University of Tsukuba, Japan.
Rob Allan Daresbury Laboratory NW-GRID Training Event 25 th January 2007 Introduction to NW-GRID R.J. Allan CCLRC Daresbury Laboratory.
Advances Toward Economic and Efficient Terabit LANs and WANs Cees de Laat Advanced Internet Research Group (AIRG) University of Amsterdam.
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
Prabal Dutta March 12, 2003Networks of Workstations1 Prabal Dutta Electrical Engineering 864 Advanced Computer Design.
Interconnection network network interface and a case study.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.
Wide-Area Parallel Computing in Java Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences vrije Universiteit.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
Understanding Parallel Computers Parallel Processing EE 613.
Presented by NCCS Hardware Jim Rogers Director of Operations National Center for Computational Sciences.
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
CIP HPC CIP - HPC HPC = High Performance Computer It’s not a regular computer, it’s bigger, faster, more powerful, and more.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Parallel Computing on Wide-Area Clusters: the Albatross Project Aske Plaat Thilo Kielmann Jason Maassen Rob van Nieuwpoort Ronald Veldema Vrije Universiteit.
Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.
SCARIe: using StarPlane and DAS-3 Paola Grosso Damien Marchel Cees de Laat SNE group - UvA.
Hardware Architecture
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
Brief introduction about “Grid at LNS”
Network Connected Multiprocessors
Overview of Earth Simulator.
Cluster Computers.
Presentation transcript:

Real Parallel Computers

Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel Computing 2005

Short history of parallel machines 1970s: vector computers 1990s: Massively Parallel Processors (MPPs) –Standard microprocessors, special network and I/O 2000s: –Cluster computers (using standard PCs) –Advanced architectures (BlueGene) –Comeback of vector computer (Japanese Earth Simulator) –GPUs, IBM Cell/BE

Performance development and predictions

Clusters Cluster computing –Standard PCs/workstations connected by fast network –Good price/performance ratio –Exploit existing (idle) machines or use (new) dedicated machines Cluster computers vs. supercomputers (MPPs) –Processing power similar: based on microprocessors –Communication performance was the key difference –Modern networks have bridged this gap (Myrinet, Infiniband, 10G Ethernet)

Overview Cluster computers at our department –DAS-1: 128-node Pentium-Pro / Myrinet cluster (gone) –DAS-2: 72-node dual-Pentium-III / Myrinet-2000 cluster –DAS-3: 85-node dual-core dual Opteron / Myrinet-10G –DAS-4 (2010): cluster with accelerators (GPUs etc.) Part of a wide-area system: –Distributed ASCI Supercomputer

Distributed ASCI Supercomputer ( )

DAS-1 node configuration 200 MHz Pentium Pro 128 MB memory 2.5 GB disk 100 Mbit/s Ethernet Myrinet 1.28 Gbit/s (full duplex) Operating system: Red Hat Linux

DAS-2 Cluster ( ) 72 nodes, each with 2 CPUs (144 CPUs in total) 1 GHz Pentium-III 1 GB memory per node 20 GB disk Fast Ethernet 100 Mbit/s Myrinet Gbit/s (crossbar) Operating system: Red Hat Linux Part of wide-area DAS-2 system (5 clusters with 200 nodes in total) Myrinet switch Ethernet switch

DAS-3 Cluster (Sept. 2006) 85 nodes, each with 2 dual-core CPUs (340 cores in total) 2.4 GHz AMD Opterons (64 bit) 4 GB memory per node 250 GB disk Gigabit Ethernet Myrinet-10G 10 Gb/s (crossbar) Operating system: Scientific Linux Part of wide-area DAS-3 system (5 clusters; 263 nodes), using SURFnet-6 optical network with Gb/s wide-area links

DAS-3 Networks Nortel * 5510 ethernet switch 85 compute nodes 85 * 1 Gb/s ethernet Myri-10G switch 85 * 10 Gb/s Myrinet 10 Gb/s ethernet blade 8 * 10 Gb/s eth (fiber) Nortel OME 6500 with DWDM blade 80 Gb/s DWDM SURFnet6 1 or 10 Gb/s Campus uplink Headnode (10 TB mass storage) 10 Gb/s Myrinet 10 Gb/s ethernet

Myrinet Nortel DAS-3 Networks

DAS-1 Myrinet Components: 8-port switches Network interface card for each node (on PCI bus) Electrical cables: reliable links Myrinet switches: 8 x 8 crossbar switch Each port connects to a node (network interface) or another switch Source-based, cut-through routing Less than 1 microsecond switching delay

24-node DAS-1 cluster

128-node DAS-1 cluster Ring topology would have: –22 switches –Poor diameter: 11 –Poor bisection width: 2

Topology 128-node cluster 4 x 8 grid with wrap-around Each switch connected to 4 switches & 4 PCs 32 switches (128/4) Diameter: 6 ; Bisection width: 8

Performance DAS-2: –9.6 μsec 1-way null-latency –168 MB/sec throughput DAS-3: –2.6 μsec 1-way null-latency –950 MB/sec throughput

MareNostrum: large Myrinet cluster IBM system at Barcelona Supercomputer Center 4812 PowerPC 970 processors, 9.6 TB memory (2006)