Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,

Slides:



Advertisements
Similar presentations
© 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene/P System Overview - Hardware.
Advertisements

Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 1 CCS-3 P AL STATE OF THE ART.
IBM 1350 Cluster Expansion Doug Johnson Senior Systems Developer.
♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.
CURRENT AND FUTURE HPC SOLUTIONS. T-PLATFORMS  Russia’s leading developer of turn-key solutions for supercomputing  Privately owned  140+ employees.
SAN DIEGO SUPERCOMPUTER CENTER Niches, Long Tails, and Condos Effectively Supporting Modest-Scale HPC Users 21st High Performance Computing Symposia (HPC'13)
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Types of Parallel Computers
Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.
Information Technology Center Introduction to High Performance Computing at KFUPM.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA; SAN DIEGO IEEE Symposium of Massive Storage Systems, May 3-5, 2010 Data-Intensive Solutions.
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
CS 213 Commercial Multiprocessors. Origin2000 System – Shared Memory Directory state in same or separate DRAMs, accessed in parallel Upto 512 nodes (1024.
Parallel Programming Chapter 2 Introduction to Parallel Architectures Johnnie Baker January 23,
SUMS Storage Requirement 250 TB fixed disk cache 130 TB annual increment for permanently on- line data 100 TB work area (not controlled by SUMS) 2 PB near-line.
An Introduction to Princeton’s New Computing Resources: IBM Blue Gene, SGI Altix, and Dell Beowulf Cluster PICASso Mini-Course October 18, 2006 Curt Hillegas.
Computing Resources Joachim Wagner Overview CNGL Cluster MT Group Cluster School Cluster Desktop PCs.
Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.
4 december, The Distributed ASCI Supercomputer The third generation Dick Epema (TUD) (with many slides from Henri Bal) Parallel and Distributed.
Real Parallel Computers. Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel.
Interconnection and Packaging in IBM Blue Gene/L Yi Zhu Feb 12, 2007.
1 petaFLOPS+ in 10 racks TB2–TL system announcement Rev 1A.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Presenter MaxAcademy Lecture Series – V1.0, September 2011 Introduction and Motivation.
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
Lecture 2 : Introduction to Multicore Computing Bong-Soo Sohn Associate Professor School of Computer Science and Engineering Chung-Ang University.
KYLIN-I 麒麟一号 High-Performance Computing Cluster Institute for Fusion Theory and Simulation, Zhejiang University
1 Lecture 7: Part 2: Message Passing Multicomputers (Distributed Memory Machines)
1 Computing platform Andrew A. Chien Mohsen Saneei University of Tehran.
GPU Programming with CUDA – Accelerated Architectures Mike Griffiths
High Performance Computing G Burton – ICG – Oct12 – v1.1 1.
CCS machine development plan for post- peta scale computing and Japanese the next generation supercomputer project Mitsuhisa Sato CCS, University of Tsukuba.
Maximizing The Compute Power With Mellanox InfiniBand Connectivity Gilad Shainer Wolfram Technology Conference 2006.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Edgar Gabriel Short Course: Advanced programming with MPI Edgar Gabriel Spring 2007.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Rensselaer Why not change the world? Rensselaer Why not change the world? 1.
SoCal Infrastructure OptIPuter Southern California Network Infrastructure Philip Papadopoulos OptIPuter Co-PI University of California, San Diego Program.
The IBM Blue Gene/L System Architecture Presented by Sabri KANTAR.
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
2009/4/21 Third French-Japanese PAAP Workshop 1 A Volumetric 3-D FFT on Clusters of Multi-Core Processors Daisuke Takahashi University of Tsukuba, Japan.
Rob Allan Daresbury Laboratory NW-GRID Training Event 25 th January 2007 Introduction to NW-GRID R.J. Allan CCLRC Daresbury Laboratory.
Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage April 2010.
Interconnection network network interface and a case study.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
Presented by NCCS Hardware Jim Rogers Director of Operations National Center for Computational Sciences.
Pierre VANDE VYVRE ALICE Online upgrade October 03, 2012 Offline Meeting, CERN.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Pathway to Petaflops A vendor contribution Philippe Trautmann Business Development Manager HPC & Grid Global Education, Government & Healthcare.
Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.
Architecture of Parallel Computers CSC / ECE 506 BlueGene Architecture 4/26/2007 Dr Steve Hunter.
BLUE GENE Sunitha M. Jenarius. What is Blue Gene A massively parallel supercomputer using tens of thousands of embedded PowerPC processors supporting.
Hardware Architecture
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
PARALLEL MODEL OF EVOLUTIONARY GAME DYNAMICS Amanda Peters MIT /13/2009.
Brief introduction about “Grid at LNS”
NIIF HPC services for research and education
Constructing a system with multiple computers or processors
BlueGene/L Supercomputer
SAP HANA Cost-optimized Hardware for Non-Production
Cluster Computers.
Presentation transcript:

Real Parallel Computers

Modular data centers

Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel Computing 2005

Short history of parallel machines 1970s: vector computers 1990s: Massively Parallel Processors (MPPs) –Standard microprocessors, special network and I/O 2000s: –Cluster computers (using standard PCs) –Advanced architectures (BlueGene) –Comeback of vector computer (Japanese Earth Simulator) –IBM Cell/BE 2010s: –Multi-cores, GPUs –Cloud data centers

Performance development and predictions

Clusters Cluster computing –Standard PCs/workstations connected by fast network –Good price/performance ratio –Exploit existing (idle) machines or use (new) dedicated machines Cluster computers vs. supercomputers (MPPs) –Processing power similar: based on microprocessors –Communication performance was the key difference –Modern networks have bridged this gap (Myrinet, Infiniband, 10G Ethernet)

Overview Cluster computers at our department –DAS-1: 128-node Pentium-Pro / Myrinet cluster (gone) –DAS-2: 72-node dual-Pentium-III / Myrinet-2000 cluster –DAS-3: 85-node dual-core dual Opteron / Myrinet-10G –DAS-4: 72-node cluster with accelerators (GPUs etc.) Part of a wide-area system: –Distributed ASCI Supercomputer

Distributed ASCI Supercomputer ( )

DAS-2 Cluster ( ) 72 nodes, each with 2 CPUs (144 CPUs in total) 1 GHz Pentium-III 1 GB memory per node 20 GB disk Fast Ethernet 100 Mbit/s Myrinet Gbit/s (crossbar) Operating system: Red Hat Linux Part of wide-area DAS-2 system (5 clusters with 200 nodes in total) Myrinet switch Ethernet switch

DAS-3 Cluster (Sept. 2006) 85 nodes, each with 2 dual-core CPUs (340 cores in total) 2.4 GHz AMD Opterons (64 bit) 4 GB memory per node 250 GB disk Gigabit Ethernet Myrinet-10G 10 Gb/s (crossbar) Operating system: Scientific Linux Part of wide-area DAS-3 system (5 clusters; 263 nodes), using SURFnet-6 optical network with Gb/s wide-area links

DAS-3 Networks Nortel * 5510 ethernet switch 85 compute nodes 85 * 1 Gb/s ethernet Myri-10G switch 85 * 10 Gb/s Myrinet 10 Gb/s ethernet blade 8 * 10 Gb/s eth (fiber) Nortel OME 6500 with DWDM blade 80 Gb/s DWDM SURFnet6 1 or 10 Gb/s Campus uplink Headnode (10 TB mass storage) 10 Gb/s Myrinet 10 Gb/s ethernet

Myrinet Nortel DAS-3 Networks

DAS-4 (Sept. 2010) 72 nodes (2 quad-core Intel Westmere Xeon E5620, 24 GB memory, 2 TB disk) 2 fat nodes with 94 GB memory Infiniband network + 1 Gb/s Ethernet 16 NVIDIA GTX 480 graphics accelerators (GPUs) 2 Tesla C2050 GPUs

DAS-4 performance Infiniband network: - One-way latency: 1.9 microseconds - Throughput: 22 Gbit/s CPU performance: - 72 nodes (576 cores): GFLOPS

Blue Gene/L Supercomputer

Blue Gene/L 2.8/5.6 GF/s 4 MB 2 processors 2 chips, 1x2x1 5.6/11.2 GF/s 1.0 GB (32 chips 4x4x2) 16 compute, 0-2 IO cards 90/180 GF/s 16 GB 32 Node Cards 2.8/5.6 TF/s 512 GB 64 Racks, 64x32x32 180/360 TF/s 32 TB Rack System Node Card Compute Card Chip

Blue Gene/L Networks 3 Dimensional Torus –Interconnects all compute nodes (65,536) –Virtual cut-through hardware routing –1.4Gb/s on all 12 node links (2.1 GB/s per node) –1 µs latency between nearest neighbors, 5 µs to the farthest –Communications backbone for computations –0.7/1.4 TB/s bisection bandwidth, 68TB/s total bandwidth Global Collective –One-to-all broadcast functionality –Reduction operations functionality –2.8 Gb/s of bandwidth per link –Latency of one way traversal 2.5 µs –Interconnects all compute and I/O nodes (1024) Low Latency Global Barrier and Interrupt –Latency of round trip 1.3 µs Ethernet –Incorporated into every node ASIC –Active in the I/O nodes (1:8-64) –All external comm. (file I/O, control, user interaction, etc.) Control Network