SC2002, Baltimore ( From the Earth Simulator to PC Clusters Structure of SC2002 Top500 List Dinosaurs Department Earth.

Slides:



Advertisements
Similar presentations
© 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene/P System Overview - Hardware.
Advertisements

Farms/Clusters of the future Large Clusters O(1000), any existing examples ? YesSupercomputing, PC clusters LLNL, Los Alamos, Google No long term experience.
Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 1 CCS-3 P AL STATE OF THE ART.
Commodity Computing Clusters - next generation supercomputers? Paweł Pisarczyk, ATM S. A.
♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.
Top 500 Computers Federated Distributed Systems Anda Iamnitchi.
Today’s topics Single processors and the Memory Hierarchy
Beowulf Supercomputer System Lee, Jung won CS843.
CURRENT AND FUTURE HPC SOLUTIONS. T-PLATFORMS  Russia’s leading developer of turn-key solutions for supercomputing  Privately owned  140+ employees.
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
Types of Parallel Computers
IDC HPC User Forum Conference Appro Product Update Anthony Kenisky, VP of Sales.
Information Technology Center Introduction to High Performance Computing at KFUPM.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
ASU/TGen Computational Facility.
A Commodity Cluster for Lattice QCD Calculations at DESY Andreas Gellrich *, Peter Wegner, Hartmut Wittig DESY CHEP03, 25 March 2003 Category 6: Lattice.
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
CS 213 Commercial Multiprocessors. Origin2000 System – Shared Memory Directory state in same or separate DRAMs, accessed in parallel Upto 512 nodes (1024.
NPACI Panel on Clusters David E. Culler Computer Science Division University of California, Berkeley
HELICS Petteri Johansson & Ilkka Uuhiniemi. HELICS COW –AMD Athlon MP 1.4Ghz –512 (2 in same computing node) –35 at top500.org –Linpack Benchmark 825.
1 BGL Photo (system) BlueGene/L IBM Journal of Research and Development, Vol. 49, No. 2-3.
Parallel Computers Past and Present Yenchi Lin Apr 17,2003.
An Introduction to Princeton’s New Computing Resources: IBM Blue Gene, SGI Altix, and Dell Beowulf Cluster PICASso Mini-Course October 18, 2006 Curt Hillegas.
Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.
Earth Simulator Jari Halla-aho Pekka Keränen. Architecture MIMD type distributed memory 640 Nodes, 8 vector processors each. 16GB shared memory per node.
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
Real Parallel Computers. Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel.
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
Interconnection and Packaging in IBM Blue Gene/L Yi Zhu Feb 12, 2007.
Peter Wegner, DESY CHEP03, 25 March LQCD benchmarks on cluster architectures M. Hasenbusch, D. Pop, P. Wegner (DESY Zeuthen), A. Gellrich, H.Wittig.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
PC DESY Peter Wegner 1. Motivation, History 2. Myrinet-Communication 4. Cluster Hardware 5. Cluster Software 6. Future …
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
Building a High-performance Computing Cluster Using FreeBSD BSDCon '03 September 10, 2003 Brooks Davis, Michael AuYeung, Gary Green, Craig Lee The Aerospace.
1 Down Place Hammersmith London UK 530 Lytton Ave. Palo Alto CA USA.
05/18/03Maurizio Davini Hepix2003 Department of Physics University of Pisa Site Report Maurizio Davini Department of Physics and INFN Pisa.
1 Lecture 7: Part 2: Message Passing Multicomputers (Distributed Memory Machines)
High Performance Computing G Burton – ICG – Oct12 – v1.1 1.
Lappeenranta University of Technology / JP CT30A7001 Concurrent and Parallel Computing Introduction to concurrent and parallel computing.
9/16/2000Ian Bird/JLAB1 Planning for JLAB Computational Resources Ian Bird.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Rensselaer Why not change the world? Rensselaer Why not change the world? 1.
Hardware Trends. Contents Memory Hard Disks Processors Network Accessories Future.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
BlueGene/L Facts Platform Characteristics 512-node prototype 64 rack BlueGene/L Machine Peak Performance 1.0 / 2.0 TFlops/s 180 / 360 TFlops/s Total Memory.
Center for Computational Sciences O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Vision for OSC Computing and Computational Sciences
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
2009/4/21 Third French-Japanese PAAP Workshop 1 A Volumetric 3-D FFT on Clusters of Multi-Core Processors Daisuke Takahashi University of Tsukuba, Japan.
Copyright © 2011 Curt Hill MIMD Multiple Instructions Multiple Data.
Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.
Cluster Software Overview
IDC HPC User Forum April 14 th, 2008 A P P R O I N T E R N A T I O N A L I N C Steve Lyness Vice President, HPC Solutions Engineering
Virtualization Supplemental Material beyond the textbook.
Interconnection network network interface and a case study.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
BLUE GENE Sunitha M. Jenarius. What is Blue Gene A massively parallel supercomputer using tens of thousands of embedded PowerPC processors supporting.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
Super Computing By RIsaj t r S3 ece, roll 50.
LQCD benchmarks on cluster architectures M. Hasenbusch, D. Pop, P
SiCortex Update IDC HPC User Forum
Cluster Computers.
Presentation transcript:

SC2002, Baltimore ( From the Earth Simulator to PC Clusters Structure of SC2002 Top500 List Dinosaurs Department Earth simulator – US -answers (Cray SX1, ASCI purple), BlueGene/L QCD computing – QCDOC, apeNEXT poster Cluster architectures Low voltage clusters, NEXCOM, Transmeta-NASA Interconnects – Myrinet, QsNet, Infiniband Large installations –LLNL, Los Alamos Cluster software Rocks (NPACI), Linux BIOS (NCSA) Algorithms - David H. Bailey Experiences - HPC in an oil company Conclusions

SC2002 Structure Technical Papers Planeries Panels Education Posters Masterworks Awards Exhibits Industry Research BOFs (Grid, Top500…) SCNet Tutorials … NIC Jülich/DESY FNAL/SLAC …

SC2002 Top500 list RankManufacturer Computer/Procs R max R peak Installation Site Country/Year Inst. type Installation Area 1 NEC,Vector SX6 Earth-Simulator/ Earth Simulator Center Earth Simulator Center Japan/2002 Research 2 Hewlett-Packard ASCI Q - AlphaServer SC ES45/1.25 GHz/ Los Alamos National Laboratory Los Alamos National Laboratory USA/2002 Research 3 Hewlett-Packard ASCI Q - AlphaServer SC ES45/1.25 GHz/ Los Alamos National Laboratory Los Alamos National Laboratory USA/2002 Research 4 IBM ASCI White, SP Power3 375 MHz/ Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory USA/2000 Research Energy 5 Linux NetworX MCR Linux Cluster Xeon 2.4 GHz - Quadrics/ Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory USA/2002 Research

SC2002 Dinosaurs – The Earth Simulator 65m 50m 640 nodes connected by a 640 x 640 single-stage crossbar switch 1 Node = 8 Arithm. Vector Processors + 16 GByte shared memory Cost: > $ 350 Mio

SC2002 Dinosaurs – The Earth Simulator In April 2002, the Earth Simulator became operational. Peak performance of the Earth Simulator is 40 Teraflops (TF). The Earth Simulator is the new No. 1 on the Top 500 list based on the LINPACK benchmark set ( it achieved a performance of 35.9 TF, or 90% of peak. The Earth Simulator ran a benchmark global atmospheric simulation model at 13.4 TF on half of the machine, i.e. performed at over 60% of peak. The total peak capability of all DOE (US Department of Energy) computers is 27.6 teraflops. The Earth Simulator applies to a number of other disciplines such as fusion and geophysics as well.

SC2002 Dinosaurs – BlueGene/L IBM and Lawrence Livermore National Laboratory …A Gara, …, BD Steinmacher-Burow, … 3 Dimensional Torus 32 x 32 x 64 Virtual cut-through hardware routing – adaptive, deadlock-free 1.4 Gb/s on 12 node links (total of 2.1 GB/s per node)

SC2002 Dinosaurs – BlueGene/L IBM CMOS CU-11  m technology. “Big Brother of QCDOC”

SC2002 Dinosaurs – BlueGene/L Machine Peak Speed (Tflop/s)180 / 360* Total Memory (Tbytes)16–32 Footprint (m 2 )230 Total Power (MW)1.2 Cost (M$)<< $100 Mio Installation Date~12/2004 No. of Nodes65,536 CPUs per Node2 Clock Frequency (MHz)700 Power Dissipation/Node (W)15 Peak Speed/Node (Gflop/s)2.8 Memory/Node (GB)0.25–0.5 MPI Latency (  s)7

Lattice QCD - QCDOC Columbia University RIKEN/Brookhaven National Laboratory (BNL) UKQCD IBM MIMD machine with distributed memory system-on-a-chip design (QCDOC = QCD on a chip)Technology: IBM SA27E = CMOS 7SF = 0.18  m lithography process ASIC combines existing IBM components and QCD-specific, custom- designed logic: 500 MHz PowerPC 440 processor core with 64-bit, 1 GFlops FPU 4 MB on-chip memory (embedded DRAM) Nearest-neighbor serial communications unit (SCU) 6-dimensional communication network (4-D Physics, 2-D partitioning) Silicon chip of about 11 mm square, consuming 2 W

Lattice QCD – QCDOC ASIC Processor cores + IBM CoreConnect Architecture

Lattice QCD, NIC/DESY Zeuthen poster : APEmille

Lattice QCD, NIC/DESY Zeuthen poster : apeNEXT

Low Power Cluster Architectures sensitivity to power consumption

Low Power Cluster Architectures good old Beowulf approach - LANL 240 Transmeta TM5600 CPUs (667 MHz) mounted onto blade servers. 24 blades then mount into a RLX Technologies System 324 chassis 10 chassis, with network switches, are mounted in a standard computer rack. Peak rate of 160 Gflops, Sustained - average of 38.9 Gflops on a 212 nodes system in a gravitational treecode N-body simulation of galaxy formation using 200 million particles. Power dissipation 1 Blade: 22W 240 node Cluster: 5.2 kW 'Green Destiny' could be a model for future high-performance computing ?!

Cluster Architectures Blade Servers NEXCOM – Low voltage blade server 200 low voltage Intel XEON CPUs (1.6 GHz – 30W) in a 42U Rack Integrated Gbit Ethernet network Mellanox – Infiniband blade server Single XEON Blades connected via a 10 Gbit (4X) Infiniband network NCSA, Ohio State University

Top500 Cluster 11.2 Tflops Linux cluster 4.6 TB of aggregate memory TB of aggregate local disk space 1152 total nodes plus separate hot spare cluster and development cluster 2,304 Intel 2.4 GHz Xeon processors Cluster File Systems, Inc. supplied the Lustre Open Source cluster wide file system Cluster connect: QsNet ELAN3 by Quadrics, 4 GB of DDR SDRAM memory per node, 120 GB Disk Space. A similar cluster with a Myrinet connection was announced for the Los Alamos National Laboratory, at Fermilab planned for 2006 MCR LINUX CLUSTER LLNL, LIVERMOORE LINUX NETWORX/QUADRICS Rmax: 5.69 TFlops

Clusters: Infiniband interconnect Link: High Speed Serial 1x, 4x, and 12x Host Channel Adapter: Protocol Engine Moves data via messages queued in memory Switch: Simple, low cost, multistage network Target Channel Adapter: Interface to I/O controller SCSI, FC-AL, GbE,... I/O Cntlr TCA Sys Mem CPU Mem Cntlr Host Bus HCA Link Switch Link Sys Mem HCA Mem Cntlr Host Bus CPU TCA I/O Cntlr up to 6GB/s Bi-directional Chips :IBM, Mellanox PCI-X cards: Fujitsu, Mellanox, JNI, IBM

Clusters Infiniband interconnect

Cluster/Farm Software NPACI (National Partnership for Advanced Computational Infrastructure) Rocks Open-source enhancements to Red Hat Linux (uses RedHat Kickstart). 100% automatic Installation – Zero hand configuration One CD installs all servers and nodes in a cluster (PXE) Entire Cluster-Aware Distribution Full Red Hat release De-facto standard cluster packages (e.g., MPI, PBS) NPACI Rocks packages Initial configuration via simple web page, integrated monitoring (Ganglia) Full re-installation instead of configuration management (cfengine) Cluster configuration database (based on MySQL) Sites using NPACI Rocks Germany: ZIB Berlin, FZK Karlsruhe US:SDSC, Pacific Northwest National Laboratory, Nothwestern University, University of Texas, Caltech, …

Cluster/Farm Software LinuxBIOS Replaces the normal BIOS found on Intel-based PCs, Alphas, and other machines with a Linux kernel that can boot Linux from a cold start. Primarily Linux—about 10 lines of patches to the current Linux kernel. Additionally, the startup code—about 500 lines of assembly and 1500 lines of C—executes 16 instructions to get into 32-bit mode and then performs RAM and other hardware initialization required before Linux can take over. Provides much greater flexibility than using a simple netboot. LinuxBIOS is currently working on several mainboards

PC Farm real life example High Performance Computing in a Major Oil Company (British Petroleum) Keith Gray Dramatically increased computing requirements in support of seismic imaging researchers SGI-IRIX, SUN-Solaris, IBM-AIX, SUN-Solaris PC-Farm-Linux Hundreds of Farm PCs, Thousands of desktops GRD (SGEEE) batch system, cfengine - configuration update Network Attached Storage SAN – too expensive switches

Algorithms High Performance Computing Meets Experimental Mathematics David H. Bailey The PSLQ Integer Relation Detection Algorithm : Recognize a numeric constant in terms of the formula that it satisfies. PSLQ is well-suited for parallel computation, and lead to several examples of new mathematical results, some of these computations performed on highly parallel computers, since they are not feasible on conventional systems (e.g. the identification of Euler-Zeta Sum Constants). New software package for performing arbitrary precision arithmetic, which is required in this research:

SC2002 Conclusions Big supercomputer architectures are back – Earth simulator, Cray SX1, BlueGene/L Special architectures for LQCD computing are continuing Clusters are coming, number of special cluster vendors providing hard and software is increasing – LinuxNetworx (former AltaSystems), RackServer, … Trend to blade servers – HP, Dell, IBM, NEXCOM, Mellanox … Intel Itanium Server processor widely accepted – surprising success (also for Intel) at SC2002 Infiniband could be a promising alternative to the special High Performance Link products Myrinet and QsNet There may come standard tools for cluster and farm handling – NPACI Rocks, Ganglia, LinuxBios (for RedHat) Batch systems: increasing number of SGE(EE) users (Ratheon) Increasing number of GRID projects (no one mentioned here – room for another talk)