Better answers Compaq HPTC Solutions Bruce Foster, Ph.D., MBA

Slides:



Advertisements
Similar presentations
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Advertisements

Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 1 CCS-3 P AL STATE OF THE ART.
PARAM Padma SuperComputer
Commodity Computing Clusters - next generation supercomputers? Paweł Pisarczyk, ATM S. A.
♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.
Ver 0.1 Page 1 SGI Proprietary Introducing the CRAY SV1 CRAY SV1-128 SuperCluster.
Beowulf Supercomputer System Lee, Jung won CS843.
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
AASPI Software Computational Environment Tim Kwiatkowski Welcome Consortium Members November 18, 2008.
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
Silicon Graphics, Inc. Poster Presented by: SGI Proprietary Technologies for Breakthrough Research Rosario Caltabiano North East Higher Education & Research.
NPACI Panel on Clusters David E. Culler Computer Science Division University of California, Berkeley
Parallel Computing Overview CS 524 – High-Performance Computing.
Multiprocessors ELEC 6200 Computer Architecture and Design Instructor: Dr. Agrawal Yu-Chun Chen 10/27/06.
Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.
Earth Simulator Jari Halla-aho Pekka Keränen. Architecture MIMD type distributed memory 640 Nodes, 8 vector processors each. 16GB shared memory per node.
PARALLEL PROCESSING The NAS Parallel Benchmarks Daniel Gross Chen Haiout.
Sun FIRE Jani Raitavuo Niko Ronkainen. Sun FIRE 15K Most powerful and scalable Up to 106 processors, 576 GB memory and 250 TB online disk storage Fireplane.
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
Douglas Doerfler Sandia National Labs April 13th, 2004 SOS8 Charleston, SC “Big” and “Not so Big” Iron at SNL.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,
Processor Technology John Gordon, Peter Oliver e-Science Centre, RAL October 2002 All details correct at time of writing 09/10/02.
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
1 Lecture 7: Part 2: Message Passing Multicomputers (Distributed Memory Machines)
GPU Programming with CUDA – Accelerated Architectures Mike Griffiths
CENG 546 Dr. Esma Yıldırım. Copyright © 2012, Elsevier Inc. All rights reserved What is a computing cluster?  A computing cluster consists of.
RSC Williams MAPLD 2005/BOF-S1 A Linux-based Software Environment for the Reconfigurable Scalable Computing Project John A. Williams 1
Online Systems Status Review of requirements System configuration Current acquisitions Next steps... Upgrade Meeting 4-Sep-1997 Stu Fuess.
Bring Consolidation Into Focus The Value of Compaq AlphaServer and Storage Consolidation Solutions Joseph Batista Director Enterprise & Internet Initiatives.
Operating System & Alpha Update Finance Industry Business Manager OpenVMS Business Group Compaq Computer Corporation Robert Watson.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Edgar Gabriel Short Course: Advanced programming with MPI Edgar Gabriel Spring 2007.
Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.
Sun Fire™ E25K Server Keith Schoby Midwestern State University June 13, 2005.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
AlphaServer UNIX Resource Consolidation.
1 Galaxy S peed T otal Cost of Ownership A vailability R eliability S calability Technology Overview...
- Rohan Dhamnaskar. Overview  What is a Supercomputer  Some Concepts  Couple of examples.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Probe Plans and Status SciDAC Kickoff July, 2001 Dan Million Randy Burris ORNL, Center for.
Infiniband Bart Taylor. What it is InfiniBand™ Architecture defines a new interconnect technology for servers that changes the way data centers will be.
Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.
Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.
Cray Environmental Industry Solutions Per Nyberg Earth Sciences Business Manager Annecy CAS2K3 Sept 2003.
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing,
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
The Alpha – Data Stream Matt Ziegler.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-2.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Background Computer System Architectures Computer System Software.
Lecture # 10 Processors Microcomputer Processors.
SYSTEM MODELS FOR ADVANCED COMPUTING Jhashuva. U 1 Asst. Prof CSE
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
CNAF - 24 September 2004 EGEE SA-1 SPACI Activity Italo Epicoco.
Lynn Choi School of Electrical Engineering
Berkeley Cluster Projects
Constructing a system with multiple computers or processors
Is System X for Me? Cal Ribbens Computer Science Department
Web Server Administration
Constructing a system with multiple computers or processors
Types of Parallel Computers
Cluster Computers.
Presentation transcript:

Better answers Compaq HPTC Solutions Bruce Foster, Ph.D., MBA

Better answers Top100 SuperComputer Architectures (June 1999)

Better answers The Barriers to Performance Scaling CPU cycle time (nsec) K10K100K 100 to 200 GFLOP Limit SC (PVP) SMP Cluster Farm MPP Numbers of CPUs Physical Limits Complexity Limits

Better answers CPU cycle time (nsec) Physical Limits K10K100K SC (PVP) SMP Cluster Farm MPP Numbers of CPUs Complexity Limits Fastest Microprocessors with best interconnects for SMP Clusters yield Maximum Application Performance (TeraFLOP Level) Clusters of SMPs Are Breaking Through to the TeraFlop Level

Better answers High Performance Computing Systems HPTC Solutions AlphaServer Systems Interconnects Software Services

Better answers Compaq is in it for the long haul!  Alpha roadmap committed for 10 years and beyond of performance leadership. Tandem will use Alpha in their next generation systems. Tandem owns 36 of the top 38 stock markets worldwide. Tandem will use Alpha in their next generation systems. Tandem owns 36 of the top 38 stock markets worldwide.  Over 50% of Compaq’s revenue is from Enterprise Systems

Better answers Wide Presence in HPTC market  Intel/ServerNet clusters at NCSA  Alpha Linux/ServerNet at Caltech  Alpha Tru64 Unix/FastEthernet at Swinburne  Alpha Linux /Myrinet “C-Plant” at Sandia (#44 on Top500 list)  HPTi win at FSL (Alpha Linux /Myrinet) 4 TFlop system  Compaq Visual Fortran for W95/NT  Compaq Compilers for Alpha/Linux  Several very large SC systems (#34 on Top500 list)  Celera 300 x 4 CPU ES40s (1.2 TFlop)  ASCI PathForward and ASCI Turquoise

Better answers 1999 Small and Medium AlphaServers  Compaq DS10  Compaq DS20 System 2 CPUs, small PC tower 2 CPUs, small PC tower 5.13 GB/s peak, 1.3 GB/s Single-CPU McCalpin Memory B/W 5.13 GB/s peak, 1.3 GB/s Single-CPU McCalpin Memory B/W  Compaq ES40 System 4 CPUs, bigger cabinet 4 CPUs, bigger cabinet EV67 systems: 2.5 GB/s 4-CPU McCalpin b/w EV67 systems: 2.5 GB/s 4-CPU McCalpin b/w Double the I/O bandwidth & more slots Double the I/O bandwidth & more slots

Better answers Next Generation DS/ES AlphaServers Designed to Protect Your Investment SecondGeneration 125 MHz Data Bus Ultra2 64- bit RAID EV MHz EV MHz 8 MB L2 Cache 32 GB Ultra3 SCSI DVDDVD Processor Architecture Memory Storage ThirdGeneration Alpha MHz 4 MB L2 Cache 16 GB of Memory 83 MHz Data Bus 2 64-bit PCI busses 33 MHz PCI Ultra2 First Generation 4 PCI Busses 66 MHz PCI AGPAGP ThirdGeneration Note: Feature set varies between AlphaServer DS and ES products based on customer needs

Better answers SC’99  16x4 ES40 => 64 CPUs  Quadrics Interconnect  1.7TB Storage

Better answers LINPACK NxN Rmax (GFlops)

Better answers

Cluster and Parallel File System  Cluster File System File system mounted on any node is visible to all nodes without race conditions File system mounted on any node is visible to all nodes without race conditions Each node is both a CFS server and CFS client Each node is both a CFS server and CFS client Coherency is maintained by exchanging tokens Coherency is maintained by exchanging tokens Semantics are POSIX and X/OPEN compliant Semantics are POSIX and X/OPEN compliant Performance depends on access type and pattern Performance depends on access type and pattern  Parallel File System Aggregates CFS files into a single parallel file Aggregates CFS files into a single parallel file Enables striping a single logical file across multiple underlying local files Enables striping a single logical file across multiple underlying local files

Better answers Compilers & Tools  Compaq F90, C, C++, Java, …  Shared memory Parallelization within SMP node by OpenMP Parallelization within SMP node by OpenMP 3rd party decomposition tools (KAI) 3rd party decomposition tools (KAI)  Cray T3D/E-compatible Shmem library  MPI (MPI 2, MPI-I/O, thread-safe)  Debugger: TotalView (Etnus, Inc.)  Performance analysis: Vampir (PALLAS GmbH)  Load balancing: LSF (Platform Computing)

Better answers Our Capability Machine is Here  A 16-CPU AlphaServer at SC’99 16-way GS160 AlphaServer 16-way GS160 AlphaServer 16 * 1.46 GF/CPU = 23.4 GFLOPS 16 * 1.46 GF/CPU = 23.4 GFLOPS High sustainable memory bandwidth High sustainable memory bandwidth  32-way: 32 CPUs: 46.8 GFLOPS 32 CPUs: 46.8 GFLOPS Very high sustainable memory bandwidth Very high sustainable memory bandwidth

Better answers Alpha Microprocessor Summary  EV6 (21264).35  m, MHz.35  m, MHz 4-wide superscalar 4-wide superscalar Out-of-order execution Out-of-order execution  EV67 (21264a).25  m, MHz.25  m, MHz 8MB L2 cache 8MB L2 cache  EV68 (21264b).18  m, MHz.18  m, MHz  EV7 (21364).18  m, ~1200 MHz.18  m, ~1200 MHz L2 cache on-chip L2 cache on-chip RAMBUS RAMBUS Glueless MP Glueless MP  EV8 (21464).13  m, ~1500 MHz.13  m, ~1500 MHz 8-wide superscalar 8-wide superscalar SMT SMT... Future Alpha Microprocessors planned through to 2025 !

Better answers EV67/667MHz Preliminary HPTC Applications Results  30 to 45% improvement over ES40 EV6/500mhz  Competitive leadership 1.15 to over 2 times HP N to over 2 times HP N4000 – Better than an 8 CPU N4000 Over 2 times SGI Origin 2000 Over 2 times SGI Origin 2000 – Better than an 8 CPU Origin 2000 Over 2 times Sun UE3000 Over 2 times Sun UE to 4 times Intel Xeon III 2 to 4 times Intel Xeon III

Better answers Global Switch EV6 Mem I/O Switch EV67 Mem I/O Switch EV67 Mem I/O Switch EV67 Mem I/O Switch EV67 Mem I/O Switch EV67 Mem I/O Switch EV67 Mem I/O Switch EV67 Mem I/O Switch New High-end AlphaServer Architecture A new way of looking at Servers  Each Quad Building Block 4 EV67 CPUs (731 MHz, 1.46 GFlops) 4 EV67 CPUs (731 MHz, 1.46 GFlops) 4 Memory Arrays (total of 16GB, 32-way) 4 Memory Arrays (total of 16GB, 32-way) 6.4 GB/s Local Switch 6.4 GB/s Local Switch 28 PCI slots 28 PCI slots  Quads aggregate via a Global Switch (8 ports) Combines up to 8 quads Combines up to 8 quads High Bandwidth, Low Latency High Bandwidth, Low Latency Preserves SMP programming model  Up to 8 System Partitions Hardware firewalls provide software fault isolation between partitions Can be dynamically reconfigured Support multiple instances and versions of same O/S or different O/S completely (Tru64 UNIX, OpenVMS, and soon Linux)

Better answers Overview of CY2000  CPUs/SMP DS10 (1 CPU), DS10 (1 CPU), DS20 (2 CPUs), DS20 (2 CPUs), ES40 (4 CPUs), and ES40 (4 CPUs), and GS80 (8), GS160 (16) and GS320 (32) GS80 (8), GS160 (16) and GS320 (32)  Systems up to 4096 CPUs 128-way 128-way  Microprocessor speed Around 1GHz at end-2000 Around 1GHz at end-2000

Better answers Systems Area Network: FAST Message Passing  Quadrics Backbone of our AlphaServer SC systems. Backbone of our AlphaServer SC systems. High Bandwidth, Low Latency, High Node/CPU Count High Bandwidth, Low Latency, High Node/CPU Count It’s a PCI Card; this allows systems of both small and big servers. It’s a PCI Card; this allows systems of both small and big servers.  ServerNet Engineered for low per-node SAN cost. Engineered for low per-node SAN cost. Brings Tandem Non-Stop technology to Alpha Linux Beowulfs Brings Tandem Non-Stop technology to Alpha Linux Beowulfs  Myrinet Ties together hundreds of Alphas on Sandia’s C-Plant. Ties together hundreds of Alphas on Sandia’s C-Plant.  Ethernet/Fast Ethernet Low cost interconnect for medium size systems; (Alpha at Swinburne, Sydney Uni (Gordon Bell winner), CSIRO multiple divisions) Low cost interconnect for medium size systems; (Alpha at Swinburne, Sydney Uni (Gordon Bell winner), CSIRO multiple divisions)

Better answers Customer Comments: Alpha and Red Hat  Comments from "The Center for the Neural Basis of Cognition ” It runs about six times faster on that {DS20} machine than on a Pentium II 400. It runs about six times faster on that {DS20} machine than on a Pentium II 400.  Comments From West Coast University math department: PII k cache g77 -O3 75:02 PII k cache g77 -O3 75:02 Celeron 450A-128K cache g77 -O3 74:44 Celeron 450A-128K cache g77 -O3 74:44 Alpha MB cache g77 -O3 29:27 Alpha MB cache g77 -O3 29:27 Alpha MB cache g77 -O3 17:16 Alpha MB cache g77 -O3 17:16 Alpha MB cache fort -O3 8:42 Alpha MB cache fort -O3 8:42 I'm impressed (both with the AlphaServer and Compaq Fortran). It's a 5 mesh fluid flow used for modeling blood flows. I'm impressed (both with the AlphaServer and Compaq Fortran). It's a 5 mesh fluid flow used for modeling blood flows.  Comments from Canadian University. With your Fortran compiler the DS20 is about 3.5x the speed of an SGI Origin 200 with a 180Mhz R10K CPU, pretty impressive. With your Fortran compiler the DS20 is about 3.5x the speed of an SGI Origin 200 with a 180Mhz R10K CPU, pretty impressive. 9 times ! 6 times ! 3.5 times!

Better answers Complete Suite of HPTC Systems 1- 2 Processors Up to 4GB of memory 6 PCI slots Switched based system - 64-bit PCI I/O subsystems - Very Large Memory Scalable clusters on DIGITAL UNIX, OpenVMS and Linux Modular system packaging - advanced systems management DS Series Apr 1- 4 Processors Up to 16GB of memory Up to 10 PCI slots ES SeriesFeb May ComingSoon 1-32 Processors Up to GB of memory Up to 224 PCI slots GS Series SC Series EV MHz Processors Up to 2 TB memory Up to 1.2K I/O slots Announcing

Better answers Thank You! Please visit our HPTC Web Site or send to Steve Tolnai or myself