Download presentation
Presentation is loading. Please wait.
Published byOswin Baldwin Modified over 9 years ago
1
NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001
2
NERSC User Group Meeting DOE Science Computational Requirements… … always outpace available resources FY2001 Request13,645,300 FY2001 Awards 7,532,200 FY2002 “Requests” 20,448,000 FY2003 “Requests” 28,358,000 MPP resources only. The FY02 and FY03 figures are estimates for NERSC planning only.
3
NERSC User Group Meeting Traditional NERSC Computational Strategy Traditional strategy within existing NERSC Program funding Acquire new computational capability every three years - 3 to 4 times capability increase of existing systems Early, commercial, balanced systems with focus on - stable programming environment - mature system management tools - good sustained to peak performance ratio Total value of $25M - $30M - About $9-10M/yr. using lease to own Have two generations in service at a time - e.g. T3E and IBM SP Phased introduction
4
NERSC User Group Meeting Re-evaluation of the Strategy In order to evaluate our strategy for NERSC-4 and beyond we employ: Trend analysis: determine target ranges for performance of future systems which assure the high-end capability for the Office of Science Technology analysis: understand the different technology options to get into the target range Constraint analysis: understand what is feasible (space, power, budget)
5
NERSC User Group Meeting NERSC Peak Performance History
6
NERSC User Group Meeting TOP 500 - Performance Development NERSC-1 NERSC-2 NERSC-3 Phase 1
7
NERSC User Group Meeting TOP500 Performance Development
8
NERSC User Group Meeting Performance Development NERSC-1 NERSC-2 NERSC-3 NERSC-4 Peak!
9
NERSC User Group Meeting Trend Analysis In order to maintain flagship role, new NERSC capability systems should be in the TOP10 at installation time Aggregate systems performance has been accelerated because of Moore’s Law + increased parallelism: expect a factor 5-6 every three years NERSC-4 in late 2003 should have at least 20-30 Tflops LINPACK performance NERSC-5 in late 2006 should have at least 100-180 Tflops LINPACK performance
10
NERSC User Group Meeting Extrapolation to the Next Decade ASCI Earth Simulator Blue Gene
11
NERSC User Group Meeting 2000 - 2005: Technology Options Clusters — SMP nodes, with custom interconnect — PCs, with commodity interconnect — vector nodes (in Japan) Custom built supercomputers — Cray SV-2 — IBM Blue Gene Other technology to influence HPC — IRAM/PIM — low power processors (Transmeta) — consumer electronics (Playstation 2) — Internet computing not yet mature for NERSC-4 not general purpose
12
NERSC User Group Meeting Cray SV2 Overview —Basic building block is a 50/100 GFLOPs node: —4 x CPUs per node. IEEE. Design goal is 12.8 GFLOPs per CPU. —8, 16 or 32 GB of coherent flat shared memory per CPU —SSI to 1024 nodes: 50/100 TFLOPs, 32TB: —100 GB/sec interconnect capacity to/from each node —~1 microsecond latency anywhere in hypercube topology —Targeted date of introduction, mid-2002. —LC cabinets; Integral HEU (heat exchange unit) —Up to 64 cabinets (4096 CPUs/50 TFLOPS) mesh topology — availability 4Q2002
13
NERSC User Group Meeting Incoming Power Box Air Coil FC-72 Filters Router Modules Node Modules Power Supplies Heat Exchanger FC-72 Gear Pumps I/O Cables Cray Scalable Systems Update - Copyright Cray Inc, used by permission Liquid-Cooled Cabinet(64 CPUs)
14
NERSC User Group Meeting 2000 - 2005: Technology Options Clusters — SMP nodes, with custom interconnect — PCs, with commodity interconnect — vector nodes (in Japan) Custom built supercomputers — Cray SV-2 — IBM Blue Gene Other technology to influence HPC — IRAM/PIM — low power processors (Transmeta) — consumer electronics (Playstation 2) — Internet computing not yet mature for NERSC-4 not general purpose high risk
15
NERSC User Group Meeting Global Earth Simulator 30 Tflop/s system in Japan completion 2002 driven by climate and earthquake simulation requirements built by NEC CMOS vector nodes
16
NERSC User Group Meeting Earth Simulator
17
NERSC User Group Meeting Global Earth Simulator Building
18
NERSC User Group Meeting Japanese Vector Platforms In the 2002 – 2005 time frame these platforms do not offer any advantage compared to SMP clusters built by American commercial vendors: —Distributed memory requires message passing —Three levels of memory hierarchy require more complicated trade-offs for performance —Similar space and power requirements By 2003-4 a shared memory vector supercomputer will no longer be a capability platform. NERSC could pursue this as a capacity platform in addition to NERSC-4 (at the expense of a smaller capability).
19
NERSC User Group Meeting 2000 - 2005: Technology Options Clusters — SMP nodes, with custom interconnect — PCs, with commodity interconnect — vector nodes (in Japan) Custom built supercomputers — Cray SV-2 — IBM Blue Gene Other technology to influence HPC — IRAM/PIM — low power processors (Transmeta) — consumer electronics (Playstation 2) — Internet computing not yet mature for NERSC-4 not general purpose high risk no techn. advantage
20
NERSC User Group Meeting Cluster of SMP Approach Processor Building Blue Gene
21
NERSC User Group Meeting 10 - 100 Tflop/s Cluster of SMPs Relatively low risk — Systems are extensions of current product line to high end — Several commercially viable vendors in the US — Experience at NERSC — Leverage from ASCI investment The first ones are already on order — LLNL is installing a 10 Tflop/s now — LANL just ordered a 30 Tflop/s Compaq system
22
NERSC User Group Meeting 100 - 1000 Tflop/s Cluster of SMPs (IBM Roadmap) Processor Building Blue Gene
23
NERSC User Group Meeting PC Clusters: Contributions of Beowulf An experiment in parallel computing systems Established vision of low cost, high end computing Demonstrated effectiveness of PC clusters for some (not all) classes of applications Provided networking software Conveyed findings to broad community (great PR) Tutorials and book Design standard to rally community! Standards beget: books, trained people, software … virtuous cycle Adapted from Gordon Bell, presentation at Salishan 2000
24
NERSC User Group Meeting Linus’s Law: Linux Everywhere Software is or should be free (Stallman) All source code is “open” Everyone is a tester Everything proceeds a lot faster when everyone works on one code (HPC: nothing gets done if resources are scattered) Anyone can support and market the code for any price Zero cost software attracts users! All the developers write lots of code Prevents community from losing HPC software (CM5, T3E)
25
NERSC User Group Meeting Is a Commodity Cluster a Supercomputer? The good —Hardware cost – commercial off-the-shelf —Majority of software is Open Source —Popular and trendy (see top500 list) —Well established programming model (MPI) The bad —Architecturally imbalanced —Higher level of complexity in HW and SW —User and system environment not fully featured like supercomputer
26
NERSC User Group Meeting Is a Commodity Cluster a Supercomputer? (cont.) The unknown —Real lifecycle costs —Rate of improvement of software environment (system and user level) —Performance and scalability —Applicability to broad range of applications What is the feasibility and cost-effectiveness of cluster systems for high-performance production capability computing workload? NERSC is currently evaluating these issues to prepare for NERSC-4 Major announcements about PC Clusters (Shell, NCSA)
27
NERSC User Group Meeting Summary on Technology Assessment Likelihood that technology will be chosen NERSC-4NERSC-5 FY2003FY2006 Cluster of SMP75%40% PC Cluster20%40% Vectors (Japanese)0.1%0% Custom built(SV-2)4.9%5% (or 0%??) New technology0%15%
28
NERSC User Group Meeting How Big Can NERSC-4 be Assume a deliver in FY 2003 Assume no other space is used in Oakland until NERSC-4 Assume cost is not an issue (at least for now) Assume technology still progresses —ASCI will have a 30 Tflop/s system running for over 2 years
29
NERSC User Group Meeting Full Computer Room Available Space Phase B of OSF
30
NERSC User Group Meeting How close is 100 Tflop/s? Available gross space in Oakland is 7,700 sf without major changes —Assume it is 70% usable —The rest goes to air handlers, columns, etc. That gives 5,400 sf of space for racks IBM system used for estimates —Other vendors are similar Each processor is 1.5 Ghz, to yield 6 Gflop/s An SMP node is made up of 32 processors 2 Nodes in a frame —64 processors in a frame = 384 Gflops per frame. Frames are 32 - 36" wide and 48” deep —service clearance of 3 feet in front and back (which can overlap) —3 by 7 is 21 sf per frame
31
NERSC User Group Meeting Practical System Peak Rack Distribution —60% of racks are for CPUs 90% are user/computation nodes 10% are system support nodes —20 % of racks are for switch fabric —20% of racks for disks 5,400 sf / 21 sf per frames = 257 frames 277 nodes that are are directly used by computation —8,870 CPUS for computation —system total is 9,856 (308 nodes) Practical system peak is 53 Tflop/s —.192 Tflop/s per node * 277 nodes —Some other places would claim 60 Tflop/s
32
NERSC User Group Meeting NERSC-4 Based on this analysis, NERSC can accommodate 53 Tflop/s peak system in existing facility with projected cluster of SMP technology Even at optimistic cost estimate of $1-2 M per Teraflop/s, budgets will be the limiting factor
33
NERSC User Group Meeting Outline Role of NERSC in SciDAC —DOE Topical Computing Facilities —Enabling Technology Centers —DOE’s Scientific Challenge Projects
34
NERSC User Group Meeting SciDAC Overview
35
NERSC User Group Meeting SciDAC adjustments to strategy SciDAC provides an accelerated strategy Increased funding by $5.8M/yr planned NERSC-3 contract has several options to allow upgrade of existing phase 2 system OASCR has not yet decided on level of incremental funding for NERSC platforms NERSC is preparing SciDAC platform strategy to maintain a balanced system and provide maximal capability to SciDAC users Planned funding would permit an upgrade of NERSC-3 to 5-6 Tflop/s peak at the expense of an unbalanced system
36
NERSC User Group Meeting NERSC’s and LBNL’s role The role of the NERSC Center as Flagship Facility for SciDAC is well defined NERSC should be able to compete for Topical Centers NERSC must be active participant in the development and deployment of new technology in the ETCs (Enabling Technology Centers) NERSC must be active participant in the Scientific Challenge Teams
37
NERSC User Group Meeting PDSF: a “Topical Facility” since 1996 PDSF and NERSC hardware arrived at LBNL at the same time in 1996 MICS agreed to dedicate 2FTEs to PDSF operation and to integrate PDSF into NERSC PDSF at NERSC evolved into a unique resource for HEP community PDSF strength: cost effective processing and easy access to NERSC HPSS system HENP experiments can draw upon resources and expertise within NERSC NERSC was stimulated to pursue R&D projects in — data intensive computing — distributed data access & computing — cluster computing
38
NERSC User Group Meeting PDSF Users and Collaborations ATLAS, D0, CDF, E895, E896, GC5, PHENIX, STAR HENP groups which are using or have used (at a significant level) PDSF include: AMANDA, ATLAS, CDF, E871, E895, GC5, NA49, PHENIX, RHIC Theory, SNO, STAR Specific software/production projects include: —CERNlib port to T3E —NERSC personnel (HCG & USG) helped with port of CERNlibs to T3E —NERSC T3E was used for port of CERNlibs —NERSC T3E provided 1/2 of data generated by STAR GEANT for first STAR Mock Data Challenge —Pittsburg Supercomputing Center T3E provided 1/2 of data —Stored on HPSS —Transfered using DPSS and pftp
39
NERSC User Group Meeting Current PDSF Configuration
40
NERSC User Group Meeting Enabling Technology Centers NERSC/LBNL is currently engaged in proposal activities for ETCs, which leverage the experience of development and deployment in the center, and the research experience of scientific staff at LBNL —Applied Mathematics (LBNL, LANL, …) —Scientific Data Management (LBNL, LLNL, ORNL, ANL) —Benchmarking and Performance Evaluation (LBNL, ORNL, LLNL, ANL) —Systems Software (LBNL, ANL, ORNL …) —Optimal Solvers (LLNL, ANL, LBNL …) —Data Analysis and Visualization (ANL, LBNL, …)
41
NERSC User Group Meeting Scientific Challenge Projects NERSC is currently actively involved with the following pre-proposal activities: Climate (ANL, LLNL, NCAR, LANL) – FY2000 funding Accelerator Modeling (SLAC, LANL) – FY2000 funding Materials (ORNL, Ames Lab, ANL …) Astrophysics (LBNL-Physics, …) Fusion (PPPL, LLNL, …)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.