Download presentation
Presentation is loading. Please wait.
Published byNoah Lamb Modified over 6 years ago
1
Strategie e Soluzioni Fujitsu per il calcolo ad alte prestazioni
Cristian Antonucci Maria De Luca Coruzzi ENEA, 3 luglio 2012 Fujitsu #2 in Top500 Copyright 2011 FUJITSU
2
High Performance Computing with Fujitsu
HPC roots K-Computer Development Solution Positioning and Use Copyright 2011 FUJITSU
3
Fujitsu HPC Servers - past, present and future
Fujitsu has been developing the most advanced supercomputers in the world since 1977! PRIMERGY in HPC for 10 years! Exascale Trans-Exascale FX10 No.1 in Top500 (June and Nov., 2011) K computer World’s Fastest Vector Processor (1999) Vector Supercomputer Series FX1 VPP5000 Most Efficient Performance in Top500 (Nov. 2008) Next x86 generation NWT* Developed with NAL SPARC Enterprise VPP300/700 No.1 in Top500 (Nov. 1993) Gordon Bell Prize (1994, 95, 96) PRIMEQUEST PRIMERGY CX400 PRIMEPOWER HPC2500 VPP500 ⒸJAXA Scalar Supercomputer Series World’s Most Scalable Supercomputer (2003) VP Series PRIMERGY BX900 Cluster node AP3000 x86 Cluster Series HX600 Cluster node F230-75APU Scalar MPP Series AP1000 PRIMERGY RX200 Cluster node Japan’s Largest Cluster in Top500 (July 2004) Japan’s First Vector (Array) Supercomputer(1977) *NWT: Numerical Wind Tunnel -1985 1990 1995 2000 2005 2010 2015 2020
4
K installation at Riken AICS *) in Kobe
Air handling units Chillers Kobe AICS facilities and aerial photo LINPACK : PFlops System Mem. : PB No. of racks : No. of CPUs : ,128 No. of cores : ,024 No. of cables : > 20,000 神戸サイトの紹介。 Seismic isolation structure 3rd Computer floor, Oct.1st ,2010 (First 8 racks was installed) * : Advanced Institute for Computational Science Courtesy of RIKEN
5
"K computer" Achieves Goal of 10 Petaflops
and Tokyo, November 2, 2011 — RIKEN and Fujitsu today announced that the "K computer“, which is a supercomputer currently under their joint development, has achieved a LINPACK benchmark performance of petaflops Highest Performance Efficiency with 93%
6
Fujitsu’s approach to the HPC market
Fujitsu covers the HPC customer needs with tailored HPC platforms Supercomputer Divisional Departmental Work Group FX 10 Petascale Supercomputer PRIMERGY HPC solutions NEW Workgroup Power Workstation
7
Next level of supercomputing introduced Nov. ‘11
Fujitsu PRIMEHPC FX10 Next level of supercomputing introduced Nov. ‘11 Node Theoretical computational performance 236.5GFLOPS Processor SPARC64™ IXfx (1.848GHz, 16 cores) x 1 Memory capacity 32GB, 64GB Memory bandwidth 85GB/s Inter-node transfer rate 5GB/s × 2 (bidirectional) / link System No. racks 4 ~ 1,024 Nodes 384 ~ 98,304 90.8 ~ 23,248TFLOPS Total memory 12 ~ 6,291TB Interconnect “Tofu” Interconnect Cooling Method Direct water cooling + air cooling (Optional: Exhaust cooling unit) SPARC64™ VIII / IX fx RAS coverage ranges Hardware-based error detection + possible self-recovery range Hardware-based error detection range Range in which errors do not affect actual operation
8
Processor features Single CPU as a node design
SPARC64TM IXfx architecture SPARC64™ V9 + HPC-ACE (Arithmetic Computational Extensions) GHz Floating-point arithmetic unit (each core) Simultaneous floating-point arithmetic executions (per core) On board InterConnect Controller (ICC) High performance per watt 236.5 Gflops per CPU High-reliability Power-saving Technologies K = 7.99 fattore dipendente dall’architettura SPARC V9
9
Tofu Interconnect Single CPU and single interconnect controller
Very fast node to node communication: 10 links for inter-node connection 10 GB/s per link = 5GB/s x 2 (bi-directional) Low latency (min 1.5μs between adjacent nodes) Total 100 GB/s off-chip bandwidth: feeds sufficient data to high performance CPU Topology network Tofu - Fujitsu’s original 6D Mesh / Torus interconnect High comunication performance High system scalability High fault-tolerance
10
Tofu Interconnect Technology Topology
Very fast node to node communication, 5GB/s x 2 (bi-directional) Low latency (min 1.5μs between adjacent nodes, max 4.4μs for 1 Pflops configuration) Integrated MPI support for collective operations (Barrier and Reduction) Topology Physical topology: 6D Torus / Mesh addressed by (x, y, z, a, b, c) 10 links / node, 6 links for 3D torus and 4 links for Node Group User view/Application view : Logical 3D Torus (X, Y, Z) x+ x- z+ y+ z- y- (1,0,0) (0,0,1) (0,0,0) (0,1,1) (0,2,1) (1,2,1) (1,2,0) (0,2,0) (0,1,0) xyz 3D Torus y x z Node Group 3D connection b a c (2 : mesh) (3 : torus) (a, b, c) Node Group (12 nodes group, 2 x 3 x 2) 3D connection of each node
11
High Performance Computing with Fujitsu
PRIMERGY x86 based HPC Copyright 2011 FUJITSU
12
Positioning of PRIMERGY HPC Portfolio
Scale: Density, HA, Flexibility, Capability Max espandibility and scalability of I/O slots, HDDs, RAM. Intel Xeon and GPGU card The optimal solution for high-end performance, density, RAS and flexible I/O integration BX900 RX350 Highest flexibility in combining different requirements (I/O, interconnect, fat nodes, thin nodes) NEW CX400 RX200 RX300 Optimal integration of HPC capability and capacity requirements in standard rack infrastructures Intel® Xeon® processor E family with up to 8 GT/s Perfect bridge from desk side to cluster in a box Scale: Density, Performance/Cost , Capacity
13
Modular HPC growth potential towards ….
Scalability, density & energy costs Workgroup / departmental capacity RX200 NEW Data Center BX400 CX400 Flexibility to address all kinds of customer requirements NEW: skinless server PRIMERGY CX400 Massive scale-out due ultra dense server HPC GPU coprocessor support Latest generation Intel® Xeon® Processor E5 series Highest memory performance plus high reliability Low latency/high bandwidth Infiniband infrastructure Industry leading blade server density Industry leading I/O bandwidth Data Center BX900 Scalability, density & availability capability
14
PRIMERGY performance boost by Intel E5 “Sandy Bridge”
New HPC support functions in Romley / Sandy Bridge platform PRIMERGY S7/S3 benchmark vs. previous generation Up to 2X FLOPS for HPC workloads with Intel® Advanced Vector Extensions (AVE) New operations to enhance vectorization Extended SSE FP register sizes Up to 4 channels DDR3 1600 Performance advantage of up 120% compared to previous generation… Up to 70% more overall performance Up to 120% in HPC scenarios … enables to run more workloads on the same system Baseline (prev. generation) Advantages SPECpower +73% Efficiency SAP SD Server Power +43% HPC LINPACK +120% SPECfp_rate_ base2006 +80% Performance SPECint_rate_ base2006 +70% SAP SD +56% VMmark 2.0 +39% By increased performance of up to 120%, PRIMERGY dual socket servers will support the requirements of today and meet future demand Source: Intel
15
Most energy efficient server in the world
Fujitsu PRIMERGY achieves world record in energy efficiency and holds several best in class ratings World record in SPECpower_ssj2008 by breaking the prestigious milestone of 6,000 overall ssj_ops/watt Reduce energy consumption and current carbon footprint Up to 73% more performance per Watt compared to the previous generation means Up to 33% less energy for the same current performance level enables to better meet stringent environmental mandates for data centers Up to 66% more workloads on current power budget without stressing current data center cooling
16
PRIMERGY CX400 - HPC Design
CX400 combines high performance computing combined with high density at lower overall investment High Density / Scalability in 2U Chassis Main usage scenario HPC requirements optimally fulfilled Up to 4 nodes (1U) or 2 nodes (2U) per 2U chassis 2x Intel® Xeon® E processors / node Intel® Xeon® processor E node coming soon 16 DIMMs, up to 1600MHz Redundant, hot-plug PSUs for enhanced availability / lower servicing effort Up to 24x HDD FDR Infiniband interconnect option for highest, most efficient bandwidth and lowest latency GPU Option (2U node) Support of Intel MIC Q1/ 2013 planned Cloud HPC Copyright 2011 FUJITSU
17
PRIMERGY CX400 S1 – chassis Shared Power Shared Cooling
2 x PSU 92% efficiency (80Plus Gold) hot-plug, redundant Shared Cooling 4x 80W fans,redundant, non h-p Shared Drivebay Hot-plug disk drives - front access Hot-plug server nodes - rear access individual serviceable w/o disruption for other nodes Rear cabling for I/O Air flow: front-to-back Size: 2U 447 x 775 x 87 mm (W x D x H) Fits into standard 19“ racks no need for over-sized rack depth or rear-expander mechanics front back hot-plug PSUs and 4 x CX250 server nodes
18
PRIMERGY CX2yy Server nodes
Double Performance per U /w condensed dual socket server nodes CX250 S1 : 1U Server Node hot-plug with 2 CPUs standard node for HPC and Cloud Computing 2 x latest Intel® Xeon® processor E family up to 512 GB RAM 1 x PCIe expansion slot + 1 x mezzanine card CX270 S1: 2U Server Node hot-plug with 2 CPUs + GPGPU option HPC optimized node /w GPGPU acceleration 2x Intel® Xeon® processor E family 2 x PCIe expansion slots 1 x GPGPU option: NVIDIA Tesla 20 series: M2075 / M2090 CX250 S1 half-wide 1U server node 4 x per CX400 The PRIMERGY CX400 S1 represents the first generation of multi-node systems for large datacenters as well as for small and medium size environments. It is the 2U rack optimized enclosure to be filled with all infrastructure components necessary for flexible operation of a wide range of application types. Due to its conventional front-to-back cooling and rear- side external connectivity it enables for cost-effectiveness through installation and operation in existing rack infrastructures and air conditioning. A low level of complexity is granted by avoiding sharing of fabrics, I/O or management components. The main ingredients like server nodes, power supply and disk drives are hot-plug enabled for enhanced availability and lowered servicing effort. The approach to minimized energy consumption is underlined by up to two redundant, highly efficient (92%) 80Plus Gold PSUs, providing power for all built-in parts. Efficient cooling is guaranteed by the four centrally installed fan units. The system offers free choice between server nodes without (1U) and with (2U) GPGPU (General Purpose Computing on Graphics Processing Units), depending on the specific applications’ characteristics. All server nodes are half-wide, thus two of them may be positioned besides each other, and up to four 1U or two 2U nodes are the maximum number per PRIMERGY CX400 enclosure, thus density may be doubled compared to standard rack servers. Great flexibility is given through the different types of local disk drives, being installed at the enclosure front side. Up to twelve 3.5” or twenty-four 2.5” storage drives are usable, be it HDDs or SSDs, both with either SATA or SAS interfaces. The drives are assigned to the installed server blades group wise, adaptable to any demand and wallet. The wide variety offered by the PRIMERGY CX400 multi-node system makes it an ideal base for use in HPC environments, virtual client connectivity areas, medium sized application deployments, and many more. CX270 S1 half-wide 2U server node /w GPGPU option 2 x per CX400 Up to 64 Intel Xeon processor cores up to GB Memory up to 36 TB local storage 2U height Copyright 2011 FUJITSU
19
Storage & Parallel File System with Fujitsu
Copyright 2011 FUJITSU
20
Fujitsu Exabyte FS architecture example
Infiniband network FUJITSU EXABYTE FILE SYSTEM PRIMERGY HPC Cluster Fail-over pair Fail-over pair Fail-over pair Master node File mgmt node MDS node MDS node OSS 1 node OSS 2 node OSS 3 node OSS 4 node FC FC FC FC FC FC FC FC FC FC FC FC Ethernet network FUJITSU ETERNUS FUJITSU ETERNUS FUJITSU ETERNUS FC FC FC FC FC FC FC FC FC FC FC FC controller controller controller controller controller controller DX LUN DX LUN DX LUN DX LUN DX LUN DX LUN DX LUN DX LUN DX LUN DX LUN MDT MDT OST OST OST OST OST OST OST OST A file is not limited to the maximum size of an OST A file’s data blocks can be striped across multiple OSS’s and OST’s File striping improves I/O bandwidth The aggregate I/O bandwidth is the sum of all OSS’s that participate in the file system File system size is the sum of all OST’s that are configured for each of the file system OSS’s Copyright FUJITSU
21
Fujitsu Exabytes File System (FEFS)
The main enhancements of FEFS are: Scalability Enabling scalability of file systems from terabytes to a maximum of 8 Exabytes Offering superior price-performance for clusters consisting of several dozen nodes up to those comprising a million servers High performance Up to 10,000 storage servers with the world's highest throughput speed of 1 TB/s Metadata management capable of creating several tens of thousands of files per second, with between 1-3 times the performance of Lustre High reliability Built-in redundancy at all levels of the file system (such as RAID disk, InfiniBand multipath, and configurations of multiple servers and storage units), enables failover while jobs are running QoS Fair share features for allocating resources amongst users prevent individual users from monopolizing I/O processing resources Priority control settings of each node guarantees enables I/O processing bandwidth control for each node Directory level quota’s allow efficient use of disk capacity by monitoring and managing file system usage by users Copyright FUJITSU
22
Specification of FEFS Item FEFS Fujitsu expand the system
System Limits Max. File system size 100 PB (8 EB) Max file size 1 PB (8 EB) Max. # of files 32G (8x10 18) files Max OST size 100 TB (1 PB) Max. # of stripes 20K Node Scalability Max # of clients 1M clients Max # of OST 20 K Max block size (Backend File System) 512 KB Usability QoS (Fair share/Best effort) Yes Directory Quota Infiniband Multi-rail Copyright FUJITSU
23
Rack mountable modules
Product Lineup Rack mountable modules Max. drive number: 480 Max. cache capacity: 16GB Max. drive number: 960 Max. cache capacity: 96GB 3.5” type 2.5” type 3.5” type 2.5” type ETERNUS DX8700 S2 ETERNUS DX8700 S2 offers a unique modular architecture. Starting with minimum initial costs, storage can be flexibly expanded according to business needs. DX8700 S2 Remarks Maximum drive number 3,072 With 2.5" drives Maximum storage capacity (Physical) SAS [TB] With 2.5" SAS 900GB drives Nearline SAS [TB] With 3.5" Nearline SAS 3TB drives Maximum cache capacity 768 [GB] Host interfaces (Port number per device) FC 2/4/8G iSCSI 1G iSCSI 10G FCoE 10G (128port) (64port) (64port) (64port) ETERNUS DX8700 S2 reduces the initial investment in a large-scale system environment and operational costs for storage integration. Only rack mount model is available for ETERNUS DX8700 S2. <Supplementary note> The maximum storage capacity is physical capacity in this table. Maximum physical capacity 1kbyte = 1000byte Maximum logical capacity kbyte = 1024byte, formatted as RAID5 CE: Controller Enclosure DE: Drive Enclosure CM: Controller Module CA : Channel Adapter Copyright 2011 FUJITSU LIMITED
24
PRIMERGY HPC Ecosystem
Reference Summary Copyright 2011 FUJITSU
25
Building Blocks of PRIMERGY HPC Ecosystem
ISV and Research Partnerships Open Petascale Libraries Network PreDiCT Initiative PRIMERGY Server PCM Edition Cluster Operation Eternus Storage Consulting and Integration Services Sizing, design Proof of concept Integration into customer environment Certified system and production environment Complete assembly, pre-installation and quality assurance Ready to operate delivery Ready to Go
26
HPC Wales – A Grid of HPC Excellence
Initiative’s motivation and background Positioning of Wales at the forefront of supercomputing Promotion of research, technology and skills Improvement of economic development Creation of 400+ quality jobs, 10+ new business Implementation and rollout Distributed HPC clusters among 15 academic sites in Wales Sophisticated tier model with central hubs, tier 1 and 2 sites, portal for transparent, easy use of resources Final rollout of PRIMERGY x86 clusters in 2012 Performance & Technology PRIMERGY CX250 and BX922 ~200 TFlops aggregated peak performance Infiniband, 10 / 1 Gb Ethernet Eternus DX online SAN (home FS) Parallel File System (up to 10 GB/s) DDN Lustre Backup & Archiving Symantec, Quantum Solution Design User-focused solution to access distributed HPC systems from desktop browser Multiple components integrated into a consistent environment with a single sign-on Data accessible across the entire infrastructure, automated movement driven by workflow Collaborative sharing of information and resources Fujitsu Value Proposition Best-in-class technology combination Latest Intel processor technology Mix of Linux and Windows OS Complete tuned software stack Full service engagement Completely integrated design Comprehensive engagement model at all levels (research, development, business) Professional delivery management and governance End-to-end program management
27
Summary : 35 years leadership in HPC
..with K computer, PRIMEHPC FX10, and PRIMERGY Servers Achieving petascale HPC today, and ready for exascale tomorrow HPC is instrumental for computational sciences and product design in industry Fujitsu provides HPC solutions for each size of a problem and is enabling adoption and efficient usage of HPC in science and industry
28
Grazie! C. Antonucci M. De Luca Coruzzi Presales Practice
M. De Luca Coruzzi Presales Practice Copyright 2007 FUJITSU LIMITED Copyright 2007 FUJITSU LIMITED 27 27
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.