СЪВРЕМЕННИ СУПЕРКОМПЮТРИ

Slides:



Advertisements
Similar presentations
Issues of HPC software From the experience of TH-1A Lu Yutong NUDT.
Advertisements

What is Hardware? Hardware is the physical parts of the computer system – the parts that you can touch and see. A motherboard, a CPU, a keyboard and a.
©2009 HP Confidential template rev Ed Turkel Manager, WorldWide HPC Marketing 4/7/2011 BUILDING THE GREENEST PRODUCTION SUPERCOMPUTER IN THE.
Istituto Tecnico Industriale "A. Monaco"
The First Microprocessor By: Mark Tocchet and João Tupinambá.
Terms 4 Definitions and Questions. Motherboard The main board of a computer, usually containing the circuitry for the central processing unit, keyboard,
♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.
CURRENT AND FUTURE HPC SOLUTIONS. T-PLATFORMS  Russia’s leading developer of turn-key solutions for supercomputing  Privately owned  140+ employees.
Program Systems Institute Russian Academy of Sciences1 Program Systems Institute Research Activities Overview Extended Version Alexander Moskovsky, Program.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Computer Parts - Internal
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
CS 213 Commercial Multiprocessors. Origin2000 System – Shared Memory Directory state in same or separate DRAMs, accessed in parallel Upto 512 nodes (1024.
Prepared by Careene McCallum-Rodney Hardware specification of a computer system.
THE CPU Cpu brands AMD cpu Intel cpu By Nathan Ferguson.
Motherboard AKA mainboard, system board, planar board, or logic board. It is printed circuit board found in all modern computers which holds many of the.
HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster.
Hardware specifications. Hard drive The hard drive is what stores all your data. It houses the hard disk, where all your files and folders are physically.
1 CHAPTER 2 COMPUTER HARDWARE. 2 The Significance of Hardware  Pace of hardware development is extremely fast. Keeping up requires a basic understanding.
High Performance Computing G Burton – ICG – Oct12 – v1.1 1.
Macquarie Fields College of TAFE Version 2 – 13 March HARDWARE 2.
Caveman arts By Trevor. Welcome to a world of hard drives and video cards. I am going to teach you how to buy the right parts to make your dream computer.
Chapter 4 The System Unit: Processing and Memory Prepared by : Mrs. Sara salih.
E0001 Computers in Engineering1 The System Unit & Memory.
 Design model for a computer  Named after John von Neuman  Instructions that tell the computer what to do are stored in memory  Stored program Memory.
Translate the following message:
Group 3 Agaalofa Filimino Harota Fruean Rubysina Maea.
The 4 functions of a computer are 1.Input 2.Output 3.Storage 4.Processing.
Company LOGO High Performance Processors Miguel J. González Blanco Miguel A. Padilla Puig Felix Rivera Rivas.
Computers Are Your Future Eleventh Edition Chapter 2: Inside the System Unit Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall1.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
By: Dorian Gobert. In 1998, the Gateway G6-450 was "top of the line", the fastest computer Gateway offered. Compared it to the eMachines ET in.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Hardware. Make sure you have paper and pen to hand as you will need to take notes and write down answers and thoughts that you can refer to later on.
The Guts. CPU CPU Socket The CPU is generally a 2 inch ceramic square with a silicon chip located inside. The chip usually about the size of a thumbnail.
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
PARTS OF THE COMPUTER PREPARED BY: RENATO R. DE VERA II.
Personal Chris Ward CS147 Fall  Recent offerings from NVIDA show that small companies or even individuals can now afford and own Super Computers.
Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.
Beowulf – Cluster Nodes & Networking Hardware Garrison Vaughan.
Confidentiality/date line: 13pt Arial Regular, white Maximum length: 1 line Information separated by vertical strokes, with two spaces on either side Disclaimer.
Academic PowerPoint Computer System – Architecture.
PC Internal Components Lesson 4.  Intel is perhaps the most recognizable microprocessor manufacturer. List some others.
Revision - 01 Intel Confidential Page 1 Intel HPC Update Norfolk, VA April 2008.
High Performance Computing
Assembling & Disassembling of CPU. Mother Board Components.
System Bus.
Parts of the computer Deandre Haynes. The Case The Case This Case is the "box" or "chassis" that holds and encloses the many parts of your computer. Its.
CSUDH Fall 2015 Instructor: Robert Spengler
Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.
Hardware Architecture
Sobolev(+Node 6, 7) Showcase +K20m GPU Accelerator.
1.3 What Is in There?.  Memory  Hard disk drive  Motherboard  CPU.
SPRING 2012 Assembly Language. Definition 2 A microprocessor is a silicon chip which forms the core of a microcomputer the concept of what goes into a.
MOTHER BOARD PARTS BY BOGDAN LANGONE BACK PANEL CONNECTORS AND PORTS Back Panels= The back panel is the portion of the motherboard that allows.
Unit 2 Technology Systems
Manycore processors Sima Dezső October Version 6.2.
M. Bellato INFN Padova and U. Marconi INFN Bologna
LHCb and InfiniBand on FPGA
Computer Maintenance Unit Subtitle: CPU’s Trade & Industrial Education
Graphics Processor Graphics Processing Unit
Modern supercomputers, Georgian supercomputer project and usage areas
Computer Parts - Internal
CS111 Computer Programming
Super Computing By RIsaj t r S3 ece, roll 50.
Drill Translate the following message:
Computer Parts - Internal
Footer.
ICT Programming Lesson 2:
Presentation transcript:

СЪВРЕМЕННИ СУПЕРКОМПЮТРИ

Tianhe-2 (MilkyWay-2) - TH-IVB-FEP Cluster, Intel Xeon E5-2692 12C 2 Tianhe-2 (MilkyWay-2) - TH-IVB-FEP Cluster, Intel Xeon E5-2692 12C 2.200GHz, TH Express-2, Intel Xeon Phi 31S1P Tianhe-2, a supercomputer developed by China’s National University of Defense Technology, retained its position as the world’s No. 1 system with a performance of 33.86 petaflop/s (quadrillions of calculations per second) on the Linpack benchmark, according to the 42nd edition of the twice-yearly TOP500 list of the world’s most powerful supercomputers. The list was announced Nov. 18 at the SC13 conference in Denver, Colo • http://www.top500.org/system/177999

Tianhe-2 (MilkyWay-2) – cont. Site: National Super Computer Center in Guangzhou Manufacturer: NUDT Cores: 3,120,000 Linpack Performance (Rmax) 33,862.7 TFlop/s Theoretical Peak (Rpeak) 54,902.4 TFlop/s Power: 17,808.00 kW Memory: 1,024,000 GB Interconnect: TH Express-2 Operating System: Kylin Linux Compiler: icc Math Library: Intel MKL-11.0.0 MPI: MPICH2 with a customized GLEX channel In a massive escalation of the supercomputing arms race, China has built Tianhe-2, a supercomputer capable of 33.86 petaflops — almost twice as fast as the US Department of Energy’s Titan, and topping the official Top 500 list of supercomputers by some margin. The US isn’t scheduled to build another large supercomputer until 2015, suggesting China will hold pole position for a long time to come. The computer has 32,000 Ivy Bridge Xeon CPUs and 48,000 Xeon Phi accelerator boards for a total of 3,120,000 compute cores, which are decked out with 1.4 petabytes of RAM. And of course the operating system is Linux. The construction of Tianhe-2 (literally Milky Way-2) comes as a huge surprise, as it was originally scheduled for deployment in 2015. No one knows why China proceeded so quickly, but it’s fairly safe to assume that it’s a reaction to the DoE’s completion of Titan last year. Tianhe-2, which is currently being tested in a non-optimal space, is capable of 33.86 petaflops — when it’s deployed in its final location, however, and when any bugs have been ironed out, the theoretical peak performance will be 54.9 petaflops. Assuming that the US doesn’t accelerate its own supercomputing plans, the final form of Tianhe-2 will be almost four times faster than any other supercomputer in the world. To achieve a theoretical peak of 54.9 petaflops, Tianhe-2 has a mind-bogglingly insane hardware spec. There are a total of 125 cabinets housing 16,000 compute nodes, each of which contains two Intel Xeon (Ivy Bridge) CPUs and three 57-core Intel Xeon Phi accelerator cards. Each compute node has a total of 88GB of RAM. In total, according to a report [PDF] by Jack Dongarra of the Oak Ridge National Laboratory, there are a total of 3,120,000 Intel cores and 1.404 petabytes of RAM, making Tianhe-2 by far the largest installation of Intel CPUs in the world. We believe it’s also the largest amount of RAM for a single system, too.

Tianhe-2 (MilkyWay-2) – cont. List Rank System Vendor Total Cores Rmax (TFlops) Rpeak (TFlops) Power (kW) 11/2013 1 TH-IVB-FEP Cluster, Intel Xeon E5-2692 12C 2.200GHz, TH Express-2, Intel Xeon Phi 31S1P NUDT 3,120,000 33,862.7 54,902.4 17,808.00 06/2013 To achieve a theoretical peak of 54.9 petaflops, Tianhe-2 has a mind-bogglingly insane hardware spec. There are a total of 125 cabinets housing 16,000 compute nodes, each of which contains two Intel Xeon (Ivy Bridge) CPUs and three 57-core Intel Xeon Phi accelerator cards. Each compute node has a total of 88GB of RAM. In total, according to a report [PDF] by Jack Dongarra of the Oak Ridge National Laboratory, there are a total of 3,120,000 Intel cores and 1.404 petabytes of RAM, making Tianhe-2 by far the largest installation of Intel CPUs in the world. We believe it’s also the largest amount of RAM for a single system, too.

Xeon Phi/Many Integrated Core (MIC) hardware From Larrabee to Knights Ferry Intel’s MIC (pronounced “Mike”) began as  a hybrid GPU/HPC (high-performance computing) product known as Larrabee *http://www.extremetech.com/extreme/133541-intels-64-core-champion-in-depth-on-xeon-phi

The Knights Ferry die, Aubrey Isle The Knights Ferry die, Aubrey Isle. Die size on KNF, at 45nm, was rumored to be roughly 700mm sq. Xeon Phi ramps up Knights Ferry; Intel isn’t giving many details yet, but we know the architecture will pack 50 or more cores and at least 8GB of RAM. In this space, total available memory is an important feature. Knights Ferry, with its 32 cores and max of 2GB of RAM, could only offer 64MB of RAM per core; a 50-core Xeon Phi with 8-16GB of RAM would offer between 163-327MB per core.

To achieve a theoretical peak of 54 To achieve a theoretical peak of 54.9 petaflops, Tianhe-2 has a mind-bogglingly insane hardware spec. There are a total of 125 cabinets housing 16,000 compute nodes, each of which contains two Intel Xeon (Ivy Bridge) CPUs and three 57-core Intel Xeon Phi accelerator cards. Each compute node has a total of 88GB of RAM. In total, according to a report [PDF] by Jack Dongarra of the Oak Ridge National Laboratory, there are a total of 3,120,000 Intel cores and 1.404 petabytes of RAM, making Tianhe-2 by far the largest installation of Intel CPUs in the world. We believe it’s also the largest amount of RAM for a single system, too. * http://www.extremetech.com/computing/159465-chinas-tianhe-2-supercomputer-twice-as-fast-as-does-titan-shocks-the-world-by-arriving-two-years-early

Beyond the glut of x86 compute capacity, though, Tianhe-2 is notable for another reason: Except for the CPUs, almost all of the other components were made in China. The front-end system, which manages the actual operation of all the compute nodes, consists of Galaxy FT-1500 processors — 16-core SPARC chips designed and built by China’s National University of Defense Technology (NUDT). The interconnect (pictured below), also designed and constructed by the NUDT, consists of 13 576-port optoelectronic switches that connect each of the compute nodes via a fat tree topology. The operating system, Kylin Linux, was also developed by NUDT. Beyond the glut of x86 compute capacity, though, Tianhe-2 is notable for another reason: Except for the CPUs, almost all of the other components were made in China. The front-end system, which manages the actual operation of all the compute nodes, consists of Galaxy FT-1500 processors — 16-core SPARC chips designed and built by China’s National University of Defense Technology (NUDT). The interconnect (pictured below), also designed and constructed by the NUDT, consists of 13 576-port optoelectronic switches that connect each of the compute nodes via a fat tree topology. The operating system, Kylin Linux, was also developed by NUDT.

Tianhe-2 is currently located at the NUDT while it undergoes testing, but will be fully deployed at the National Supercomputer Center in Guangzhou (NSCC-GZ) by the end of 2013. The peak power consumption for the processors, memory, and interconnect is 17.6 megawatts, with the water cooling system bringing that up to 24MW — slightly below the gigaflops-per-watt efficiency record set by the DoE/ORNL/Cray Titan supercomputer. When Tianhe-2 is complete, its primary purpose will be as an open research platform for researchers in southern China.

With Tianhe-2, two Arch-2 network interface chips and two "Ivy Bridge-EP" Xeon E5 compute nodes (each with two processor sockets) are on a single circuit board (even though they are logically distinct). This compute node plus one Xeon Phi coprocessor share the left half of the compute node and five Xeon Phis share the right side. The two sides can be electrically separated and pulled out separately for maintenance. The Arch-2 NICs link to the Xeon E5 chipset through PCI-Express 2.0 ports on the NIC, which is unfortunate given the doubling of bandwidth with the move to PCI-Express 3.0 slots. (Maybe that is coming with the Arch-3 interconnect, if there is one on the whiteboard at NUDT?) There's one Arch-2 NIC per compute node; the three Xeon Phi coprocessors for each node link over three PCI-Express 3.0 x16 ports to the CPUs. Yup, the Xeon Phis can talk faster to the CPU than the CPU can talk to the Arch-2 interface. It is unknown how this imbalance might affect the performance of Tianhe-2. First, here is a picture of the Tianhe-2 chassis. As we previously explained, based on a report of the machine put together by Jack Dongarra, a professor at the University of Tennessee and one of the stewards of the Linpack supercomputer benchmark, the Chinese government's National University of Defense Technology has done a bit of integrating with the updated "Sky River" machine. (Sky River is what Tianhe means when translated to English, and it is what we in the West call the Milky Way when we look to the night sky.) * http://www.theregister.co.uk/2013/07/15/a_peek_inside_chinas_floptopping_tianhe2_supercomputer/

The RSW switch blade for Tianhe-2

One set of RSW switches is rotated 90 degrees in parts of the system for reasons that don't make sense to me – yet. But here is how the components plug together: How the compute nodes, switch, and backplane come together in Tianhe-2

That special version of the Xeon Phi that NUDT is getting is called the 31S1P, and the P is short for passive cooling. Dongarra said that this Xeon Phi card had 57 cores and was rated at 1.003 teraflops at double-precision floating point, and that is precisely the same feeds and speeds as the 3120A launched back in November with active cooling (meaning a fan). That 3120A had only 6GB of GDDR5 graphics memory, and the 31S1P that NUDT is getting has 8GB like the Xeon Phi 5110P card, which has 60 cores activated, which runs at a slightly slower clock speed, and which burns less juice and creates less heat. It is also 33 per cent more expensive at $2,649 in single-unit quantities. Anyway, with 48,000 of these bad boys, the Xeon Phi part of Tianhe-2 has 2.74 million cores and delivers a peak DP performance of 48.14 petaflops. Add 'em up, and you get 54.9 petaflops peak. *http://www.theregister.co.uk/Print/2013/06/10/inside_chinas_tianhe2_massive_hybrid_supercomputer/

The Tianhe-2, like its predecessor, is also front-ended by a homegrown Sparc-based cluster. NUDT has created its own variant of the Sparc chip, called the Galaxy FT-1500, which has sixteen cores, runs at 1.8GHz, is etched in 40 nanometer processes, burns around 65 watts, and delivers around 144 gigaflops of double-precision performance. The front-end processor for the Tianhe-2 machine has 4,096 of these processors in its nodes, which gives another 590 teraflops.

The TH Express-2 Arch interconnect created by NUDT The Arch network has thirteen switches, each with 576 ports, at its top level, and that router chip, called the NRC and presumably short for network router chip, has a throughput of 2.76Tb/sec. The Arch network interface chip, called NIC of course, has a lot fewer pins (675 compared to 2,577 in the router) but is designed to have the same die size. This Arch NIC hooks into PCI-Express 2.0 slots on the compute nodes and it looks like there is one Arch port per node. These NIC ports hook into a passive electrical backplane and link local server nodes to each other. The thirteen 576-port switches are used to link groups of racks to each other in the fat tree setup. (Much like top-of-rack and end-of-row aggregation switches do in InfiniBand and Ethernet networks.) Presumably this is done with optical cables. The backplane is running at 10Gb/sec or 14Gb/sec, according to the presentation by Xiangke, and it is not clear if these are two different dialable bandwidths or if different parts of the backplane must run at these two different speeds. Dongarra said that a broadcast MPI operation was able to run at 6.36GB/sec on the Arch interconnect with a latency of around 9 microseconds with a 1KB data packet across 12,000 of the nodes in Tianhe-2A system. Arch is said to be using a proprietary protocol, but I would bet it is a hodge-podge of different technologies and very likely a superset of InfiniBand. But that is just a hunch, and there are people much more qualified than this system hack to make better guesses. The whole shebang has 125 compute racks and the thirteen Arch switch racks for a total of 138 racks, plus another 24 racks for storage (12.4PB in size) if you want to count that in the system. It runs the Kylin variant of Linux created by NUDT for the Chinese military as well as the H2FS file system. This beast consumes 17.4MW under load, and the closed-coupled, chilled water cooling system is designed to handle 24MW and - as it turns out - will warm up the water supply in Guangzhou. The most interesting part of the Tianhe-2 system is probably the Arch interconnect, which is also known as TH Express-2. The heart of the Arch interconnect is a high-radix router, just like Cray's Intel's "Gemini" and "Aries" interconnects, and like Aries is also uses a combination of electrical cables for short haul jumps and optical cables for long haul jumps. And like InfiniBand networks, Arch uses a fat tree topology, which is why many believe that Arch is, in fact, a goosed up version of InfiniBand, but NUDT is claiming it is a proprietary protocol and frankly, we are in no position to argue.

links CRAYT3E https://cug.org/5-publications/proceedings_attendee_lists/1999CD/S99_Proceedings/S99_Papers/Frese/frese.html - http://top500.org/lists/2013/11/