Jaguar Super Computer Topics Covered Introduction Architecture Location & Cost Bench Mark Results Location & Manufacturer Machines in top 500 Operating.

Slides:



Advertisements
Similar presentations
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Center for Computational Sciences Cray X1 and Black Widow at ORNL Center for Computational.
Advertisements

The Development of Mellanox - NVIDIA GPUDirect over InfiniBand A New Model for GPU to GPU Communications Gilad Shainer.
IBM 1350 Cluster Expansion Doug Johnson Senior Systems Developer.
♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.
Today’s topics Single processors and the Memory Hierarchy
CURRENT AND FUTURE HPC SOLUTIONS. T-PLATFORMS  Russia’s leading developer of turn-key solutions for supercomputing  Privately owned  140+ employees.
Zhao Lixing.  A supercomputer is a computer that is at the frontline of current processing capacity, particularly speed of calculation.  Supercomputers.
BY MANISHA JOSHI.  Extremely fast data processing-oriented computers.  Speed is measured in “FLOPS”.  For highly calculation-intensive tasks.  For.
Parallel Research at Illinois Parallel Everywhere
Architecture and Implementation of Lustre at the National Climate Computing Research Center Douglas Fuller National Climate Computing Research Center /
Types of Parallel Computers
IDC HPC User Forum Conference Appro Product Update Anthony Kenisky, VP of Sales.
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
CS 240A Applied Parallel Computing John R. Gilbert Thanks to Kathy Yelick and Jim Demmel at UCB for.
Supercomputers Daniel Shin CS 147, Section 1 April 29, 2010.
Top500: Red Storm An abstract. Matt Baumert 04/22/2008.
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
Real Parallel Computers. Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Plans for Exploitation of the ORNL Titan Machine Richard P. Mount ATLAS Distributed Computing Technical Interchange Meeting May 17, 2013.
Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Chapter 2 Computer Clusters Lecture 2.1 Overview.
1 Advanced Storage Technologies for High Performance Computing Sorin, Faibish EMC NAS Senior Technologist IDC HPC User Forum, April 14-16, Norfolk, VA.
Operating Systems CS3502 Fall 2014 Dr. Jose M. Garrido
Maximizing The Compute Power With Mellanox InfiniBand Connectivity Gilad Shainer Wolfram Technology Conference 2006.
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Sun Fire™ E25K Server Keith Schoby Midwestern State University June 13, 2005.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Rensselaer Why not change the world? Rensselaer Why not change the world? 1.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
INVITATION TO COMPUTER SCIENCE, JAVA VERSION, THIRD EDITION Chapter 6: An Introduction to System Software and Virtual Machines.
Presented by Leadership Computing Facility (LCF) Roadmap Buddy Bland Center for Computational Sciences Leadership Computing Facility Project.
Massive Supercomputing Coping with Heterogeneity of Modern Accelerators Toshio Endo and Satoshi Matsuoka Tokyo Institute of Technology, Japan.
- Rohan Dhamnaskar. Overview  What is a Supercomputer  Some Concepts  Couple of examples.
CS 240A Applied Parallel Computing John R. Gilbert Thanks to Kathy Yelick and Jim Demmel at UCB for.
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
2009/4/21 Third French-Japanese PAAP Workshop 1 A Volumetric 3-D FFT on Clusters of Multi-Core Processors Daisuke Takahashi University of Tsukuba, Japan.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Probe Plans and Status SciDAC Kickoff July, 2001 Dan Million Randy Burris ORNL, Center for.
Copyright © 2011 Curt Hill MIMD Multiple Instructions Multiple Data.
Types of computers Hardware. 8/3/12 Hardware - the tangible, physical parts of the computer which work together to input, process, store and output data.
1 Cray Inc. 11/28/2015 Cray Inc Slide 2 Cray Cray Adaptive Supercomputing Vision Cray moves to Linux-base OS Cray Introduces CX1 Cray moves.
1 THE EARTH SIMULATOR SYSTEM By: Shinichi HABATA, Mitsuo YOKOKAWA, Shigemune KITAWAKI Presented by: Anisha Thonour.
CPSC 171 Introduction to Computer Science System Software and Virtual Machines.
Modeling Billion-Node Torus Networks Using Massively Parallel Discrete-Event Simulation Ning Liu, Christopher Carothers 1.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
Presented by NCCS Hardware Jim Rogers Director of Operations National Center for Computational Sciences.
 System Requirements are the prerequisites needed in order for a software or any other resources to execute efficiently.  Most software defines two.
Tackling I/O Issues 1 David Race 16 March 2010.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.
Scheduling a 100,000 Core Supercomputer for Maximum Utilization and Capability September 2010 Phil Andrews Patricia Kovatch Victor Hazlewood Troy Baer.
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
TEMPLATE DESIGN © H. Che 2, E. D’Azevedo 1, M. Sekachev 3, K. Wong 3 1 Oak Ridge National Laboratory, 2 Chinese University.
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
Enhancements for Voltaire’s InfiniBand simulator
Appro Xtreme-X Supercomputers
Super Computing By RIsaj t r S3 ece, roll 50.
Constructing a system with multiple computers or processors
Nicole Ondrus Top 500 Parallel System Presentation
CLUSTER COMPUTING.
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Introduction to Operating Systems
Hybrid Programming with OpenMP and MPI
Types of Parallel Computers
Cluster Computers.
Presentation transcript:

Jaguar Super Computer

Topics Covered Introduction Architecture Location & Cost Bench Mark Results Location & Manufacturer Machines in top 500 Operating Systems & Supporting Languages Typical Applications Interesting sibling interesting about the assigned machine Questions

Introduction Jaguar is ranked Number 2 in the world by the Top500 List of the world’s fastest computers. Jaguar is Number 1 for open science, with major projects in climate science, astrophysics, fusion research, and many other critical areas Jaguar achieved teraflops on High-Performance Linpack benchmark, more than 85 percent of its theoretical peak of 119 teraflops Jaguar also solved the largest Linpack problem ever (a matrix problem of order 2.2 million containing nearly 5 trillion elements )

Architecture Processor : AMD X86_64 Opteron Quad Core 2300 MHz (9.2 GFlops )  Has Two of dedicated Notes – Compute nodes – Service nodes – Compute nodes : – Designed to run MPI tasks efficiently and reliably to completion. – Each compute node contains a quad-core 2.1 GHz AMD Opteron processor and 8GB of memory. Aggregate system performance is approximately 263 TF. – Approximately 600 TB are available in the scratch file system. – The compute nodes are running the Compute Node Linux (CNL) OS – Each node is connected to a Cray SeaStar router through HyperTransport, and the SeaStars are all interconnected in a 3-D-torus topology. – The resulting interconnect has very high bandwidth, low latency, and extreme scalability.

Architecture Service nodes : – Designed to provide system and I/O connectivity and also serve as login nodes from which jobs are compiled and launched. – Jaguar has a total of 7,832 XT4 compute nodes in addition to input/output (I/O) and login service nodes. – The service nodes consist of a 2.6 GHz dual-core AMD Opteron processor with 8 GB of memory. Service nodes run SuSE LINUX – Service nodes run SuSE LINUX

Architecture Scalable Interconnect: XT4 Internal Interconnect – The Cray XT5 system incorporates a high-bandwidth, low-latency interconnect based on the Cray SeaStar2+™ chip. – The interconnect directly connects all the nodes in a Cray XT5 system in a 3D torus topology, eliminating the cost and complexity of external switches and allowing for easy expandability and upgradability to future high performance Cray interconnect technologies. – Communications processing and high-speed routing on a single device. – Each communications chip is composed of a HyperTransport™ link, a Direct Memory Access (DMA) engine, a communications and management processor, a high speed interconnect router and a service port – The peak bidirectional bandwidth of each link is 9.6 GB/s with sustained bandwidth in excess of 6Gb/s

Architecture

Networking : InfiniBand network - The XT4 and XT5 parts of Jaguar are combined into a single system using an InfiniBand network that links each piece to the Spider file system - The InfiniBand network SION (or scalable I/O network) connects all major NCCS systems. File System: - Spider, a 10-petabyte Luster-based shared file system, connects to every system in the ORNL computing complex - It will serve all NCCS platforms and connect to every internal network. - Because all simulation data will reside on Spider, file transfers among computers and other systems will be un­necessary.

Architecture Scalable Software: – The Cray XT5 utilizes the Cray Linux Environment™ (CLE) – The Linux environment features a compute kernel which can be configured to match different workloads – The Cray XT5 system maintains a single root file system across all nodes,

Location & Cost Location: Oak Ridge National Laboratory (ORNL), United States – ORNL is a multi program science and technology laboratory managed for the U.S. Department of Energy by UT-Battelle, LLC Cost: Difficult to find as the computer has been evolved constantly over time.

Bench Mark -Results Benchmark results: Jaguar, a Cray XT5 system at ORNL, is #2 in the top 500 supercomputers list. It achieves a maximal Lin pack performance of PFlop/s with a theoretical peak performance of PFlop/s CoresRmax (GFlops)Rpeak(GFlops)Power

Location & Manufacturer Machines in top 500 Location in Top 5002 Machines in Top 104 Machines in Top 5010 Manufacturer Including No of Machines in Top RankComputer 2Jaguar –Cray XT5 Quadcore 2.3 Ghz Cray Inc 7NERSC/LBNL Franklin-CrayXT4Quadcore 2.3Ghz Cray Inc 8Jaguar –Cray XT4 Quadcore 2.1 Ghz Cray Inc 9RedStorm- sandia/ Cray RedStorm,3/4,2.4/2.2 Ghz dual/ Quadcore Cray Inc

Operating Systems & Supporting Languages The Compute Node Linux (CNL): is the next generation of lightweight kernels for compute nodes on the CRAY XT3 and Cray XT4 computer systems. The CNL operating system provides a runtime environment based on the SUSE SLES distribution Supported languages: CompilersC,C++,Fortran Math Libraries BLAS, FFTs, LAPACK, ScaLAPACK, SuperLU Scientific LibrariesCray scientific Library

Typical Applications Energy Assurance :Petascale leadership systems will arm sci­entists with better data to aggressively pursue renewable energy sources and more efficiently and safely exploit conven­tional energy options Climate :The potential of petascale simulations to clarify the evolution of the climate system is difficult to overstate. Nearly every aspect of climate simulation stands to benefit from the upcoming petascale Materials : In materials science, innovations made possible by petascale computing promise to bolster American competitiveness in multiple technological sectors Biology : Biologists will use petaflop computers for detailed studies showing how proteins carry out crucial tasks. Simulations of larger structures at longer timescales and finer resolution will allow exploration of protein structure and behavior

Interesting Sibling Kraken at UT. It’s a Cray XT4 but as of Feb 2 has been replaced by an XT5 and will probably be jumping up higher on the list of the top 500. It is also located very near Jaguar as it is at the University of Tennessee and is part of a partnership with ORNL the owner of the Jaguar

Interesting About The Assigned Machine Jaguar is actually the most powerful supercomputer in the world for open scientific use. Its peak performance is more than 119 trillion calculations per second (119 teraflops). To support its extraordinary concentration of computing power, the NCCS has put in place high-speed fiber-optic networks to expedite data movement

Jaquar At a Glance  Cray XT  Top500 rank: 2  1.64-petaflops peak theoretical performance  petaflops actual performance on HPL benchmark program  182,000 processing cores  AMD quad-core Opteron TM 2.3 gigahertz processors  InfiniBand network  Cray SeaStar network interface and router  362 terabytes of memory  578 terabytes per second of memory bandwidth  284 gigabytes per second of input/output bandwidth  10-gigabyte-per-second connections to ESnet and Internet 2 networks  High-Performance Storage System scales to store increasing amounts of data  Spider, a 10-petabyte Luster-based shared file system, connects to every system in the ORNL computing complex  disk subsystem transfers data at greater than 200 gigabytes per second

Questions??