K-computer and Supercomputing Projects in Japan Makoto Taiji Computational Biology Research Core RIKEN Planning Office for the Center for Computational.

Slides:

Advertisements

Similar presentations

ภาควิชาฟิสิกส์ คณะวิทยาศาสตร์ มหาวิทยาลัยมหิดล

Advertisements

O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Center for Computational Sciences Cray X1 and Black Widow at ORNL Center for Computational.

Ver 0.1 Page 1 SGI Proprietary Introducing the CRAY SV1 CRAY SV1-128 SuperCluster.

One-day Meeting, INI, September 26th, 2008 Role of spectral turbulence simulations in developing HPC systems YOKOKAWA, Mitsuo Next-Generation Supercomputer.

Beowulf Supercomputer System Lee, Jung won CS843.

Contact: Hirofumi Amano at Kyushu 40 Years of HPC Services In this memorable year, the.

CURRENT AND FUTURE HPC SOLUTIONS. T-PLATFORMS  Russia’s leading developer of turn-key solutions for supercomputing  Privately owned  140+ employees.

Zhao Lixing.  A supercomputer is a computer that is at the frontline of current processing capacity, particularly speed of calculation.  Supercomputers.

10/21/20091 Protein Explorer: A Petaflops Special-Purpose Computer System for Molecular Dynamics Simulations Makoto Taiji, Tetsu Narumi, Yousuke Ohno,

Special-purpose computers for scientific simulations Makoto Taiji Processor Research Team RIKEN Advanced Institute for Computational Science Computational.

Advanced Computational Research Laboratory (ACRL) Virendra C. Bhavsar Faculty of Computer Science University of New Brunswick Fredericton, NB, E3B 5A3.

Claude TADONKI Mines ParisTech – LAL / CNRS / INP 2 P 3 University of Oujda (Morocco) – October 7, 2011 High Performance Computing Challenges and Trends.

1 Computer Science, University of Warwick Metrics  FLOPS (FLoating point Operations Per Sec) - a measure of the numerical processing of a CPU which can.

Seminar on parallel computing Goal: provide environment for exploration of parallel computing Driven by participants Weekly hour for discussion, show &

Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.

Earth Simulator Jari Halla-aho Pekka Keränen. Architecture MIMD type distributed memory 640 Nodes, 8 vector processors each. 16GB shared memory per node.

Lecture 1: Introduction to High Performance Computing.

NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.

Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,

Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

Updates of AICS and the next step for Post-Petascale Computing in Japan Mitsuhisa Sato University of Tsukuba Team leader of programming environment research.

1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,

Institute on Systems Science and Health- Federal Funding Panel Grace C.Y. Peng, Ph.D. May 25, 2011.

1 Parallel computing and its recent topics. 2 Outline 1. Introduction of parallel processing (1)What is parallel processing (2)Classification of parallel.

Performance Evaluation of Hybrid MPI/OpenMP Implementation of a Lattice Boltzmann Application on Multicore Systems Department of Computer Science and Engineering,

Chapter 2 Computer Clusters Lecture 2.3 GPU Clusters for Massive Paralelism.

“The Architecture of Massively Parallel Processor CP-PACS” Taisuke Boku, Hiroshi Nakamura, et al. University of Tsukuba, Japan by Emre Tapcı.

SDSC RP Update TeraGrid Roundtable Reviewing Dash Unique characteristics: –A pre-production/evaluation “data-intensive” supercomputer based.

CCS machine development plan for post- peta scale computing and Japanese the next generation supercomputer project Mitsuhisa Sato CCS, University of Tsukuba.

Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.

© David Kirk/NVIDIA and Wen-mei W. Hwu, 1 Programming Massively Parallel Processors Lecture Slides for Chapter 1: Introduction.

The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.

2005 Materials Computation Center External Board Meeting The Materials Computation Center Duane D. Johnson and Richard M. Martin (PIs) Funded by NSF DMR.

Rensselaer Why not change the world? Rensselaer Why not change the world? 1.

Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,

SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.

Lecture 1: Introduction. Course Outline The aim of this course: Introduction to the methods and techniques of performance analysis of computer systems.

Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

BlueGene/L Facts Platform Characteristics 512-node prototype 64 rack BlueGene/L Machine Peak Performance 1.0 / 2.0 TFlops/s 180 / 360 TFlops/s Total Memory.

1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.

Problem is to compute: f(latitude, longitude, elevation, time)  temperature, pressure, humidity, wind velocity Approach: –Discretize the.

- Rohan Dhamnaskar. Overview  What is a Supercomputer  Some Concepts  Couple of examples.

Protein Explorer: A Petaflops Special Purpose Computer System for Molecular Dynamics Simulations David Gobaud Computational Drug Discovery Stanford University.

Leibniz Supercomputing Centre Garching/Munich Matthias Brehm HPC Group June 16.

2009/4/21 Third French-Japanese PAAP Workshop 1 A Volumetric 3-D FFT on Clusters of Multi-Core Processors Daisuke Takahashi University of Tsukuba, Japan.

High performance parallel computing of climate models towards the Earth Simulator --- computing science activities at CRIEPI --- Yoshikatsu Yoshida and.

Brent Gorda LBNL – SOS7 3/5/03 1 Planned Machines: BluePlanet SOS7 March 5, 2003 Brent Gorda Future Technologies Group Lawrence Berkeley.

ESFRI & e-Infrastructure Collaborations, EGEE’09 Krzysztof Wrona September 21 st, 2009 European XFEL.

1 THE EARTH SIMULATOR SYSTEM By: Shinichi HABATA, Mitsuo YOKOKAWA, Shigemune KITAWAKI Presented by: Anisha Thonour.

Enabling Grids for E-sciencE INFSO-RI Institute of mathematical problems of biology RAS Expertise in mathematical modeling and experimental data.

Anton, a Special-Purpose Machine for Molecular Dynamics Simulation By David E. Shaw et al Presented by Bob Koutsoyannis.

Interconnection network network interface and a case study.

Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.

COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.

Presented by NCCS Hardware Jim Rogers Director of Operations National Center for Computational Sciences.

Cray XD1 Reconfigurable Computing for Application Acceleration.

Tackling I/O Issues 1 David Race 16 March 2010.

EU-Russia Call Dr. Panagiotis Tsarchopoulos Computing Systems ICT Programme European Commission.

Pathway to Petaflops A vendor contribution Philippe Trautmann Business Development Manager HPC & Grid Global Education, Government & Healthcare.

Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.

Evaluation itemsPoints/10 Relevance to topics Clearness of introduction Background and theory Delivery of knowledge Presentation materials and handout.

What is Parallel and Distributed computing?

BlueGene/L Supercomputer

Introduction and History of Cray Supercomputers

Course Description: Parallel Computer Architecture

with Computational Scientists

The C&C Center Three Major Missions: In This Presentation:

K computer RIKEN Advanced Institute for Computational Science

Department of Computer Science, University of Tennessee, Knoxville

K computer RIKEN Advanced Institute for Computational Science

Presentation transcript:

K-computer and Supercomputing Projects in Japan Makoto Taiji Computational Biology Research Core RIKEN Planning Office for the Center for Computational and Quantitative Life Science & Processor Research Team RIKEN Advanced Institute for Computational Science

Agenda K-computer Advanced Institute for Computational Science High Performance Computing Infrastructure My own perspective in future HPC, and MDGRAPE-4 (in short)

My Backgrounds Physics Special-purpose computers for scientific simulations (1986~) –Monte Carlo simulations of spin systems (1986, m-TIS I) –FPGA-based reconfigurable machine (1990, m-TIS II) –Gravitational N-body problems (1992~96, GRAPE-4,5) –Molecular Dynamics simulations – (1994~, MD-GRAPE, MDM, MDGRAPE-3,4) –Dense Matrix Calculation, quasi-general-purpose machine –(MACE, 2000) Ultrafast laser spectroscopy (1987~92) –Conjugated Polymers –Rhodopsin and Bacteriorhodopsin Learning process as dynamical systems, multi-agent dynamics (1996~2002) Physical Random Number Generator (1997~2004)

World situation of HPC (Top 500) Country Share of Japan: Down to 6 th position

Next-Generation Supercomputer Project National project to develop a leading general- purpose supercomputer in Japan Not for single purpose – cf. Earth Simulator Location: Kobe Port Island Developer: Fujitsu Linpack 10 PetaFLOPS Partial operation: Spring 2011 Full service: Autumn 2012 K computer system (CG)

Mt. Rokko Sannomiya Port Island Kobe Sky Bridge Portliner To Akashi / Awaji-Island To Osaka About 5km from Sannomiya 12 min. by Portliner Ashiya Kobe Airport Kobe Medical Industry Development Project Core Facilities Shinkansen-Line Shin-Kobe Station Photo: June, 2006 K-computer & Advanced Institute for Computational Sciences Location of K computer

RIKEN Advanced Institute for Computational Science National Center to cover wide fields of computational science and engineering

Formation of Central Hub in Kobe 8 Strategic Region Academia Registered Organization Selection of applications User Support 【 Public Use 】 Industry Advanced Institute for Computational Science Operation Sophistication 【 Operation Organization Use 】 Interdisciplinary Research, Computer Science Operation and sophistication of the supercomputer, Computational Sciences Interdisciplinary research Director: Dr. Kimihiko Hirao Strategic Region 【 Strategic Use 】

RIKEN Advanced Institute for Computational Science 9 Director Operation Technology Division Research Promotion Division Research Division Field Theory Research Team (TL: Yoshinobu Kuramashi) Computational Biophysics Research Team (TL: Yuji Sugita) Computational Materials Science Research Team (TL: Seiji Yunoki) Computational Molecular Science Research Team (TL: Takahito Nakajima) System Software Research Team (TL: Yutaka Ishikawa) Processor Research Team (TL: Makoto Taiji) Deputy Director Computational Science Research Computer Science Research

Grand Challenge Applications Next-Generation Integrated Nano-Science Simulation Software (2006–2011) Next-Generation Integrated Life-Science Simulation Software (2006–2012) To create next-generation nano-materials (new semiconductor materials, etc.) by integrating theories (such as quantum chemistry, statistical dynamics and solid electron theory) and simulation techniques in the fields of new-generation information functions/materials, nano-biomaterials, and energy Base site: Institute for Molecular Science Next-Generation Energy Solar energy fixation Fuel alcohol Fuel cells Electric energy storage Electrons and molecules Electrons Domain Electron theory of solids Quantum chemistry Doping of fullerene and carbon nanotubes Molecular dynamics Condensed matters Integrated system 5nm Self- organized magnetic nanodots Semi- macroscopic Molecular assembly Next-Generation Nano Biomolecules Next-Generation information Function Materials One-dimensional crystal of silicon Polio virus Orbiton (orbital waves) Ferromagnetic half-metals “off”“on” light Optical switch Liposome Nafion Water 15nm Mesoscale structure of naflon membrane Self- assembly Capsulation Nafion membrane Medicines, New drug, and DDS Protein folding Nonlinear optical Device Nano quantum devices Spin electronics Ultra high-density storage devices Integrated electronic devices Water molecules inside lisozyme cavity Whole body Cardiova scular system Cells Organs Tissues Micro Macro Meso Microscopic approach MD/first principle/quantum chemistry simulations Continuous entity simulations Size Base site: RIKEN Wako Institute Electronic conduction in integrated systems Vascular system modeling Skeleton model Fluids, heat, structures Achievement of chemical reactions Molecular network analysis Protein structural analysis Drug response analysis Proteins/ DNA ~ ~ ~-6 High Intensity Focused Ultrasound Drug development Tailor-made medicine Drug Delivery System Regenerative medicine Surgical procedures Catheters Micromachines Hyperthermia Macroscopic approach Organ and body scale Toward therapeutic technology Molecular scale Cellular scale Viruses Anticancer drugs Protein control Nano processes for DDC light 27 nm 46 nm To provide new tools for breakthroughs against various problems in life science by means of petaflops-class simulation technology, leading to comprehensive understanding of biological phenomena and the development of new drugs/medical devices and diagnostic/therapeutic methods Brain Function

Appointment of Strategic Regions Computational resources and budget will be allocated for the following regions “Strategic organization” will organize the research Region 1. Foundations for predictive life sciences, medical care, and drug design Region 2. Innovation of new materials and new energies Region 3. Prediction of global change for disaster prevention and reduction Region 4. Next-generation manufacturing Region 5. Origin and structure of matter and the universe : Feasibility Studies : Strategic Researches 11

FY2008FY2009FY2010FY2011 Computer building Research building FY2007FY2006FY2012 Shared file system Processing unit Front-end unit (total system software) Next-Generation Integrated Nanoscience Simulation Next-Generation Integrated Life Simulation Verification Development, production, and evaluation Tuning and improvement Verification Production, installation, and adjustment Production, installation, and adjustment Production, installation, and adjustment Construction Design Construction Design Prototype and evaluation Detailed design Conceptual design Detailed design Basic design Basic design Development, production, and evaluation Production and evaluation System Buildings Detailed design Basic design Basic design Schedule of Project Applications Strategic Researches Research Promotion Preparatory Researches Preparatory Researches Partial operation within FY2010, Full operation starts from FY2012 Feasibility Studies 12

Features of K computer 京 = “K” means High Performance : Linpack 10 PFLOPS Massive Parallelization –> 80,000 Processors, > 640,000 Cores SPARC64 VIIIfx: Processor designed for HPC –VISIMPACT / HPC-ACE extensions 16GB / node, 2GB / core ~20MW

K-Computer System Number of nodes : > 80,000 Number of Processors: > 80,000 Number of Cores: > 640,000 Peak Performance: > 10 PFLOPS Memory Capacity: > 1PB (16GB/node) Network: Tofu interconnect (6-dim. Torus) User view: 3D-Torus Bandwidth: 5GB/s bidirectional for each six direction 4 Simultaneous Communication Bisection Bandwidth: >30TB/s (bidirectional, nominal peak) ノード CPU: 128GFLOPS (8 Core) Core SIMD(4FMA) 16GFlops Core SIMD(4FMA) 16GFlops Core SIMD(4FMA) 16GFlops Core SIMD(4FMA) 16GFlops Core SIMD(4FMA) 16GFlops Core SIMD(4FMA) 16GFlops Core SIMD(4FMA) 16GFlops L2$: 5MB 64GB/s Core SIMD(4FMA) 16GFLOPS MEM: 16GB 3D-Torus Network x y z 5GB/s x Bidirectional 5GB/s x Bidirectional 5GB/s x Bidirectional 5GB/s x Bidirectional 5GB/s x Bidirectional 5GB/s x Bidirectional

Cabinet of K computer 24 boards/cabinet 192 CPUs 24 TFLOPS 15

What is special in K computer? Network –High Bandwidth, Low Latency Processor for HPC –VISIMPACT Shared Cache & Hardware Barrier Multi-core parallelization of inner loop –HPC-ACE Register Extension SIMD 2FMA, 2 issue/cycle (4FMA/Core) Instructions for special functions (trigonometric, inverse, square-root, inverse square-root etc.) 16

17 T. Maruyama, Proc. Hot Chips 2009.

Software OS: Linux Compiler –Fujitsu compiler will support Fortran(2003), C(1999), C++(2003) GNU C/C++ extensions Automatic vectorization for SPARC64 VIIIfx OpenMP 3.0 MPI-2.1 –gcc may also be available. However, it cannot generate CPU specific instructions (e.g SIMD) and poor performance is expected.

How to use it? Five “Strategic Regions” has been selected. For these fields, MEXT will fund some research budget, and machine time will be delivered. General Use For general use, “registered organization” will control distribution of machine time. Commercial Use RIKEN does not responsible for the usage of the machine, basically.

HPCI: High Performance Computing Infrastructure System to utilize academic supercomputers in Japan 2012~ User Communities –5 strategic regions, Industrial Consortiums, National Universities and Institutes Computing Resource Provider –RIKEN AICS, University Centers, National Institutes 20

Basic Idea of HPCI 21 Logical Structure Physical Structure 25 Organization13 Organization

Problem in Future of HPC Hardware If the problem can be parallelized… Computing performance is cheap. However, in every aspects… Data movements dominates costs. –Core ー Cache –Cache ー Main Memory –Node ー Node –Node ー Disk –System ー System/Apparatus/Internet 22

Future Processors for HPC Gap between top-end HPC processors and commodity will increase What are needed for HPC –Many-core processors, Accelerators for “dense problems” –Chip stacking for bandwidth –Network integration Network will be the most important factor in HPC

Future Directions (1) Network integration is essential both for general- purpose machines and special-purpose ones Platform for Accelerators –General-purpose processor cores –Cache or local memory –Fast, low-latency on-chip and off-chip networks Network >30GB/s Memory ＞ 100GB/s Memory PU Accelerator On-chip Network >100GB/s/router

Future Directions (2) High Memory Bandwidth System –“Single-chip BlueGene/L” by System-on-Chip or Chip stacking by TSV –B/F 〜 1 –B/F 〜 0.1 for remote node Network >50GB/s Memory PU >500GB/s >500GFLOPS

Problem in Network Molecular Dynamics: Strong Scaling is important 〜 50,000 FLOP/particle/step N= GFLOP/step 5TFLOPS effective performance 1msec/step = 170nsec/day Rather Easy 5PFLOPS effective performance 1μsec/step = 200μsec/day??? Difficult, but important

Anton D. E. Shaw Research Special-purpose pipeline + General-purpose core + Dedicated Network By decreasing communication latency, it can achieve high sustained performance even for small systems R. O. Dror et al., Proc. Supercomputing 2009, in USB memory.

MDGRAPE-4 Special-purpose computer for molecular dynamics simulations Test bed for future HPC hardware FY2010-FY2012 System-on-Chip –Accelerator –Memory –General-purpose processor –Network ~4Tflops / chip

Fin 29