International conference “Modern problems of computational mathematics and mathematical modeling”, dedicated to the 90th anniversary of academician G.I.Marchuk.

Slides:



Advertisements
Similar presentations
Issues of HPC software From the experience of TH-1A Lu Yutong NUDT.
Advertisements

Parallel Computing Glib Dmytriiev
Tesla CUDA GPU Accelerates Computing The Right Processor for the Right Task.
The Development of Mellanox - NVIDIA GPUDirect over InfiniBand A New Model for GPU to GPU Communications Gilad Shainer.
1 Computational models of the physical world Cortical bone Trabecular bone.
CURRENT AND FUTURE HPC SOLUTIONS. T-PLATFORMS  Russia’s leading developer of turn-key solutions for supercomputing  Privately owned  140+ employees.
Early Linpack Performance Benchmarking on IPE Mole-8.5 Fermi GPU Cluster Xianyi Zhang 1),2) and Yunquan Zhang 1),3) 1) Laboratory of Parallel Software.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Porting, Benchmarking, and Optimizing Computational Material.
OpenFOAM on a GPU-based Heterogeneous Cluster
LinkSCEEM-2: A computational resource for the development of Computational Sciences in the Eastern Mediterranean Mostafa Zoubi SESAME SESAME – LinkSCEEM.
HIGH PERFORMANCE COMPUTING ENVIRONMENT The High Performance Computing environment consists of high-end systems used for executing complex number crunching.
CS 240A Applied Parallel Computing John R. Gilbert Thanks to Kathy Yelick and Jim Demmel at UCB for.
An Introduction to Princeton’s New Computing Resources: IBM Blue Gene, SGI Altix, and Dell Beowulf Cluster PICASso Mini-Course October 18, 2006 Curt Hillegas.
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.
Linux clustering Morris Law, IT Coordinator, Science Faculty, Hong Kong Baptist University.
Parallelization and CUDA libraries Lei Zhou, Yafeng Yin, Hong Man.
1 petaFLOPS+ in 10 racks TB2–TL system announcement Rev 1A.
Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.
HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster.
Institutional Research Computing at WSU: Implementing a community-based approach Exploratory Workshop on the Role of High-Performance Computing in the.
Iterative computation is a kernel function to many data mining and data analysis algorithms. Missing in current MapReduce frameworks is collective communication,
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
Data Partitioning on Heterogeneous Multicore and Multi-GPU Systems Using Functional Performance Models of Data-Parallel Applications Published in: Cluster.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Statistical Performance Analysis for Scientific Applications Presentation at the XSEDE14 Conference Atlanta, GA Fei Xing Haihang You Charng-Da Lu July.
Computing Environment in Chinese Academy of Sciences Dr. Xue-bin Dr. Zhonghua Supercomputing Center Computer Network.
INFSO-RI Enabling Grids for E-sciencE EGEODE VO « Expanding GEosciences On DEmand » Geocluster©: Generic Seismic Processing Platform.
Sobolev Showcase Computational Mathematics and Imaging Lab.
General Purpose Computing on Graphics Processing Units: Optimization Strategy Henry Au Space and Naval Warfare Center Pacific 09/12/12.
GPU in HPC Scott A. Friedman ATS Research Computing Technologies.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
Use of GPUs in ALICE (and elsewhere) Thorsten Kollegger TDOC-PG | CERN |
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
Issues in (Financial) High Performance Computing John Darlington Director Imperial College Internet Centre Fast Financial Algorithms and Computing 4th.
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
Software Overview Environment, libraries, debuggers, programming tools and applications Jonathan Carter NUG Training 3 Oct 2005.
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
2011/08/23 國家高速網路與計算中心 Advanced Large-scale Parallel Supercluster.
Third-party software plan Zhengji Zhao NERSC User Services NERSC User Group Meeting September 19, 2007.
LCG LCG-1 Deployment and usage experience Lev Shamardin SINP MSU, Moscow
Data Structures and Algorithms in Parallel Computing Lecture 7.
Integration center of the cyberinfrastructure of NRC “KI” Dubna, 16 july 2012 V.E. Velikhov V.A. Ilyin E.A. Ryabinkin.
Intermediate Parallel Programming and Cluster Computing Workshop Oklahoma University, August 2010 Running, Using, and Maintaining a Cluster From a software.
Parallel Programming & Cluster Computing Linear Algebra Henry Neeman, University of Oklahoma Paul Gray, University of Northern Iowa SC08 Education Program’s.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.
Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.
The Library Approach to GPU Computations of Initial Value Problems Dave Yuen University of Minnesota, U.S.A. with Larry Hanyk and Radek Matyska Charles.
NREL is a national laboratory of the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, operated by the Alliance for Sustainable.
Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
Multicore Applications in Physics and Biochemical Research Hristo Iliev Faculty of Physics Sofia University “St. Kliment Ohridski” 3 rd Balkan Conference.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
High Performance Computing Seminar
29 th of March 2016 HPC Applications and Parallel Computing Paradigms.
HPC Roadshow Overview of HPC systems and software available within the LinkSCEEM project.
HPC usage and software packages
A survey of Exascale Linear Algebra Libraries for Data Assimilation
LinkSCEEM-2: A computational resource for the development of Computational Sciences in the Eastern Mediterranean Mostafa Zoubi SESAME Outreach SESAME,
Tohoku University, Japan
Trends and Challenges in High Performance Computing Hai Xiang Lin Delft Institute of Applied Mathematics Delft University of Technology.
Overview of HPC systems and software available within
CARLA Buenos Aires, Argentina - Sept , 2017
Chapter 01: Introduction
Parallel computing in Computational chemistry
Presentation transcript:

International conference “Modern problems of computational mathematics and mathematical modeling”, dedicated to the 90th anniversary of academician G.I.Marchuk AlgoWiki: an Open Encyclopedia of Parallel Algorithmic Features Prof. Vladimir Voevodin Deputy Director RCC MSU Head of Department on Supercomputers and Quantum Informatics CMC MSU International conference “Modern problems of computational mathematics and mathematical modeling”, dedicated to the 90th anniversary of academician G.I.Marchuk AlgoWiki: an Open Encyclopedia of Parallel Algorithmic Features Prof. Vladimir Voevodin Deputy Director RCC MSU Head of Department on Supercomputers and Quantum Informatics CMC MSU June 10 th, 2015, INM RAS, Moscow

StrelaSetunBESM-6 BlueGene/P “Chebyshev” “Lomonosov” Computing Facilities of Moscow State University (from 1956 up to now)

Research on Algorithm and Application Structures (what are our algorithms?) How often did we change programming paradigm? (How many times did we have to rewrite our applications completely?) 70s - Loop Vectorization (innermost) 80s - Loop Parallelization (outer) + Vectorization (innermost) 90s - MPI mid 90s - OpenMP mid 2000s - MPI+OpenMP 2010s - CUDA, OpenCL, MPI+OpenMP+accelerators … Do you see the end of this rewriting process?

Tablets, Smartphones… PCs, Laptops… Servers… Supercomputers

Tablets, Smartphones… PCs, Laptops… Servers… Supercomputers Degree of parallelism

15 years ago … (24 CPUs, Intel P-III/500 MHz, SCI network, 8 m 2, 12 Gflops) 15 years ago … (24 CPUs, Intel P-III/500 MHz, SCI network, 8 m 2, 12 Gflops)

HPC Stages at MSU

M.V.Lomonosov 1711 – 1765 … and now ( Intel cores, 2065 NVIDIA GPUs, QDR IB, 1000 m 2, 1.7 Pflops) … and now ( Intel cores, 2065 NVIDIA GPUs, QDR IB, 1000 m 2, 1.7 Pflops)

Top500 the most powerful computers… ( November 2014) Top500 the most powerful computers… ( November 2014)

MSU Supercomputer “Lomonosov-2” 1 rack = 256 nodes: Intel (14c) + NVIDIA = 515 Tflop/s “Lomonosov-2” Supercomputer (5 racks) = 2.5 Pflop/s MSU Supercomputer Center: Users: 2511 Projects: 1607 Organizations: 302 MSU Faculties/ Institutes : 20+ Computational science is everywhere…

What do we need to know to control efficiency of supercomputer centers ? Licenses Projects Users Softwarecomponents Quotas Organizations Hardwarecomponents Statuses Applications Partitions Queues JobsUsersSysAdmins Management Is it difficult to control few components ? A few ?..

A few? Info on MSU “Lomonosov” Supercomputer : (1.7 Pflops, 6000 computing nodes, 12K CPUs, 2K GPUs…) Licenses Projects Users Softwarecomponents Quotas Organizations Hardwarecomponents Statuses Applications Partitions Queues Jobs per day 20UsersSysAdmins Management Current trend: all these numbers grow extremely fast! Current trend: all these numbers grow extremely fast!

Software Infrastructure of “Lomonosov” Supercomputer (what about scalability?) Software packages, systems, tools, libraries – 60+, and this number grows very fast: Intel ICC/IFORT, GCC, PathScale, PGI, MPIs, Intel VTune Performance Analyzer, Intel Cluster Tools, RogueWave TotalView, RogueWave ThreadSpotter, Allinea DDT, ScaLAPACK, ATLAS, IMKL, AMCL, BLAS, LAPACK, FFTW, cuBLAS, cuFFT, MAGMA, cuSPARSE, CUSP, and cuRAND… VASP, WIEN2k, CRYSTAL, Gaussian, MOLPRO, Turbomole, Accelrys Material Studio, MesoProp, MOLCAS, Gromacs, FireFly, LAMMPS, NAMD, GAMESS, Quantum ESPRESSO, ABINIT, Autodock, CP2K, NWChem, PRIRODA, SIESTA, Amber, CPMD, DL POLY, VMD, GULP, Aztec, Geant, OpenFOAM, PARMETIS, FDMNES, GSL, METIS, Msieve, Octave, OpenMX, PETSc, SMEAGOL, VisIt, VTK, WRF…

Fine analysis of supercomputing applications: dynamic features

Fine analysis of supercomputing applications: dynamic features

Fine analysis of supercomputing applications: dynamic features

I.Description of properties and structures: theoretical part (machine-independent properties) II.Description of properties and structures: implementation issues (programming technologies, static and dynamic features, properties of computer architectures) Algorithms: description of properties and structures (from mobile platforms to exascale supercomputers) AlgoWiki

AlgoWiki

AlgoWiki

AlgoWiki

AlgoWiki

I. Description of properties and structures: theoretical part General description of algorithms Mathematical description of algorithms Computational kernel Macro structure of algorithms A description of algorithms’ serial implementation Serial complexity of algorithms Information graph Describing the resource parallelism of algorithms Input/output data description Algorithm’s properties … AlgoWiki Algorithms: description of properties and structures (from mobile platforms to exascale supercomputers)

AlgoWiki Algorithms: description of properties and structures (information graph)

AlgoWiki Algorithms: description of properties and structures (information graph)

AlgoWiki Algorithms: description of properties and structures (information graph)

AlgoWiki Algorithms: description of properties and structures (information graph)

AlgoWiki Algorithms: description of properties and structures (information graph)

AlgoWiki Algorithms: description of properties and structures

I. Description of properties and structures: theoretical part General description of algorithms Mathematical description of algorithms Computational kernel Macro structure of algorithms A description of algorithms’ serial implementation Serial complexity of algorithms Information graph Describing the resource parallelism of algorithms Input/output data description Algorithm’s properties … AlgoWiki Algorithms: description of properties and structures (from mobile platforms to exascale supercomputers)

Serial and parallel complexity ratio for the algorithm; computational intensity of an algorithm (ratio of a number of arithmetic operations to amount of data); balance of different types of operations; determinacy of the algorithm: number of iterations, data structures (e.g., the structure of sparse matrices), use of random number generators, use of associative operations: the effect of round-off errors. “long” edges in the information graph and other similar features of families of edges; intensity of data usage; known compact representations of the information graph; … AlgoWiki Algorithms: description of properties and structures (algorithm’s properties)

AlgoWiki MPI_Reduce( buf, …) Algorithms: description of properties and structures (determinacy of algorithms: collective operations)

II. Description of properties and structures: implementation issues Peculiarities of implementations of the serial algorithm; Description of data and computation locality; Possible methods and considerations for parallel implementation of the algorithm; Scalability of the algorithm and its implementations; Dynamic characteristics and efficiency of algorithm implementations; Conclusions for different classes of computer architectures; Existing implementations of the algorithm; … AlgoWiki Algorithms: description of properties and structures (from mobile platforms to exascale supercomputers)

Description of data locality AlgoWiki Canonical analysis of algorithms: ops = time Make algorithm faster = minimize ops, compares…

Courtesy of Vad.Voevodin, MSU FFT Linpack Random Access AlgoWiki Algorithms: description of properties and structures (description of data locality)

Courtesy of Vad.Voevodin, MSU AlgoWiki Algorithms: description of properties and structures (data locality of Cholesky factorization)

Courtesy of A.Teplov, MSU AlgoWiki AlgoWiki and Top500: Top500 = “Linpack + performance + computers”. “Algorithms + performance + computers” is an element of the Part II of AlgoWiki. Algorithms: description of properties and structures (scalability of algorithms: performance)

Courtesy of A.Teplov, MSU AlgoWiki Algorithms: description of properties and structures (scalability of algorithms: efficiency)

Courtesy of A.Teplov, MSU AlgoWiki Algorithms: description of properties and structures (dynamic features)

Проведение анализа масштабируемости программ 40 Performance (Gflops) Performance of 1 core L1 cache misses (mln/s) L3 cache-misses (mln/s) Number of Load ops (bln/s) Number of Store ops (mln/s) Data transfer rate (MB/s) Average data transfer rate (MB/s) Data transfer rate (thousands packets/s) Courtesy of A.Teplov, MSU

AlgoWiki AlgoWiki is a project for the entire computing community !

International conference “Modern problems of computational mathematics and mathematical modeling”, dedicated to the 90th anniversary of academician G.I.Marchuk Thank you ! June 10 th, 2015, INM RAS, Moscow