1 Toward Petascale Programming and Computing, Challenges and Collaborations Serge G. Petiton PAAP workshop, RIKEN.

Slides:

Advertisements

Similar presentations

A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMS Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science GABRIEL CRAMER.

Advertisements

Priority Research Direction Key challenges General Evaluation of current algorithms Evaluation of use of algorithms in Applications Application of “standard”

Seunghwa Kang David A. Bader Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce Cluster and a Highly Multithreaded System.

Towards a Virtual European Supercomputing Infrastructure Vision & issues Sanzio Bassini

Parallel Research at Illinois Parallel Everywhere

HPC - High Performance Productivity Computing and Future Computational Systems: A Research Engineer’s Perspective Dr. Robert C. Singleterry Jr. NASA Langley.

A system Performance Model Instructor: Dr. Yanqing Zhang Presented by: Rajapaksage Jayampthi S.

1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,

OpenFOAM on a GPU-based Heterogeneous Cluster

Claude TADONKI Mines ParisTech – LAL / CNRS / INP 2 P 3 University of Oujda (Morocco) – October 7, 2011 High Performance Computing Challenges and Trends.

Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,

Dynamic adaptation of parallel codes Toward self-adaptable components for the Grid Françoise André, Jérémy Buisson & Jean-Louis Pazat IRISA / INSA de Rennes.

DCABES 2009 China University Of Geosciences 1 The Parallel Models of Coronal Polarization Brightness Calculation Jiang Wenqian.

Workload Management Massimo Sgaravatto INFN Padova.

1 Resolution of large symmetric eigenproblems on a world-wide grid Laurent Choy, Serge Petiton, Mitsuhisa Sato CNRS/LIFL HPCS Lab. University of Tsukuba.

Chapter 2 Computer Clusters Lecture 2.1 Overview.

Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.

1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.

1 Down Place Hammersmith London UK 530 Lytton Ave. Palo Alto CA USA.

Computer Science Perspective Ludek Matyska Faculty of Informatics, Masaryk University, Brno and also CESNET, Prague.

Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.

Parallel Performance of Hierarchical Multipole Algorithms for Inductance Extraction Ananth Grama, Purdue University Vivek Sarin, Texas A&M University Hemant.

An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.

Priority Research Direction Key challenges Fault oblivious, Error tolerant software Hybrid and hierarchical based algorithms (eg linear algebra split across.

ET E.T. International, Inc. X-Stack: Programming Challenges, Runtime Systems, and Tools Brandywine Team May2013.

© Fujitsu Laboratories of Europe 2009 HPC and Chaste: Towards Real-Time Simulation 24 March

COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.

1 Discussions on the next PAAP workshop, RIKEN. 2 Collaborations toward PAAP Several potential topics : 1.Applications (Wave Propagation, Climate, Reactor.

Optimized Java computing as an application for Desktop Grid Olejnik Richard 1, Bernard Toursel 1, Marek Tudruj 2, Eryk Laskowski 2 1 Université des Sciences.

Neural and Evolutionary Computing - Lecture 10 1 Parallel and Distributed Models in Evolutionary Computing  Motivation  Parallelization models  Distributed.

N. GSU Slide 1 Chapter 02 Cloud Computing Systems N. Xiong Georgia State University.

Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.

Cluster Reliability Project ISIS Vanderbilt University.

Exploiting Data Parallelism in SELinux Using a Multicore Processor Bodhisatta Barman Roy National University of Singapore, Singapore Arun Kalyanasundaram,

Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo, Vignesh T. Ravi, Gagan Agrawal Department of Computer Science and Engineering.

4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.

A Survey of Distributed Task Schedulers Kei Takahashi (M1)

F. Cappello, O. Richard, P. Sens ---oo Draft oo--- Contact us for experiment proposal Grid eXplorer (GdX) An Instrument for eXploring the GRID F. Cappello,

StreamX10: A Stream Programming Framework on X10 Haitao Wei School of Computer Science at Huazhong University of Sci&Tech.

Heavy and lightweight dynamic network services: challenges and experiments for designing intelligent solutions in evolvable next generation networks Laurent.

Stochastic optimization of energy systems Cosmin Petra Argonne National Laboratory.

Service - Oriented Middleware for Distributed Data Mining on the Grid ，劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.

1 CMPE 511 HIGH PERFORMANCE COMPUTING CLUSTERS Dilek Demirel İşçi.

Numerical Libraries Project Microsoft Incubation Group Mary Beth Hribar Microsoft Corporation CSCAPES Workshop June 10, 2008 Copyright Microsoft Corporation,

2007/11/2 First French-Japanese PAAP Workshop 1 The FFTE Library and the HPC Challenge (HPCC) Benchmark Suite Daisuke Takahashi Center for Computational.

Summary Background –Why do we need parallel processing? Moore’s law. Applications. Introduction in algorithms and applications –Methodology to develop.

Introduction to Research 2011 Introduction to Research 2011 Ashok Srinivasan Florida State University Images from ORNL, IBM, NVIDIA.

Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.

OpenWP: A directive based language and runtime for coarse grain distributed executions Matthieu Cargnelli*, Guillaume Alléon*°, Franck Cappello° * EADS,

Allen D. Malony Department of Computer and Information Science TAU Performance Research Laboratory University of Oregon Discussion:

1)Leverage raw computational power of GPU  Magnitude performance gains possible.

October 2008 Integrated Predictive Simulation System for Earthquake and Tsunami Disaster CREST/Japan Science and Technology Agency (JST)

Design Issues of Prefetching Strategies for Heterogeneous Software DSM Author :Ssu-Hsuan Lu, Chien-Lung Chou, Kuang-Jui Wang, Hsiao-Hsi Wang, and Kuan-Ching.

1 Grid Activity Summary » Grid Testbed » CFD Application » Virtualization » Information Grid » Grid CA.

What Programming Paradigms and algorithms for Petascale Scientific Computing, a Hierarchical Programming Methodology Tentative Serge G. Petiton June 23rd,

Data Structures and Algorithms in Parallel Computing Lecture 7.

AUTO-GC: Automatic Translation of Data Mining Applications to GPU Clusters Wenjing Ma Gagan Agrawal The Ohio State University.

EU-Russia Call Dr. Panagiotis Tsarchopoulos Computing Systems ICT Programme European Commission.

Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.

INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.

Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.

Page : 1 SC2004 Pittsburgh, November 12, 2004 DEISA : integrating HPC infrastructures in Europe DEISA : integrating HPC infrastructures in Europe Victor.

Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.

Workload Management Workpackage

Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming

NEGST January 2006-(may 2007)-December 2008 Serge G. Petiton, CNRS/LIFL The objective of the NEGST project is to promote the collaborations of Japan and.

Summary Background Introduction in algorithms and applications

Hybrid Programming with OpenMP and MPI

Wide Area Workload Management Work Package DATAGRID project

Defining the Grid Fabrizio Gagliardi EMEA Director Technical Computing

Presentation transcript:

1 Toward Petascale Programming and Computing, Challenges and Collaborations Serge G. Petiton PAAP workshop, RIKEN

2 Outline 1.Introduction 2.Present Problems and Future Architectures 3.From Worldwide GRID to a unique core 4.The End-user and Petascale Computing Challenges 5.Past and Present Researches to evaluate the Petascale Programming 6.Challenges and Collaborations

3 Outline 1.Introduction 2.Present Problems and Future Architectures 3.From Worldwide GRID to a unique core 4.The End-user and Petascale Computing Challenges 5.Past and Present Researches to evaluate the Petascale Programming 6.Challenges and Collaborations

4 Introduction (Several) Petascale computing will call for a new ecosystem. Scientific end-users would have to adapt or rewrite software, propose new methods,….as they already have to on the past. Nevertheless, the evolution may be more complex this time. Initiated last year by the French Embassy in Tokyo, the French HPC mission concludes that such challenges would generate more intensive Japanese-French collaborations in this scientific domain and propose to organise a first workshop in 2007 in this subject. Applications, Algorithms, Middleware, Languages, Methods, and Programming Paradigms, for example, have to be proposed for future supercomputer/platforms/cluster/other with respect to several potential architectures.

5 Outline 1.Introduction 2.Present Problems and Future Architectures 3.From Worldwide GRID to a unique core 4.The End-user and Petascale Computing Challenge 5.Past and Present Researches to evaluate the Petascale Programming 6.Challenges and Collaborations

6 Petascale and beyond Approximately another factor 10 has to be obtained to reach Petascale sustained performances. Nevertheless, limitation of the programming paradigms, software and the present ecosystem would probably occur around the 10 Petaflop frontier. We would be face to several complex computer scientific challenges which would have to integrate historical computational scientific end user behaviours since decade (such as languages, sotfware,…). We have to begin to predict and to evaluate the future programming paradigms for end users and to begin to extrapolate what would be systems, languages, algorithmic, arithmetic for future Petascale computers.

7 Parallel Architectures for HPC From vector pipelined architecture to Multi-SMP computer with accelerators, Hybrid, Heterogeneous computers Cluster of clusters, multi-level architectures Inter node communications are often the problems for many applications Power consumptions lead the Moore’s law to a dead end, processor frequencies would not increase as usually in the future. Instead cores will proliferate. Elementary Functions and Floating point arithmetic need to be adapted to the size of the new applications and numerical method stabilities have to increase.

8 GRID for HPC From virtual supercomputers to P2P desktop grids Virtualization of the computing resources for the end users Computational science data mining and data analysis, tools for scientists Multi-parameter task farming programming Mixed-methods, coupled software Multi-physic applications Asynchronous methods, Hybrid methods Fault tolerance, load balancing, security, orchestration, scheduling strategies,….. Communications are very slows compare to the computing resources (what is the reference memory level?)

9 Outline 1.Introduction 2.Present Problems and Future Architectures 3.From Worldwide GRID to a unique core 4.The End-user versus Petascale Computing 5.Past and Present Researches to evaluate the Petascale Programming 6.Challenges and Collaborations

10 Parallel and Distributed Computing for Petascale Applications Parallel programming paradigms and distributing computing methodologies merge toward a global large scale programming challenge. From Grid-oriented algorithms to Many-core programming optimizations Fault-tolerance, load-balancing, scheduling strategies, resource orchestration would have to be optimize for Petascale computers It is over to wait 18 months to get double sequential performances, we have to parallelize applications also to obtain accelerations with the many-core processors. From the Grid-level to the Many-core-level, communications would be the difficult criteria to optimize.

11 Several Correlated Challenges Large scale, several dozens of hundreds of thousand of processors Nodes with intra-communication problems SIMD or pipelined (or others) accelerators Many-core processors Low power processor From Grid-level to Core-level communications Integration of parallel computing and distributed computing toward a large scale global computing challenge Network : from worldwide latency to optical network for intra- core communications Multi-arithmetic and accurate normalize elementary functions

12 Hierarchical Computing GRID of supercomputer Supercomputer : clusters (of clusters) of nodes (dozen) Nodes (dozen or hundred of thousand) Processors (dozen, or less) Cores (dozen, hundreds) Threads (dozens) VLIW SIMD K-adic operations…………… What about Systems, Compilers, Languages and Programming. What efficiency?

13 Outline 1.Introduction 2.Present Problems and Future Architectures 3.From Worldwide GRID to a unique core 4.The End-user and Petascale Computing Challenges 5.Past and Present Researches to evaluate the Petascale Programming 6.Challenges and Collaborations

14 The end user will have to program such computers and to obtain accurate results Applications, Numerical Methods Middleware/Language Programming paradigms, Algorithmic Hierarchical programming using component technologies have to be introduced. High level workflow Languages must be developed, each component also need other adapted Languages. End user expertises may be synthetized and communicated thought all the levels

15 Outline 1.Introduction 2.Present Problems and Future Architectures 3.From Worldwide GRID to a unique core 4.The End-user and Petascale Computing Challenges 5.Past and Present Researches to evaluate the Petascale Programming 6.Challenges and Collaborations

16 Methods Krylov based methods for linear sparse linear algebra : Lanczos, Arnoldi, GMRES Dense Linear algebra methods : Block Gauss-Jordan Basic matrix computation : A(Ax+x)+x,…… Hybrid methods (co-methods) : MERAM, GMRES- ARNOLDI-LS. And : QR method and factorization, LS, inverse iterations,…

17 Platforms, Middleware and framework Several parallel computers Clusters in Tsukuba A Japanese-French platform, with dozens of PC in France and clusters in Tsukuba GRID5000 IBM Cell Tsubame (new) OminRPC, Condor, XtremWeb And YML

18 Experimentations Performance evaluation with respect to several parameters, with scheduling strategy emulations and extrapolations Co-methods and hybrid methods. Iterative restarted methods. Out-of-core computing to optimize the computing/communication ratio. …/…. Parallel methods adapted for cluster computing with strategies to save power without losing too much efficiency

19 World-wide grid experiments Experimental platforms, numerical settings Computing and network resources University of Tsukuba –Homogeneous dedicated clusters (Dual Xeon ~3GHz,1 to 4 GB) University of Lille 1 –Heterogeneous NOWs (Celeron 1.4 GHz to P4 3.2 Ghz, 128MB to 1GB) –Shared with students Internet OmniRPC 4 Platforms 2 local platforms: 29 / 58 nodes, Lille 2 world-wide platforms –58 (29 Lille+ 29 Tsukuba dual-proc.) –116 (58 Lille, 58 Tsukuba dual-proc.)

20 Grid'5000 experiments Presentation, motivations Up to 9 sites distributed in France Dedicated PC with reservation policy Fast and dedicated Network –RENATER (1GBit/s to 10GBit/s) PC are homogeneous (few exceptions) Homogeneous environment (deployment strategy) For those experiments Orsay: up to 300 single-CPU nodes Lille: up to 60 single-CPU nodes Nice: up to 60 dual-CPU nodes Rennes: up to 70 dual-CPU nodes

21 Hybrid GMRES(m,v,r)/Arnoldi(q,w,u)-LS(k,l)

22 Les résultats sur Grid5000(II) hybridation pour nG=2,5,8,10 comparee avec GMRES pur (n=1700,nA=2,m(GMRES)=358,m(Arnoldi)=150,k=30,l=30)

23 Les résultats sur Grid5000(Ш) Implantation sur deux sites comparee avec implantation sur un site (n=1700,nG=4,nA=2,m(GMRES)=400,m(Arnoldi)=256,k=10,l=1)

24 Résultats et comparaison (I) P-Time GMRES méthode hybride (n=3060,nA=2,m(GMRES)=400,m(Arnoldi)=50,K=30,L=30) Nombre de processeurs Temps d’exécution

25 Workflow programming for HPC End users have to program such platform They have to use component technologies They want to develop software independently of the middleware The end-user expertises have to be exploited YML propose such framework We experiment with three middleware and evaluate the generated overhead.

26 YML Design

27 Exemple de graphe de composants/tâches Begin node End node Graph node Dependence par compute tache1(..); signal(e1); // compute tache2(..); migrate matrix(..); signal(e2); // wait(e1 and e2); Par compute tache3(..); signal(e3); // compute tache4(..); signal(e4); // compute tache5(..); filter-datacut(..); signal(e5); visualize mesh(…) ; end par // wait(e3 and e4 and e5); compute tache6(..); compute tache7(..); end par Generic component node

28 Outline 1.Introduction 2.Present Problems and Future Architectures 3.From Worldwide GRID to a unique core 4.The End-user and Petascale Computing Challenges 5.Past and Present Researches to evaluate the Petascale Programming 6.Challenges and Collaborations

29 Challenges Many are well-defined but open problems are still numerous, Several of them will be discussed during the workshop, Application behaviors on these future number crushers have to be studied, Middleware and Languages for each level of the hierarchical execution model have to be proposed, and their interoperability well-defined. Basic numerical and linear algebra methods have to be stabilized and/or adapted for larger scale applications, and new ones have to be proposed Last but not least, the programming paradigms for end-users are the key for the future of the computers, and the expertise of the end users have to be exploited.

30 Collaborations are the main goal of this workshop. ………… rendez-vous tomorrow for the “discussions for the next”