Fine Grain MPI Earl J. Dodd Humaira Kamal, Alan University of British Columbia 1.

Slides:

Advertisements

Similar presentations

Introduction to Grid Application On-Boarding Nick Werstiuk

Advertisements

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University

Big Data Open Source Software and Projects ABDS in Summary XIV: Level 14B I590 Data Science Curriculum August Geoffrey Fox

Class CS 775/875, Spring 2011 Amit H. Kumar, OCCS Old Dominion University.

A Dynamic World, what can Grids do for Multi-Core computing? Daniel Goodman, Anne Trefethen and Douglas Creager

Lecturer: Sebastian Coope Ashton Building, Room G.18 COMP 201 web-page: Lecture.

Hadi Salimi Distributed Systems Labaratory, School of Computer Engineering, Iran University of Science and Technology, Fall

Running Large Graph Algorithms – Evaluation of Current State-of-the-Art Andy Yoo Lawrence Livermore National Laboratory – Google Tech Talk Feb Summarized.

GridRPC Sources / Credits: IRISA/IFSIC IRISA/INRIA Thierry Priol et. al papers.

CoreGRID Workpackage 5 Virtual Institute on Grid Information and Monitoring Services Authorizing Grid Resource Access and Consumption Erik Elmroth, Michał.

1 Dr. Frederica Darema Senior Science and Technology Advisor NSF Future Parallel Computing Systems – what to remember from the past RAMP Workshop FCRC.

Tile Reduction: the first step towards tile aware parallelization in OpenMP Ge Gan Department of Electrical and Computer Engineering Univ. of Delaware.

Measuring Network Performance of Multi-Core Multi-Cluster (MCMCA) Norhazlina Hamid Supervisor: R J Walters and G B Wills PUBLIC.

Performance Technology for Complex Parallel Systems REFERENCES.

WORKFLOWS IN CLOUD COMPUTING. CLOUD COMPUTING  Delivering applications or services in on-demand environment  Hundreds of thousands of users / applications.

07/14/08. 2 Points Introduction. Cluster and Supercomputers. Cluster Types and Advantages. Our Cluster. Cluster Performance. Cluster Computer for Basic.

1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,

Resource Fabrics: The Next Level of Grids and Clouds Lei Shi.

Hossein Bastan Isfahan University of Technology 1/23.

Iterative computation is a kernel function to many data mining and data analysis algorithms. Missing in current MapReduce frameworks is collective communication,

Ch 4. The Evolution of Analytic Scalability

Marco Blumendorf I July 21th, 2009 Towards a Model-Based Framework for the Development of Adaptive Multimodal User Interfaces.

June Amsterdam A Workflow Bus for e-Science Applications Dr Zhiming Zhao Faculty of Science, University of Amsterdam VL-e SP 2.5.

1 Developing Native Device for MPJ Express Advisor: Dr. Aamir Shafi Co-advisor: Ms Samin Khaliq.

Evaluating the Performance of MPI Java in FutureGrid Nigel Pugh 2, Tori Wilbon 2, Saliya Ekanayake 1 1 Indiana University 2 Elizabeth City State University.

 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

Atlanta, Georgia TiNy Threads on BlueGene/P: Exploring Many-Core Parallelisms Beyond The Traditional OS Handong Ye, Robert Pavel, Aaron Landwehr, Guang.

CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.

CCA Common Component Architecture Manoj Krishnan Pacific Northwest National Laboratory MCMD Programming and Implementation Issues.

W HAT IS H ADOOP ? Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity.

Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.

Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.

SPMD: Single Program Multiple Data Streams

AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author ： Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source ： Proceedings of the 2nd IASTED.

Architectural Blueprints The “4+1” View Model of Software Architecture

Benchmarking MapReduce-Style Parallel Computing Randal E. Bryant Carnegie Mellon University.

Recent Advances in SmartGridSolve Oleg Girko School of Computer Science & Informatics University College Dublin.

LATA: A Latency and Throughput- Aware Packet Processing System Author: Jilong Kuang and Laxmi Bhuyan Publisher: DAC 2010 Presenter: Chun-Sheng Hsueh Date:

MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.

Service - Oriented Middleware for Distributed Data Mining on the Grid ，劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.

Experts in numerical algorithms and HPC services Compiler Requirements and Directions Rob Meyer September 10, 2009.

Experiment Management with Microsoft Project Gregor von Laszewski Leor E. Dilmanian Link to presentation on wiki 12:13:33Service Oriented Cyberinfrastructure.

Abstract A Structured Approach for Modular Design: A Plug and Play Middleware for Sensory Modules, Actuation Platforms, Task Descriptions and Implementations.

1 "Workshop 31: Developing a Hands-on Undergraduate Parallel Programming Course with Pattern Programming SIGCSE The 44 th ACM Technical Symposium.

CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.

Distributed simulation with MPI in ns-3 Joshua Pelkey and Dr. George Riley Wns3 March 25, 2011.

An answer to your common XACML dilemmas Asela Pathberiya Senior Software Engineer.

International Symposium on Grid Computing (ISGC-07), Taipei - March 26-29, 2007 Of 16 1 A Novel Grid Resource Broker Cum Meta Scheduler - Asvija B System.

Advanced Computer Networks Lecture 1 - Parallelization 1.

Xi He Golisano College of Computing and Information Sciences Rochester Institute of Technology Rochester, NY THERMAL-AWARE RESOURCE.

Memcached Integration with Twister Saliya Ekanayake - Jerome Mitchell - Yiming Sun -

Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.

Outline  Introduction  Subgraph Pattern Matching  Types of Subgraph Pattern Matching  Models of Computation  Distributed Algorithms  Performance.

Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.

Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.

PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.

C. Murad Özsert Intel's Tera Scale Processor Architecture.

INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.

Cloud-based Parallel Implementation of SLAM for Mobile Robots Supun Kamburugamuve, Hengjing He, Geoffrey Fox, David Crandall School of Informatics & Computing.

Parallel Programming Models

Parallel Programming By J. H. Wang May 2, 2017.

Using SCTP to hide latency in MPI programs

Model-Driven Analysis Frameworks for Embedded Systems

Toward a Unified HPC and Big Data Runtime

HPC User Forum 2012 Panel on Potential Disruptive Technologies Emerging Parallel Programming Approaches Guang R. Gao Founder ET International.

Ch 4. The Evolution of Analytic Scalability

Cluster Load Balancing for Fine-grain Network Services

COMP60621 Fundamentals of Parallel and Distributed Systems

COMP60611 Fundamentals of Parallel and Distributed Systems

Presentation transcript:

Fine Grain MPI Earl J. Dodd Humaira Kamal, Alan University of British Columbia 1

Agenda Motivation Fine-Grain MPI Key System Features Novel Program Design. 2

Introduction of multicore has changed the architecture of modern processors dramatically. Plethora of languages and frameworks have emerged to express fine-grain concurrency on multicore systems. 3

New Languages and Frameworks golang parallelthreads/processesconcurrency clustermulticore

cluster computing How to take advantage of multicore with seamless execution across a cluster?

MPI + X OpenMP UPC PGAS ? Let X = MPI

FG-MPI: FINE-GRAIN MPI o FG-MPI extends the execution model of the Message Passing Interface (MPI) to expose large-scale, fine-grain concurrency. 7

Decoupling an MPI process from an OS-level process.

FG-MPI System Has light-weight, scalable design integrated into MPICH middleware which leverages its architecture. Implements location-aware communication inside OS- processes and nodes. Allows the user to scale to millions of MPI processes without needing the corresponding number of processor cores. Allows granularity of MPI programs to be adjusted through the command-line to better fit the cache leading to improved performance. Enables design of novel algorithms and vary the number of MPI processes to match the problem rather than the hardware. Enables task oriented program design due to decoupling from hardware and support for function-level concurrency.

Executing FG-MPI Programs o Example of SPMD MPI program with 16 MPI processes, assuming two nodes with quad-core. 8 pairs of processes executing in parallel, where each pair interleaves execution. mpiexec –nfg 2 –n 8 myprog

Decoupled from Hardware Fit the number of processes to the problem rather than the number of cores. mpiexec –nfg 250 –n 4 myprog

Flexible Process Mapping Flexibly move the boundary of MPI processes mapped to OS-processes, cores and machines. mpiexec –nfg 1000 –n 4 myprog mpiexec –nfg 500 –n 8 myprog mpiexec –nfg 750 –n 4 myprog: -nfg 250 –n 4 myprog

Scalability Can have hundreds and thousands of MPI processes on a laptop or cluster. 100 Million processes on 6500 cores. mpiexec –nfg –n 8 myprog mpiexec –nfg –n 6500 myprog

Novel Program Design o Modelling of emergent systems Bird flocking. o Distributed data structures Every data item is an MPI process.

Dynamic Graph Applications FG-MPI Distributed Skip- list with support for Range- querying Companies with an Executive in common: Every dot represents a executive/director from a publicly listed company; People are connected to one another if they served the company at the same time. How to query large amounts of real-time data to extract relationship information? Scalable, using thousands of processors executing on over 200 cores  Twitter feeds  Sensor data feeds  Financial data

Technical Deep-Dive Webinar FG-MPI: A Finer Grain Concurrency Model for MPI March 19, 2014 at 3:00 PM - 4:00 PM CT Society of HPC Professionals (SHPCP)

Acknowledgements We acknowledge the support of the on- going FG-MPI project by: Intel Corporation, Inc. Mitacs Canada. NSERC (Natural Sciences and Engineering Research Council of Canada)

Thank You … or google “FG-MPI” Dr. Alan Wagner UBC Dr. Humaira Kamal UBC Sarwar Alam UBC Earl J. Dodd Scalable Analytics Inc

Publications H. Kamal and A. Wagner. An integrated fine-grain runtime system for MPI. Journal of Computing, Springer, May 2013, 17 pages. Sarwar Alam, Humaira Kamal and Alan Wagner. Service Oriented Programming in MPI. In Communicating Process Architectures pp ISBN: Open Channel Publishing Ltd., England., August H. Kamal and A. Wagner. Added concurrency to improve MPI performance on multicore. In 41st International Conference on Parallel Processing (ICPP), pages , H. Kamal and A. Wagner. An integrated runtime scheduler for MPI. In J. Traff, S. Benkner, and J. Dongarra, editors, Recent Advances in the Message Passing Interface, volume 7490 of Lecture Notes in Computer Science, pages Springer Berlin Heidelberg, H. Kamal, S.M. Mirtaheri, and A. Wagner. Scalability of communicators and groups in MPI. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC 2010, pages , New York, NY, USA, H. Kamal and A. Wagner. FG-MPI: Fine-Grain MPI for multicore and clusters. In 11th IEEE Intl. Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC) held in conjunction with IPDPS-24, pages 1-8, April 2010.