Semantics of Minimally Synchronous Parallel ML Myrto Arapini, Frédéric Loulergue, Frédéric Gava and Frédéric Dabrowski LACL, Paris, France.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

MPI Message Passing Interface
MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
Christian Delbe1 Christian Delbé OASIS Team INRIA -- CNRS - I3S -- Univ. of Nice Sophia-Antipolis November Automatic Fault Tolerance in ProActive.
Inference of progress properties for (multi party) sessions Mario Coppo (Universita’ di Torino) joint work with Mariangiola Dezani, Nobuko Yoshida Lisbon,
Parallel Processing1 Parallel Processing (CS 667) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived.
Le modèle BSP Bulk-Synchronous Parallel Frédéric Gava.
Getting started with ML ML is a functional programming language. ML is statically typed: The types of literals, values, expressions and functions in a.
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
C. Varela; Adapted w/permission from S. Haridi and P. Van Roy1 Declarative Computation Model Kernel language semantics Carlos Varela RPI Adapted with permission.
Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.
High Performance Communication using MPJ Express 1 Presented by Jawad Manzoor National University of Sciences and Technology, Pakistan 29 June 2015.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Introduction to Parallel Processing 3.1 Basic concepts 3.2 Types and levels of parallelism 3.3 Classification of parallel architecture 3.4 Basic parallel.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Parallel & Concurrent Programming: Occam Vitaliy Lvin University of Massachusetts.
F. Gava, HLPP 2005 Frédéric Gava A Modular Implementation of Parallel Data Structures in Bulk-Synchronous Parallel ML.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Synchronization (Barriers) Parallel Processing (CS453)
1 Developing Native Device for MPJ Express Advisor: Dr. Aamir Shafi Co-advisor: Ms Samin Khaliq.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon.
Bulk Synchronous Parallel Processing Model Jamie Perkins.
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
Rio de Janeiro, October, 2005 SBAC Portable Checkpointing for BSP Applications on Grid Environments Raphael Y. de Camargo Fabio Kon Alfredo Goldman.
Jean-Sébastien Gay LIP ENS Lyon, Université Claude Bernard Lyon 1 INRIA Rhône-Alpes GRAAL Research Team Join work with DIET TEAM D istributed I nteractive.
Parallel Programming in Split-C David E. Culler et al. (UC-Berkeley) Presented by Dan Sorin 1/20/06.
MRPGA : An Extension of MapReduce for Parallelizing Genetic Algorithm Reporter :古乃卉.
Workflow Early Start Pattern and Future's Update Strategies in ProActive Environment E. Zimeo, N. Ranaldo, G. Tretola University of Sannio - Italy.
Chapter 3 Parallel Programming Models. Abstraction Machine Level – Looks at hardware, OS, buffers Architectural models – Looks at interconnection network,
Compactly Representing Parallel Program Executions Ankit Goel Abhik Roychoudhury Tulika Mitra National University of Singapore.
PAPP 2004 Gava1 Frédéric Gava Parallel I/O Bulk-Synchronous Parallel ML In.
TECHNIQUES FOR REDUCING CONSISTENCY- RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS J. B. Carter University of Utah J. K. Bennett and W. Zwaenepoel.
Compiling Several Classes of Communication Patterns on a Multithreaded Architecture Gagan Agrawal Department of Computer and Information Sciences Ohio.
The Vesta Parallel File System Peter F. Corbett Dror G. Feithlson.
A High Performance Middleware in Java with a Real Application Fabrice Huet*, Denis Caromel*, Henri Bal + * Inria-I3S-CNRS, Sophia-Antipolis, France + Vrije.
PMI: A Scalable Process- Management Interface for Extreme-Scale Systems Pavan Balaji, Darius Buntinas, David Goodell, William Gropp, Jayesh Krishna, Ewing.
October 11, 2007 © 2007 IBM Corporation Multidimensional Blocking in UPC Christopher Barton, Călin Caşcaval, George Almási, Rahul Garg, José Nelson Amaral,
August 2001 Parallelizing ROMS for Distributed Memory Machines using the Scalable Modeling System (SMS) Dan Schaffer NOAA Forecast Systems Laboratory (FSL)
Computing Simulation in Orders Based Transparent Parallelizing Pavlenko Vitaliy Danilovich, Odessa National Polytechnic University Burdeinyi Viktor Viktorovych,
Ronny Krashinsky Erik Machnicki Software Cache Coherent Shared Memory under Split-C.
Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.
DISTRIBUTED COMPUTING
Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo.
Gedae, Inc. Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004.
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
A Functional Language for Departmental Metacomputing Frederic Gava & Frederic Loulergue Universite Paris Val de Marne Laboratory of Algorithms, Complexity.
1 HPJAVA I.K.UJJWAL 07M11A1217 Dept. of Information Technology B.S.I.T.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
Parallel and Distributed Simulation Deadlock Detection & Recovery: Performance Barrier Mechanisms.
1 Hierarchical Parallelization of an H.264/AVC Video Encoder A. Rodriguez, A. Gonzalez, and M.P. Malumbres IEEE PARELEC 2006.
Operational Semantics Mooly Sagiv Reference: Semantics with Applications Chapter 2 H. Nielson and F. Nielson
DGrid: A Library of Large-Scale Distributed Spatial Data Structures Pieter Hooimeijer,
Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.
Parallel Programming Models EECC 756 David D. McGann 18 May, 1999.
A Parallel Communication Infrastructure for STAPL
Distributed Shared Memory
Programming Models for SimMillennium
Parallel Nested Monte-Carlo Search
Declarative Computation Model Kernel language semantics (Non-)Suspendable statements (VRH ) Carlos Varela RPI October 11, 2007 Adapted with.
CSCE569 Parallel Computing
Approches fonctionnelles de la programmation parallèle
Frédéric Gava Bulk-Synchronous Parallel ML Implementation of the
A Refinement Calculus for Promela
Frédéric Gava Bulk-Synchronous Parallel ML
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

Semantics of Minimally Synchronous Parallel ML Myrto Arapini, Frédéric Loulergue, Frédéric Gava and Frédéric Dabrowski LACL, Paris, France

F. GavaSNPD Outline Introduction Pure functional minimally synchronous parallel programming High level semantics Distributed evaluation Conclusions and future work

Introduction

F. GavaSNPD Our previous work: Bulk Synchronous Parallelism + Functional Programming = BSML Bulk Synchronous Parallelism : Scalability Portability Simple cost model Functional Programming : High level features (higher order functions, pattern matching, concrete types, etc.) Programs proofs Safety of the environment

F. GavaSNPD What is MSPML ? Why MSPML ? = Minimally Synchronous Parallel ML  BSML without synchronization barriers Comparison of efficiency BSML/MSPML Further extensions restricted or impossible in BSML (divide-and-conquer, nesting of parallel values) Basis for an additional level to BSML for Grid computing: BSML on each nodes (clusters) + MSPML for coordination

Pure Functional Minimally Synchronous Parallel Programming

F. GavaSNPD The MSPML Library Parallel ML library for the Objective Caml language operations on a parallel data structure Abtract type:  par access to the machine parameters: p: unit  int p() = number of processes

F. GavaSNPD Creation of Parallel Vectors mkpar: (int  )  par (mkpar f ) f (p-1)…(f 1)(f 0)

F. GavaSNPD Pointwise Parallel Application apply: (    ) par   par   par apply = f p-1 …f1f1 f0f0 v p-1 …v1v1 v0v0 f p-1 v p-1 …f 1 v 1 f 0 v 0

F. GavaSNPD Example (1) let replicate x = mkpar(fun pid->x) replicate: ’a -> ’a par (replicate 5) 5…55

F. GavaSNPD Example (2) let parfun f v = apply (replicate f) v parfun: (’a->’b)->’a par->’b par parfun (fun x->x+1) parallel_vector f v p-1 …v1v1 v0v0 f…ff apply = (f v p-1 )…(f v 1 )(f v 0 )

F. GavaSNPD Communication Operation: get get:  par  int par  par get v p-1 …v1v1 v0v0 i p-1 …i1i1 i 0 (=1) v i p-1 …vi1vi1 vi0vi0 =

F. GavaSNPD Example (3) let bcast_direct root parallel_vector = if not(within_bounds root) then raise Bcast_Error else get parallel_vector (replicate root) bcast_direct: int->’a par->’a par v p-1 …v1v1 v0v0 v root … ( bcast_direct root = )

F. GavaSNPD Global Conditional if parallel_vector at n then … else … if at n then E1 else E2 = E1 …truef p-1 …b1b1 b0b0 n

F. GavaSNPD Implementation of MSPML MSPML v 0.05 : F. Loulergue Library for Ocaml (uses threads and TCP/IP) October

High level semantics

F. GavaSNPD Terms Functional semantics: can be evaluated sequentially or in parallel Terms :

F. GavaSNPD Values and judgements Values: Judgement: e  v Term « e » evaluates to value « v »

F. GavaSNPD Some rules

Distributed evaluation

F. GavaSNPD Informal presentation (1) MSPML programs seen as SPMD programs The parallel machine runs p copies of the same MSPML program Rules for local reduction + rule for communication For example at processor i the expression (mkpar f) is evaluated to (f i)

F. GavaSNPD Informal presentation (2) Proc.012 empty 0,v’ Local computation Communication environment get v 1 0,v request 0 1v’ A BIT LATER 0,v’’

F. GavaSNPD Informal presentation (3) Proc.012 empty 0,v’ 0,v’’ 1,w’ 2,w’’ Local computation Communication environment 0,v 0,v’ 1,w request 2 0 Not Ready !!

F. GavaSNPD Terms and judgments New term: request i j the processor asks the value stored at processor j during the i th step Step = each step is ended by a call to get Judgment: At process i, the term e d with communication environment E c is evaluate to e’ d with new communication environment E’ c

F. GavaSNPD Local computation rules

F. GavaSNPD Communication rule

F. GavaSNPD Conclusion and Future Work Conclusion Minimally Synchronous Parallel ML: Functional semantics & Deadlock free Two semantics: High level semantics (programmer’s view) Distributed evaluation (implementation’s view) Implementation Future Work MPI Implementation Comparison with BSMLlib Extensions: parallel composition, etc.