Presentation is loading. Please wait.

Presentation is loading. Please wait.

Approches fonctionnelles de la programmation parallèle

Similar presentations


Presentation on theme: "Approches fonctionnelles de la programmation parallèle"— Presentation transcript:

1 Approches fonctionnelles de la programmation parallèle
Frédéric Gava Sous la direction de Frédéric Loulergue Approches fonctionnelles de la programmation parallèle et des méta-ordinateurs Sémantiques, implantations et certification

2 Background Parallel programming Implicit Explicit Concurrent
Automatic parallelization Skeletons Data-parallelism Parallel extensions

3 Projects 2002-2004 ACI Grid 4 partners
Design of parallel and grid libraries of primitives for OCaml with applications to distributed SGBD and numeric computations ACI Young researchers Production of a programming environment in which certified parallel programs can be written, proved and safely executed

4 Outline Introduction Semantics of BSML and certification Extensions
New primitives : parallel composition & parallel IO Library of parallel data structures Globalized operations Conclusion and future work

5 Introduction

6 The BSP model BSP architecture: Characterized by:
Synchronization unit P/M Network Characterized by: p number of processors r processors speed L global synchronization g communication phase (1 word at most sent or received by each processor)

7 T(s) = (max0i<p wi) + hg + L
BSP model of execution T(s) = (max0i<p wi) + hg + L

8 The BSML language -calculus ML BS-calculus Parallel constructions BSML Parallel primitives Structured parallelism as an explicit parallel extension of ML Functional language with BSP cost predictions Allows the implementation of skeletons Implemented as a parallel library for the "Objective Caml" language Using a parallel data structure called parallel vector

9 A BSML program fp-1 … f1 f0 gp-1 … g1 g0 Replicated part Parallel part
Sequential part

10 Asynchronous primitives
mkpar: (int  )   par f (p-1) (f 1) (f 0) (mkpar f ) apply: (  ) par   par   par fp-1 f1 f0 vp-1 v1 v0 fp-1 vp-1 f1 v1 f0 v0 apply

11 Synchronous primitives
put: (int option) par(int option) par None Some v4 Some v1 Some v3 Some v5 Some v2 3 2 1 put proj:  option par(int option) vp-1 v1 v0 proj f such that (f i)=vi

12 Semantics and certification

13 Outline Natural semantics Small steps semantics Distributed semantics
Programming model Easy for proofs Natural semantics Small steps semantics Easy for costs Distributed semantics Make asynchronous steps appear Abstract machine Execution model Close to a real implementation

14 Mini language e ::= l.e functional core language | (e e) | …
Expression of our mini language : e ::= l.e functional core language | (e e) | … | (mkpar e) parallel primitives | <e, e, … , e> parallel vector | (e)[s] substitution | l.e[s] closure

15 Natural semantics Confluent
Semantics = set of axioms and inference rules Easy to understand, makes proofs more easy Example: Confluent

16 Small steps semantics Confluent (costs and values)
Local costs Semantics = set of rewriting rules Using contexts for the strategy Easier understanding of costs and errors Example: Global cost Confluent (costs and values) Equivalent to the previous semantics

17 Distributed semantics
Semantics = set of parallel rewriting rules SPMD style: Parallel vector Parts of the Parallel vector Small steps scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec in scan' 0 (bsp_p()-1) op vec scan op vec = (super (fun()->scan' fst mid Prog Distributed evaluation scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec in scan' 0 (bsp_p()-1) op vec scan op vec = (super (fun()->scan' fst mid Prog scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec in scan' 0 (bsp_p()-1) op vec scan op vec = (super (fun()->scan' fst mid Prog scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec in scan' 0 (bsp_p()-1) op vec scan op vec = (super (fun()->scan' fst mid Prog Confluent Equivalent to the previous semantics

18 Synchronous instruction
Abstract machine BSP-CAM = p*CAM + BSP instructions (style SPMD) PUSH SWAP PID CONS APP SEND CAM COMMUNICATIONS PID of the machine for mkpar Synchronous instruction for put Minimal set of parallel instructions Equivalence with the distributed semantics

19 Certification of BSML programs
The Coq Proof assistant: Typed-calculus with dependent types Specification = term (goal) Language of tactics to build a proof of this goal Extraction of the proof (certified program) BSML and Coq : Axiomatization of the primitive semantics in Coq Proof of BSML programs as usual proof of ML programs Certification and extraction of BSML programs: Broadcast, total exchange … Prefixes Sort

20 Example: replicate Specification of replicate: intros T a.
exists (mkpar T (fun pid: Z  a)). rewrite mkpar_def. Certified extraction: let replicate a = mkpar (fun pid  a)

21 Extensions and parallel data structures

22 Parallel Data-structures
Outline New primitive Divide-and-conquer Properties Parallel composition Confluent semantics Two equivalent semantics Implemented with BSML Parallel Data-structures Simplify programming OCaml interfaces Load-balancing External memory (IO) New primitives New cost model Property Confluent semantics

23 Multiprogramming Several programs on the same machine
New primitives for parallel composition: Superposition Juxtaposition (implemented with the superposition) Divide-and-conquer BSP algorithms

24 Parallel superposition
super : (unit  )  (unit  b)    b super E1 E2 = (E1 (), E2()) Fusion of communications/synchronization Preserves the BSP model Pure functional semantics

25 Parallel superposition
Confluent BSP Equivalence

26 Example: parallel prefixes
Direct version (BSML+MPI) Superposition version Juxtaposition version Time(s) Size of the polynomials

27 Parallel data structures
Observations: Data Structures are as important as algorithms Symbolic computations use these data structures massively A parallel implementation of data structures: Interfaces as close as possible to the sequential ones Modular implementation to get a straightforward maintenance Load-balancing of the data

28 Parallel data structures
5 modules: Set, Map, Stack, Queue, Hashtable Interfaces: Same as in OCaml With some specific parallel functions such as parallel reductions A parallel data structure = one data structure on each processor Manual or Automatic load-balancing: To get similar sizes of the local data structures Better performances for parallel iterations A two super-steps algorithm using histograms

29 Example Computation of the “nth” nearest neighbors atom in a molecule : Sequential version Parallel version (BSML+PUB) Time(s) Number of atoms

30 Example with load balancing
Without balancing With balancing Time(s) Number of atoms

31 External memories Motivations : Measured Predicted Time(s)
Number of elements

32 The EM-BSP model Disc 1 Processor Bus Disc 2 Memory Disc D P/M Network
We add to the BSP model: D = the number of disks B = the size of the blocs O = latency of the disks G = time to read/write a byte

33 Shared disks Disc 1 Disc 2 Disc M P/M Network We add to the BSP model:
With parameters similar to those of the local disks

34 External memory in BSML
For safety, two kinds of files: local and global ones New primitives to manipulate these files (IO primitives) New semantics Confluent EM-BSP cost of the primitives

35 Modular implementation
BSMLlib Primitives Std library Comm Super IO Parallel data structures Lower level PUB MPI TCP/IP Threads

36 Cost prediction Lists Arrays Predicted (max) Predicted (avg) Time(s)
Number of elements

37 IO cost prediction Predicted BSML Predicted BSML-IO Measured BSML-IO
Time(s) Number of elements

38 Globalized operations

39 + Outline DMML BSML MSPML Semantics Cost models Implementations
Desynchronize Semantics Cost models Implementations

40 MSPML Using the MPM model (parameters similar to that of BSP)
But with a different execution model: Same language as BSML (parallel vector) but with new primitives of communication: put  mget

41 MSPML Natural semantics Small steps semantics Distributed semantics
Similar to BSML Programming model Easy for proofs Natural semantics Small steps semantics Similar to BSML Easy for costs Distributed semantics Very different Execution model Makes asynchronous steps appear

42 Asynchronous communications
Proc 0,v’’ 0,v’ 0,v Empty Local computation A bit later request 0 1 get v 1 v’ communication Environment of Communications

43 Asynchronous communications
Proc 0,v’’ 1,w’ 2,w’’ 0,v’ 0,v’ 1,w 0,v empty Not ready request 2 0

44 Departmental meta-computing
BSML MSPML Intranet BSML BSML

45 Departmental Meta-computing ML
BSML+ MSPML-like for coordination Two kinds of vectors: parallel vector: a par departmental vectors: a dep Operational semantics (confluent) Performance model (the DMM model) Implementation

46 Example: departmental prefixes
Computation of the prefixes where each processor contains a value Naive method: each processor sends its value to other processors Better method: Each BSP unit computes a parallel prefix One processor of each BSP unit receives values of other units Each BSP unit finishes its computation with this value

47 Experiments Naive algorithm BSP algorithm (one cluster)
Better algorithm Time(s) Size of the polynomials

48 Conclusion and future work

49 Conclusion Semantics of BSML: Expressivity: Meta-computing: Semantics
Confluent and equivalent semantics Abstract machine Proof of BSML programs Expressivity: Parallel composition Parallel data structures Parallel IO Meta-computing: Desynchronization of BSML (MSPML) Departmental Meta-computing ML (DMML) Semantics Cost models Implementations

50 Future work in the Propac project
Cost prediction: Static analysis of the programs Cost prediction of certified programs Proofs of BSP imperative programs: Coq Program correction BSML IMP ML Extension with BSP operations Extension of the logical assertions

51 Vérification efficace par Interaction de Techniques (VITE)
Design of parallel model checkers for High-level Petri Nets Using BSML to implement a toolkit: Using the BSP model to dynamically load-balance Using a modular and generic implementation to ease the use of this toolkit Using the Propac tools to certify this implementation

52 Merci de votre attention

53 BSML and MSPML BSML MSPML MPM BSP Natural semantics
Proofs of programs (with Coq) BSP MPM Natural semantics PUB MPI TCP/IP Small steps semantics Distributed semantics CAM Programming model Usefull for costs Execution model

54 Petri nets State Place Transition Token Arc

55 Parallel Implementation
Propac High Level Semantics Parallel Semantics BSML Distributed evaluation Nat Step Distr Sequential Implemen- tation Coq Axioma- tisation Abstract Machines Design of BSP-CAM Parallel Implementation Proofs of BSML programs Performance model Dynamic cost analysis


Download ppt "Approches fonctionnelles de la programmation parallèle"

Similar presentations


Ads by Google