Presentation is loading. Please wait.

Presentation is loading. Please wait.

Frédéric Gava Bulk-Synchronous Parallel ML

Similar presentations


Presentation on theme: "Frédéric Gava Bulk-Synchronous Parallel ML"— Presentation transcript:

1 Frédéric Gava Bulk-Synchronous Parallel ML
Semantics and Implementation of the Parallel Juxtaposition

2 BSML Background Parallel programming Implicit Explicit Concurrent
Automatic parallelization skeletons Data-parallelism Parallel extensions

3 Projects 2002-2004 ACI Grid LIFO, LACL, PPS, INRIA
Design of parallel and Grid librairies for OCaml. ACI « Young researchers » LIFO, LACL Production of a programming environment in which certified parallel programs can be written and safely executed.

4 Outline The BSML language Parallel compositions
Superposition : types and semantics Juxtaposition : types and semantics Implementation of the juxtaposition Conclusion and future works

5 The BSML language

6 Unit of synchronization
The BSP model BSP architecture: Unit of synchronization P/M Network Characterized by: p Number of processors r Processors speed L Global synchronization g Phase of communication (1 word at most sent of received by each processor)

7 T(s) = (max0i<p wi) + hg + L
Model of execution T(s) = (max0i<p wi) + hg + L

8 Example : broadcast cost = png + L cost = 2ng + 2L
Direct broadcast: cost = png + L Broadcast with 2 phases : cost = 2ng + 2L

9 The BSML language -calculus ML BS-calculus Parallel constructions BSML Parallel primitives Structured parallelism as an explicit parallel extension of ML Functional language with BSP cost predictions Allows the implementation of skeletons Implemented as a parallel library for the "Objective Caml" language Using a parallel data structure called parallel vector

10 A BSML program fp-1 … f1 f0 gp-1 … g1 g0 Replicated part Parallel part
Sequential part

11 Parallel primitives of BSML
Asynchronous primitives: Creation of a vector mkpar : (int  )   par Parallel point-wize application apply : (  ) par   par   par Synchronous and communications primitives: Communications put : (int option) par(int option) par Projection of values proj :  option par(int option)

12 Semantics Natural semantics Small-steps semantics
Programming model Easy for proofs (Coq) Natural semantics Small-steps semantics Easy for costs Distributed semantics Execution model Make asynchronous steps appear Close to a real implemantation

13 Parallel compositions

14 Multi-programming Several programs on the same machine
New primitives of parallel composition: Superposition Juxtaposition (implanted with the superposition) Divide-and-conquer BSP algorithms

15 Parallel Superposition
super : (unit  )  (unit  b)    b super E1 E2  (E1 (), E2()) Fusion of communications/synchronisations using super-threads Keep the BSP model Pure functional semantics

16 Parallel Superposition

17 Parallel juxtaposition
juxta : int(unit par)(unit  par)   par Fusion of communications/synchronisations on each sub-machine Keep the BSP model Side-effect on the number of processors v 0 v 1 v m-1 v i v’ 0 v’ 1 v’ p-1-m v’ j Juxta m v 0 v m-1 v i v’ 0 v’ p-1-m v’ j =

18 Parallel juxtaposition
Communications Synchronisation E2 Communications Synchronisation E1 Communications Synchronisation E3 = (juxta 3 E1 E2)

19 Distributed semantics
Semantics = set of parallel rewriting rules SPMD style: Parallel vector Parts of the parallel vector Natural scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec in scan' 0 (bsp_p()-1) op vec scan op vec = (super (fun()->scan' fst mid Prog Distributed evaluation scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec in scan' 0 (bsp_p()-1) op vec scan op vec = (super (fun()->scan' fst mid Prog scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec in scan' 0 (bsp_p()-1) op vec scan op vec = (super (fun()->scan' fst mid Prog scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec in scan' 0 (bsp_p()-1) op vec scan op vec = (super (fun()->scan' fst mid Prog Confluent Equivalent

20 Implementation of the juxtapositon

21 Use of the superposition
2 references that contain the number of processors of a sub-machine and the real PID of the virtual processor 0 (on a sub-machine) Creation of uncompleted vectors Each sub-machine in a super-thread

22 Example, parallel prefixes
scan: ()   par   par scan (+) <v0, …, vp-1> = <v0, v0+v1, …, v0+v1+…+vp-1> a c e g op a b op c d op e f op g h Processors op v v’ v

23 Juxta versu Super Code of a direct method : 12 lines
Code with superposition : 8 lines Code with juxtaposition : 6 lines

24 Performances Time (s) Direct method (BSML+MPI)
D-a-C method with superposition D-a-C method with juxtaposition Time (s) Size of the polynomials

25 Conclusion and future works

26 Conclusion BSML=BSP+ML
Superposition = primitive of parallel composition Juxtaposition is easier for divide-and-conquer algorithms Distributed semantics of the juxtaposition Juxtaposition implemented using superposition Similar performances

27 Future works Proofs of the implementation using semantics
Implentation of bigger algorithms BSP model-checking of high-level Petri-nets (M-nets)

28 Thanks for your attention


Download ppt "Frédéric Gava Bulk-Synchronous Parallel ML"

Similar presentations


Ads by Google