Download presentation
Presentation is loading. Please wait.
Published byValdomiro Laranjeira Canto Modified over 6 years ago
1
Approches fonctionnelles de la programmation parallèle
Frédéric Gava Sous la direction de Frédéric Loulergue Approches fonctionnelles de la programmation parallèle et des méta-ordinateurs Sémantiques, implantations et certification
2
Background Parallel programming Implicit Explicit Concurrent
Automatic parallelization Skeletons Data-parallelism Parallel extensions
3
Projects 2002-2004 ACI Grid 4 partners
Design of parallel and grid libraries of primitives for OCaml with applications to distributed SGBD and numeric computations ACI Young researchers Production of a programming environment in which certified parallel programs can be written, proved and safely executed
4
Outline Introduction Semantics of BSML and certification Extensions
New primitives : parallel composition & parallel IO Library of parallel data structures Globalized operations Conclusion and future work
5
Introduction
6
The BSP model BSP architecture: Characterized by:
Synchronization unit P/M Network Characterized by: p number of processors r processors speed L global synchronization g communication phase (1 word at most sent or received by each processor)
7
T(s) = (max0i<p wi) + hg + L
BSP model of execution T(s) = (max0i<p wi) + hg + L
8
The BSML language -calculus ML BS-calculus Parallel constructions BSML Parallel primitives Structured parallelism as an explicit parallel extension of ML Functional language with BSP cost predictions Allows the implementation of skeletons Implemented as a parallel library for the "Objective Caml" language Using a parallel data structure called parallel vector
9
A BSML program fp-1 … f1 f0 gp-1 … g1 g0 Replicated part Parallel part
Sequential part
10
Asynchronous primitives
mkpar: (int ) par f (p-1) … (f 1) (f 0) (mkpar f ) apply: ( ) par par par fp-1 … f1 f0 vp-1 v1 v0 fp-1 vp-1 f1 v1 f0 v0 apply
11
Synchronous primitives
put: (int option) par(int option) par None Some v4 Some v1 Some v3 Some v5 Some v2 3 2 1 put proj: option par(int option) vp-1 … v1 v0 proj f such that (f i)=vi
12
Semantics and certification
13
Outline Natural semantics Small steps semantics Distributed semantics
Programming model Easy for proofs Natural semantics Small steps semantics Easy for costs Distributed semantics Make asynchronous steps appear Abstract machine Execution model Close to a real implementation
14
Mini language e ::= l.e functional core language | (e e) | …
Expression of our mini language : e ::= l.e functional core language | (e e) | … | (mkpar e) parallel primitives | <e, e, … , e> parallel vector | (e)[s] substitution | l.e[s] closure
15
Natural semantics Confluent
Semantics = set of axioms and inference rules Easy to understand, makes proofs more easy Example: Confluent
16
Small steps semantics Confluent (costs and values)
Local costs Semantics = set of rewriting rules Using contexts for the strategy Easier understanding of costs and errors Example: Global cost Confluent (costs and values) Equivalent to the previous semantics
17
Distributed semantics
Semantics = set of parallel rewriting rules SPMD style: Parallel vector Parts of the Parallel vector Small steps scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec in scan' 0 (bsp_p()-1) op vec scan op vec = (super (fun()->scan' fst mid Prog Distributed evaluation scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec in scan' 0 (bsp_p()-1) op vec scan op vec = (super (fun()->scan' fst mid Prog scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec in scan' 0 (bsp_p()-1) op vec scan op vec = (super (fun()->scan' fst mid Prog scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec in scan' 0 (bsp_p()-1) op vec scan op vec = (super (fun()->scan' fst mid Prog Confluent Equivalent to the previous semantics
18
Synchronous instruction
Abstract machine BSP-CAM = p*CAM + BSP instructions (style SPMD) PUSH SWAP PID CONS APP SEND CAM COMMUNICATIONS PID of the machine for mkpar Synchronous instruction for put Minimal set of parallel instructions Equivalence with the distributed semantics
19
Certification of BSML programs
The Coq Proof assistant: Typed-calculus with dependent types Specification = term (goal) Language of tactics to build a proof of this goal Extraction of the proof (certified program) BSML and Coq : Axiomatization of the primitive semantics in Coq Proof of BSML programs as usual proof of ML programs Certification and extraction of BSML programs: Broadcast, total exchange … Prefixes Sort
20
Example: replicate Specification of replicate: intros T a.
exists (mkpar T (fun pid: Z a)). rewrite mkpar_def. Certified extraction: let replicate a = mkpar (fun pid a)
21
Extensions and parallel data structures
22
Parallel Data-structures
Outline New primitive Divide-and-conquer Properties Parallel composition Confluent semantics Two equivalent semantics Implemented with BSML Parallel Data-structures Simplify programming OCaml interfaces Load-balancing External memory (IO) New primitives New cost model Property Confluent semantics
23
Multiprogramming Several programs on the same machine
New primitives for parallel composition: Superposition Juxtaposition (implemented with the superposition) Divide-and-conquer BSP algorithms
24
Parallel superposition
super : (unit ) (unit b) b super E1 E2 = (E1 (), E2()) Fusion of communications/synchronization Preserves the BSP model Pure functional semantics
25
Parallel superposition
Confluent BSP Equivalence
26
Example: parallel prefixes
Direct version (BSML+MPI) Superposition version Juxtaposition version Time(s) Size of the polynomials
27
Parallel data structures
Observations: Data Structures are as important as algorithms Symbolic computations use these data structures massively A parallel implementation of data structures: Interfaces as close as possible to the sequential ones Modular implementation to get a straightforward maintenance Load-balancing of the data
28
Parallel data structures
5 modules: Set, Map, Stack, Queue, Hashtable Interfaces: Same as in OCaml With some specific parallel functions such as parallel reductions A parallel data structure = one data structure on each processor Manual or Automatic load-balancing: To get similar sizes of the local data structures Better performances for parallel iterations A two super-steps algorithm using histograms
29
Example Computation of the “nth” nearest neighbors atom in a molecule : Sequential version Parallel version (BSML+PUB) Time(s) Number of atoms
30
Example with load balancing
Without balancing With balancing Time(s) Number of atoms
31
External memories Motivations : Measured Predicted Time(s)
Number of elements
32
The EM-BSP model Disc 1 Processor Bus Disc 2 Memory Disc D P/M Network
We add to the BSP model: D = the number of disks B = the size of the blocs O = latency of the disks G = time to read/write a byte
33
Shared disks Disc 1 Disc 2 Disc M P/M Network We add to the BSP model:
With parameters similar to those of the local disks
34
External memory in BSML
For safety, two kinds of files: local and global ones New primitives to manipulate these files (IO primitives) New semantics Confluent EM-BSP cost of the primitives
35
Modular implementation
BSMLlib Primitives Std library Comm Super IO Parallel data structures Lower level PUB MPI TCP/IP Threads
36
Cost prediction Lists Arrays Predicted (max) Predicted (avg) Time(s)
Number of elements
37
IO cost prediction Predicted BSML Predicted BSML-IO Measured BSML-IO
Time(s) Number of elements
38
Globalized operations
39
+ Outline DMML BSML MSPML Semantics Cost models Implementations
Desynchronize Semantics Cost models Implementations
40
MSPML Using the MPM model (parameters similar to that of BSP)
But with a different execution model: Same language as BSML (parallel vector) but with new primitives of communication: put mget
41
MSPML Natural semantics Small steps semantics Distributed semantics
Similar to BSML Programming model Easy for proofs Natural semantics Small steps semantics Similar to BSML Easy for costs Distributed semantics Very different Execution model Makes asynchronous steps appear
42
Asynchronous communications
Proc 0,v’’ 0,v’ 0,v Empty Local computation A bit later request 0 1 get v 1 v’ communication Environment of Communications
43
Asynchronous communications
Proc 0,v’’ 1,w’ 2,w’’ 0,v’ 0,v’ 1,w 0,v empty Not ready request 2 0
44
Departmental meta-computing
BSML MSPML Intranet BSML BSML
45
Departmental Meta-computing ML
BSML+ MSPML-like for coordination Two kinds of vectors: parallel vector: a par departmental vectors: a dep Operational semantics (confluent) Performance model (the DMM model) Implementation
46
Example: departmental prefixes
Computation of the prefixes where each processor contains a value Naive method: each processor sends its value to other processors Better method: Each BSP unit computes a parallel prefix One processor of each BSP unit receives values of other units Each BSP unit finishes its computation with this value
47
Experiments Naive algorithm BSP algorithm (one cluster)
Better algorithm Time(s) Size of the polynomials
48
Conclusion and future work
49
Conclusion Semantics of BSML: Expressivity: Meta-computing: Semantics
Confluent and equivalent semantics Abstract machine Proof of BSML programs Expressivity: Parallel composition Parallel data structures Parallel IO Meta-computing: Desynchronization of BSML (MSPML) Departmental Meta-computing ML (DMML) Semantics Cost models Implementations
50
Future work in the Propac project
Cost prediction: Static analysis of the programs Cost prediction of certified programs Proofs of BSP imperative programs: Coq Program correction BSML IMP ML Extension with BSP operations Extension of the logical assertions
51
Vérification efficace par Interaction de Techniques (VITE)
Design of parallel model checkers for High-level Petri Nets Using BSML to implement a toolkit: Using the BSP model to dynamically load-balance Using a modular and generic implementation to ease the use of this toolkit Using the Propac tools to certify this implementation
52
Merci de votre attention
53
BSML and MSPML BSML MSPML MPM BSP Natural semantics
Proofs of programs (with Coq) BSP MPM Natural semantics PUB MPI TCP/IP Small steps semantics Distributed semantics CAM Programming model Usefull for costs Execution model
54
Petri nets State Place Transition Token Arc
55
Parallel Implementation
Propac High Level Semantics Parallel Semantics BSML Distributed evaluation Nat Step Distr Sequential Implemen- tation Coq Axioma- tisation Abstract Machines Design of BSP-CAM Parallel Implementation Proofs of BSML programs Performance model Dynamic cost analysis
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.