Download presentation
Presentation is loading. Please wait.
Published byEleanor Norris Modified over 9 years ago
1
Fortress Aaron Becker Abhinav Bhatele Hassan Jafri 2 May 2006
2
Design Philosophy of Fortress Goals: –High performance –Extensibility –Mathematical notation –Ease of use/Productivity (type safety, type inference, abstraction, large standard library) Note: most of this has nothing to do with parallel programming
3
C:Java :: Fortran:Fortress Security model with type safety Platform independence Large libraries Just in time compilation
4
Small Language, Big Libraries Want to keep language small, simple and extensible but still provide lots of functionality Solution: big libraries with extensive operator overloading and templating
5
Parallel Programming Data model: global address space Control flow: multithreaded, new threads spawned dynamically Memory accesses are transactional –Atomic blocks –No locks –Explicit tests for failure/retry
6
Application Code is Simple Goal: Parallelism is mostly abstracted away in libraries in the common case (matrix multiply, fft, etc) Runtime will dynamically choose data layout and algorithm based on program characteristics
7
Mathematical Notation Goal: code that looks like math For computational code, want programs that look like the math they’re based on Example: NAS conjugate gradient
8
NAS CG Serial Code
9
NAS CG Specification
10
Fortress CG Code conjGrad[/Elt extends Number, nat N, Mat extends Matrix[/Elt,N BY N/], Vec extends Vector[/Elt,N/] /](A: Mat, x: Vec): (Vec, Elt) cgitmax = 25 z: Vec = 0 r: Vec = x p: Vec = r rho: Elt = r^T r for j <- seq(1:cgit_max) do q = A p alpha = rho / p^T q z := z + alpha p r := r - alpha q rho0 = rho rho := r^T r beta = rho / rho0 p := r + beta p end (z, ||x – A z||) (z,norm) = conjGrad(A,x)
11
Implicit Parallelism In Fortress, loops are parallel by default for (i,j)←a.indices do a[i,j] := b[i] c[j] end Reducers (sum, max, etc) are defined in libraries and may have parallel or serial implementations y = SUM[k←1:n] a[k] x^k
12
Parallelism Tuple expressions: Each component of the tuple is evaluated in a separate implicit thread. Method calls: The receiver and each of the arguments is evaluated in a separate implicit thread. Function applications: The function and each of the arguments is evaluated in a separate implicit thread. for loops: Each iteration of the loop is evaluated in a separate implicit thread, unless a sequential generator is used Comprehensions: Each element of the comprehension is evaluated in a separate implicit thread, unless a sequential generator is used. The comprehension as a whole corresponds to a reduction Sums and other big operators: Each element of the sum (etc.) is evaluated in a separate implicit thread; the sum as a whole corresponds to a reduction.
13
Parallelism Generators –Express parallel iterations –Generator lists in for loops, comprehensions and reductions (v1, v2,….) <- g Where g is generator list –Common generators l : u Any range expression a.indices() The index set of an array a {0, 1, 2, 3} The elements of an aggregate expression sequential (g) A sequential version of generator g –Example for i <- sequential(1:n) do
14
Data Distribution “Compiler, RTE, profiler, library (and programmer?) cooperate to compute optimized data layouts” Programmer doesn’t even know number of processors! Many data distributions are implemented at the library level
15
Distributions conjGrad[[E extends Numeric]] (A: Sparse[[E, nxn]], x: E[n]): (E[n], E) = do cgitmax = 25 z: E[n] := A.distribution.array 〚 n 〛 (0) r: E[n] := x.copy(distribution = A.distribution) p: E[n] := r.copy() rho: E := r·r for j ← sequential(0#cgitmax) do q = A p alpha = rho / p·q z := z + α p r := r - α q rho0 = rho rho := r·r beta := rho /rho0 p := r + beta. p end (z, ||x - A z||) end
16
Regions Regions describe hierarchical structure of machine (each node is a region): Tree is limited –Can not describe all architectures (e.g. grid layout)
17
Distributed Arrays Arrays are spread out on the machine Default distribution is determined by fortress library. Distributions can be used for customized parallelism
18
Shared and Local Data Objects should be considered to be local by default. The sharedness of an object may change on the fly. If an object is transitively reachable from more than one running thread at a time, it must be shared. When a reference to a local object is stored into a field of a shared object, the local datum must be published. Its sharedness is changed to shared, and all of the data to which it refers is also published. Local variables referenced by a thread must be published before that thread may be run in parallel with the thread which spawned it. A field with value type is assigned by copying, and thus has the sharedness of the containing object or closure.
19
Current State of Language Language still in a state of flux Running simple programs in an interpreter Working on running Fortress on the JavaTM Virtual Machine Prototype of distributions as a Java library Spinning up library effort Proving type soundness
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.