Download presentation
Presentation is loading. Please wait.
1
Yelick 1 ILP98, Titanium Titanium: A High Performance Java- Based Language Katherine Yelick Alex Aiken, Phillip Colella, David Gay, Susan Graham, Paul Hilfinger, Arvind Krishnamurthy, Ben Liblit, Carleton Miyamoto, Geoff Pike, Luigi Semenzato,
2
Yelick 2 ILP98, Titanium Talk Outline Motivation Extensions for uniprocessor performance Extensions for parallelism A framework for domain-specific languages Status and performance
3
Yelick 3 ILP98, Titanium Programming Challenges on Millennium Large scale computations Optimized simulation algorithms are complex Use of hierarchical parallel machine Cost-conscious programming Unstructured meshes Adaptive meshes Minimization algorithms ?
4
Yelick 4 ILP98, Titanium Titanium Approach Performance is primary goal –High uniprocessor performance –Designed for shared and distributed memory –Parallelism constructs with programmer control –Optimizing compiler for caches, communication scheduling, etc. Expressiveness secondary goal –Based on safe language: Java –Safety simplifies programming and compiler analysis –Framework for domain-specific language extensions
5
Yelick 5 ILP98, Titanium New Language Features Immutable classes Multidimensional arrays –also: points and index sets as first-class values –multidimensional iterators Memory management –semi-automated zone-based allocation Scalable parallelism –SPMD model of execution with global address space Language-level synchronization Support for grid-based computation
6
Yelick 6 ILP98, Titanium Java Objects Primitive scalar types: boolean, double, int, etc. –access is fast Objects: user-defined and from the standard library –has level of indirection (pointer to) implicit –arrays are objects –all objects can be checked for equality and a few other operations 3 true r: 7.1 i: 4.3
7
Yelick 7 ILP98, Titanium Immutable Classes in Titanium For small objects, would sometimes prefer –to avoid level of indirection –pass by value –extends the idea of primitive values (1, 4.2, etc.) to user-defined values Titanium introduces immutable classes –all fields are final (implicitly) –cannot inherit from (extend) or be inherited by other classes –needs to have 0-argument constructor, e.g., Complex () immutable class Complex {... } Complex c = new Complex(7.1, 4.3);
8
Yelick 8 ILP98, Titanium Arrays in Java Arrays in Java are objects Only 1D arrays are directly supported Array bounds are checked (as in Fortran) Multidimensional arrays as arrays of arrays are slow and cannot transform into contiguous memory
9
Yelick 9 ILP98, Titanium Titanium Arrays Fast, expressive arrays –multidimensional –lower bound, upper bound, stride –concise indexing: A[p] instead of A(i, j, k) Points –tuple of integers as primitive type Domains –rectangular sets of points (bounds and stride) –arbitrary sets of points Multidimensional iterators
10
Yelick 10 ILP98, Titanium Example: Point, RectDomain, Array Point lb = [1, 1]; Point ub = [10, 20]; RectDomain R = [lb : ub : [2, 2]]; double [2d] A = new double[R]; … foreach (p in A.domain()) { A[p] = B[2 * p]; } Standard optimizations: strength reduction common subexpression elimination invariant code motion removing bounds checks from body
11
Yelick 11 ILP98, Titanium Memory Management Java implemented with garbage collection –Distributed GC too unpredictable –Compile-time analysis can improve performance Zone-based memory management –extends existing model –good performance –safe –easy to use
12
Yelick 12 ILP98, Titanium Zone-Based Memory Management Zone Z1 = new Zone(); Z1 Zone Z2 = new Zone(); Z2 T x = new(Z1) T();x T y = new(Z2) T(); y x.field = y; x = y; delete Z1; delete Z2;// error Allocate objects in zones Release zones manually
13
Yelick 13 ILP98, Titanium Sequential Performance Times in seconds (lower is better).
14
Yelick 14 ILP98, Titanium Sequential Performance C/C++/ FORTRAN Java Arrays Titanium Arrays Overhead DAXPY 3D multigrid 2D multigrid EM3D 1.4s 12s 5.4s 0.7s1.8s1.0s42% 15% 83% 7% 6.2s 22s 1.5s6.8s On an Ultrasparc: C/C++/ RTFORAN Java Arrays Titanium Arrays Overhead DAXPY 3D multigrid 2D multigrid EM3D 1.8s 23.0s 7.3s 1.0s1.6s60% -25% -13% 27% 5.5s 20.0s 2.3s On a Pentium II:
15
Yelick 15 ILP98, Titanium Model of Parallelism Single Program, Multiple Data –fixed number of processes –each process has own local data –global synchronization (barrier) n processes... start barrier... end...
16
Yelick 16 ILP98, Titanium Global Address Space Each process has its own heap References can span process boundaries Class T { … } T gv; T lv = null; if (thisProc() == 0) { lv = new T(); // allocate locally } gv = broadcast lv from 0; // distribute … gv.field... Process 0 Other processes lv gv lv gv lv gv lv gv lv gv lv gv LOCAL HEAP
17
Yelick 17 ILP98, Titanium Global vs. Local References Global references may be slow –distributed memory: overhead of a few instructions when using a global reference to access a local object –shared memory: no performance implications Solution: use local qualifier –statically restrict references to local objects –example: T local lv = null; –use only in critical sections
18
Yelick 18 ILP98, Titanium Global Synchronization Analysis In Titanium, processes must synchronize at the same textual instances of barrier() doThis(); barrier(); boolean x = someCondition(); if (x) { doThat(); barrier(); } doSomeMore(); barrier();
19
Yelick 19 ILP98, Titanium Global Synchronization Analysis In Titanium, processes must synchronize at the same textual instances of barrier() Singleness analysis statically guarantees correctness by restricting the values of variables that control program flow doThis(); barrier(); boolean single x = someCondition(); if (x) { doThat(); barrier(); } doSomeMore(); barrier();
20
Yelick 20 ILP98, Titanium Support for Grid-Based Computation Point lb = [0, 0]; Point ub = [6, 4]; RectDomain R = [lb : ub : [2, 2]]; … Domain red = R + (R + [1, 1]); foreach (p in red) { … } (0, 0) (6, 4) R (1, 1) (7, 5) R + [1, 1] red (0, 0) (7, 5) Gauss-Seidel relaxation with red-black ordering
21
Yelick 21 ILP98, Titanium Implementation Strategy –compile Titanium into C (currently C++) –Posix threads for SMPs (currently Solaris threads) –Lightweight Active Messages for communication Status –runs on SUN Enterprise 8-way SMP –runs on Berkeley NOW –trivial ports to 1/2 dozen other architectures –tuning for sequential performance
22
Yelick 22 ILP98, Titanium Titanium Status Titanium language definition complete. Titanium compiler running. Compiles for uniprocessors, NOW; others soon. Application developments ongoing. Many research opportunities.
23
Yelick 23 ILP98, Titanium Applications Three-D AMR Poisson Solver (AMR3D) –block-structured grids with multigrid computation on each –2000 line program –algorithm not yet fully implemented in other languages –tests performance and effectiveness of language features Three-D Electromagnetic Waves (EM3D) –unstructured grids Several smaller benchmarks
24
Yelick 24 ILP98, Titanium Parallel Performance Numbers from Ultrasparc SMP Parallel efficiency good –EM3D (unstructured kernel) –3D AMR limited by algorithm Number of processors Speedup
25
Yelick 25 ILP98, Titanium New Compiler Analyses for Parallelism Analysis of synchronization –finds unmatched barriers, parallel code blocks –extends traditional control flow analysis Analysis of communication –reorder and pipeline memory operations without observed effect –extends traditional dependence analysis Analyses extended to domain-specific constructs –arrays indexed by domains of points –looping constructs provide summarize information
26
Yelick 26 ILP98, Titanium Future Directions Use of framework for domain-specific languages –Fluids and AMR done –Unstructured meshes and sparse solvers Better programming tools –debuggers, performance analysis Optimizations –analysis of parallel code and synchronization done –optimizations for caches on uniprocessors and SMPs underway –load balancing on clusters of SMPs
27
Yelick 27 ILP98, Titanium Conclusions Performance –sequential performance consistently close to C/FORTRAN »currently: 80% slower to 25% faster –sequential efficiency very high Expressiveness –safety of Java with small set of performance features –extensible to new application domains Portability, compatibility, etc. –no gratuitous departures from Java standard –compilation model easily supports new platforms
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.