Presentation is loading. Please wait.

Presentation is loading. Please wait.

Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore.

Similar presentations


Presentation on theme: "Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore."— Presentation transcript:

1 Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore Workshop June 21, 2007

2 2 Panelists David August - Princeton University Saman Amarasinghe - Massachusetts Institute of Technology Guy Blelloch - Carnegie Mellon University Charles Leiserson - Massachusetts Institute of Technology Uzi Vishkin - University of Maryland, College Park

3 3 Architectural Challenges Significant parallelism Multiple kinds of parallelism —cores —ILP —SIMD Diversity of cores Run-time throttling of cores for power mgmt Memory hierarchy —bandwidth –near term: will continue to be a significant bottleneck –long term: 3D stacked memory? —long and often non-uniform memory latencies —scratch pads

4 4 Roles of Parallel Programming Models Enhance programmer productivity through abstraction Manage platform resources to deliver performance Provide standard interface for platform portability

5 5 The Goal Simpler ways of conceptualizing, expressing, debugging, and tuning scalable parallel programs Multiple models will be necessary Models will necessarily trade off simplicity, expressivity, relevance to legacy code, and performance

6 6 To Succeed, Parallel Programming Models Must … Be ubiquitous —cross platform —at a minimum: laptops, SMP servers —distributed memory clusters? Be expressive Be productive —easy to write —easy to read and maintain —easy to reuse Have a promise of future availability and longevity Be efficient Be supported by tools

7 7 Simplifying Parallel Programming A high-level parallel language should … Provide global address space —beware exposed buffering … Separate concerns: partitioning, mapping, and synchronization vs. algorithm specification —“viscosity” comes from premature mingling of these issues Enable programmer to manage locality at a high level —locality = performance —affinity between data and computation –e.g. HPF’s “ON HOME” declarations

8 8 Design Issues I Ultimate control vs. simplicity of use —“library developers” vs. “productivity users” –should it be the same language for both? extensible language model (Sun’s Fortress) kitchen sink model (X10) Implicit vs. explicit parallelism —implicit parallelism is often more malleable —better supports dynamic adaptation Compiler assisted vs. compiler-centric —Co-array Fortran and UPC –user control over work decomposition, data movement, and synchronization —HPF: compiler must deliver or all is lost Lazy vs. eager parallelism —Cilk’s lazy parallelism provides a model for “scalable” binaries —eager parallelism adds unnecessary overhead

9 9 Design Issues II Deterministic vs. non-deterministic models —deterministic “clocked final model” –Saraswat et al. (www.saraswat.org/cf.pdf) Static vs. dynamic scheduling —dynamic scheduling will be increasingly important –irregular computations, task parallelism –adaptive scheduling in response to “core throttling” Cooperative vs. independent scheduling of work —does benefit of shared cache outweigh difficulty of using it? –tightly synchronous vs. more loosely synchronous Scalable to distributed-memory ensembles? —broad community probably only cares about tightly-coupled platforms —some government and industry clients will always have extreme needs Importance of managing affinity between cores and data —important for highest efficiency for library developers

10 10 Transactions are not “THE” Answer Transactions are a piece of the puzzle: atomicity Other aspects of the parallel programming problem —identifying concurrency —partitioning work —ordering actions

11 11 Autotuning Seductive idea Very successful as a library-based approach —FFTW, Atlas, OSKI, … Much work needed to apply to applications rather than kernels —huge search space –progress in effective truncated search —model guidance can be effective —autotuning for parallelism –dangerously close to automatic parallelization

12 12 Rice Experience: Lessons from HPF Good data and computation partitionings are essential —without good partitionings, parallelism suffers —flexible user-control is essential Excess communication undermines scalability —both frequency and volume must be right —embrace user hints to guide communication placement and optimization –e.g. HPF/JA directives: REFLECT, LOCAL, PIPELINE, etc. Single processor efficiency is critical —must use caches effectively on microprocessors —Icache: beware of complex machine-generated code —Dcache: beware of communication footprint Optimizing tightly-coupled algorithms can be hard —if the compiler doesn’t optimize it, performance may be doomed!

13 13 Rice Experience: HPF vs. Co-array Fortran Rice dHPF - a decade of investment in compiler technology —not quite, govt cut funding here too, just like architecture —polyhedral code generation models (like Lethin described) Co-array Fortran for clusters —a few years effort by a pair of students Result: Co-array Fortran bests HPF —more expressive —higher performance —shorter time to solution —currently, can be HARDER to program than MPI

14 14 Principal Compiler and Runtime Challenges Exploiting multiple levels of heterogeneous parallelism Choreographing parallelism, data movement, synchronization Managing memory hierarchy —cache —scratch pad Warning: Don’t try this at home.

15 15 Programming Model Ecosystem Issues Semantic mismatch between programming model and execution model Debugging: data races and non-determinism Performance analysis: why isn’t performance scaling —insufficient parallelism —parallelism is too fine grain to be efficient —architecture level issues, e.g., false sharing

16 16 A Path Forward Kernel, benchmark, and application driven studies —assess strengths and weaknesses of models Explore alternatives & evaluate their effects on —simplicity —expressiveness —correctness —performance


Download ppt "Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore."

Similar presentations


Ads by Google