Download presentation
Presentation is loading. Please wait.
1
CS 267 Spring 2008 Horst Simon UC Berkeley May 15, 2008 Code Generation Framework for Process Network Models onto Parallel Platforms Man-Kit Leung, Isaac Liu, Jia Zou Final Project Presentation
2
Leung, Liu,Zou 2 / 18CS267 Sp 08 Final PresentationUC Berkeley Outline Motivation Demo Code Generation Framework Application and Results Conclusion
3
Leung, Liu,Zou 3 / 18CS267 Sp 08 Final PresentationUC Berkeley Motivation Parallel programming is difficult… -Functional correctness -Performance debugging + tuning (Basically, trial & error) Code generation as a tool –Systematically explore implementation space –Rapid development / prototyping –Optimize performance –Maximize (programming) reusability –Correct-by-construction [E. Dijkstra ’70] –Minimize human errors (bugs) –Eliminates the need for low-level testing –Because, otherwise, manual coding is too costly Especially true for multiprocessors/distributed platforms
4
Leung, Liu,Zou 4 / 18CS267 Sp 08 Final PresentationUC Berkeley Higher-Level Programming Model Source Actor1 Sink Actor Source Actor2 Implicit Buffers Kahn Process Networks (KPNs) is a distributed model of computation (MoC) where a group of processing units are connected by communication channels to form a network of processes. –The communication channels are FIFO queues. –“The Semantics of a Simple Language For Parallel Programming” [GK ’74] Deterministic Inherently parallel Expressive
5
Leung, Liu,Zou 5 / 18CS267 Sp 08 Final PresentationUC Berkeley MPI Code Generation Workflow Analyze & annotate model Assume weights on edges & nodes Generate cluster info (buffer & grouping) Analyze & annotate model Assume weights on edges & nodes Generate cluster info (buffer & grouping) Generate MPI code SIMD (Single Instruction Multiple Data) Generate MPI code SIMD (Single Instruction Multiple Data) Execute code Obtain execution statistics for tuning Execute code Obtain execution statistics for tuning Partitioning (Mapping) Model Given a (KPN) Model Executable Code Generation
6
Leung, Liu,Zou 6 / 18CS267 Sp 08 Final PresentationUC Berkeley Demo The codegen facility is in the Ptolemy II nightly release - http://chess.eecs.berkeley.edu/ptexternal/nightly/
7
Leung, Liu,Zou 7 / 18CS267 Sp 08 Final PresentationUC Berkeley Partitioning (Mapping) Models Code Generation Executable Role of Code Generation Ptolemy II Platform-based Design [AS ‘02]
8
Leung, Liu,Zou 8 / 18CS267 Sp 08 Final PresentationUC Berkeley Implementation Space for Distributed Environment Mapping # of logical processing units # of cores / processors Network costs Latency Throughput Memory Constraint Communication buffer size Minimization metrics Costs Power consumption …
9
Leung, Liu,Zou 9 / 18CS267 Sp 08 Final PresentationUC Berkeley Partition Using node and edge weights abstractions Annotation on the model From the model, the input file to Chaco is generated. After Chaco produces the output file, the partitions are automatically annotated onto the model.
10
Leung, Liu,Zou 10 / 18CS267 Sp 08 Final PresentationUC Berkeley Multiprocessor Architectures Shared Memory vs. Message Passing –We want to generate code that will run on both kinds of architectures –Message passing: Message Passing Interface(MPI) as the implementation –Shared memory: Pthread implementation available for comparison UPC and OpenMP as future work
11
Leung, Liu,Zou 11 / 18CS267 Sp 08 Final PresentationUC Berkeley Pthread Implementation void Actor1 (void) {... } void Actor2 (void) {... } void Model (void) { pthread_create(&Actor1…); pthread_create(&Actor2…); pthread_join(&Actor1…); pthread_join(&Actor2…); } Model
12
Leung, Liu,Zou 12 / 18CS267 Sp 08 Final PresentationUC Berkeley MPI Code Generation Local buffers MPI send/recv MPI Tag Matching KPN Scheduling: Determine when actors are safe to fire Actors can’t block other actors on same partition Termination based on a firing count
13
Leung, Liu,Zou 13 / 18CS267 Sp 08 Final PresentationUC Berkeley Sample MPI Program main() { if (rank == 0) { Actor0(); Actor1(); } if (rank == 1) { Actor2(); }... } Actor#() { [1] MPI_Irecv(input); [2] if (hasInput && !sendBufferFull){ [3] output = localCalc(); [4] MPI_Isend(1, output); } }
14
Leung, Liu,Zou 14 / 18CS267 Sp 08 Final PresentationUC Berkeley Application
15
Leung, Liu,Zou 15 / 18CS267 Sp 08 Final PresentationUC Berkeley Execution Platform
16
Leung, Liu,Zou 16 / 18CS267 Sp 08 Final PresentationUC Berkeley Preliminary Results # cores MPI 500 Iter MPI 1000 Iter MPI 2500 Iter MPI 5000 Iter Pthread 500 Iter Pthread 1000 Iter Pthread 2500 Iter Pthread 5000 Iter 223.0 (ms)49.0137.6304.017.947.1182.0406.0 318.837.495.4195.0 419.438.397.5193.0
17
Leung, Liu,Zou 17 / 18CS267 Sp 08 Final PresentationUC Berkeley Conclusion & Future Work Conclusion -Framework for code generation to parallel platforms -Generate scalable MPI code from Kahn Process Network models Future Work -Target more platforms ( UPC, OpenMP etc) -Additional profiling techniques -Support more partitioning tools -Improve performance on generated code
18
Leung, Liu,Zou 18 / 18CS267 Sp 08 Final PresentationUC Berkeley Acknowledgments Edward Lee Horst Simon Shoaib Kamil Ptolemy II developers NERSC John Kubiatowicz
19
Leung, Liu,Zou 19 / 18CS267 Sp 08 Final PresentationUC Berkeley Extra slides
20
Leung, Liu,Zou 20 / 18CS267 Sp 08 Final PresentationUC Berkeley Why MPI Message passing –Good for distributed (shared-nothing) systems Very generic –Easy to set up –Required setup (i.e. mpicc and etc.) for one “master” –Worker nodes only need to have SSH Flexible (explicit) –Nonblocking + blocking send/recv Cons: required explicit syntax modification (as opposed to OpenMP, Erlang, and etc.) –Solution: automatic code generation
21
Leung, Liu,Zou 21 / 18CS267 Sp 08 Final PresentationUC Berkeley Actor-oriented design: a formalized model of concurrency object oriented actor oriented Actor-oriented design hides the states of each actor and makes them inaccessible from other actor The emphasis of data flow over control flow leads to conceptually concurrent execution of actors The interaction between actors happens in a highly disciplined way Threads and mutexes become implementation mechanism instead of part of programming model
22
Leung, Liu,Zou 22 / 18CS267 Sp 08 Final PresentationUC Berkeley Pthread implementation Each actor as a separate thread Implicit buffers –Each buffer has a read and write count –Condition variable: sleeps and wakes up threads –Capacity of the buffer A global notion of scheduling exists –OS level –All actors are at blocking-read mode implies the model should terminate
23
Leung, Liu,Zou 23 / 18CS267 Sp 08 Final PresentationUC Berkeley MPI Implementation Mapping of actors to cores is needed. –Classic graph partitioning problem –Nodes: actors –Edges: messages –Node weights: computations on each actor –Edge weights: amount of messages communicated –Partitions: processors Chaco chosen as the graph partitioner.
24
Leung, Liu,Zou 24 / 18CS267 Sp 08 Final PresentationUC Berkeley Partition Profiling Challenge: providing the user with enough information so node weights and edge weights can be annotated and modified to achieve load balancing. –Solution 1: Static analysis –Solution 2: Simulation –Solution 3: Dynamic load balancing –Solution 4: Profiling the current run and feed the information back to the user
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.