Download presentation
Presentation is loading. Please wait.
Published byLanny Dharmawijaya Modified over 6 years ago
1
CSE4102/5102 Team Project A Study of Parallel Programming Languages
Bineet Kumar Ghosh Yanting Chen Saumya Gupta Team B
2
HPC(High Performance Computing) HTC(High-throughput computing)
Cilk HPC(High Performance Computing) HTC(High-throughput computing) GPU Computing Condor Message Passing Shared Memory Grid Computing MPI Cilk++ OpenMP Cloud Computing Today Intel Cilk Plus offers a competitive alternative for parallel programming that is far easier to use than MPI. It is intended to complement OpenMP, enabling programmers to parallelize applications that may be too complex to fit into one of these frameworks.
3
Introduction of Cilk Cilk (pronounced "silk") is an extension of C/C++
Cilk programs are processed to C linked with a runtime library A multithreaded parallel programming language Effective for exploiting dynamic, asynchronous parallelism (Chess games) Cilk Motivation The compiler and runtime system are with the responsibility of scheduling the computation task to run efficiently on the given platform. The programmers identify elements that can safely be executed in parallel Nested loops data parallelism cilk threads Divide-and-conquer algorithm task parallelism cilk threads The run-time environment decides how to actually divide the work between processors can run without rewriting on any number of processors
4
History of Cilk technology
Before we overview the development and implementation of Cilk, we shall first overview a brief history of Cilk technology. The first implementation of Cilk, Cilk-1, arose from three separate projects at MIT. April The three projects were combined and named Cilk. March Cilk-5 was first released, which included a provably efficient runtime scheduler like its predecessor, and a source-to-source compiler. September MIT spun out the Cilk technology to Cilk Arts, Inc., Inc., founded by technical leaders Charles E. Leiserson. Cilk developed an entirely new codebase for a C++ product named Cilk++. July Cilk Arts was sold to Intel Corporation, which continued developing the technology. September Intel released its ICC compiler with Intel Cilk Plus. The product included Cilk support for C and C++, and the runtime system provided transparent integration with legacy binary executables. Cilk-1 Cilk-5 Cilk++ Intel Cilk Plus
5
Important Application
Parallel chess programs Cilkchess *Socrates Parallel gomoku The Cilkchess Parallel Chess Program Computer chess provides a good testbed for understanding dynamic multithreaded computations. Cilk-5 incorporates inlets and aborts, features that turned out to be necessary for cleanly expressing the speculative parallelism of a chess program.
6
Performance of Applications
Performance data Tserial = time for C program T1 = time for 1-processor Cilk program Tserial /T1 = efficiency of the Cilk program -- Efficiency is close to 1 for programs with long threads: Cilk overhead is small If we have one processor for Cilk program, time will be slower than C program. T1/TP = speedup
7
Paradigm of Cilk/Cilk++
Parallel Programming is really hard primarily for the reasons like task/data partitioning, synchronization, communication and etc. The main motivation behind the development of Cilk is that the programmer should focus on structuring his program to expose parallelism, leaving Cilk's runtime system with the responsibility of scheduling the computation to run efficiently on a given platform. Based on C and C++ General Purpose Programming Language designed for Multithreaded Parallel Computing Constructs extended to Parallel Loops and Fork Join Idiom.
8
Spawn and Sync Cilk’s main addition to C and C++
The spawn keyword, when preceding a function call (spawn f(x)), indicates that the function call (f(x)) can safely run in parallel with the statements following it in the calling function. Note that the scheduler is not obligated to run this procedure in parallel; the keyword merely alerts the scheduler that it can do so. A sync statement indicates that execution of the current function cannot proceed until all previously spawned function calls have completed. x = spawn fib(n-1); y = spawn fib(n-2); sync; return (x+y); spawn fib(n-1); fib(n-1) thread main thread sync fib(n-2) thread completed here spawn fib(n-2); fib(n-2) thread
9
Inlets Children procedures need to return the value to the parent.
Guarantee those return values are incorporated into the parent’s frame atomically. No lock is required to avoid data race. cilk int fib (int n) { int rst=0; inlet void summer (int result) rst+=result; return; } if (n<2) return n; else { summer(spawn fib(n-1)); summer(spawn fib(n-2)); sync; return(rst);
10
Abort and Parallel Loop
The abort keyword can only be used inside an inlet; it tells the scheduler that any other procedures that have been spawned off by the parent procedure can safely be aborted. Cilk++ added an additional construct, the parallel loop, denoted cilk_for in Cilk Plus. This implements the Parallel Map Idiom. These loops look like inlet void fun() { if (some condition) abort; } thread 1 thread 2 thread 3 thread 4 – issues abort abort; void loop(int *a, int n) { cilk_for (int i = 0; i < n; i++) { a[i] = f(a[i]); }
11
Cilk Scheduler The Cilk Scheduler maps Cilk threads onto processors dynamically at run-time. A Cilk thread is ready if all its predecessors have executed. The Cilk scheduler uses a policy called "work-stealing" to divide procedure execution efficiently among multiple processors P1 will use this as a stack. Whereas other processes can use this as queue if their individual tasks are finished Top of stack Top of stack Top of stack Top of stack Front of Queue P2 Front of Queue Front of Queue P3 P4 Front of Queue P1
12
Cilk Releases and Versions
C-based package for continuation-passing-style threads on Thinking Machines Corporation’s Connection Machine Model CM-5 Supercomputer Uses work-stealing as a general scheduling policy to improve the load balance and locality of the computation Adaptively parallel and fault-tolerant network-of-workstations implementation Cilk-2 Includes full type checking, supported all of ANSI C in its C-language subset Offers call-return semantics for writing multithreaded procedures More portable runtime also, base release includes support for several architectures other than the CM-5 Cilk-3 Implementation of dag-consistent distributed shared memory Solves a much wider class of applications Dag-consistency useful consistency model, and its relaxed semantics allows for an efficient, low overhead, software implementation. Cilk-4 Change of primary development platform from the CM-5 to the Sun Microsystems SPARC SMP The compiler and runtime system were reimplemented, eliminating continuation passing as the basis of the scheduler, and instead embedding scheduling decisions directly into the compiled code. Scaled down programs-the overhead to spawn a parallel thread in Cilk-4 was typically less than 3 times the cost of an ordinary C procedure call
13
Current Cilk Release Cilk-5
The runtime system was rewritten to be more flexible and portable than in Cilk-4 Cilk 5.0:Uses operating system threads as well as processes to implement the individual Cilk “workers” that schedule Cilk threads Cilk 5.2: Includes a debugging tool called the Nondeterminator, which can help Cilk programmers to localize data-race bugs in their code(However, Nondeterminator is not included in the present Cilk-5.3 release) Cilk-5.3 :Current Cilk Release. Can be used by programmers who are not necessarily parallel-processing experts. Integrated with the gcc compiler, and the runtime system runs identically on all platforms. Many bugs in the Cilk compiler are fixed, and the runtime system is simplified.
14
Cilk: Supported Platforms
Latest Official MIT Cilk Release, Cilk is releases under the GNU General Public License. It is intended to run on Unix-like systems that support POSIX threads, Cilk runs on GNU systems on top of the Linux kernel, and it integrates nicely with the Linux development tools. Works on GNU/Linux (IA32, AMD64, PowerPC, probably IA64), MacOS X (Intel, probably PowerPC as well), MS Windows (under cygwin) where gcc, POSIX threads, and GNU are available.
15
Cilk Compiling Flow gcc Id
Source to source compiler cilk2c C Compiler gcc Linking loader Id Fib.cilk Fib.c Fib.o Fib Cilk runtime system ./cilkc -[options] filename.cilk: This command produces the executable. For example: $ cilkc -O2 fib.cilk -o fib ./filename --nproc THRDNUM <arguments>: This command is used to run the program.For example:fib --nproc 4 30: this starts fib on 4 processors to compute the 30th Fibonacci number. -cilk-profile and -cilk-span : Used to collect performance data of a Cilk program. For example:$ cilkc -cilk-profile -cilk-span -O2 fib.cilk -o fib
16
Cilk:Storage Allocation
Cilk supports both stacks and heaps. Uses a cactus stack for stack-allocated storage, behaves much like an ordinary stack, supports C’s rule of pointers: ptr=Cilk_alloca(size); For heap, it works exactly as same as C:malloc(size);free() Cilk supports shared memory, global variables, like C Cilk provides mutual exclusion locks that allow to create atomic regions of code.
17
Introduction of ParaSail
ParaSail (Parallel Specification and Implementation Language) is a completely new language, but it steals liberally from other programming languages, including the ML series (SML, CAML, OCAML, etc.), the Algol/Pascal family (Algol, Pascal, Ada, Modula, Eiffel, Oberon, etc.), the C family (C, C++, Java, C#), and the region-based languages (especially Cyclone). ParaSail is intended to avoid "fine-granule" garbage collection in favor of stack and region-based storage management. To a programmer, ParaSail looks like a modified form of Java or C#, two leading languages. The difference is that it automatically splits a program into thousands of smaller tasks that can then be spread across cores, which maximizes the number of tasks being carried out in parallel, regardless of the number of cores.
18
Biographies of the Founder
Mr. Taft graduated from Harvard College with a bachelor's in Chemistry. He joined Intermetrics. While at Intermetrics, he participated in the development of the Ada Integrated Environment for the Air Force. From 1990 to Mr. Taft led the Ada 9X language design team, culminating in the February 1995 approval of Ada 95 as the first ISO standardized object-oriented programming language. Since he has been a member of the ISO Rapporteur Group that developed Ada 2005, and is finalizing Ada 2012. Most recently Tucker has been designing and implementing the parallel programming language ParaSail. Tucker Taft
19
Why Design A New Language for Safety-Critical Systems?
80+% of safety-critical systems are developed in C and C++, two of the least safe languages invented in the last 40 years Every 40 years you should start from scratch Computers have stopped getting faster By 2020, most chips will have cores Advanced Static Analysis has come of age -- time to get the benefit at compile-time
20
How does ParaSail Compare to…
C/C++ -- built-in safety; built-in parallelism; much simpler Ada -- eliminates race conditions, increases parallelism, eliminates run-time checks, simplifies language Java -- eliminates race conditions, increases parallelism, avoids garbage collection, does all checking at compile time, no run-time exceptions Cilk -- the ParaSail run-time system uses the work stealing model (similar to what is used by Cilk). Example: setting up the parallel algorithms The Cilk compiler does this at points where you write "spawn.“ The ParaSail compiler does this for you automatically.
21
Paradigm of ParaSail Parallel Specification and Implementation Language (ParaSail) is an object-oriented parallel programming language. ParaSail uses a pointer-free programming model, where objects can grow and shrink, and value semantics are used for assignment. It has no global garbage collected heap. Instead, region-based memory management is used throughout. Any possible race conditions are detected at compile time. Both an interpreter using the ParaSail virtual machine, and an LLVM-based ParaSail compiler are available. Work stealing is used for scheduling ParaSail's light-weight threads. Input area work area output area
22
ParaSail Model ParaSail has five basic concepts: Module Type Object
Has an interface, and classes that implement them interface M <Formal is Int<>> is ... Supports inheritance of interface and code Type is an instance of a Module type T is M <Actual>; “T+” is polymorphic type for types inheriting from Tʼs interface Object is an instance of a Type var Obj : T := T::Create(...); Objects can be declared resizable (mutable) Operation is defined in a Module, and operates on one or more Objects of specified Types. Containers instead of pointers Generalized indexing into containers replaces pointer deferences region[index] equiv to *index Input area work area
23
ParaSail Virtual Machine
ParaSail Virtual Machine (PSVM) designed for prototype implementations of ParaSail. PSVM designed to support “pico” threading with parallel block, parallel call, and parallel wait instructions. Heavier-weight “server” threads serve a queue of light-weight picothreads, each of which represents a sequence of PSVM instructions (parallel block) or a single parallel “call” Similar to Intelʼs Cilk run-time model with work stealing. While waiting to be served, a pico-thread needs only a handful of words of memory. A single ParaSail program can easily involve 1000ʼs of pico threads. PSVM instrumented to show degree of parallelism achieved Input area work area output area
24
Why and How do we Formalize
Assertions help catch bugs sooner rather than later. Parallelism makes bugs much more expensive to find and fix. Integrate assertions (annotations) into the syntax everywhere, as pre/postconditions, invariants, etc. Compiler disallows potential race-conditions. Compiler checks assertions, rejects the program if it can't prove the assertions. No run-time checking implies better performance, and no run-time exceptions to worry about. Input area work area foo() { x = fun(8); assert(x > 7); // Precondition x = x + fun(9); assert (x < 7); // Postcondition } output area
25
What makes ParaSail interesting
Pervasive (implicit and explicit) parallelism Supported by ParaSail Virtual Machine Inherently Safe: preconditions, postconditions, constraints, etc., integrated throughout the syntax no global variables; no dangling references; value semantics no run-time checks or exceptions -- all checking at compile-time storage management based on optional and extensible objects Small number of flexible concepts: Modules, Types, Objects, Operations, Containers User-defined literals, indexing, aggregates, physical units checking Input area work area output area
26
ParaSail Releases and Compilers
Initial release of Parasail includes an interpreter called ParaSail Virtual Machine (PSVM). With Parasail 5.1, the parallel constructs of ParaSail have been adapted to other syntaxes, to produce Java-like, Python-like, and Ada-like parallel languages, dubbed, respectively, Javallel, Parython, and Sparkel. Compilers and interpreters for these languages are included with the ParaSail implementation. With Parasail 6.0, LLVM based Parasail Compiler is available. With current release of ParaSail 7.0, LLVM complier works with PSVM frontend.
27
ParaSail Supported Platforms
ParaSail is supported on Linux, Mac and Windows Current release of Parasail includes executable binary files for Mac and Linux. For windows, executable binary files is not yet developed, user has to build it from the source release.
28
ParaSail:Storage Allocation
Eliminates Global Variable - Operations may only access variables passed as parameters Eliminate global heap with no explicit allocate/free of storage and no garbage collector Replaced by region-based storage management (local heaps) All objects conceptually live in a local stack frame Eliminate pointers Adopt notion of “optional” objects that can grow and shrink Eliminate explicit threads, lock/unlock, signal/wait Concurrent objects synchronized automatically
29
Implementation Plans Parallel Memory Clearing - N=Mk shared memory cells Algorithm Groote with P = Nk processes Thread 1 Thread 2 P=20 Input area work area P=22 P=23
30
Thank You
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.