WSE 187: INTRODUCTION TO PARALLEL PROGRAMMING* Lecture 2 Jesmin Jahan Tithi *Prepared with the help of free online resources.
LOGIN TO SSH Steps (Windows) Connect to the host Give provided password when prompted For the first time users: Accept security keys Change password: First provide the Old password Then type new password Repeat new password Do not afraid if you do not see any character On screen. But be careful when you are Typing. Save the host info in SSH when prompted. Mac uses: Use terminal to directly connect to the server. Other steps are the same. You may try to use Filezilla to transfer file from mac to server.
HELLO PARALLEL WORLD! Intel Cilk Plus
CILK PLUS Intel® Cilk™ Plus = add-on to the C and C++, implemented by the Intel® C++ Compiler 3 keywords to C and C++: cilk_for, cilk_spawn, and cilk_sync cilk_spawn - Specifies that a function call can execute asynchronously, without requiring the caller to wait for it to return. This is an expression of an opportunity for parallelism, not a command that mandates parallelism. The Intel Cilk Plus runtime will choose whether to run the function in parallel with its caller.cilk_spawn cilk_sync - Specifies that all spawned calls in a function must complete before execution continues. There is an implied cilk_sync at the end of every function that contains a cilk_spawn.cilk_sync cilk_for - Allows iterations of the loop body to be executed in parallel.cilk_for cilk_spawn and cilk_for keywords express opportunities for parallelism.
CILK_SPAWN #include static void hello(){ int i=0; for(i=0;i< ;i++) printf(""); printf("Hello "); } static void world(){ int i=0; for(i=0;i< ;i++) printf(""); printf("world! "); } int main(){ cilk_spawn hello(); cilk_spawn world(); //cilk_sync; printf("Done! "); } Compile: icc –O3 –o hello Hello_parallel_world.cpp Run:./hello Run: CILK_NWORKERS=4./hello
CILK_SPAWN EXERCISE Order of placement Wheels, Chassis, Engine, Frame, Steering wheel #include void make(char* str){ int i=0; for(i=0;i< ;i++) printf(""); printf("%s has/have been created.\n",str); } void place(char* str){ int i=0; for(i=0;i< ;i++) printf(""); printf("%s has/have been placed.\n",str); } int main(){ //Place your code here }
CILK_FOR #include #include "cilktime.h" using namespace std; #define n int main(){ // First input vector. int A[n]; // Second input vector. int B[n]; // Sum vector. int C[n]; // Initialize the vectors or arrays with input. cilk_for (int i = 0; i <= n; i++){ A[i] = i; B[i] = i+1; } // Compute the sum unsigned long long tstart = cilk_getticks(); //beginning time stamp cilk_for (int i = 0; i <= n; i++){ C[i] = A[i] + B[i]; } unsigned long long tend = cilk_getticks(); //end time stamp cout<<"Time to run:"<<cilk_ticks_to_seconds(tend-tstart)<<endl; // Check the sum to verify. int pos; cout<<"Enter position of element to inspect"<<endl; cin>>pos; cout<<C[pos]<<endl; return 0; } for (int i = 0; i < 8; ++i) { cilk_spawn do_work(i); } cilk_sync; A better approach is to use a cilk_for loop: cilk_for (int i = 0; i < 8; ++i) { do_work(i); }
CILK_FOR #include int main(){ long int sum = 0; cilk_for (int i = 0; i <= ; i++) sum += i; printf("%ld\n",sum); return 0; } for (int i = 0; i < 8; ++i) { cilk_spawn do_work(i); } cilk_sync; A better approach is to use a cilk_for loop: cilk_for (int i = 0; i < 8; ++i) { do_work(i); } //wrong! race conditionrace condition
CILK_FOR #include #include //pthread library int main(){ long int sum = 0; pthread_mutex_t m; //define the lock pthread_mutex_init(&m,NULL); //initialize the lock cilk_for (int i = 0; i <= ; i++){ pthread_mutex_lock(&m); //lock - prevents other threads from running this code sum += i; pthread_mutex_unlock(&m); //unlock - allows other threads to access this code } printf("%ld\n",sum); } Several ways of dealing with race conditions. First option: Use locks! We will learn more later.
CHANGING NUMBER OF CORES/THREADS Run with: CILK_NWORKERS=4./executable Or change inside the main program: if (0!= __cilkrts_set_param("nworkers","16")) { cout<<"Failed to set worker count\n"<<endl; return 1; } Check to verify: int num_threads =__cilkrts_get_nworkers(); cout<< num_threads <<endl;