Download presentation
Presentation is loading. Please wait.
1
CS 284a, 5 November 97Copyright (c) 1997-98, John Thornley1 CS 284a Lecture Wednesday, 5 November 1997
2
CS 284a, 5 November 97Copyright (c) 1997-98, John Thornley2 Timing a Multithreaded for Loop What is the overhead of thread creation/termination? Time the following null loop: Time the loop on one processor. Time the loop for different values of n. #pragma multithreadable chunk_size(1) mapping(simple) for (i = 0; i < n, i++) ;
3
CS 284a, 5 November 97Copyright (c) 1997-98, John Thornley3 Tempting... but this won’t work #include double elapsed; set_processor_usage(1); start_timing(); #pragma multithreadable chunk_size(1) mapping(simple) for (i = 0; i < n, i++) ; finish_timing(&elapsed); printf(“Multithreaded loop took %f seconds\n”, elapsed); Granularity of clock is milliseconds. Loop may run in small number of milliseconds.
4
CS 284a, 5 November 97Copyright (c) 1997-98, John Thornley4 Need to time many iterations #include double elapsed; set_processor_usage(1) start_timing(); for (k = 0; k < num_iterations; k++) #pragma multithreadable chunk_size(1) mapping(simple) for (i = 0; i < n, i++) ; finish_timing(&elapsed); printf(“Multithreaded loop took %f seconds\n”, elapsed/k); Experiment to find large enough num_iterations. In general, should subtract out time for k loop.
5
CS 284a, 5 November 97Copyright (c) 1997-98, John Thornley5 Timing a Barrier What is the overhead of threads passing a barrier? Time barrier implementation of the following null loops: Subtract out time for one multithreaded null loop. Time the loops on one processor. Time the loops for different values of n. #pragma multithreadable chunk_size(1) mapping(simple) for (i = 0; i < n, i++) ; #pragma multithreadable chunk_size(1) mapping(simple) for (i = 0; i < n, i++) ;
6
CS 284a, 5 November 97Copyright (c) 1997-98, John Thornley6 All-Pairs Shortest-Paths Problem Fully-connected graph with n vertices. Given: for all i, j : edge[i][j] = length of edge from vertex i to vertex j. Want: for all i, j : path[i][j] = length of shortest path from vertex i to vertex j. void seq_shortest_paths( int n, const unsigned int edge[N][N], unsigned int path[N][N]);
7
CS 284a, 5 November 97Copyright (c) 1997-98, John Thornley7 All-Pairs Shortest-Paths Example 0 21 1 2 3 0 8 7 1 6 5 0 1 2 012 edge 811 602 357 0 1 2 012 411 502 346 path
8
CS 284a, 5 November 97Copyright (c) 1997-98, John Thornley8 All-Pairs Shortest-Paths Algorithm void seq_shortest_paths( int n, const unsigned int edge[N][N], unsigned int path[N][N]) { int i, j, k; unsigned int new_path; for (i = 0; i < n; i++) for (j = 0; j < n; j++) path[i][j] = edge[i][j]; for (k = 0; k < n; k++) for (i = 0; i < n; i++) for (j = 0; j < n; j++) { new_path = path[i][k] + path[k][j]; if (new_path < path[i][j]) path[i][j] = new_path; } Invariant: After outer-loop iteration k, for all i, j : path[i][j] =shortest path from vertex i to vertex j with intermediate vertices in the set { 0, 1,..., k }.
9
CS 284a, 5 November 97Copyright (c) 1997-98, John Thornley9 Multithreaded All-Pairs Shortest Paths Is This OK? void multi_shortest_paths( int n, const unsigned int edge[N][N], unsigned int path[N][N], int t) { int i, j, k; unsigned int new_path; for (i = 0; i < n; i++) for (j = 0; j < n; j++) path[i][j] = edge[i][j]; #pragma multithreadable chunk_size(1) mapping(blocked(t)) for (k = 0; k < n; k++) for (i = 0; i < n; i++) for (j = 0; j < n; j++) { new_path = path[i][k] + path[k][j]; if (new_path < path[i][j]) path[i][j] = new_path; } No! Erroneous sharing of variables! Different k iterations write to same path[i][j] variables.
10
CS 284a, 5 November 97Copyright (c) 1997-98, John Thornley10 Multithreaded All-Pairs Shortest Paths Is This OK? void multi_shortest_paths( int n, const unsigned int edge[N][N], unsigned int path[N][N], int t) { int i, j, k; unsigned int new_path; for (i = 0; i < n; i++) for (j = 0; j < n; j++) path[i][j] = edge[i][j]; for (k = 0; k < n; k++) #pragma multithreadable chunk_size(1) mapping(blocked(t)) for (i = 0; i < n; i++) for (j = 0; j < n; j++) { new_path = path[i][k] + path[k][j]; if (new_path < path[i][j]) path[i][j] = new_path; } No! Erroneous sharing of variables! Different i iterations write/read same path[i][j] variables.
11
CS 284a, 5 November 97Copyright (c) 1997-98, John Thornley11 Multithreaded All-Pairs Shortest-Paths This is OK (But Inefficient) void multi_shortest_paths( int n, const unsigned int edge[N][N], unsigned int path[N][N], int t) { int i, j, k; unsigned int old_path[N][N]; unsigned int new_path; for (i = 0; i < n; i++) for (j = 0; j < n; j++) path[i][j] = edge[i][j]; for (k = 0; k < n; k++) { copy(n, old_path, path); /* old_path = path; */ #pragma multithreadable chunk_size(1) mapping(blocked(t)) for (i = 0; i < n; i++) for (j = 0; j < n; j++) { new_path = old_path[i][k] + old_path[k][j]; if (new_path < old_path[i][j]) path[i][j] = new_path; }
12
CS 284a, 5 November 97Copyright (c) 1997-98, John Thornley12 Multithreaded All-Pairs Shortest-Paths Outline of Efficient Solution Two arrays, path and temp, as in inefficient solution. Swap pointers on each k iteration, instead of copying arrays. Should we create/destroy new set of threads on each k iteration, or use barrier implementation? Time for n = 1000 is 45 sec. 45 msec per k iteration. Barrier implementation necessary if multithreading overhead is significant fraction of 45 msec.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.