Download presentation
Presentation is loading. Please wait.
Published byΕιδοθεα Μπλέτσας Modified over 6 years ago
1
Measuring Program Performance Matrix Multiply
CSCE 513 Computer Architecture Measuring Program Performance Matrix Multiply Topics Linux times Matrix multiplication Readings: November 20, 2017
2
Times in Unix File times Process times struct timeval {
ls –l gives modification date (#seconds since Jan 1, 1970) Process times struct timeval { long tv_sec; /* seconds */ long tv_usec; /* microseconds */ };
3
The time command cocsce-l1d39-11> time gcc pthread1.c -l pthread -o pthread1 real m0.077s user 0m0.052s sys m0.012s cocsce-l1d39-11> Note real == wall clock time, and real-time >= user-time + system-time
4
cocsce-l1d39-11> gcc pthread1
cocsce-l1d39-11> gcc pthread1.c -l pthread -o pthread1 cocsce-l1d39-11> ./pthread1 In main: creating thread 0 In main: creating thread 1 In main: creating thread 2 Hello World! It's me, thread 0! In main: creating thread 3 Hello World! It's me, thread 1! Hello World! It's me, thread 2! In main: creating thread 4 Hello World! It's me, thread 3! Hello World! It's me, thread 4!
5
TIME(7) Linux Programmer's Manual TIME(7) NAME time - overview of time and timers DESCRIPTION Real time and process time Real time is defined as time measured from some fixed point, either from a standard point in the past (see the description of the Epoch and calendar time below), or from some point (e.g., the start) in the life of a process (elapsed time). Process time is defined as the amount of CPU time used by a process. This is some‐ times divided into user and system components. User CPU time is the time spent executing code in user mode. System CPU time is the time spent by the kernel exe‐ cuting in system mode on behalf of the process (e.g., executing system calls). The time(1) command can be used to determine the amount of CPU time consumed during the execution of a program. A program can determine the amount of CPU time it has con‐ sumed using times(2), getrusage(2), or clock(3). The hardware clock …
6
Getrusage struct rusage { struct timeval ru_utime; /* user CPU time used */ struct timeval ru_stime; /* system CPU time used */ long ru_maxrss; /* maximum resident set size */ long ru_ixrss; /* integral shared memory size */ long ru_idrss; /* integral unshared data size */ long ru_isrss; /* integral unshared stack size */ long ru_minflt; /* page reclaims (soft page faults) */ long ru_majflt; /* page faults (hard page faults) */ long ru_nswap; /* swaps */ long ru_inblock; /* block input operations */ long ru_oublock; /* block output operations */ long ru_msgsnd; /* IPC messages sent */ long ru_msgrcv; /* IPC messages received */ long ru_nsignals; /* signals received */ long ru_nvcsw; /* voluntary context switches */ long ru_nivcsw; /* involuntary context switches */ };
7
struct timeval struct timeval { long tv_sec; /* seconds */ long tv_usec; /* microseconds */ };
8
Matmult.c - example Headers / declarations Initialize arrays A and B
Multiplication 𝑪 𝒊,𝒋 = 𝒌=𝟎 𝒏−𝟏 𝑨 𝒊,𝒌 ∗ 𝑩 𝒌,𝒋
9
3 Nested Loops to compute product
for(i=0;i<rows;++i){ for(j=0;j<cols2;++j){ for(k=0;k<cols;++k){ C[i][j] = C[i][j] + A[i][k] * B[k][j]; } Note rows*cols2 *cols multiplications and additions If for square matrices rows=cols2=cols= n then there are n3 multiplications
10
Headers #include <stdio.h> #include <stdlib.h> #include <math.h> #include <assert.h> #include <time.h> #include <sys/resource.h> double **allocmatrix(int, int ); int freematrix(double **, int, int); void nerror(char *error_text); double seconds(int nmode); double rand_gen(double fmin, double fmax); void SetSeed(int flag);
11
int main(int argc, char** argv) { int l,rows,cols2,cols; int i,j,k; double temp; double **A, **B, **C; double tstart, tend; /* **************************************************** // * The following allows matrix parameters to be * // * entered on the command line to take advantage * // * of dynamically allocated memory. You may modify * // * or remove it as you wish. * // ****************************************************/ if (argc != 4) { nerror("Usage: <executable> <rows-value> <cols-value> <cols2-value>"); } rows = atoi(argv[1]); /* A is a rows x cols matrix */ cols = atoi(argv[2]); /* B is a cols x cols2 matrix */ cols2 = atoi(argv[3]); /* So C=A*B is a rows x cols2 matrix */ Main: args
12
Initializing the arrays
A=(double **) allocmatrix(rows,cols); /* ********************************************************* // * Initialize matrix elements so compiler does not * // * optimize out * // *********************************************************/ for(i=0;i<rows;i++) { for(j=0;j<cols;j++) { A[i][j] = rand_gen(1.0, 2.0); /* if(i == j) A[i][j]=1.0; else A[i][j] = 0.0; */ }
13
Rand_gen /* generate a random double between fmin and fmax */ double rand_gen(double fmin, double fmax) { return fmin + (fmax - fmin) * drand48(); } /* The drand48() and erand48() functions return nonnegative double-precision floating-point values uniformly distributed over the interval [0.0, 1.0). */
14
Seconds- a function to combine all the times into one double
/* Returns the total cpu time used in seconds. */ double seconds(int nmode){ struct rusage buf; double temp; getrusage( nmode, &buf ); /* Get system time and user time in micro-seconds.*/ temp = (double)buf.ru_utime.tv_sec*1.0e6 + (double)buf.ru_utime.tv_usec + (double)buf.ru_stime.tv_sec*1.0e6 + (double)buf.ru_stime.tv_usec; /* Return the sum of system and user time in SECONDS.*/ return( temp*1.0e-6 ); }
15
Timing a section of code
tstart = seconds(RUSAGE_SELF); for(i=0;i<rows;++i){ for(j=0;j<cols2;++j){ for(k=0;k<cols;++k){ C[i][j] = C[i][j] + A[i][k] * B[k][j]; } tend = seconds(RUSAGE_SELF);
16
Timing a section of code – kij variation
tstart = seconds(RUSAGE_SELF); for(k=0;k<cols;++k){ for(i=0; i<rows;++i){ for(j=0; j<cols2;++j){ C[i][j] = C[i][j] + A[i][k] * B[k][j]; } tend = seconds(RUSAGE_SELF);
17
Timing a section of code – kji variation
tstart = seconds(RUSAGE_SELF); for(k=0;k<cols;++k){ for(j=0;j<cols2;++j){ for(i=0;i<rows;++i){ C[i][j] = C[i][j] + A[i][k] * B[k][j]; } tend = seconds(RUSAGE_SELF);
18
Performance variations
cocsce-l1d39-11> gcc kij.c -o kij cocsce-l1d39-11> gcc kji.c -o kji cocsce-l1d39-11> ./matmul The total CPU time is: seconds cocsce-l1d39-11> ./kij The total CPU time is: seconds cocsce-l1d39-11> ./kji The total CPU time is: seconds
19
Address Trace - &x – address of operator &x – address of x
if((mytracefile = fopen(“trace”, “w”)) == NULL) fprintf(stderr, “Could not open file %s!\n”, “trace”; fprintf(mytracefile, “address of x is %p\n”, &x);
20
for(i=0;i<rows;++i){ for(j=0;j<cols2;++j){ for(k=0;k<cols;++k){ C[i][j] = C[i][j] + A[i][k] * B[k][j]; }
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.