Presentation is loading. Please wait.

Presentation is loading. Please wait.

With RTAI, MPICH2, MPE, Jumpshot, Sar and hopefully soon OProfile or VTune Dawn Nelson CSC523.

Similar presentations


Presentation on theme: "With RTAI, MPICH2, MPE, Jumpshot, Sar and hopefully soon OProfile or VTune Dawn Nelson CSC523."— Presentation transcript:

1 With RTAI, MPICH2, MPE, Jumpshot, Sar and hopefully soon OProfile or VTune Dawn Nelson CSC523

2  We already know that parallel programming makes good use of multiple processors, but we don’t always know if our programs are making the most efficient use of MPI.  Knowing how to use something doesn’t necessarily mean you know how it works.  There are many different ways to get data from point A to point B.  If you don’t know which way is better, try it, time it, profile it.  When adding in calls to the RTAI API, you can find out if you are adding overhead or benefit.  You can determine other bottlenecks or inefficiencies.

3  Within your code  For overall timing, add gettimeofday() call in your program at the beginning and end to get overall time.  Use MPI_WTime() to time MPI calls.  Add MPE tags for user programmed events. MPE already keeps track of send, receive, wait, barrier and other MPI calls.  Without changing code  Use the system call sar (yum install sysstat).  Use an open source profiler such as OProfile.  Use a purchased profiler such as VTune.

4  MPE tags in your code look like this:  #include  int event,event1a, event1b;  MPE_Log_get_state_eventIDs( &event1a, &event1b );  MPE_Describe_state(event1a,event1b,"slave compute“,"orange");  MPE_Start_log(); //start of code section to profile  …… //your code block here  MPE_Log_event(event1b, 0, "slave_compute_end");  //logging environment ends with MPI_Finalize();  A clog2 file is generated.  Use Jumpshot to graphically display the file:  >jumpshot filename.clog2

5 Matrix Multiply, using 4 nodes, master-slave scenario

6 Matrix Multiply from the midterm with MPI_Barrier() call

7  Real Time computing is done at the highest priority (SCHED_FIFO 99)  Real Time computing is non-preemptible.  Real Time computing blocks interrupts and interference from the OS and from users.  Reduces Jitter and Latency: processes finish at the same time.  For high compute to IO ratios, Real Time improves overall processing speed.

8 RTAI-LXRT Architecture RTAI-MPI Architecture

9 Soft LXRT Real Time implementation.

10 Matrix Multiply without RTAI

11  Jitter is the dissimilar completion times of MPI workers, as it effects the timing of the overall process.  Jitter is caused by interference of interrupts by the operating system, other processes, other users.  By running a process 100 times and collecting the overall time, jitter can be measured.  Jitter was measured using an MPI application with and without injected interference from a cpu hog.  Jitter was measured using an RTAI-MPI application with and without injected interference from a cpu hog.

12

13

14  MPE shows chronologically when MPI calls are done and completed but not the effects of programs on the cpu, memory, interrupts, swap, I/O, tcp  The sysstat command sar will collect information about the above listed items.  Open source kSar java program will parse the text file generated by sar and graph everything.

15

16

17 Notice, big drop in amount of interrupts while lxrt-soft program is running RTAI lxrt-soft real timeNo RTAI

18  Finish testing the I/O hog program and implement it.  Current kSar graph data was obtained from my home system and not the C107 cluster. The kSar program was not working for importing a text file, as well as not working remotely via ssh from home to cluster.  Implement OProfile or VTune for additional profiling on master and one or more worker nodes.  Obtain sar and kSar data for master and one or more worker nodes on cluster.  Obtain MPE data while using a cpu hog and an interrupt hog.  Obtain sar and kSar data while using a cpu hog and an interrupt hog.  Determine what the interrupt levels mean: i014 etc.

19  Perform system benchmarking  Write the corresponding paper for this project.  Determine the scalability of a real time and parallel application. There is a little bit of overhead using high priority and RTAI.

20  http://oprofile.sourceforge.netprofiling  http://www.bitmover.com/lmbench/  http://www.cyberciti.biz/tips/identifying-linux-bottlenecks- sar-graphs-with-ksar.html  http://www.cyberciti.biz/tips/linux-resource-utilization-to- detect-system-bottlenecks.html  http://www.cyberciti.biz/tips/top-linux-monitoring-tools.html  http://www.howtoforge.com/sysstat_monitoring_centos  http://www.opersys.com/ftp/pub/LTT/Documentation/p- a06_yaghmour.pdf  http://www.opersys.com/LTT/documentation.html  http://www.linux-mag.com/cache/7728/1.html  http://www.rtai.org


Download ppt "With RTAI, MPICH2, MPE, Jumpshot, Sar and hopefully soon OProfile or VTune Dawn Nelson CSC523."

Similar presentations


Ads by Google