Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using the VTune Analyzer on Multithreaded Applications

Similar presentations


Presentation on theme: "Using the VTune Analyzer on Multithreaded Applications"— Presentation transcript:

1 Using the VTune Analyzer on Multithreaded Applications
Intel Software College

2 Using the VTune Analyzer on Multithreaded Applications
Objectives Two uses for VTune Performance Analyzer Improving the efficiency of computation Improving your threading model Example: mandelbrot3.exe Using the VTune Analyzer on Multithreaded Applications

3 Using the VTune Analyzer on Multithreaded Applications
Agenda Determine if a sample section of code can be threaded Determine if the implemented threads are balanced or not Using the VTune Analyzer on Multithreaded Applications

4 Use VTune Analyzer on Serial Applications
Determine what parts of your application (if any) that, when threaded, will speed up your app CPU-bound apps can potentially run twice as fast on dual-core processors Memory bound apps may potentially run 50% faster I/O bound applications may not run any faster Find main performance bottlenecks in your application (sampling) Determine whether or not it makes sense to thread there If not, look further up the program’s calling sequence to find a more appropriate place to consider (callgraph) Using the VTune Analyzer on Multithreaded Applications

5 Use VTune Analyzer on Multithreaded Applications
Determine if your current threading model is balanced On the thread view, each of the threads should be consuming the same amount of time If not, you must shift the amount of work between the threads Use “Samples Over Time” view Look for idle CPU time Could more threads utilize idle resources? In the process view this can show up as ntoskrnl, or intelppm.sys Sometimes the process view actually says idle task Using the VTune Analyzer on Multithreaded Applications

6 Example: mandelbrot3.exe
Single threaded app, analyze to see if it can be made Demo machine: Four socket, dual core, hyper-threaded (HT) machine (4 sockets * 2 cores per socket * 2 HT CPUs per core) = the OS sees 16 processors Using the VTune Analyzer on Multithreaded Applications

7 Using the VTune Analyzer on Multithreaded Applications
PROCESS VIEW: Analyzer shows much idle time which we’d expect on a 16 CPU system running not very much except our example program. Using the VTune Analyzer on Multithreaded Applications

8 Using the VTune Analyzer on Multithreaded Applications
Module view: show us the executables that that consumed time in the Mandelbrot process Using the VTune Analyzer on Multithreaded Applications

9 Using the VTune Analyzer on Multithreaded Applications
THREAD VIEW: Show us the threads in the Mandelbrot application (only 1 thread does any significant computation). Using the VTune Analyzer on Multithreaded Applications

10 Using the VTune Analyzer on Multithreaded Applications
SOURCE VIEW: 99% of CPU time used by that application was spent in the GenerateScanLine function. Specifically, in CalculatePixels. If we threaded here, we’d be creating, synchronizing and destroying a pixel for each thread on the screen. Thread management would dominate the execution time. The same would be true for creating a thread per scan line. We need to look further up the calling sequence to find a better place to create threads. Using the VTune Analyzer on Multithreaded Applications

11 Using the VTune Analyzer on Multithreaded Applications
CALLGRAPH: Best spot, Generate_Display which calls GenerateScanline. The callgraph shows us that both of these functions are on the critical path, this is a good sign for our plan. Using the VTune Analyzer on Multithreaded Applications

12 Using the VTune Analyzer on Multithreaded Applications
SOURCE VIEW for modified code. Speedup, sure, but now that we’ve gone multithreaded, let’s make sure each of the threads is doing the same amount of work. Using the VTune Analyzer on Multithreaded Applications

13 Using the VTune Analyzer on Multithreaded Applications
CHECK THE LOAD BALANCING using SAMPLES OVER TIME. Each square is a slice of time, and red squares indicate a larger number of amount of CPU usage in that slice of time. Many squares for each thread are red at the same instant in time, which indicates the threads are running in parallel. BUT, threads are finishing at different times, which means some threads have more work to do than other threads. This means there will be idle CPUs when the first threads finish. Using the VTune Analyzer on Multithreaded Applications

14 Using the VTune Analyzer on Multithreaded Applications
We saw different threads are doing different amounts of work. Some threads are drawing parts of the screen that aren’t covered by much of the fractal so they don’t consume much CPU time. SO, we interleaved the threads and scanlines (on a 16 CPU system) so that the first thread does the first line, the 17th line, the 33rd line, and so forth. The second thread does the second line, the 18th, and so on. This should cause a more even computational distribution over all the threads. The above is the sample over time view of the amended code. (2156 ms to 297 ms) Using the VTune Analyzer on Multithreaded Applications

15 Using the VTune Analyzer on Multithreaded Applications
This should always be the last slide of all presentations. Using the VTune Analyzer on Multithreaded Applications


Download ppt "Using the VTune Analyzer on Multithreaded Applications"

Similar presentations


Ads by Google