Download presentation
Presentation is loading. Please wait.
Published byMelissa Richardson Modified over 9 years ago
1
Profiling Tools Introduction to Computer System, Fall 2015. (PPI, FDU) Vtune & GProfile
2
Profiling In software engineering, profiling ("program profiling", "software profiling") is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization. Profiling is achieved by instrumenting either the program source code or its binary executable form using a tool called a profiler (or code profiler). Profilers may use a number of different techniques, such as event-based, statistical, instrumented, and simulation methods.
3
Performance Tuning for Intel® Xeon Phi™ Coprocessors Visualizing Performance Opportunities using Intel® VTune™ Amplifier
4
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Introduction Can profile host, offload or native coprocessor applications Host-based profiling may be sufficient to identify vectorization/ parallelism/ offload candidates Call stacks currently available for host only Start with representative/reasonable workloads! Use Intel ® VTune™ Amplifier XE to gather hot spot data Tells what functions account for most of the run time Often, this is enough But it does not tell you much about program structure Move on to more detailed analyses 2
5
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Hotspot (Statistical call tree) Hardware-Event Based Sampling Thread Profiling Visualize thread interactions on timeline Balance workloads Easy set-up Pre-defined performance profiles Use a normal production build Compatible Microsoft*, GCC*, Intel compilers C/C++, Fortran, Assembly,.NET* Latest Intel processors and compatible processors 1 Find Answers Fast Filter out extraneous data View results tied to source/assembly lines Event multiplexing Windows* or Linux* Visual Studio* Integration (Windows) Standalone user interface and command line 32 and 64-bit 3 Intel ® VTune™ Amplifier XE Tune Applications for Scalable Multicore Performance Fast, Accurate Performance Profiles 1 IA-32 and Intel ® 64 architectures. Many features work with compatible processors. Event based sampling requires a genuine Intel Processor.
6
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 4 A Quick Tour Through Intel® VTune™ Amplifier Setting up a project Execution file, command line arguments, working directory Search directories (standard binary libraries for Intel MPSS 3) Quick tour of advanced setup dialog Selecting a collector Host versus native event collection Launching a collection Viewing results, source and assembly
7
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice VTune™ Amplifier XE visualizes performance 5
8
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice VTune™ Amplifier XE visualizes performance 6 InstructionsNavigatorNew CompareOpen Result Open Properties Project Toolbar
9
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice VTune™ Amplifier XE visualizes performance 13 Grid Pane
10
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice VTune™ Amplifier XE visualizes performance 14 Grid Pane Grouping pull-down
11
VTune™ Amplifier XE visualizes performance Intel Confidential Optimization Notice 18 Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 6/29/2014 Source View / Per line localization
12
VTune™ Amplifier XE visualizes performance Intel Confidential Optimization Notice 19 Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 6/29/2014 Source View / View / Hot spot Navigation controls Can also copy small data files onto card, but will need to be recopied after reboot. Suggest create /tmp/usrname as working directory
13
VTune™ Amplifier XE visualizes performance Intel Confidential Optimization Notice 20 Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 6/29/2014 Assembly View / View / Hot spot Navigation controls
14
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice For event collection the coprocessor is treated as a special HW architecture 21
15
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice General Exploration runs a set of events to drive top-down analysis 25
16
VTune™ Amplifier XE visualizes performance Intel Confidential Optimization Notice 20 Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 6/29/2014 Assembly View / View / Hot spot Navigation controls
17
VTune™ Amplifier Intel Confidential Optimization Notice 20 Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 6/29/2014 Advantage Both command line and GUI, easy to use Multiple predefined analyzing suite Support hardware events like cache and memory access analysis Multithread profiling well supported
18
VTune™ Amplifier Intel Confidential Optimization Notice 20 Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 6/29/2014 Limitations For enterprise use, Expensive!!! Can only be used on intel machines.
20
GPROF Gprof is a performance analysis tool for Unix applications. It uses a hybrid of instrumentation and sampling and was created as extended version of the older "prof" tool. Unlike prof, gprof is capable of limited call graph collecting and printing.
21
Usage Instrumentation code is automatically inserted into the program code during compilation (for example, by using the '-pg' option of the gcc compiler), to gather caller- function data. A call to the monitor function 'mcount' is inserted before each function call. gcc -Wall -g -pg -lc_p example.c -o example./example will create gmon.out gprof -b example gmon.out
22
Result Gprof output consists of two parts: the flat profile and the call graph. The flat profile gives the total execution time spent in each function and its percentage of the total running time. Function call counts are also reported. Output is sorted by percentage, with hot spots at the top of the list.
23
Result % the percentage of the total running time of the time program used by this function. cumulative a running sumof the number of seconds accounted seconds for by this function and those listed above it. self the number of seconds accounted for by this seconds function alone. This is the major sort for this listing. calls the number of times this function was invoked, if this function is profiled, else blank. self the average number of milliseconds spent in this ms/call function per call, if this function is profiled, else blank. total the average number of milliseconds spent in this ms/call function and its descendents per call, if this function is profiled, else blank. name the name of the function. This is the minor sort for this listing. The index shows the location of the function in the gprof listing. If the index is in parenthesis it shows where it would appear in the gprof listing if it were to be printed.
24
Advantages GNU is not UNIX(supported by GNU) Unlimited by hardwares
25
Limitations Gprof cannot measure time spent in kernel mode (syscalls, waiting for CPU or I/O waiting), and only user- space code is profiled. Gprof profiles the main thread of application of multi- threaded application. Insert code when compiling. No hardware events.
26
More man gprof https://sourceware.org/binutils/docs/gprof/
27
Open topic Introduction to Computer System, Fall 2015. (PPI, FDU)
29
Attack: Stack Buffer Overflow A Typical Buffer Overflow Attack –Inject malicious code in buffer –Overwrite return address to buffer –Once return, the malicious code runs 0110110101 0101010101 0110101010 1010101010 return addr saved ebp ebp buf 0101011010 1010111010 void function(char *str) { char buf[16]; strcpy(buf,str); }
30
Defense: DEP (Data Execution Prevention) Execute Code, not Data Data areas marked non-executable –Stack marked non-executable Hardware enforced (NX) You can load your shellcode in the stack …but you can’t jump to it slide 30
31
How to pwn? Give other ways of pwning except buffer overflow. Focusing on how to change the program form its normal execution path.
32
Debugging
33
How to Debug? The Program gets wrong results Runs program in debug mode Execute the code line by line to find the cause Can this always work well in a multi-threads program? If not, why? what’s the difference between sequential bugs and parallel ones? And how to debug a tricky multi-threads program?
34
Cache
35
Cache locality Cache locality is the key to achieving high levels of performance. We can improve cache locality by either optimizing our program or changing the cache strategy or the implementation. You can introduce some methods to improve the cache locality from certain perspective and present how it works.
36
Requirement Each student picks one topic and do a presentation with ppt slides. Any techniques or methods if you can finish presentation within 6 min 2015/10/30 6-7 classroom will be informed later. PPT slides should be emailed to your TA before 2015/10/29 23:59 p.m.
37
How to score high ? Illustrate your ideas clearly, you may refer to the Internet or give out your own solution. Remember time is limited, try to be precise and concise. Your presentation contains three part: PPT, oral speaking and your content. All of these are important in grading.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.