Profiling Tools Introduction to Computer System, Fall 2015. (PPI, FDU) Vtune & GProfile.

Slides:



Advertisements
Similar presentations
Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
Advertisements

Profiler In software engineering, profiling ("program profiling", "software profiling") is a form of dynamic program analysis that measures, for example,
Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,
Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
Profiling your application with Intel VTune at NERSC
Intel® performance analyze tools Nikita Panov Idrisov Renat.
Mr Manesh T Dept. of CSE College of Arts and Science Chapter 3 Types of Softwares Code: 1400 Tech.
MCTS GUIDE TO MICROSOFT WINDOWS 7 Chapter 10 Performance Tuning.
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
CS 345 Computer System Overview
Visual Basic 2010 How to Program. © by Pearson Education, Inc. All Rights Reserved.2.
Visual Basic 2010 How to Program Reference: Instructor: Maysoon Bin Duwais slides Visual Basic 2010 how to program by Deitel © by Pearson Education,
1 Lecture 6 Performance Measurement and Improvement.
PhD/Master course, Uppsala  Understanding the interaction between your program and computer  Structuring the code  Optimizing the code  Debugging.
Virtual Machine Monitors CSE451 Andrew Whitaker. Hardware Virtualization Running multiple operating systems on a single physical machine Examples:  VMWare,
Hands-On Microsoft Windows Server 2008 Chapter 11 Server and Network Monitoring.
CH 13 Server and Network Monitoring. Hands-On Microsoft Windows Server Objectives Understand the importance of server monitoring Monitor server.
Windows Server 2008 Chapter 11 Last Update
Getting Reproducible Results with Intel® MKL 11.0
ORIGINAL AUTHOR JAMES REINDERS, INTEL PRESENTED BY ADITYA AMBARDEKAR Overview for Intel Xeon Processors and Intel Xeon Phi coprocessors.
CSE 451: Operating Systems Autumn 2013 Module 6 Review of Processes, Kernel Threads, User-Level Threads Ed Lazowska 570 Allen.
Spring 2014 SILICON VALLEY UNIVERSITY CONFIDENTIAL 1 Introduction to Embedded Systems Dr. Jerry Shiao, Silicon Valley University.
Chocolate Bar! luqili. Milestone 3 Speed 11% of final mark 7%: path quality and speed –Some cleverness required for full marks –Implement some A* techniques.
System Calls 1.
Multi-core Programming Thread Profiler. 2 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Topics Look at Intel® Thread Profiler features.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
UNIX System Administration OS Kernal Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept Kernel or MicroKernel Concept: An OS architecture-design.
® IBM Software Group © 2012 IBM Corporation OPTIM Data Studio – Jon Sayles, IBM/Rational November, 2012.
MCTS Guide to Microsoft Windows 7
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 2: System Structures.
WORK ON CLUSTER HYBRILIT E. Aleksandrov 1, D. Belyakov 1, M. Matveev 1, M. Vala 1,2 1 Joint Institute for nuclear research, LIT, Russia 2 Institute for.
Welcome to the Power of 64-bit Computing …now available on your desktop! © 1998, 1999 Compaq Computer Corporation.
Multi-core Programming VTune Analyzer Basics. 2 Basics of VTune™ Performance Analyzer Topics What is the VTune™ Performance Analyzer? Performance tuning.
Process Management. Processes Process Concept Process Scheduling Operations on Processes Interprocess Communication Examples of IPC Systems Communication.
Portions © Intel Corporation | Portions © Hewlett-Packard Corporation * Other brands and names may be claimed as the property of others.
SB ScriptBasic Introduction to ScriptBasic There are more people writing programs in BASIC than the number of people capable programming.
Lecture 8. Profiling - for Performance Analysis - Prof. Taeweon Suh Computer Science Education Korea University COM503 Parallel Computer Architecture &
Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,
Upgrade to Real Time Linux Target: A MATLAB-Based Graphical Control Environment Thesis Defense by Hai Xu CLEMSON U N I V E R S I T Y Department of Electrical.
1 A Simple but Realistic Assembly Language for a Course in Computer Organization Eric Larson Moon Ok Kim Seattle University October 25, 2008.
CS 390- Unix Programming Environment CS 390 Unix Programming Environment Topics to be covered: Distributed Computing Fundamentals.
Lecture Set 2 Part B – Configuring Visual Studio; Configuration Options and The Help System (scan quickly for future reference)
1 Computer Programming (ECGD2102 ) Using MATLAB Instructor: Eng. Eman Al.Swaity Lecture (1): Introduction.
Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
Application Profiling Using gprof. What is profiling? Allows you to learn:  where your program is spending its time  what functions called what other.
CSE 303 Concepts and Tools for Software Development Richard C. Davis UW CSE – 12/6/2006 Lecture 24 – Profilers.
A Tutorial on Introduction to gdb By Sasanka Madiraju Graduate Assistant Center for Computation and Technology.
Adv. UNIX: Profile/151 Advanced UNIX v Objectives –introduce profiling based on execution times and line counts Special Topics in Comp.
Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,
Full and Para Virtualization
Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.
Copyright 2014 – Noah Mendelsohn Performance Analysis Tools Noah Mendelsohn Tufts University Web:
Processes and Threads MICROSOFT.  Process  Process Model  Process Creation  Process Termination  Process States  Implementation of Processes  Thread.
Lecture 5 Rootkits Hoglund/Butler (Chapters 1-3).
*Pentium is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries Performance Monitoring.
Tuning Threaded Code with Intel® Parallel Amplifier.
Embedded Real-Time Systems Processing interrupts Lecturer Department University.
Operating System (Reference : OS[Silberschatz] + Norton 6e book slides)
Beyond Application Profiling to System Aware Analysis Elena Laskavaia, QNX Bill Graham, QNX.
A+ Guide to Managing and Maintaining Your PC, 7e Chapter 2 Introducing Operating Systems.
July 10, 2016ISA's, Compilers, and Assembly1 CS232 roadmap In the first 3 quarters of the class, we have covered 1.Understanding the relationship between.
Introduction to Operating Systems Concepts
Virtual Machine Monitors
MCTS Guide to Microsoft Windows 7
Process Management Presented By Aditya Gupta Assistant Professor
Introduction to Operating System (OS)
Many-core Software Development Platforms
Introduction to OProfile
Chapter-1 Computer is an advanced electronic device that takes raw data as an input from the user and processes it under the control of a set of instructions.
Presentation transcript:

Profiling Tools Introduction to Computer System, Fall (PPI, FDU) Vtune & GProfile

Profiling In software engineering, profiling ("program profiling", "software profiling") is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization. Profiling is achieved by instrumenting either the program source code or its binary executable form using a tool called a profiler (or code profiler). Profilers may use a number of different techniques, such as event-based, statistical, instrumented, and simulation methods.

Performance Tuning for Intel® Xeon Phi™ Coprocessors Visualizing Performance Opportunities using Intel® VTune™ Amplifier

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Introduction Can profile host, offload or native coprocessor applications Host-based profiling may be sufficient to identify vectorization/ parallelism/ offload candidates  Call stacks currently available for host only Start with representative/reasonable workloads! Use Intel ® VTune™ Amplifier XE to gather hot spot data  Tells what functions account for most of the run time  Often, this is enough  But it does not tell you much about program structure  Move on to more detailed analyses 2

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice  Hotspot (Statistical call tree) Hardware-Event Based Sampling Thread Profiling  Visualize thread interactions on timeline Balance workloads Easy set-up  Pre-defined performance profiles Use a normal production build Compatible  Microsoft*, GCC*, Intel compilers C/C++, Fortran, Assembly,.NET* Latest Intel processors and compatible processors 1 Find Answers Fast  Filter out extraneous data View results tied to source/assembly lines Event multiplexing Windows* or Linux*  Visual Studio* Integration (Windows)  Standalone user interface and command line 32 and 64-bit 3 Intel ® VTune™ Amplifier XE Tune Applications for Scalable Multicore Performance Fast, Accurate Performance Profiles 1 IA-32 and Intel ® 64 architectures. Many features work with compatible processors. Event based sampling requires a genuine Intel Processor.

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 4 A Quick Tour Through Intel® VTune™ Amplifier Setting up a project  Execution file, command line arguments, working directory  Search directories (standard binary libraries for Intel MPSS 3)  Quick tour of advanced setup dialog Selecting a collector  Host versus native event collection Launching a collection Viewing results, source and assembly

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice VTune™ Amplifier XE visualizes performance 5

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice VTune™ Amplifier XE visualizes performance 6 InstructionsNavigatorNew CompareOpen Result Open Properties Project Toolbar

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice VTune™ Amplifier XE visualizes performance 13 Grid Pane

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice VTune™ Amplifier XE visualizes performance 14 Grid Pane Grouping pull-down

VTune™ Amplifier XE visualizes performance Intel Confidential Optimization Notice 18 Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 6/29/2014 Source View / Per line localization

VTune™ Amplifier XE visualizes performance Intel Confidential Optimization Notice 19 Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 6/29/2014 Source View / View / Hot spot Navigation controls Can also copy small data files onto card, but will need to be recopied after reboot. Suggest create /tmp/usrname as working directory

VTune™ Amplifier XE visualizes performance Intel Confidential Optimization Notice 20 Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 6/29/2014 Assembly View / View / Hot spot Navigation controls

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice For event collection the coprocessor is treated as a special HW architecture 21

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice General Exploration runs a set of events to drive top-down analysis 25

VTune™ Amplifier XE visualizes performance Intel Confidential Optimization Notice 20 Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 6/29/2014 Assembly View / View / Hot spot Navigation controls

VTune™ Amplifier Intel Confidential Optimization Notice 20 Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 6/29/2014 Advantage Both command line and GUI, easy to use Multiple predefined analyzing suite Support hardware events like cache and memory access analysis Multithread profiling well supported

VTune™ Amplifier Intel Confidential Optimization Notice 20 Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 6/29/2014 Limitations For enterprise use, Expensive!!! Can only be used on intel machines.

GPROF Gprof is a performance analysis tool for Unix applications. It uses a hybrid of instrumentation and sampling and was created as extended version of the older "prof" tool. Unlike prof, gprof is capable of limited call graph collecting and printing.

Usage Instrumentation code is automatically inserted into the program code during compilation (for example, by using the '-pg' option of the gcc compiler), to gather caller- function data. A call to the monitor function 'mcount' is inserted before each function call. gcc -Wall -g -pg -lc_p example.c -o example./example will create gmon.out gprof -b example gmon.out

Result Gprof output consists of two parts: the flat profile and the call graph. The flat profile gives the total execution time spent in each function and its percentage of the total running time. Function call counts are also reported. Output is sorted by percentage, with hot spots at the top of the list.

Result % the percentage of the total running time of the time program used by this function. cumulative a running sumof the number of seconds accounted seconds for by this function and those listed above it. self the number of seconds accounted for by this seconds function alone. This is the major sort for this listing. calls the number of times this function was invoked, if this function is profiled, else blank. self the average number of milliseconds spent in this ms/call function per call, if this function is profiled, else blank. total the average number of milliseconds spent in this ms/call function and its descendents per call, if this function is profiled, else blank. name the name of the function. This is the minor sort for this listing. The index shows the location of the function in the gprof listing. If the index is in parenthesis it shows where it would appear in the gprof listing if it were to be printed.

Advantages GNU is not UNIX(supported by GNU) Unlimited by hardwares

Limitations Gprof cannot measure time spent in kernel mode (syscalls, waiting for CPU or I/O waiting), and only user- space code is profiled. Gprof profiles the main thread of application of multi- threaded application. Insert code when compiling. No hardware events.

More man gprof

Open topic Introduction to Computer System, Fall (PPI, FDU)

Attack: Stack Buffer Overflow A Typical Buffer Overflow Attack –Inject malicious code in buffer –Overwrite return address to buffer –Once return, the malicious code runs return addr saved ebp ebp buf void function(char *str) { char buf[16]; strcpy(buf,str); }

Defense: DEP (Data Execution Prevention) Execute Code, not Data Data areas marked non-executable –Stack marked non-executable Hardware enforced (NX) You can load your shellcode in the stack …but you can’t jump to it slide 30

How to pwn? Give other ways of pwning except buffer overflow. Focusing on how to change the program form its normal execution path.

Debugging

How to Debug? The Program gets wrong results Runs program in debug mode Execute the code line by line to find the cause Can this always work well in a multi-threads program? If not, why? what’s the difference between sequential bugs and parallel ones? And how to debug a tricky multi-threads program?

Cache

Cache locality Cache locality is the key to achieving high levels of performance. We can improve cache locality by either optimizing our program or changing the cache strategy or the implementation. You can introduce some methods to improve the cache locality from certain perspective and present how it works.

Requirement Each student picks one topic and do a presentation with ppt slides. Any techniques or methods if you can finish presentation within 6 min 2015/10/ classroom will be informed later. PPT slides should be ed to your TA before 2015/10/29 23:59 p.m.

How to score high ? Illustrate your ideas clearly, you may refer to the Internet or give out your own solution. Remember time is limited, try to be precise and concise. Your presentation contains three part: PPT, oral speaking and your content. All of these are important in grading.