Modern Linux Tracing Landscape

Slides:



Advertisements
Similar presentations
Profiling your application with Intel VTune at NERSC
Advertisements

Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.
Processes CSCI 444/544 Operating Systems Fall 2008.
OS Spring’03 Introduction Operating Systems Spring 2003.
NDT Tools Tutorial: How-To setup your own NDT server Rich Carlson Summer 04 Joint Tech July 19, 2004.
Microsoft ® Official Course Monitoring and Troubleshooting Custom SharePoint Solutions SharePoint Practice Microsoft SharePoint 2013.
1 CS503: Operating Systems Part 1: OS Interface Dongyan Xu Department of Computer Science Purdue University.
Chocolate Bar! luqili. Milestone 3 Speed 11% of final mark 7%: path quality and speed –Some cleverness required for full marks –Implement some A* techniques.
A Portable Virtual Machine for Program Debugging and Directing Camil Demetrescu University of Rome “La Sapienza” Irene Finocchi University of Rome “Tor.
LINUX System : Lecture 3 Vmware, Cygwin, LINUX installation Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University.
Multi-core Programming VTune Analyzer Basics. 2 Basics of VTune™ Performance Analyzer Topics What is the VTune™ Performance Analyzer? Performance tuning.
LiNK: An Operating System Architecture for Network Processors Steve Muir, Jonathan Smith Princeton University, University of Pennsylvania
Lecture 8. Profiling - for Performance Analysis - Prof. Taeweon Suh Computer Science Education Korea University COM503 Parallel Computer Architecture &
Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.
Introduction to Open Source Performance Tool --Linux Tool Perf Yiqi Ju (Fred) Sep. 13, 2012.
1 COMP 3438 – Part II-Lecture 1: Overview of Compiler Design Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Guide to Linux Installation and Administration, 2e1 Chapter 10 Managing System Resources.
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Performance Monitoring Tools on TCS Roberto Gomez and Raghu Reddy Pittsburgh Supercomputing Center David O’Neal National Center for Supercomputing Applications.
OSes: 3. OS Structs 1 Operating Systems v Objectives –summarise OSes from several perspectives Certificate Program in Software Development CSE-TC and CSIM,
November 25, KFT & Tracing Collaboration Tim Bird Sony Electronics.
MSP432™ MCUs Training Part 14: Serial Wire Output Trace
31 Oktober 2000 SEESCOASEESCOA STWW - Programma Work Package 5 – Debugging Task Generic Debug Interface K. De Bosschere e.a.
© 2002 IBM Corporation Confidential | Date | Other Information, if necessary Copyright © 2009 Ericsson, Made available under the Eclipse Public License.
1 Asstt. Prof Navjot Kaur Computer Dept PRESENTED BY.
AdaptJ Sookmyung Women’s Univ. PSLAB. 1. 목차 1. Overview 2. Collecting Trace Data using the AdaptJ Agent 2.1 Recording a Trace 3. Analyzing Trace Data.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
Beyond Application Profiling to System Aware Analysis Elena Laskavaia, QNX Bill Graham, QNX.
Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.
© 2013 MontaVista Software, LLC. MontaVista Confidential and Proprietary. CGE7 DevRocket7 Feature Demo Divya Vyas.
FUDConBrussels Fedora ProjectFedora Project – 24 February 2007 FUDConBrussels THIS IS SYSTEMTAP Dynamic instrumentation.
Tracing for Hardware, Driver and Binary Reverse Engineering in Linux Mathieu Desnoyers Recon 2006.
Java Flight Recorder and Java Mission Control
Introduction to Operating Systems Concepts
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
CS 3214 Computer Systems Lecture 9 Godmar Back.
Operating System Structures
Module 12: I/O Systems I/O hardware Application I/O Interface
Current Generation Hypervisor Type 1 Type 2.
AESA – Module 8: Using Dashboards and Data Monitors
Operating Systems CMPSC 473
Overview – SOE PatchTT December 2013.
Jonathan W. Duggins; James Blum NC State University; UNC Wilmington
Kernel Tracing David Ferry, Chris Gill
MONITORING MICROSOFT WINDOWS SERVER 2003

Perf with the Linux Kernel
What we need to be able to count to tune programs
OS Virtualization.
Chapter 4: Threads.
Threads and Data Sharing
QNX Technology Overview
PerfView Measure and Improve Your App’s Performance for Free
Chapter 3: Operating-System Structures
IS3440 Linux Security Unit 7 Securing the Linux Kernel
Operating System Concepts
Getting Started: Developing Code with Cloud9
Lecture Topics: 11/1 General Operating System Concepts Processes
The Design & Implementation of Hyperupcalls
Introduction to OProfile
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Min Heap Update E.g. remove smallest item 1. Pop off top (smallest) 3
Chapter 2: Operating-System Structures
Kernel Tracing David Ferry, Chris Gill, Brian Kocoloski
RUN TIME PROGRAM BEHAVIOUR
TEE-Perf A Profiler for Trusted Execution Environments
In Today’s Class.. General Kernel Responsibilities Kernel Organization
Module 12: I/O Systems I/O hardwared Application I/O Interface
Enhancements to ROOT performance benchmarking
Presentation transcript:

Modern Linux Tracing Landscape Sasha Goldshtein github.com/goldshtn CTO, Sela Group @goldshtn

Agenda Overview of kernel tracing technologies Modern tracing tools BPF: The next Linux tracing superpower

What Is Tracing, Exactly? Inspect function execution, arguments, call graph Print lightweight log messages (kernel/user) Aggregate statistics (min/max/avg, histogram) Low overhead Continuous monitoring

Linux Tracing Tools Ease of use dtrace for Linux ply/BPF SysDig ktap LTTng perf SystemTap ftrace Shading: green = relatively new, not fully mature, progressing quickly red = dead, dying, has issues blue = mature, stable Arrows indicate direction of development (towards ease of use and or features) bcc/BPF C/BPF custom .ko new stable dead Level of detail, features

Tracepoints Trace statements compiled to a function that does nothing Optionally attached to a probe handler that prints/counts/… TRACE_EVENT, DEFINE_EVENT_CLASS, DEFINE_EVENT Documentation/trace/tracepoints.txt Also available for user mode with USDT, #include <sys/sdt.h> TRACE_EVENT(sched_switch, TP_PROTO(bool preempt, struct task_struct *prev, ...), TP_ARGS(preempt, prev, next), ...

ftrace Kernel functions are instrumented with calls to mcount (gcc -pg) Tracer calls replaced with nops at boot time Patched back to call ftrace on demand Can get function execution trace, call graph, call stack Main interface through /sys/kernel/debug/tracing Documentation/trace/ftrace.txt

kprobes and uprobes Place a probe on any instruction in any function Replaced with breakpoint or with jump if possible Handler (typically .ko) can run before and after kprobe, jprobe, kretprobe (same for user) Documentation/kprobes.txt Documentation/trace/uprobetracer.txt push ebp mov ebp, esp sub esp, 8 ... mov esp, ebp pop ebp ret Demo Poor-man’s opensnoop: cd /sys/kernel/debug/tracing echo 1 > tracing_on echo ‘p:myprobe do_sys_open filename=+0(%si):string’ > kprobe_events echo 1 > events/kprobes/myprobe/enable cat trace_pipe

perf_events Standard Linux profiler Many event sources: Provides the perf command Usually a package added by linux-tools-common, etc. Many event sources: Timer-based sampling Hardware events (e.g. LLC misses) Tracepoints (e.g. block:block_rq_complete) Dynamic tracing (kprobes, uprobes) Can sample stacks of (almost) everything on CPU Can miss hard interrupt ISRs, but these should be near-zero and can be measured separately if needed

perf Developed in-tree and actively maintained, new features landing often Multi-tool for a variety of performance investigations Records into perf.data for post-processing # perf kvm Tool to trace/measure kvm guest os list List all symbolic event types usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS] lock Analyze lock events mem Profile memory accesses The most commonly used perf commands are: record Run a command and record its profile into perf.data annotate Read perf.data (created by perf record) and display annotated code report Read perf.data (created by perf record) and display the profile archive Create archive with object files with build-ids found in perf.data sched Tool to trace/measure scheduler properties (latencies) bench General framework for benchmark suites script Read perf.data (created by perf record) and display trace output buildid-cache Manage build-id cache. stat Run a command and gather performance counter statistics buildid-list List the buildids in a perf.data file test Runs sanity tests. config Get and set variables in a configuration file. timechart Tool to visualize total system behavior during a workload data Data file related processing top System profiling tool. diff Read perf.data files and display the differential profile probe Define new dynamic tracepoints evlist List the event names in a perf.data file trace strace inspired tool inject Filter to augment the events stream with additional information kmem Tool to trace/measure kernel memory properties See 'perf help COMMAND' for more information on a specific command.

Flame Graphs A visual approach for summarizing stack traces x-axis: alphabetical stack sort, to maximize merging y-axis: stack depth color: random (default), or a dimension Currently made from Perl + SVG + JavaScript https://github.com/brendangregg/FlameGraph Multiple d3 versions are also being developed Easy to make Converters for many profilers Demo CPU profiling with flame graphs perf record -F 97 -ag -- sleep 5 perf script | FlameGraph/stackcollapse-perf.pl | FlameGraph/flamegraph.pl > flame.svg

Berkeley Packet Filters (BPF) Originally designed for, well, packet filtering: dst port 80 and len >= 100 Custom instruction set, interpreted/JIT compiled 0: (bf) r6 = r1 1: (85) call 14 2: (67) r0 <<= 32 3: (77) r0 >>= 32 4: (15) if r0 == 0x49f goto pc+40

Extended BPF Used for virtual network, security, tracing Multiple front-ends: C, perf, SystemTap, bcc, ply, … User Program Kernel 1. generate verifier BPF bytecode kprobes BPF uprobes 2. load per-event data 3. perf_output tracepoints 3. async read statistics maps

BCC: BPF Compiler Collection Library and Python/Lua module for compiling, loading, and executing BPF programs https://github.com/iovisor/bcc C + Python/Lua front-end for BPF Includes many tracing tools Tracing layers: bcc tool bcc tool … bcc … Python lua U K front-ends Kernel BPF Events

BCC Tools $ ls *.py argdist.py bashreadline.py biolatency.py biosnoop.py biotop.py bitesize.py btrfsdist.py btrfsslower.py cachestat.py cachetop.py capable.py cpudist.py dcsnoop.py dcstat.py execsnoop.py ext4dist.py ext4slower.py filelife.py fileslower.py filetop.py funccount.py funclatency.py gethostlat...py hardirqs.py killsnoop.py llcstat.py mdflush.py memleak.py offcputime.py offwaketime.py oomkill.py opensnoop.py pidpersec.py profile.py runqlat.py softirqs.py solisten.py stackcount.py stacksnoop.py statsnoop.py syncsnoop.py tcpaccept.py tcpconnect.py tcpconnlat.py tcpretrans.py tplist.py trace.py vfscount.py vfsstat.py wakeuptime.py xfsdist.py xfsslower.py zfsdist.py zfsslower.py

BCC General Performance Checklist execsnoop tcpconnect opensnoop tcpaccept ext4slower (or btrfs*, xfs*, zfs*) tcpretrans gethostlatency biolatency runqlat biosnoop profile cachestat Demo biolatency (while running dd) fileslower 0 execsnoop stackcount t:sched:sched_switch trace 'r:/usr/bin/bash:readline "%s", retval’ trace -p $(pidof node) 'u:node:http__server__request "%s %s (from %s:%d)" arg5, arg6, arg3, arg4’ ustat ucalls -SL 10 -l java uobjnew ruby $(pidof irb)

Summary Tracing can identify bugs and performance issues that no debugger or profiler can catch Tools make low-overhead, dynamic, production tracing possible Flame graphs help visualize complex stack trace information and other hierarchical data BPF is the next-generation backend for Linux tracing tools

References Perf and flame graphs BPF/BCC tutorials (by Brendan Gregg) https://perf.wiki.kernel.org/index.php/Main_Page http://www.brendangregg.com/flamegraphs.html BPF/BCC tutorials (by Brendan Gregg) https://github.com/iovisor/bcc/blob/master/docs/tutorial.md https://github.com/iovisor/bcc/blob/master/docs/tutorial_bcc_python_developer.md https://github.com/iovisor/bcc/blob/master/docs/reference_guide.md ftrace, perf, and (mostly) BPF hands-on labs (by Sasha Goldshtein) https://github.com/goldshtn/linux-tracing-workshop BPF https://github.com/torvalds/linux/tree/master/samples/bpf https://www.kernel.org/doc/Documentation/networking/filter.txt https://github.com/iovisor/bpf-docs

Thank You! Sasha Goldshtein github.com/goldshtn CTO, Sela Group sashag@sela.co.il blog.sashag.net