Profiling Code Performance on a Distributed System

Profiling Code Performance on a Distributed System
John DeGraaf CMU 1995, ECE Tony DeLuca Pitt 1995, CS

About NetApp $6.3B (2014), Fortune 500 storage and data management company. Founded in 1992. 12,300 employees, 150+ worldwide offices, HQ: Sunnyvale, CA, Pittsburgh: ~275. Customers include energy, pharmaceutical, sports, entertainment, technology, cloud and many more. 96% of Fortune 100 companies are customers of NetApp. Consistently among “top places to work” lists internationally and locally.

NetApp Confidential - Internal Use Only
Uses of NetApp The NFL and all of its teams run their business on NetApp. Energy companies like Chevron and Shell use NetApp innovations to explore for oil that will fuel your car. Movies such as "Avatar" and "The Lord of the Rings" were created using NetApp products. Pharmaceutical companies like Genentech use NetApp storage to develop the medications on which many of us depend. Popular services, telephone companies, and Internet sites run on NetApp. NetApp Confidential - Internal Use Only

Agenda What is profiling? How is it accomplished?
What are the benefits & challenges? Requirements in Clustered Data ONTAP Profiling a distributed call path Example instrumentation code Debug vs. production use Analyzing profiling reports Other use cases

What is profiling? Dynamic program analysis that measures the frequency and duration of function calls, specific instructions, and/or usage of system resources. Profiling is generally achieved by instrumenting either the program source code or its binary executable. Profilers may use a number of different techniques, such as event-based, statistical, instrumented, and simulation methods.

How is profiling accomplished?
Instrumentation manual compiler assisted (eg: Gprof) run-time (eg: Valgrind) Sampling (time and/or event-based) eg: Gperftools, Oprofile, Apple Shark, ... HW-specific: Intel vTune, AMD Code Analyzer Run-time Profiling eg: DTrace, LTTng, SystemTap, …

What are the benefits? Discover performance (and variability)
Discover frequency of use (and abuse) Understand flow of code Discover multipliers Find performance (and other) regressions ...

Visualization Tools

Challenges with Profiling
Observer effects Changes timings; Can be many times slower Estimation problems Some minor, some major Recursion causes accounting problems Distributed system challenges CPU vs. Latency Profiling specific calls No perfect profiler Should understand how it works to use it well

Clustered Data ONTAP Requirements
Support per-request & per-method profiling Handle limited recursion Expose inter-process communication costs Expose inter-node communication costs Work without special build, on in-use system Automatic for every public API (current & future) Support custom scoped measurements Aggregated API-level stats always available Near-zero overhead for per-API stats Per-request results available on completion Very low per-request overhead when used NetApp Confidential - Internal Use Only

Profiling a distributed call path
Client can request tracing for any operation Per-thread tracing based on setting IPC passes local tracing setting to server Response packages tracing metrics to client Client inserts remote metrics in local thread Results are displayed to user

NetApp Confidential - Internal Use Only
Example metrics class class duration_metrics { uint64_t total_time_; // in us uint64_t max_time_; // in us uint64_t min_time_; // in us uint64_t last_time_; // in us uint64_t total_calls_; public: duration_metrics() : total_time_(0), max_time_(0), min_time_(0), last_time_(0), total_calls_(0) {} uint64_t get_avg_time() const { return total_time_ / total_calls_; } uint64_t get_max_time() const { return max_time_; } uint64_t get_min_time() const { return min_time_; } uint64_t get_last_time() const { return last_time_; } uint64_t get_total_calls() const { return total_calls_; } uint64_t get_total_time() const { return total_time_; } void add_time(uint64_t diff) { total_calls_++; last_time_ = diff; if (total_calls_ == 1 || diff < min_time_) { min_time_ = diff; } max_time_ = max(max_time_, diff); total_time_ += diff; }; NetApp Confidential - Internal Use Only

Example instrumentation
class trace_node { std::vector<trace_node*> children_; trace_node* parent_; duration_metrics metrics_; static __thread trace_node *current_node_; public: static trace_requests(bool b, const char* label); // enable/disable thread tracing static trace_node *begin_trace(cost char* label); // Add child node if not present void end_trace(uint64_t diff) { metrics_.add_time(diff); current_node_ = parent_; } }; class track_duration { uint64_t start_; trace_node *trace_node_; track_duration(const char* label) : start_(0), trace_node_(NULL) { start(label); } ~track_duration() { (void) stop(); } void start(const char* label) { if (!trace_node_) { trace_node_ = trace_node::begin_trace(label); start_ = rdtsc_in_us(); // Monotonic TSC in microseconds uint64_t stop() { uint64_t diff = rdtsc_in_us() - start_; if (trace_node_) { trace_node_->end_trace(diff); return diff;

Example usage #ifdef ENABLE_PROFILING #define MEASURE_METHOD_DURATION track_duration _method_duration(__FUNCTION__); #define MEASURE_SCOPE_DURATION(x) track_duration _scope_duration(x); #else #define MEASURE_METHOD_DURATION #define MEASURE_SCOPE_DURATION(x) #endif void some_function(int param) { MEASURE_METHOD_DURATION ... for (int i=0; i < param; i++) { MEASURE_SCOPE_DURATION(“some_function::loop1”); } return; // Non-virtual public interface that wraps virtual method bool framework_class::get() framework_track_duration metric(this); // Tracks active calls if (metric.start(getMethodName(OP_GET)) { result_ = get_imp(); // call derived class implementation return result_ == OK;

Uses in production code
Prevent poorly behaving code from consuming too many system resources Track which features are used, how often, and how well they perform Improved live diagnostic information Node: degraafcluster Done/ Table Method Dist Samples Avg (us) OK Errs Retry volume get_imp server % 0% 3% volume next_imp server % 2% 60% volume nextseq_imp server % 0% 20% Node: degraafcluster Done/ volume get_imp server % 0% 97% volume next_imp server % 0% 54% volume nextseq_imp server % 0% 20%

Trace Output for Code Flow
TRACE RESULTS: us ( sec) %Parent / Label Method Dst Num !Ok Time Max Min Avg % ONTAPI svr % *self_time* % ONTAPI parse-xml svr % ONTAPI pre-processing svr % *self_time* svr % ONTAPI audit_logger svr % ONTAPI input-validation svr % ONTAPI api-execution svr % *self_time* % zapi_qtree_list_iter zsmf-populate svr % ZapiSmfMapping list svr % *self_time* % ZapiSmfMapping parse_child_elem svr % ZapiSmfMapping list-record-loop svr % *self_time* % smdb_iterator next svr % *self_time* % qtree_snmp next_imp svr % qtree_snmp nextseq_imp svr % *self_time* % qtree_snmp next_imp svr % *self_time* % smdb_iterator next svr % *self_time* % clusterVolSnmpView next_imp clt % *self_time* % clusterVolSnmpView clntRPC clt % clusterVolSnmpView callRPC clt % *self_time* % clusterVolSnmpView servRPC svr % *self_time* % smdb_iterator next svr % *self_time* % clusterVolSnmpView next_imp svr

Trace Output sorted by self-time
TRACE RESULTS: us ( sec) Label Method Dst Samples Errors >Self Time %All Total Time %All Avg Self Avg Total Min Time Max Time : : d-qtree-list-info zapi clt qtree_snmp nextseq_imp svr sequential_identifier_byname get_imp svr vserver get_imp svr sequential_identifier get_imp svr system-get-ontapi-version health-check clt refIDVolumeTable nextseq_imp svr vserver_by_name get_imp svr ZapiSmfMapping list-record-loop svr msidTable get_imp svr export_rules_byid_table get_imp svr ONTAPI extract-xml svr ZapiSmfMapping constructxml svr ZapiSmfMapping list svr aggrTable get_imp svr clusterVolSnmpView next_imp clt sequential_identifier_byname get svr clusterVolSnmpView servRPC svr ONTAPI zapi_elem_free svr clusterVolSnmpView next svr qtree_export_table next svr qtree_snmp next svr vserver get svr clusterVolSnmpView nextseq_imp svr clusterVolSnmpView callRPC clt qtree_export_table get_imp svr NetApp Confidential - Internal Use Only

Analyzing Results for Regressions
Report Totals TRACE RESULTS: us ( sec) 1. TRACE RESULTS: us ( sec) Label Method Dst Self [0] Self [1] >Diff Count [0] Count [1] Diff d-qtree-list-info zapi clt vserver get_imp svr sequential_identifier_byname get_imp svr system-get-ontapi-version health-check clt vserver_by_name get_imp svr sequential_identifier get_imp svr smdb_iterator get svr smdb_iterator next svr qtree_snmp next_imp svr clusterVolSnmpView next_imp clt ONTAPI extract-xml svr ZapiSmfMapping constructxml svr clusterVolSnmpView servRPC svr ONTAPI encode svr qtree_export_table next_imp svr ZapiSmfMapping list svr capability get_imp svr clusterVolSnmpView callRPC clt

Profile use of other resources
Heap Memory File Descriptors Locks Example: Dataset Time(s) Net Bytes Max Bytes Alloc Bytes Freed Bytes Allocs

Interested in NetApp? University Graduate Hire Internship / Co-Op
Capstone Projects / Senior Design Projects Sign-in sheet

Profiling Code Performance on a Distributed System

Similar presentations

Presentation on theme: "Profiling Code Performance on a Distributed System"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Profiling Code Performance on a Distributed System

Similar presentations

Presentation on theme: "Profiling Code Performance on a Distributed System"— Presentation transcript:

Similar presentations

About project

Feedback