Workshop in Nihzny Novgorod State University Activity Report

Workshop in Nihzny Novgorod State University Activity Report
Alexey Iliasov ( ) Kyrgyz Russian Slavic University

Goals of the project Research:
- implementation approaches - applicability - real-life applications targeting Implement: - simple profiler - analysis tool

Implementation Approaches
levels of abstraction - hardware level - machine instructions level - assembly language level - compiler level - source code level - library level

GNU Family Compilers - supports many languages supports many targets provides lots of optimisations techniques open source available under the terms of the GPL

GNU Family Compilers machine independent ports exist for more then 30 platforms high code generation quality intensive optimisation RTL - Register Transfer Language reusability ,000 lines of language and platform independent routines.

GNU Family Compilers weird internal structure written in mix of C and C++ modularity problems lack of good documentation

GCC infrastructure 25 optimization passes + assembler generation
source parser 25 optimization passes + assembler generation tree optimisation target back end RTL debug info language front-end binary

based on tree transformation
Mudflap C/C++ bounds checker based on tree transformation instruments program to detect memory access errors tracks call to many library functions provides replacements for common C library functions

memory profiler for GCC
Mudzzi memory profiler for GCC based on mudflap approach development considerations high performance language independent large-scale applications minimization of inlined code multi-threading support online or post-mortem analysis

memory profiler for GCC
Mudzzi memory profiler for GCC tracked events read/write memory accesses object declarations object destructions (for stack-frame objects) calls to malloc, calloc, realloc, mmap and free timing

Mudzzi two record types: normal prefix record
records format two record types: normal prefix record length prefixed prefix length record Memory Read/Write record: record type: 32 bits access address : 32 bits RTDSC cpu tick value : 64 bits source line number : 32 bits base pointer address : 32 bits size of accessed region : 32 bits coded source file an function name : 32 bits

Mudzzi code transformation original instrumented
void foo() { int a = 3; mpf_vardecl(&a, sizeof(int), 0, “a”, .., ..); int b[100]; mpf_vardecl(b, sizeof(int)*100, 0, “b”, .., ..); b[a] = 10; mpf_add(b+a, a, b, 1, .., ..); mpf_varundecl(a, .., ..); mpf_varundecl(b, .., ..); return; } void foo() { int a = 3; int b[100]; b[a] = 10; return; }

profiled code performance ~20% of original
Mudzzi profiled code performance ~20% of original

dump file size problem: grows very fast
Mudzzi dump file size problem: grows very fast

Visualization and analysis tool for memory profiler

features overview 1.Visualization of memory profiler dump
2.Cycles detection 3.Array access analysis inside detected cycles 4.Reuse distance calculation for arrays 5.Cache hit/miss rate, analysis and explanations

address/time diagram example array access pattern
addresses time by rows by columns

address/time diagram array access pattern

cache config and report

Blocked Matrix Multiply cache interference
void BlkMatrixMultiply (etype *X, etype *Y, etype *Z, int N, int B) { int w, q, i, j, k; etype r; for (w = 0; w < N; w += b) for (q = 0; q < N; q += b) for (i = 0; i < N; i++) for (k = w; k < MIN (w + b, N); k++) { r = *(X + i * N + k); for (j = q; j < MIN (q + b, N); j++) *(Z + i * N + j) += *(Y + k * N + j) * r; } where N - matrix size, B - block size we use N = 128, B = 32 and arrays are a[N][N], b[N][N], c[N][N]

Full view of cache utilization report

VARIABLE à' hit rate:81% Replacement causes: `b' (0xbffe78d0:49152) replacements (62%) `c' (0xbffdb8d0:49152) - 57 replacements (3%) interference with b self interference VARIABLE `b' hit rate:81% Replacement causes: à' (0xbfff38d0:49152) replacements (1%) `c' (0xbffdb8d0:49152) replacements (3%) VARIABLE `c' hit rate:99% Replacement causes: à' (0xbfff38d0:49152) replacements (7%) `b' (0xbffe78d0:49152) replacements (91%)

number of distinct object references between two reuses
Reuse distance number of distinct object references between two reuses RD = 4 (e, c, a, d) a d f e c e a c a a c e d e a c d f a e a c RD = 4 (d, f, e, c) - not a time but address distance measure - closely related to hit rate for LRU/FIFO caches - leads to an effective and easy to apply optimisation

finding groups of variables commonly used together
Clustering finding groups of variables commonly used together a b c d e f a d f e c e a c a a c e d e a c d f a e a c a 5 1 4 1 b a-c b d e f a-c c 1 3.5 0.5 1 3 b d 2 1 d e 2 1 1 e f 1 f

profiler implementation (as GCC module)
Results of the project profiler implementation (as GCC module) Benefits: - good analysis capabilities and binding to sources - good performance - ease of use Problems: - ineffective (full code coverage) - part of another program

Results of the project applicability
- instrumentation effectively works for large-scale applications - reasonable performance penalty - platform/OS independent Problems: - lack of remote analysis - GCC-centric

Results of the project analysis tool - visual diagrams
- cache analysis - binding to source-level - flexible Problems: - poor representation for long-running large applications - too few analysis tools - some tests/tools stuck on large dump files

profiling the profiler
Results of the project profiling the profiler

Results of the project glance at future
- consider DIOTA as instrumentation basis - implement remote analysis - multiple specific profilers within one analysis tool - add support for HT/SMP architectures

That's all Thank you! Iliasov Alexey Kyrgyz Russian Slavic University Kyrgyzstan

Workshop in Nihzny Novgorod State University Activity Report

Similar presentations

Presentation on theme: "Workshop in Nihzny Novgorod State University Activity Report"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Workshop in Nihzny Novgorod State University Activity Report

Similar presentations

Presentation on theme: "Workshop in Nihzny Novgorod State University Activity Report"— Presentation transcript:

Similar presentations

About project

Feedback