Presentation is loading. Please wait.

Presentation is loading. Please wait.

Perseus Design. 2 Lockheed Martin and Government Use Only Architecture Behavioral “signatures” are extracted from a baseline execution Prototype will.

Similar presentations


Presentation on theme: "Perseus Design. 2 Lockheed Martin and Government Use Only Architecture Behavioral “signatures” are extracted from a baseline execution Prototype will."— Presentation transcript:

1 Perseus Design

2 2 Lockheed Martin and Government Use Only Architecture Behavioral “signatures” are extracted from a baseline execution Prototype will focus on support for x86 binaries on a Linux platform Configuration plans define application-triggered system control (affinity & power) The plethora of variables presents a huge solution space ideal for Genetic Algorithm approaches Platform models define number of cores and cache characteristics Second phase of instrumentation “hooks” configuration plan into application

3 3 Lockheed Martin and Government Use Only Behavioral Analysis Sub-system Behavioral analysis is performed is split in two  TEG (Temporal Execution Graph)  TMAM (Temporal Memory Access Map) Precise data is collected for on a per-thread, per-call site basis  Binary instrumentation is facilitated by Dyninst (University Wisconsin Madison)  Accurate counting (e.g., processor cycles) and timing is facilitated through PAPI (University Tennessee)

4 4 Lockheed Martin and Government Use Only TEG Collection TEG collects information about how much time the application spent executing different functions in the application. Both cycle count and timestamps are collected so that potential for “slow-downs” can be identified Per-thread, per-call site timing and cycle count information is collected for selected function calls Results provide timing distributions for functions as opposed to averages and counts (e.g., gprof, callgrind) Overhead is dependent upon density of instrumentation (i.e., number of functions + calls) ~ in most cases negligible

5 5 Lockheed Martin and Government Use Only TMAM Collection All application reads and writes to memory are captured via probes instrumented at the binary level. This data is essential for cache false-sharing identification Data is collected via a shared memory logger Overhead is very expensive - O(x100) slower  At these levels we have to be careful not to affect normal behavior. Dynamic probe placement and sampling could be used to alleviate this problem Massive volumes of data result (e.g., 20 second program can generate 100 Gb +) Two modes of operation: off-line analysis, real-time analysis

6 6 Lockheed Martin and Government Use Only Platform Analysis Micro-benchmarks implemented as part of current solution empirically measure data concerning  Number of processors, number (and values) of frequency steppings  Cost of thread migration (i.e. affinity change)  Ratios of power-to-cycles at different frequencies  Cost (in cycles) of frequency modulation  Core topology

7 7 Lockheed Martin and Government Use Only Example Platform Information Example data empirically collected through fine-grained on-chip timing and micro-benchmark program Data collected from Dual- processor Quad-core Xeon running Debian Linux. Each matrix element is shaded according to measured latencies of the migration (darker is slower).

8 8 Lockheed Martin and Government Use Only Design Optimization Engine

9 9 Lockheed Martin and Government Use Only Example Deployment Data Deployment results are made up of a trigger locations and auto-generated trigger source code libControl.so 8048C07,Before_CS_8048C07 8048C98,Before_CS_8048C98 8048D92,Before_CS_8048D92 8048DB0,Before_CS_8048DB0 #include #include "affinity.h" #include "fvctrl.h" #include "triggeraux.h" void Init_Frequency() { modulate_cpu(0, 1, 0); modulate_cpu(1, 1, 0); modulate_cpu(2, 1, 0); modulate_cpu(3, 0, 0); modulate_cpu(4, 0, 0); modulate_cpu(5, 0, 0); modulate_cpu(6, 0, 0); modulate_cpu(7, 1, 0); } void Before_CS_8048D92() { switch(GetThreadInstanceId()) { case 1: { affinize_thread(0, pthread_self()); break; } case 2: { affinize_thread(3, pthread_self()); break; } case 3: { affinize_thread(1, pthread_self()); break; } case 4: { affinize_thread(1, pthread_self()); break; }

10 10 Lockheed Martin and Government Use Only Power Measurement Server-style ATX power feeds two 12V lines to each processor. Data is streamed to a host via USB.


Download ppt "Perseus Design. 2 Lockheed Martin and Government Use Only Architecture Behavioral “signatures” are extracted from a baseline execution Prototype will."

Similar presentations


Ads by Google