Download presentation
Presentation is loading. Please wait.
1
ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev Alexandra Fedorova School of Computing Science Simon Fraser University, Vancouver, BC, Canada
2
Overview Legendary Introduction to ABACUS Delicious Profiling Units Epic Conclusion 2
3
Introduction to ABACUS 3
4
4
5
5
6
6
7
ABACUS 7
8
8 ASPLOS rocks!
9
ABACUS 9
10
Performance comparison 10 Memory Reuse Profile ABACUS avg runtime: 48.5seconds Simics avg runtime: 1 hour 6minutes ABACUS Simics
11
Conclusion ABACUS is a generic profiler that can be easily integrated into modern processors It can be used by the O/S to obtain runtime information about a thread’s behaviour to make better thread assignments 11
12
Thank you! Questions?
13
Motivation Future systems will be multi-core and heterogeneous How does the OS place threads on this architecture? Characterize thread behaviour Instruction Mix Memory Reuse Profile Effectiveness of pre-fetching Memory bandwidth utilization 13
14
Motivation (cont'd) How are these metrics collected? Offline analysis Code Instrumentation Simulation (e.g., Simics) Software-based instruction set simulator Models systems with full OS support 14
15
Motivation (cont'd) Why not use current hardware counters? Architecture-specific Not all desired metrics provided Help detect symptoms, not causes Limited in number and in concurrent use 15
16
Goal Create a hardware profiler to collect thread characteristics at runtime Imposed constraints External to processor Minimally invasive Cycle accurate OS controllable 16
17
ABACUS hArdware-Based Analyzer for the Characterization of User Software A collection of runtime configurable profiling units Collects metrics useful for thread placement Controllable through the O/S 17
18
Hardware Platform 18 Proof-of-concept System LEON3 Sparc v8 Instruction Set Architecture Single core, single threaded Test System OpenSparc Niagara T1 soft processor 1 to 4 hardware threads Multi-core Multi-board support
19
Hardware Platform (cont'd) 19
20
ABACUS 20
21
External Interface Bus slave and master modules Processing required on processor signals Designed such that only external interface changes with different processor/system 21
22
Portability 22 Previously integrated with a LEON3 (Sparc v8 ISA) based system Differences: AMBA Advanced High-performance Bus (AHB) vs Processor Local Bus (PLB) Processor internals
23
Controller Starts or stops profiling Can limit profiling to a specific address range DMA interface for retrieving collected data Linux device driver support 23
24
Profiling Units Operate on one or more processor signals: Instruction PC Cache Reuse Distance etc. Store data in a collection of counters 24
25
Profiling Units (cont'd) Focus on two dimensional metrics – Gives bigger picture / greater insight Aim to be as architecture independent as possible 25
26
Profile Unit Behaves like a traditional software profiler Operates on Program Counter 26 Range Overlap Trace Range Non-Overlap Code Space
27
Memory Reuse Unit Collects a measure of code or data reuse Utilizes Least Recently Used (LRU) stack Reuse distance is movement in the LRU stack or a miss Uses in cache contention management 27
28
Memory Reuse Unit Creates histogram of cache reuse pattern Range: [0, set associativity – 1] or cache miss 28 Reuse Distance 4-way set- associative reuse profile
29
Instruction Mix 29 Identify current instruction subset in use Divide instructions into logical categories Load/Store Floating Point Control Flow Opcode-based table lookup
30
Latency Unit 30 Break down miss latency into constituent sources Bus contention DRAM latency etc. For each category create a histogram of latency in cycles
31
Stall Unit 31 Break down Cycles Per Instruction Attribute cycles to their sources Cache miss Translation Lookaside Buffer (TLB) miss Floating Point busy stalls etc.
32
Verification 32 Run a subset of the SPECCPU2006 benchmarks Those with memory usage within board specs Collect metrics with ABACUS and Simics Profile for a few billion instructions Limited by Simics performace
33
Test Platform Proof-of-concept System Single core, single threaded XUP V2Pro: 90% slice utilization 33 ProcessorLEON3 (SPARC v8 ISA) (50MHz) Memory256MB DDR RAM OSDebian Etch (4.0)
34
Simulation Platform Simics System: Differences: SPARC v9 ISA (64-bit processor) Local filesystem vs NFS 34 ProcessorUltraSparc II (SPARC v9 ISA) Memory256MB DDR RAM OSDebian Etch (4.0)
35
LEON3 Comparison 35 ABACUS Simics
36
LEON3 Comparison (cont'd) 36 DC Memory Reuse Profile ABACUS Simics
37
Resource Usage 37 Default: 32bit counters40bit counters 32bit counters Profile Unit added 2–way LRU Instruction Cache 2–way LRU Data Cache 5 Instruction Types
38
Conclusion ABACUS is a generic profiler that can be easily integrated into modern processors It can be used by the O/S to obtain runtime information about a thread’s behaviour to make better thread assignments 38
39
Future Plans Move to multi-core/multi-threaded system Memory reuse distance independent of existing cache implementation Process tracking Integrate results into OS scheduler 39
40
Questions ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.