Presentation is loading. Please wait.

Presentation is loading. Please wait.

ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.

Similar presentations


Presentation on theme: "ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev."— Presentation transcript:

1 ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev Alexandra Fedorova School of Computing Science Simon Fraser University, Vancouver, BC, Canada

2 Overview Legendary Introduction to ABACUS Delicious Profiling Units Epic Conclusion 2

3 Introduction to ABACUS 3

4 4

5 5

6 6

7 ABACUS 7

8 8 ASPLOS rocks!

9 ABACUS 9

10 Performance comparison 10 Memory Reuse Profile ABACUS avg runtime: 48.5seconds Simics avg runtime: 1 hour 6minutes ABACUS Simics

11 Conclusion ABACUS is a generic profiler that can be easily integrated into modern processors It can be used by the O/S to obtain runtime information about a thread’s behaviour to make better thread assignments 11

12 Thank you! Questions?

13 Motivation Future systems will be multi-core and heterogeneous How does the OS place threads on this architecture? Characterize thread behaviour Instruction Mix Memory Reuse Profile Effectiveness of pre-fetching Memory bandwidth utilization 13

14 Motivation (cont'd) How are these metrics collected? Offline analysis Code Instrumentation Simulation (e.g., Simics) Software-based instruction set simulator Models systems with full OS support 14

15 Motivation (cont'd) Why not use current hardware counters? Architecture-specific Not all desired metrics provided Help detect symptoms, not causes Limited in number and in concurrent use 15

16 Goal Create a hardware profiler to collect thread characteristics at runtime Imposed constraints External to processor Minimally invasive Cycle accurate OS controllable 16

17 ABACUS hArdware-Based Analyzer for the Characterization of User Software A collection of runtime configurable profiling units Collects metrics useful for thread placement Controllable through the O/S 17

18 Hardware Platform 18 Proof-of-concept System LEON3 Sparc v8 Instruction Set Architecture Single core, single threaded Test System OpenSparc Niagara T1 soft processor 1 to 4 hardware threads Multi-core Multi-board support

19 Hardware Platform (cont'd) 19

20 ABACUS 20

21 External Interface Bus slave and master modules Processing required on processor signals Designed such that only external interface changes with different processor/system 21

22 Portability 22 Previously integrated with a LEON3 (Sparc v8 ISA) based system Differences: AMBA Advanced High-performance Bus (AHB) vs Processor Local Bus (PLB) Processor internals

23 Controller Starts or stops profiling Can limit profiling to a specific address range DMA interface for retrieving collected data Linux device driver support 23

24 Profiling Units Operate on one or more processor signals: Instruction PC Cache Reuse Distance etc. Store data in a collection of counters 24

25 Profiling Units (cont'd) Focus on two dimensional metrics – Gives bigger picture / greater insight Aim to be as architecture independent as possible 25

26 Profile Unit Behaves like a traditional software profiler Operates on Program Counter 26 Range Overlap Trace Range Non-Overlap Code Space

27 Memory Reuse Unit Collects a measure of code or data reuse Utilizes Least Recently Used (LRU) stack Reuse distance is movement in the LRU stack or a miss Uses in cache contention management 27

28 Memory Reuse Unit Creates histogram of cache reuse pattern Range: [0, set associativity – 1] or cache miss 28 Reuse Distance 4-way set- associative reuse profile

29 Instruction Mix 29 Identify current instruction subset in use Divide instructions into logical categories Load/Store Floating Point Control Flow Opcode-based table lookup

30 Latency Unit 30 Break down miss latency into constituent sources Bus contention DRAM latency etc. For each category create a histogram of latency in cycles

31 Stall Unit 31 Break down Cycles Per Instruction Attribute cycles to their sources Cache miss Translation Lookaside Buffer (TLB) miss Floating Point busy stalls etc.

32 Verification 32 Run a subset of the SPECCPU2006 benchmarks Those with memory usage within board specs Collect metrics with ABACUS and Simics Profile for a few billion instructions Limited by Simics performace

33 Test Platform Proof-of-concept System Single core, single threaded XUP V2Pro: 90% slice utilization 33 ProcessorLEON3 (SPARC v8 ISA) (50MHz) Memory256MB DDR RAM OSDebian Etch (4.0)

34 Simulation Platform Simics System: Differences: SPARC v9 ISA (64-bit processor) Local filesystem vs NFS 34 ProcessorUltraSparc II (SPARC v9 ISA) Memory256MB DDR RAM OSDebian Etch (4.0)

35 LEON3 Comparison 35 ABACUS Simics

36 LEON3 Comparison (cont'd) 36 DC Memory Reuse Profile ABACUS Simics

37 Resource Usage 37 Default: 32bit counters40bit counters 32bit counters Profile Unit added 2–way LRU Instruction Cache 2–way LRU Data Cache 5 Instruction Types

38 Conclusion ABACUS is a generic profiler that can be easily integrated into modern processors It can be used by the O/S to obtain runtime information about a thread’s behaviour to make better thread assignments 38

39 Future Plans Move to multi-core/multi-threaded system Memory reuse distance independent of existing cache implementation Process tracking Integrate results into OS scheduler 39

40 Questions ?


Download ppt "ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev."

Similar presentations


Ads by Google