Energy saving in multicore architectures Assoc. Prof. Adrian FLOREA, PhD Prof. Lucian VINTAN, PhD – Research chair Lecturer Arpad GELLERT, PhD Horia CALBOREAN, PhD Advanced Computer Architecture & Processing Systems Research Lab Anticipatory Techniques in Advanced Processor Architectures (superscalar, SMT) An Automatic Design Space Exploration Framework for Multicore Architecture Optimizations
Advanced Computer Architecture & Processing Systems Research Lab Computing hardware 14 Intel Compute nodes (2 processor HS21 blades with quad-core Intel Xeon) 2 Cell Compute nodes (2 processor QS22 blades withIBM PowerXCell 8i Processor )
Advanced Computer Architecture & Processing Systems Research Lab Issue Bottleneck (Data-flow) Conventional processing models are limited in their processing speed by the dynamic program’s critical path (Amdahl); 2 Solutions Dynamic Instruction Reuse (DIR) - a non-speculative technique. Value Prediction (VP) - a speculative technique. Common issue Value locality Challenges Selective Instruction Reuse (MUL & DIV) Selective Load Value Prediction (“Critical Loads”) Exploiting IR & VP in a Superscalar / Simultaneous Multithreaded (SMT) Architecture to anticipate Long-Latency Instructions Results Anticipatory Techniques in Advanced Processor Architectures (superscalar, SMT)
Advanced Computer Architecture & Processing Systems Research Lab Traditional value prediction techniques have been increasingly challenged by the advent of mobile, battery-operated devices due to the significant amount of energy consumption. This is essentially due to the on-chip memory required for computing the prediction and the overall number of accesses to the predictor itself. We introduce and analyze a selective value predictor which is triggered selectively only during specific cache miss events. Advantages: Reduce the overall number of accesses and the energy consumption of the on-chip memory and logic reserved to the value speculation. Improve over traditional value predictors in terms of performance and energy consumption. Create room for a reduction of the data-cache size by preserving performance, thus enabling a reduction of the system cost. Exploiting Selective Value Prediction in Superscalar and SMT Architectures
Advanced Computer Architecture & Processing Systems Research Lab Tools, Metrics and some Results The M-SIM Simulator Cycle-Level Performance Simulator Hardware Configuration SPEC Benchmark Power Models Hardware Access Counts Performance Estimation Power Estimation
Design space exploration (DSE) of a Selective Load Value Prediction scheme suitable for energy- aware Simultaneous MultiThreaded (SMT) architectures a) Superscalar b) SMT Advanced Computer Architecture & Processing Systems Research Lab
Automatic Design Space Exploration Framework for Multicore Architecture Optimizations Multiobjective optimization of advanced computer architectures using experts’ domain- knowledge HUGE design space (>19 parameters) M-SIM 2 – 2,5 millions of billions configurations (10 15 ) Manual design space exploration impossible Multi-objective optimization (performance processing, power consumption, integration area, thermal dissipation) problem becomes even harder Solution Heuristic algorithms ( genetic algorithms, bio-inspired algorithms ) Advanced Computer Architecture & Processing Systems Research Lab
Framework for Automatic Design Space Exploration (FADSE) - It must: Simulate many individuals ( architectural configurations) Slow! (24 hours/generations on 96 cores, one generation = 100 individuals) Implement reliability mechanisms (bounded wait for client, resending individuals, checkpointing, etc) Accelerating process: Simulate less configurations (database integration (up to 67% reuse), evaluate only 2500 configurations!!!) Parallelize (distributed evaluation) Advanced Computer Architecture & Processing Systems Research Lab Adding Computer Architecture Domain-Knowledge (Constraints, Hierarchical parameters, Fuzzy Rules) After 30 generations