September 28 th 2004University of Utah1 A preliminary look Karthik Ramani Power and Temperature-Aware Microarchitecture
Sept 28 th Motivation ITRS Roadmap: Reasons for increasing power consumption Higher chip operating frequencies Increased gate leakage of transistors Higher interconnect Capacitances and Resistances Lack of interconnect architecture design tool until 2009 Inability of the Interconnect to scale for performance beyond 2009
Sept 28 th Heterogeneous Interconnects: A starting point Two sets of Interconnects Low Delay, high power wires Low Power wires(high delay) Easier to target instructions Augurs well for a more sophisticated model
Sept 28 th Interconnect transfers - Types Bypassed register value Ready register value Address transfer Store value Load value
Sept 28 th Bypassed Register Values Operands produced in a cluster that are immediately required by another cluster Criticality based on two factors Operand arrival time at the cluster Actual issue time of the sourcing instruction Criticality changes at runtime Needs a dynamic predictor Rename & Dispatch IQ Regfile FU IQ Regfile FU IQ Regfile FU Producing Instruction completing execution at cycle 120 Consumer Instruction dispatched at Cycle 100
Sept 28 th The Data Criticality Predictor A table indexed by the lower order bits of the instruction address, updated dynamically to indicate the criticality of data. Difference in arrival time and usage calculated for each operand of an instruction Difference < Threshold Critical Difference > Threshold Non-Critical
Sept 28 th Summary of transfers CriticalNon-Critical Load ValuesStore value Effective address unpredictedEffective address predicted Bypassed register value Ready register value
Sept 28 th Result summary Two kinds of non-critical transfers Data that are not immediately used – 36% Verification of address predictions – 13% Criticality based case 49% of all data transfers through the Power-optimized wires Performance penalty - only 2.5% Potential energy savings of around 50% in the interconnects
Sept 28 th Things that are missing Power modeling for the processor as a whole. Implications on transient temperature variations for varying workloads. Lack of a good on chip interconnect power/temperature simulator Complexity effective design for the criticality predictor
Sept 28 th Interconnect simulator: Problems Should account for: No. of wires in the particular process. Deal with a 3-D space for routing of wires. Satisfy the design rule constraints. More of a layout optimization problem.
Sept 28 th What we propose to do Wattch: incorporated into a scalable 16 cluster system HotSpot: Transient temperature model HotLeakage: Leakage power model Build a prototype layout to satisfy the above requirements
Sept 28 th Wattch Power model from Princeton University Simulates an o-o-o processor (Alpha 21264) Caveat: Interconnects are not accurately simulated
Sept 28 th Wattch Modified Wattch uses a single instruction window logic Issue queue model Separate Int and FP Wakeup logic Separate Int and FP Selection logic Helps in efficient distribution
Sept 28 th Wattch Modified Single result bus, FUs and register files Distributed units Separate Integer and floating point register files Separate Integer and floating point execution and result bus units
Sept 28 th Wattch Modified Wattch: Simple Alpha Modified for a scalable 16 cluster system Modular: easy for adaptation and testability. Caveat: There is lot of scope for improvement
September 28 th 2004University of Utah16 Visual Feature Recognition Elastic Bunch Graph Matching(EBGM)
Sept 28 th History No particular algorithm known Many algorithms for face and object recognition Few feature recognition benchmarks like the FERET Eigen faces – traditionally known for face recognition
Sept 28 th Motivation: EBGM FLESH TONING SEGMENT- ATION FACE DETECTION FACE RECOGNITION No Segmentation needed in EBGM! Steps in Face Recognition
Sept 28 th EBGM Steps involved in EBGM NORMALIZATION/ PREPROCESSING FACE GRAPH CREATION FACE IDENTIFICATION Looks easy
Sept 28 th EBGM: Mathematically Image descriptions are based on a Wavelet transform Gabor jets are extracted from each landmark Local image information around each node is the key
Sept 28 th EBGM: What is missing? Landmark localization is less reliable Difficult to track small differences in face orientation now Compute intensive Gabor jets
Sept 28 th Questions? Thank you