Electrical and Computer Engineering Fun Size Your Data: Using Statistical Techniques to Efficiently Compress and Exploit Benchmarking Results David J.

Slides:



Advertisements
Similar presentations
Using Partial Tag Comparison in Low-Power Snoop-based Chip Multiprocessors Ali ShafieeNarges Shahidi Amirali Baniasadi Sharif University of Technology.
Advertisements

Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter * and David H. Albonesi Cornell University *Google, Inc.
THERMAL-AWARE BUS-DRIVEN FLOORPLANNING PO-HSUN WU & TSUNG-YI HO Department of Computer Science and Information Engineering, National Cheng Kung University.
CSUN Engineering Management Six Sigma Quality Engineering Week 11 Improve Phase.
1 Wire-driven Microarchitectural Design Space Exploration School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA 30332,
Multi-Level Caches Vittorio Zaccaria. Preview What you have seen: Data organization, Associativity, Cache size Policies -- how to manage the data once.
International Symposium on Low Power Electronics and Design Energy-Efficient Non-Minimal Path On-chip Interconnection Network for Heterogeneous Systems.
Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.
PERFORMANCE ANALYSIS OF MULTIPLE THREADS/CORES USING THE ULTRASPARC T1 (NIAGARA) Unique Chips and Systems (UCAS-4) Dimitris Kaseridis & Lizy K. John The.
1 A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Roman Lysecky *Dept. of Electrical Engineering Dept. of Computer.
UPC Microarchitectural Techniques to Exploit Repetitive Computations and Values Carlos Molina Clemente LECTURA DE TESIS, (Barcelona,14 de Diciembre de.
Architectural Improvement for Field Programmable Counter Array: Enabling Efficient Synthesis of Fast Compressor Trees on FPGA Alessandro Cevrero 1,2 Panagiotis.
CML CML Presented by: Aseem Gupta, UCI Deepa Kannan, Aviral Shrivastava, Sarvesh Bhardwaj, and Sarma Vrudhula Compiler and Microarchitecture Lab Department.
Variability-Driven Formulation for Simultaneous Gate Sizing and Post-Silicon Tunability Allocation Vishal Khandelwal and Ankur Srivastava Department of.
High-Level Constructors and Estimators Majid Sarrafzadeh and Jason Cong Computer Science Department
Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu.
Copyright 2004 David J. Lilja1 Errors in Experimental Measurements Sources of errors Accuracy, precision, resolution A mathematical model of errors Confidence.
Optimizing General Compiler Optimization M. Haneda, P.M.W. Knijnenburg, and H.A.G. Wijshoff.
Pipelining and Retiming 1 Pipelining  Adding registers along a path  split combinational logic into multiple cycles  increase clock rate  increase.
Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Storage Assignment during High-level Synthesis for Configurable Architectures Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
1 Lecture 11: Digital Design Today’s topics:  Evaluating a system  Intro to boolean functions.
1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah.
Jan. 2007VLSI Design '071 Statistical Leakage and Timing Optimization for Submicron Process Variation Yuanlin Lu and Vishwani D. Agrawal ECE Dept. Auburn.
Circuit Performance Variability Decomposition Michael Orshansky, Costas Spanos, and Chenming Hu Department of Electrical Engineering and Computer Sciences,
Processing Rate Optimization by Sequential System Floorplanning Jia Wang 1, Ping-Chih Wu 2, and Hai Zhou 1 1 Electrical Engineering & Computer Science.
Restrictive Compression Techniques to Increase Level 1 Cache Capacity Prateek Pujara Aneesh Aggarwal Dept of Electrical and Computer Engineering Binghamton.
Thermal-Aware SoC Test Scheduling with Test Set Partitioning and Interleaving Zhiyuan He 1, Zebo Peng 1, Petru Eles 1 Paul Rosinger 2, Bashir M. Al-Hashimi.
Author: D. Brooks, V.Tiwari and M. Martonosi Reviewer: Junxia Ma
CS 7810 Lecture 15 A Case for Thermal-Aware Floorplanning at the Microarchitectural Level K. Sankaranarayanan, S. Velusamy, M. Stan, K. Skadron Journal.
Temperature Aware Microprocessor Floorplanning Considering Application Dependent Power Load *Chunta Chu, Xinyi Zhang, Lei He, and Tom Tong Jing Electrical.
UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD.
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S.
VOLTAGE SCHEDULING HEURISTIC for REAL-TIME TASK GRAPHS D. Roychowdhury, I. Koren, C. M. Krishna University of Massachusetts, Amherst Y.-H. Lee Arizona.
University of Michigan Electrical Engineering and Computer Science 1 Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-Thread Applications.
CAD for Physical Design of VLSI Circuits
1 Modern Floorplanning Based on Fast Simulated Annealing Tung-Chieh Chen* and Yao-Wen Chang* # Graduate Institute of Electronics Engineering* Department.
HDL-Based Layout Synthesis Methodologies Allen C.-H. Wu Department of Computer Science Tsing Hua University Hsinchu, Taiwan, R.O.C {
CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.
RF network in SoC1 SoC Test Architecture with RF/Wireless Connectivity 1. D. Zhao, S. Upadhyaya, M. Margala, “A new SoC test architecture with RF/wireless.
Copyright 2004 David J. Lilja1 Measuring Computer Performance SUMMARY.
1 A Cost-effective Substantial- impact-filter Based Method to Tolerate Voltage Emergencies Songjun Pan 1,2, Yu Hu 1, Xing Hu 1,2, and Xiaowei Li 1 1 Key.
Design of a High-Throughput Low-Power IS95 Viterbi Decoder Xun Liu Marios C. Papaefthymiou Advanced Computer Architecture Laboratory Electrical Engineering.
1. 2 Table 4.1 Key characteristics of six passenger aircraft: all figures are approximate; some relate to a specific model/configuration of the aircraft.
1 COMP541 Multicycle MIPS Montek Singh Apr 4, 2012.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Under-Graduate Project Improving Timing, Area, and Power Speaker: 黃乃珊 Adviser: Prof.
Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.
MIAO ZHOU, YU DU, BRUCE CHILDERS, RAMI MELHEM, DANIEL MOSSÉ UNIVERSITY OF PITTSBURGH Writeback-Aware Bandwidth Partitioning for Multi-core Systems with.
The Essentials of 2-Level Design of Experiments Part I: The Essentials of Full Factorial Designs The Essentials of 2-Level Design of Experiments Part I:
1/25 June 28 th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control BranchTap Improving Performance With.
Reduction of Register File Power Consumption Approach: Value Lifetime Characteristics - Pradnyesh Gudadhe.
1 Lecture 2: Performance, MIPS ISA Today’s topics:  Performance equations  MIPS instructions Reminder: canvas and class webpage:
Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin.
Floorplanning Optimization with Trajectory Piecewise-Linear Model for Pipelined Interconnects C. Long, L. J. Simonson, W. Liao and L. He EDA Lab, EE Dept.
CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF.
Department of Electrical and Computer Engineering University of Wisconsin - Madison Optimizing Total Power of Many-core Processors Considering Voltage.
Fast VLSI Implementation of Sorting Algorithm for Standard Median Filters Hyeong-Seok Yu SungKyunKwan Univ. Dept. of ECE, Vada Lab.
Application Domains for Fixed-Length Block Structured Architectures ACSAC-2001 Gold Coast, January 30, 2001 ACSAC-2001 Gold Coast, January 30, 2001.
Interconnect Characteristics of 2.5-D System Integration Scheme Yangdong (Steven) Deng & Wojciech P. Maly
Cache Pipelining with Partial Operand Knowledge Erika Gunadi and Mikko H. Lipasti Department of Electrical and Computer Engineering University of Wisconsin—Madison.
1/25 HIPEAC 2008 TurboROB TurboROB A Low Cost Checkpoint/Restore Accelerator Patrick Akl 1 and Andreas Moshovos AENAO Research Group Department of Electrical.
Operation Tables for Scheduling in the presence of Partial Bypassing Aviral Shrivastava 1 Eugene Earlie 2 Nikil Dutt 1 Alex Nicolau 1 1 Center For Embedded.
Prediction of Interconnect Net-Degree Distribution Based on Rent’s Rule Tao Wan and Malgorzata Chrzanowska- Jeske Department of Electrical and Computer.
Partial Reconfigurable Designs
Hiba Tariq School of Engineering
Fine-Grain CAM-Tag Cache Resizing Using Miss Tags
An Automated Design Flow for 3D Microarchitecture Evaluation
Pipeline Principle A non-pipelined system of combination circuits (A, B, C) that computation requires total of 300 picoseconds. Comb. logic.
Presentation transcript:

Electrical and Computer Engineering Fun Size Your Data: Using Statistical Techniques to Efficiently Compress and Exploit Benchmarking Results David J. Lilja Electrical and Computer Engineering University of Minnesota

Electrical and Computer Engineering The Problem We can generate heaps of data But its noisy Too much to understand or use efficiently Heaps o data … Benchmark programs

Electrical and Computer Engineering A Solution Statistical design of experiments techniques Compress complex benchmark results Exploit the results in interesting ways Extract new insights Demonstrate using Microarchitecture-aware floorplanning Benchmark classification

Electrical and Computer Engineering Why Do We Need Statistics? Draw meaningful conclusions in the presence of noisy measurements Noise filtering Aggregate data into meaningful information Data compression Heaps o data …

Electrical and Computer Engineering Why Do We Need Statistics? Draw meaningful conclusions in the presence of noisy measurements Noise filtering Aggregate data into meaningful information Data compression Heaps o data …

Electrical and Computer Engineering Design of Experiments for Data Compression … ABC V1 V2 V3 V4 Effects of each input A, B, C Effects of interactions AB, AC, BC, ABC

Electrical and Computer Engineering Types of Designs of Experiments Full factorial design with replication O(v m ) experiments = O(4 3 ) Fractional factorial designs O(2 m ) experiments = O(2 3 ) Multifactorial design (P&B) O(m) experiments = O(3) Main effects only – no interactions m-factor resolution x designs k O(2 m ) experiments = k O(2 3 ) Selected interactions ABC V1 V2 V3 V4

Electrical and Computer Engineering Example: Architecture-Aware Floor-Planner V. Nookala, S. Sapatnekar, D. Lilja, DAC05.

Electrical and Computer Engineering Motivation Imbalance between device and wire delays Global wire delays > system clock cycle in nanometer technology wire Layout

Electrical and Computer Engineering Solution Wire-pipelining If delay > a clock cycle insert flip- flops along a wire Several methods for optimal FF insertion on a wire Li et al. [DATE 02] Cocchini et al. [ICCAD 02] Hassoun et al. [ICCAD 02] wire Layout FF But what about the performance impact of the pipeline delays?

Electrical and Computer Engineering Impact on Performance Execution time = num-instr * cycles/instr (CPI) * cycle-time Wire-pipelining

Electrical and Computer Engineering Impact on Performance Key idea Some buses are critical Some can be freely pipelined without (much) penalty Execution time = num-instr * cycles/instr (CPI) * cycle-time Wire-pipelining

Electrical and Computer Engineering Change Objective Function Traditional physical design objectives Minimize area, total wire length, etc. New objective Optimize only throughput critical wires to maximize overall performance Execution time = num-instr * cycles/instr (CPI) * cycle-time Wire-pipelining

Electrical and Computer Engineering Conventional Microarchitecture Interaction with Floor Planner Simulation Methodology Physical Design µ-arch Benchmarks CPI info Frequency

Electrical and Computer Engineering Microarchitecture-aware Physical Design Incorporate wire-pipelining models into the simulator Extra pipeline stages in processor Simulator needs to adjust operation latencies Simulation Methodology Physical Design µ-arch Benchmarks CPI info Frequency Layout

Electrical and Computer Engineering But There are Problems Simulation is too slow instructions per simulated instruction Numerous benchmark programs to consider Exponential search space Thousands of combinations tried in physical design step Simulation Methodology Physical Design µ-arch Benchmarks CPI info Frequency Layout

Electrical and Computer Engineering Design of Experiments Methodology Design of Experiments based Simulation Methodology FloorplanningValidation µ-arch benchmarks Bus, interaction weights Layout MinneSPEC Reduced input sets # Simulations is linear in the number of buses (if no interactions) Frequency

Electrical and Computer Engineering Related Floorplanning Work Simulated Annealing (SA) CPI look up table [Liao et al, DAC 04] Bus access ratios from simulation profiles Minimize the weighted sum of bus latencies [Ekpanyapong et al, DAC 04] Throughput sensitivity models for a selected few critical paths Limited sampling for a large solution space [Jagannathan et al, ASPDAC 05] Our approach Design of experiments to identify criticality of each bus

Electrical and Computer Engineering Microarchitecture and factors 22 buses 19 factors in experimental design Some factors model multiple buses FetchDecode RUU REG BPRED IL1 DL1 L2ITLB LSQ DTLB IADD1 IADD2 IADD3 IMULT FMULT FADD

Electrical and Computer Engineering 2-level Resolution III Design 2-levels for each factor Lowest and highest possible values (range) Latency range of buses Min = 0 Max = Chip corner-corner wire latency 19 factors 32 simulations (nearest power of 2) Captured by a design matrix (32x19) 32 rows - 32 simulations 19 columns - Factor values

Electrical and Computer Engineering Experimental setup Nine SPEC 2000 benchmarks MinneSPEC reduced input sets SimpleScalar simulator Floorplanner -- PARQUET Simulated annealing based Objective function Minimize the weighted sum of bus latencies Secondarily minimize aspect ratio and area

Electrical and Computer Engineering Comparisons CaseDescription SFPOur statistical floorplanner accAccess ratios from [Ekpanyapong et al, DAC 04] minWLTraditional floorplanning

Electrical and Computer Engineering Typical Results for Single Benchmark

Electrical and Computer Engineering Averaged Over All Benchmarks Compared to acc 3-7% point improvement Better improvements over acc at higher frequencies SFP-comb SFP (within about 1-3% points)

Electrical and Computer Engineering Summary Use statistical design of experiments Compress benchmark data into critical bus weights Used by microarchitecture-aware floorplanner Optimizes insertion of pipeline delays on wires to maximize performance Extend methodology for other critical objectives Power consumption Heat distribution

Electrical and Computer Engineering Collaborators and Funders Vidyasagar Nookala Joshua J. Yi Sachin Sapatnekar Semiconductor Research Corporation (SRC) Intel IBM