Optimizations for a Simulator Construction System Supporting Reusable Components David A. Penry and David I. August The Liberty Architecture Research Group.

Slides:



Advertisements
Similar presentations
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Advertisements

Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
Extensibility, Safety and Performance in the SPIN Operating System Presented by Allen Kerr.
Dynamic Feedback: An Effective Technique for Adaptive Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California,
Fast Paths in Concurrent Programs Wen Xu, Princeton University Sanjeev Kumar, Intel Labs. Kai Li, Princeton University.
400 Gb/s Programmable Packet Parsing on a Single FPGA Authors : Michael Attig 、 Gordon Brebner Publisher: 2011 Seventh ACM/IEEE Symposium on Architectures.
- Sam Ganzfried - Ryan Sukauye - Aniket Ponkshe. Outline Effects of asymmetry and how to handle them Design Space Exploration for Core Architecture Accelerating.
Page 1 Building Reliable Component-based Systems Chapter 16 - Component based embedded systems Chapter 16 Component based embedded systems.
Rapid Development of a Flexible Validated Processor Model David A. Penry Manish Vachharajani David I. August The Liberty Architecture Research Group Princeton.
Software Reuse Building software from reusable components Objectives
System Level Design: Orthogonalization of Concerns and Platform- Based Design K. Keutzer, S. Malik, R. Newton, J. Rabaey, and A. Sangiovanni-Vincentelli.
BROADWAY: A SOFTWARE ARCHITECTURE FOR SCIENTIFIC COMPUTING Samuel Z. Guyer and Calvin Lin The University of Texas.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
01OC2000 Slide 1 © 2000 General Motors Corporation James B. Kolhoff Real Time Scheduling Issues in Powertrain Controls James B. Kolhoff.
Design of Fault Tolerant Data Flow in Ptolemy II Mark McKelvin EE290 N, Fall 2004 Final Project.
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
February 12, 2009 Center for Hybrid and Embedded Software Systems Model Transformation Using ERG Controller Thomas H. Feng.
The Effect of Data-Reuse Transformations on Multimedia Applications for Different Processing Platforms N. Vassiliadis, A. Chormoviti, N. Kavvadias, S.
Compiler Optimization Overview
Orion: A Power-Performance Simulator for Interconnection Networks Presented by: Ilya Tabakh RC Reading Group4/19/2006.
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
Generative Programming. Generic vs Generative Generic Programming focuses on representing families of domain concepts Generic Programming focuses on representing.
Architectural Design Establishing the overall structure of a software system Objectives To introduce architectural design and to discuss its importance.
1 Embedded Computer System Laboratory RTOS Modeling in Electronic System Level Design.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 18 Slide 1 Software Reuse.
Software Engineering Muhammad Fahad Khan
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse.
EECE **** Embedded System Design
Objects and Components. The adaptive organization The competitive environment of businesses continuously changing, and the pace of that change is increasing.
Extreme Makeover for EDA Industry
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Automated Design of Custom Architecture Tulika Mitra
A New Method For Developing IBIS-AMI Models
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse.
1 H ardware D escription L anguages Modeling Digital Systems.
Automatic Communication Refinement for System Level Design Samar Abdi, Dongwan Shin and Daniel Gajski Center for Embedded Computer Systems, UC Irvine
Real-Time Operating Systems for Embedded Computing 李姿宜 R ,06,10.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
StreamX10: A Stream Programming Framework on X10 Haitao Wei School of Computer Science at Huazhong University of Sci&Tech.
Generative Programming. Automated Assembly Lines.
The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park.
DEV337 Modeling Distributed Enterprise Applications Using UML in Visual Studio.NET David Keogh Program Manager Visual Studio Enterprise Tools.
Synchronization Transformations for Parallel Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara.
Presentation by Tom Hummel OverSoC: A Framework for the Exploration of RTOS for RSoC Platforms.
CISC Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware
Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.
A flexible simulator for control- dominated distributed real-time systems Johannes Petersson IDA/SaS/ESLAB Johannes Petersson IDA/SaS/ESLAB Master’s Thesis.
Gedae, Inc. Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004.
Visualization Four groups Design pattern for information visualization
1 Software Design Lecture What’s Design It’s a representation of something that is to be built. i.e. design  implementation.
Review of Parnas’ Criteria for Decomposing Systems into Modules Zheng Wang, Yuan Zhang Michigan State University 04/19/2002.
CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State.
High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University.
A Fast SystemC Engine D. Gracia Pérez LRI, Paris South Univ. O. Temam LRI, Paris South Univ. G. Mouchard LRI, Paris South Univ. CEA.
CAD for VLSI Ramakrishna Lecture#2.
The Effect of Data-Reuse Transformations on Multimedia Applications for Application Specific Processors N. Vassiliadis, A. Chormoviti, N. Kavvadias, S.
Control of Dynamic Discrete-Event Systems Lenko Grigorov Master’s Thesis, QU supervisor: Dr. Karen Rudie.
Introduction to Load Balancing:
Design Patterns.
Building a Whole-Program Type Analysis in Eclipse
Design Flow System Level
Presented by: Sameer Kulkarni
Department of Computer Science University of California, Santa Barbara
Retargetable Model-Based Code Generation in Ptolemy II
The Liberty Simulation Environment as a Pedagogical Tool
SOFTWARE LIFE-CYCLES Beyond the Waterfall.
Department of Computer Science University of California, Santa Barbara
UNISIM (UNIted SIMulation Environment) walkthrough
Research: Past, Present and Future
Presentation transcript:

Optimizations for a Simulator Construction System Supporting Reusable Components David A. Penry and David I. August The Liberty Architecture Research Group Princeton University

2 Architectural Simulator Architectural Exploration Architectural options are studied using simulators More iterations = better decisions Need fast path to simulator Need fast simulator Architecture Options

3 Simulator Construction Systems Reuse simulator infrastructure Architectural Simulator Instance Architecture Description Simulator Builder But still must be able to reuse descriptions Structural composition Medium-grained components Standard communication contracts High parameterizability Separation of concerns

4 The Reuse Penalty Reusability leads to a speed penalty: more component instances more signals more general code Therefore: reusable systems are often slower How can we mitigate the reuse penalty?

5 Liberty Simulation Environment Simulator construction system for high reuse Two-tiered specifications Leaf module templates in C Netlisting language for instantiation and customization Three-signal standard communications contract with overrides (control functions) Code is generated Enable Data Ack

6 Contrast: SystemC Simulator construction libraries (C++) Partially supports reuse: + Structural composition + Module granularity varies ? Communications contracts by convention - Low parameterizability - Separation of concerns Description is a C++ program

7 AC D B AC D B AC D B AC D B AC D B AC D B AC D B Models of Computation System C uses Discrete Event (DE) LSE uses Heterogenous Synchronous Reactive (HSR) Edwards (1997) Unparsed code blocks (black boxes) Values begin unresolved and resolve monotonically Chaotic scheduling

8 Potential HSR Benefits vs. DE Static schedules possible Lower per-signal overhead Use of unresolved value to avoid redundant computation AC D B

9 Three models of a 4-way out-of-order microprocessor SystemC using custom speed-optimized components LSE model using custom speed-optimized components LSE model using standard reusable components 9 benchmarks (CPU 2000/MediaBench) See paper for compiler, etc. Experimental methodology Custom LSE Reusable LSE Custom SystemC Non-edge signals Signals Instances Model

10 Custom LSE vs. SystemC Custom LSE outperforms custom SystemC Reduction in overhead Use of unresolved signal value Static instantiation and code specialization Dynamic schedule for both ModelCycles/secSpeedup Custom SystemC Custom LSE

11 Reuse Penalty Reusable model suffers large reuse penalty (0.26) Many more signals Many more non-edge signals More components All dynamic schedules ModelCycles/secSpeedup Custom SystemC Custom LSE Reusable LSE

12 Creating Static Schedules Edward’s algorithm (1997) Construct a signal dependency graph Break into strongly-connected components (SCC). Schedule in topological order Partition each SCC into a head and tail Schedule tail recursively, then repeat head (any order) and tail’s schedule Coalesce AC D B

13 Creating Static Schedules Edward’s algorithm (1997) Construct a signal dependency graph Break into strongly-connected components (SCC). Schedule in topological order Partition each SCC into a head and tail Schedule tail recursively, then repeat head (any order) and tail’s schedule Coalesce AC D B

14 Creating Static Schedules Edward’s algorithm (1997) Construct a signal dependency graph Break into strongly-connected components (SCC). Schedule in topological order Partition each SCC into a head and tail Schedule tail recursively, then repeat head (any order) and tail’s schedule Coalesce a b c Schedule: a b c AC D B

15 Creating Static Schedules Edward’s algorithm (1997) Construct a signal dependency graph Break into strongly-connected components (SCC). Schedule in topological order Partition each SCC into a head and tail Schedule tail recursively, then repeat head (any order) and tail’s schedule Coalesce a b c Schedule: 1 b 4 H T AC D B

16 Creating Static Schedules Edward’s algorithm (1997) Construct a signal dependency graph Break into strongly-connected components (SCC). Schedule in topological order Partition each SCC into a head and tail Schedule tail recursively, then repeat head (any order) and tail’s schedule Coalesce a b c Schedule: H T AC D B

17 Creating Static Schedules Edward’s algorithm (1997) Construct a signal dependency graph Break into strongly-connected components (SCC). Schedule in topological order Partition each SCC into a head and tail Schedule tail recursively, then repeat head (any order) and tail’s schedule Coalesce A B C H T Choosing an optimal partition is exponential AC D B Schedule: A B C B (D)

18 Dynamic sub-schedule embedding SCCs arise due to incomplete information “Optimal” schedules are optimal w.r.t. information “Optimal” schedule may be worse than dynamic A BC When an SCC is “too big”, just schedule that section dynamically

19 Dependency information enchancement In practice, we see big SCCs Peek in the black box Simple parsing of communication overrides (control functions) Can ask user to tell about internal dependencies Not too painful because it is reused A BC

20 Evaluation of Information Enhancement Control function parsing more useful alone Not principally through scheduling It is important to have both kinds of enhancement OptimizationCycles/secSpeedup No static scheduling With control function parsing With internal dependencies With both

21 Reuse Penalty Revisited Reuse penalty mitigated in part ModelCycles/secSpeedup Build time (s) Custom SystemC Custom LSE Reusable LSE w/o optimization Reusable LSE with optimization Reusable LSE model 6% faster than custom SystemC

22 Conclusions A tradeoff exists between speed and reuse The simulator construction system can help Higher base speed makes reuse penalty less painful Optimizations are possible with HSR model Ability of scheduler adapt to information available is powerful This adaptation is not possible with DE You can have high reuse at reasonable speeds

23 Future Work Release of LSE Fall Hybrid model of computation Embed HSR in DE, DE in HSR Automatic extraction of HSR portions from DE

24 Other optimizations Improved block coalescing See paper Code specialization Implementation of APIs depends upon environment