Timing Analysis of Embedded Software for Families of Microarchitectures Jan Reineke, UC Berkeley Edward A. Lee, UC Berkeley Representing Distributed Sense.

Slides:



Advertisements
Similar presentations
Learning Cache Models by Measurements Jan Reineke joint work with Andreas Abel Uppsala University December 20, 2012.
Advertisements

Sungjun Kim Columbia University Edward A. Lee UC Berkeley
CS1104: Computer Organisation School of Computing National University of Singapore.
CS 258 Parallel Computer Architecture Lecture 15.1 DASH: Directory Architecture for Shared memory Implementation, cost, performance Daniel Lenoski, et.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Microarchitectural Approaches to Exceeding the Complexity Barrier © Eric Rotenberg 1 Microarchitectural Approaches to Exceeding the Complexity Barrier.
Embedded Software Optimization for MP3 Decoder Implemented on RISC Core Yingbiao Yao, Qingdong Yao, Peng Liu, Zhibin Xiao Zhejiang University Information.
Predictable Programming on a Precision Timed Architecture Hiren D. Patel UC Berkeley Joint work with: Ben Lickly, Isaac Liu, Edward.
Timing Predictability - A Must for Avionics Systems - Reinhard Wilhelm Saarland University, Saarbrücken.
Room: E-3-31 Phone: Dr Masri Ayob TK 2123 COMPUTER ORGANISATION & ARCHITECTURE Lecture 4: Computer Performance.
8th Biennial Ptolemy Miniconference Berkeley, CA April 16, 2009 Precision Timed (PRET) Architecture Hiren D. Patel, Ben Lickly, Isaac Liu and Edward A.
The Case for Precision Timed (PRET) Machines Edward A. Lee Professor, Chair of EECS UC Berkeley With thanks to Stephen Edwards, Columbia University. National.
1 Center for Embedded Systems Research (CESR) Department of Computer Science North Carolina State University Frank Mueller Timing Analysis: In Search of.
7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Cyber-Physical Systems: A Vision of the Future Edward A. Lee Robert S. Pepper Distinguished.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
Csci4203/ece43631 Review Quiz. 1)It is less expensive 2)It is usually faster 3)Its average CPI is smaller 4)It allows a faster clock rate 5)It has a simpler.
February 21, 2008 Center for Hybrid and Embedded Software Systems Mapping A Timed Functional Specification to a Precision.
November 18, 2004 Embedded System Design Flow Arkadeb Ghosal Alessandro Pinto Daniele Gasperini Alberto Sangiovanni-Vincentelli
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in partitioned architectures Rajeev Balasubramonian Naveen.
ECE 510 Brendan Crowley Paper Review October 31, 2006.
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in Partitioned Architectures Rajeev Balasubramonian Naveen.
Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Presented by: Ahmad Lashgar ECE Department, University of Tehran.
Algorithm Analysis (Big O)
Computer performance.
- 1 - EE898-HW/SW co-design Hardware/Software Codesign “Finding right combination of HW/SW resulting in the most efficient product meeting the specification”
Course Outline DayContents Day 1 Introduction Motivation, definitions, properties of embedded systems, outline of the current course How to specify embedded.
Embedded Systems Design ICT Embedded System What is an embedded System??? Any IDEA???
- 1 - Embedded systems: processing Embedded System Hardware Embedded system hardware is frequently used in a loop („hardware in a loop“): actuators.
Intel Architecture. Changes in architecture Software architecture: –Front end (Feature changes such as adding more graphics, changing the background colors,
VOLTAGE SCHEDULING HEURISTIC for REAL-TIME TASK GRAPHS D. Roychowdhury, I. Koren, C. M. Krishna University of Massachusetts, Amherst Y.-H. Lee Arizona.
Pipelines for Future Architectures in Time Critical Embedded Systems By: R.Wilhelm, D. Grund, J. Reineke, M. Schlickling, M. Pister, and C.Ferdinand EEL.
Tufts Wireless Laboratory School Of Engineering Tufts University “Network QoS Management in Cyber-Physical Systems” Nicole Ng 9/16/20151 by Feng Xia, Longhua.
ParaScale : Exploiting Parametric Timing Analysis for Real-Time Schedulers and Dynamic Voltage Scaling Sibin Mohan 1 Frank Mueller 1,William Hawkins 2,
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
Mahesh Sukumar Subramanian Srinivasan. Introduction Embedded system products keep arriving in the market. There is a continuous growing demand for more.
Time for High-Confidence Cyber-Physical Systems
Storage Allocation for Embedded Processors By Jan Sjodin & Carl von Platen Present by Xie Lei ( PLS Lab)
1 of 20 Phase-based Cache Reconfiguration for a Highly-Configurable Two-Level Cache Hierarchy This work was supported by the U.S. National Science Foundation.
ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
Abdullah Aldahami ( ) March 23, Introduction 2. Background 3. Simulation Techniques a.Experimental Settings b.Model Description c.Methodology.
Computer Organization and Architecture Tutorial 1 Kenneth Lee.
CS5222 Advanced Computer Architecture Part 3: VLIW Architecture
By Edward A. Lee, J.Reineke, I.Liu, H.D.Patel, S.Kim
A Unified WCET Analysis Framework for Multi-core Platforms Sudipta Chattopadhyay, Chong Lee Kee, Abhik Roychoudhury National University of Singapore Timon.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
ECE 720T5 Fall 2011 Cyber-Physical Systems Rodolfo Pellizzoni.
Lx: A Technology Platform for Customizable VLIW Embedded Processing.
CISC Machine Learning for Solving Systems Problems Microarchitecture Design Space Exploration Lecture 4 John Cavazos Dept of Computer & Information.
Application Domains for Fixed-Length Block Structured Architectures ACSAC-2001 Gold Coast, January 30, 2001 ACSAC-2001 Gold Coast, January 30, 2001.
1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.
A Low-Area Interconnect Architecture for Chip Multiprocessors Zhiyi Yu and Bevan Baas VLSI Computation Lab ECE Department, UC Davis.
February 12, 2009 Center for Hybrid and Embedded Software Systems Timing-aware Exceptions for a Precision Timed (PRET)
1 of 14 Lab 2: Design-Space Exploration with MPARM.
ECE 720T5 Winter 2014 Cyber-Physical Systems Rodolfo Pellizzoni.
15-740/ Computer Architecture Lecture 12: Issues in OoO Execution Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/7/2011.
Timing Anomalies in Dynamically Scheduled Microprocessors Thomas Lundqvist, Per Stenstrom (RTSS ‘99) Presented by: Kaustubh S. Patil.
Block Cache for Embedded Systems Dominic Hillenbrand and Jörg Henkel Chair for Embedded Systems CES University of Karlsruhe Karlsruhe, Germany.
COMPUTATIONAL MODELS.
A Common Machine Language for Communication-Exposed Architectures
Architecture & Organization 1
A Precision Timed Architecture for Predictable and Repeatable Timing
CSCI1600: Embedded and Real Time Software
Hiren D. Patel Isaac Liu Ben Lickly Edward A. Lee
Architecture & Organization 1
Timing-aware Exceptions for a Precision Timed (PRET) Target
Worst-Case Execution Time
EE 445S Real-Time Digital Signal Processing Lab Spring 2014
Die Stacking (3D) Microarchitecture -- from Intel Corporation
CSCI1600: Embedded and Real Time Software
Presentation transcript:

Timing Analysis of Embedded Software for Families of Microarchitectures Jan Reineke, UC Berkeley Edward A. Lee, UC Berkeley Representing Distributed Sense and Control Systems (DSCS) theme of MuSyC Clark Kerr Campus, UC Berkeley November 17, 2011 With thanks to: Hiren Patel Alberto Sangiovanni-Vincentelli This work has received funding from NSF # and MURI #FA

Reineke et al., Berkeley 2 The Challenge: Microarchitecture Selection Embedded Software Timing Requirements with ? Family of Microarchitectures = Platform Choices: Processor frequency Sizes and latencies of local memories Latency and bandwidth of interconnect Presence of floating- point unit … Select a microarchitecture that a)satisfies all timing requirements, and b)minimizes cost/size/energy.

Reineke et al., Berkeley 3 Timing Analysis Today

Reineke et al., Berkeley 4 What does the execution time of a program depend on? Input-dependent control flow Microarchitectural State + Pipeline, Memory Hierarchy, Interconnect

Reineke et al., Berkeley 5 Example of Influence of Microarchitectural State Motorola PowerPC 755 Courtesy of Reinhard Wilhelm.

Reineke et al., Berkeley 6 Architecture Selection: How it might be done today 1.For every microarchitecture  a different timing model time-consuming, costly, error-prone, and still often inaccurate 2.Space of microarchitectures and their configurations is huge

Reineke et al., Berkeley 7 How it will be done tomorrow: Start out with the Timing Model Reverse process: 1.Design platform timing model for precise and efficient timing analysis. 2.Develop family of microarchitectures conforming to this model. But one platform timing model will not fit all realizations of the platform!

Reineke et al., Berkeley 8 Support Wide Range of Realizations: Parameterize the Timing Model 1.Parameterize the model to enable wide range of realizations. 2.Parametric timing analysis: symbolic computation of all microarchitectures that satisfy the timing requirements.

Reineke et al., Berkeley 9 Example of Parameters and Analysis Results: WCET in Terms of Parameters Floating-point unit present? |cache| >= 4096 Assume four parameters: 1. Presence/absence of floating-point unit 2.Size of cache |cache| in bytes 3.Processor period p=1/frequency in nanoseconds 4.Latency of main memory l in nanoseconds |cache| >= 4096 WCET = 10*p+3*l WCET = 13*p+5*l WCET = 12*p+3*l WCET = 15*p+5*l Yes No

Reineke et al., Berkeley 10 Example of Analysis Results: Assume Deadline D Floating-point unit present? |cache| >= *p+3*l <= D13*p+5*l <= D12*p+3*l <= D15*p+5*l <= D This enables us to determine the cheapest/most-energy efficient/… microarchitecture satisfying the constraints!

Reineke et al., Berkeley 11 Related Challenge: Configuration at Runtime Floating-point unit present? |cache| >= *p+3*l <= D13*p+5*l <= D12*p+3*l <= D15*p+5*l <= D Microarchitecture selected at design time eliminates some of the options, but not necessarily all: |cache| >= *p+3*l <= D13*p+5*l <= D |cache| >= *p+6 <= D13*p+10 <= D Given a model of the power consumption of the processor, we can derive energy optimal cache and processor configurations in terms of the deadline.

Reineke et al., Berkeley 12 Related Challenge: Configuration at Runtime Microarchitecture selection is based on worst-case assumptions at design time. At runtime we may have more knowledge: e = (time, input) e’ = (time+ET(A, input), input’, deadline) Execution time ET(A, input) may be less than worst- case execution time WCET(A). Deadline may be greater than deadline assumed at design time. We can exploit this knowledge to reconfigure a microarchitecture at runtime: Frequency/voltage scaling Distribution of shared resources: Shared cache/scratchpad Functional units

Reineke et al., Berkeley 13 Parameterized Timing Models: Formalization Execution time of program P under input i Parameters Program PInput i Worst case over all inputs

Reineke et al., Berkeley 14 Necessary and Desirable Properties of Parameterized Timing Models Necessary: Execution time should be monotone in parameters. “a higher frequency always yields a shorter execution time” “a smaller cache always yields a longer execution time” Desirable for efficiency: Execution time should depend linearly on parameters. “doubling the processor frequency will decrease execution time by a factor of two”

Reineke et al., Berkeley 15 Parametric Timing Analysis: Black-Box Approach Exploits properties of parameterized timing models to generalize from samples to parametric formula. “Reduce to known problem” and “Generalize from examples” We know how to select representative parameter valuations.

Reineke et al., Berkeley 16 Modern microarchitectures are optimized for average-case performance at the expense of timing predictability and repeatability. Context: Precision-Timed (PRET) Machines Instruction Set Architecture: Precise control over timing through deadline instructions. Microarchitecture: The Precision-Timed ARM adheres to a simple timing model without sacrificing performance: Thread-interleaved pipeline Scratchpad memories in place of caches Predictable DRAM controller The goal of the PRET project is to reintroduce timing predictability and repeatability without sacrificing performance. Black-box WCET analysis can be realized using GameTime (Sanjit Seshia).

Lee, Berkeley 17 Conclusions and Future Work Today, timing models originate from previously developed microarchitectures. Tomorrow, timing models will be carefully designed to enable precise and efficient parametric timing analysis, and microarchitectures will follow. Ongoing work: Proof-of-concept implementation for a parameterized PRET timing model. Raffaello Sanzio da Urbino – The Athens School