ECE 720T5 Winter 2014 Cyber-Physical Systems Rodolfo Pellizzoni.

Slides:

Advertisements

Similar presentations

Computer Architecture Instruction-Level Parallel Processors

Advertisements

CSCI 4717/5717 Computer Architecture

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Lecture 1 An Overview of High-Performance Computer Architecture ECE 463/521 Spring 2006.

CPE 731 Advanced Computer Architecture Instruction Level Parallelism Part I Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.

1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.

Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

Anshul Kumar, CSE IITD CSL718 : Memory Hierarchy Cache Performance Improvement 23rd Feb, 2006.

1 COMP 740: Computer Architecture and Implementation Montek Singh Tue, Feb 24, 2009 Topic: Instruction-Level Parallelism IV (Software Approaches/Compiler.

Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.

Dynamic Branch Prediction

Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.

Instruction-Level Parallelism (ILP)

Pipelining II (1) Fall 2005 Lecture 19: Pipelining II.

Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.

Chapter 12 CPU Structure and Function. CPU Sequence Fetch instructions Interpret instructions Fetch data Process data Write data.

Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

Chapter 3 Instruction-Level Parallelism and Its Dynamic Exploitation – Concepts 吳俊興高雄大學資訊工程學系 October 2004 EEF011 Computer Architecture 計算機結構.

CS Computer Architecture 1 CS 430 – Computer Architecture Pipelined Execution - Review William J. Taffe using slides of David Patterson.

Chapter 12 Pipelining Strategies Performance Hazards.

EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.

Wish Branches A Review of “Wish Branches: Enabling Adaptive and Aggressive Predicated Execution” Russell Dodd - October 24, 2006.

1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.

Computer Architecture 2011 – out-of-order execution (lec 7) 1 Computer Architecture Out-of-order execution By Dan Tsafrir, 11/4/2011 Presentation based.

EECC722 - Shaaban #1 Lec # 10 Fall Conventional & Block-based Trace Caches In high performance superscalar processors the instruction fetch.

Multiscalar processors

Csci4203/ece43631 Review Quiz. 1)It is less expensive 2)It is usually faster 3)Its average CPI is smaller 4)It allows a faster clock rate 5)It has a simpler.

Chapter 12 CPU Structure and Function. Example Register Organizations.

Review for Midterm 2 CPSC 321 Computer Architecture Andreas Klappenecker.

7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.

Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

CS 7810 Lecture 9 Effective Hardware-Based Data Prefetching for High-Performance Processors T-F. Chen and J-L. Baer IEEE Transactions on Computers, 44(5)

Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.

5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2.

Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.

Pipelines for Future Architectures in Time Critical Embedded Systems By: R.Wilhelm, D. Grund, J. Reineke, M. Schlickling, M. Pister, and C.Ferdinand EEL.

Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.

Hybrid-Scheduling: A Compile-Time Approach for Energy–Efficient Superscalar Processors Madhavi Valluri and Lizy John Laboratory for Computer Architecture.

CS1104 – Computer Organization PART 2: Computer Architecture Lecture 12 Overview and Concluding Remarks.

Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 8: MIPS Pipelined.

Precomputation- based Prefetching By James Schatz and Bashar Gharaibeh.

Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.

Spring 2003CSE P5481 Advanced Caching Techniques Approaches to improving memory system performance eliminate memory operations decrease the number of misses.

CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.

1 CPRE 585 Term Review Performance evaluation, ISA design, dynamically scheduled pipeline, and memory hierarchy.

ECE 720T5 Fall 2011 Cyber-Physical Systems Rodolfo Pellizzoni.

Introduction to Computer Organization Pipelining.

1 Adapted from UC Berkeley CS252 S01 Lecture 17: Reducing Cache Miss Penalty and Reducing Cache Hit Time Hardware prefetching and stream buffer, software.

Out-of-order execution Lihu Rappoport 11/ MAMAS – Computer Architecture Out-Of-Order Execution Dr. Lihu Rappoport.

High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.

Timing Anomalies in Dynamically Scheduled Microprocessors Thomas Lundqvist, Per Stenstrom (RTSS ‘99) Presented by: Kaustubh S. Patil.

15-740/ Computer Architecture Lecture 3: Performance

Computer Organization CS224

Computer Architecture Principles Dr. Mike Frank

Multiscalar Processors

Pipeline Implementation (4.6)

Drinking from the Firehose Decode in the Mill™ CPU Architecture

CSCI1600: Embedded and Real Time Software

The processor: Pipelining and Branching

Lecture 17: Case Studies Topics: case studies for virtual memory and cache hierarchies (Sections )

Ka-Ming Keung Swamy D Ponpandi

Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt

Overview Prof. Eric Rotenberg

Mattan Erez The University of Texas at Austin

Course Outline for Computer Architecture

Instruction Level Parallelism

CSCI1600: Embedded and Real Time Software

Ka-Ming Keung Swamy D Ponpandi

Spring 2019 Prof. Eric Rotenberg

Presentation transcript:

ECE 720T5 Winter 2014 Cyber-Physical Systems Rodolfo Pellizzoni

/ 27 Upcoming Deadlines Course status –All comments on research projects out by tonight. –Hopefully all presentation comments out by Friday – sorry for the delay. –As always, feel free to set up a meeting as needed. Research Track: Literature Review March 3. –Address comments in the project proposal. –Add an expanded related work section – 1.5 / 2 pages. –More detailed than normal related work – show your review work - provide a more in-depth description of the state-of-the- art in the specific area of the project. Applied Track: preliminary result March 10. 2

/ 27 Topic Today: Microarchitecture Previously: system design. Next: Microarchitecture. Previous problem: determine interference due to multiple agents (tasks/cores) contending for access to shared resources. This problem: compute worst-case execution time for a sequence of instructions. In reality, the two problems are similar, because in modern microarchitectures instructions “contend” for multiple shared resources (virtual registers, execution units, etc.)

/ 27 Microarchitectural Features and Predictability Modern microarchitectures aggressively reduce average case at the cost of decreased predictability. Processor state is very hard to predict when using: –Deep pipelines –Superscalar execution –Out-of-order execution –Virtual registers –Branch predictors –Hardware prefetchers –Unpredictable replacement schemes for TLB/Caches –Basically, any sort of architectural trick… 4

/ 27 Computing the WCET As we already mentioned, two main mechanisms… Static analysis –Analyze the application code together with a model of the architecture. –Provable worst-case over the set of all possible input values and initial states of the processor. –Very complex. Possibly very slow. Pessimistic. Measurement –Can fail to reveal the real worst-case. –Still very much used. 5

Memory Hierarchies, Pipelines, and Buses for Future Architectures in Time-Critical Embedded Systems 6

/ 27 Overview In summary: the architecture should be designed to simplify timing analysis! Several important concepts on static analysis and cache analysis. 7

/ 27 Timing Analysis: How To 8

/ 27 Control Flow Graph 9 Analyze the code (either source or binary) Split the code into a sequence of basic blocks. Basic blocks are typically terminated by jumps (or function calls/returns)

/ 27 Abstract State 10 The analyzer must maintain the state of the processor (pipeline, cache, etc.) to determine BB duration. Problem: the state can depend on all the BB before. Flow-sensitive analysis: the analysis depends on the specific sequence of instructions in the BB. Context-sensitive analysis: the analysis depends on the preceding/calling BBs.

/ 27 Abstract State 11 Solution: abstract state. A collection (set) of possible processor states; if context- sensitive, subsets of the current abstract state are tagged based on BB history. Whenever a new BB is analyzed, perform an abstract state merge based on the abstract states of all preceding BBs. Lose precision but avoids exponential analysis.

/ 27 Timing Anomalies 12

/ 27 To Summarize… Domino effect: I can repeat a set of instructions any amount of times, but the timing of each iterations always depends on the processor state before starting the iteration. In other words, the analysis never converges on a loop. 1.Fully-compositional architecture: no timing anomaly. 2.Compositional architecture with constant bounded effects: just take the worst-case for each component of the abnormal scenario (ex: A misses & B executes before C). 3.Noncompositional architecture: domino effects mean we need to keep the whole context. 13

/ 27 PLRU load line 1 load line access line 2 load line load line 4

/ 27 Example 15

/ 27 Convergence of May and Must Set 16

/ 27 How Important is the Cache State? 17

/ 27 Solving the Abstract State Problem Virtual Interferences: timing penalties caused not by contention for shared resources, but because of loss of precision in the abstract state. Solution: reset state at each basic block. Naïve solution doesn’t work that well… –We can’t do so for caches! –We can only extract limited parallelism within a single basic block –Branch prediction becomes useless (together with a bunch of other predictions mechanisms) Better solution: bunch multiple BBs together. –Doesn’t solve the cache problem, but good for the microarchitecture state. 18

/ 27 Virtual Traces Time-Predictable Out-of-Order Execution for Hard Real- Time Systems Virtual trace: a limited-length path through a set of BBs. Superblock: set of BBs with one entry and multiple exits. –Main exit: WCET through the superblock –Side exit: quicker exit. 19

/ 27 Virtual Traces in the Processor 20 ISA changed to signal begin/end of traces. State reset at trace exit. The WCET of each trace is easy to compute!

/ 27 Results – Alpha ISA 21

Precision-Timed Architecture 22

/ 27 System Design 23

/ 27 PRET Pipeline 24 FETCH DECO DE REGA CC MEM EXEC UTE EXCE PT FETCH DECO DE REGA CC MEM EXEC UTE EXCE PT FETCH DECO DE REGA CC MEM EXEC UTE EXCE PT FETCH DECO DE REGA CC MEM EXEC UTE EXCE PT FETCH DECO DE REGA CC MEM EXEC UTE EXCE PT FETCH DECO DE REGA CC MEM EXEC UTE EXCE PT FETCH DECO DE REGA CC MEM EXEC UTE EXCE PT FETCH DECO DE REGA CC MEM EXEC UTE FETCH DECO DE REGA CC MEM FETCH DECO DE REGA CC FETCH DECO DE FETCH t THREAD#1 THREAD#2 THREAD#3 THREAD#4 THREAD#5 THREAD#6 1 clock Thread 1, Instruction 1 Thread 1, Instruction 2

/ 27 Producer Consumer with Deadline Inst 25

/ 27 Video Game App 26

/ 27 Video Controller 27

/ 27 Inner Loop 28