NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud,

Slides:



Advertisements
Similar presentations
Instruction Level Parallelism and Superscalar Processors
Advertisements

Computer Organization and Architecture
NC STATE UNIVERSITY 1 Assertion-Based Microarchitecture Design for Improved Fault Tolerance Vimal K. Reddy Ahmed S. Al-Zawawi, Eric Rotenberg Center for.
Lecture 6: Multicore Systems
Chapter 3 Pipelining. 3.1 Pipeline Model n Terminology –task –subtask –stage –staging register n Total processing time for each task. –T pl =, where t.
Harini Ramaprasad, Frank Mueller North Carolina State University Center for Embedded Systems Research Tightening the Bounds on Feasible Preemption Points.
Microprocessor Microarchitecture Multithreading Lynn Choi School of Electrical Engineering.
Multithreading Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.
NC STATE UNIVERSITY DSN 2007 © Vimal Reddy Inherent Time Redundancy (ITR): Using Program Repetition for Low-Overhead Fault Tolerance Vimal Reddy.
Microarchitectural Approaches to Exceeding the Complexity Barrier © Eric Rotenberg 1 Microarchitectural Approaches to Exceeding the Complexity Barrier.
Single-Chip Multiprocessor Nirmal Andrews. Case for single chip multiprocessors Advances in the field of integrated chip processing. - Gate density (More.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 19 - Pipelined.
1 Lecture 11: ILP Innovations and SMT Today: out-of-order example, ILP innovations, SMT (Sections 3.5 and supplementary notes)
Superscalar Processors (Pictured above is the DEC Alpha 21064) Presented by Jeffery Aguiar.
Multithreading and Dataflow Architectures CPSC 321 Andreas Klappenecker.
Chapter Hardwired vs Microprogrammed Control Multithreading
1 Lecture 9: More ILP Today: limits of ILP, case studies, boosting ILP (Sections )
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 3.
NC STATE UNIVERSITY Anantaraman © 2004RTSS–25 Enforcing Safety of Real-Time Schedules on Contemporary Processors using a Virtual Simple Architecture (VISA)
How Multi-threading can increase on-chip parallelism
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
Simultaneous Multithreading:Maximising On-Chip Parallelism Dean Tullsen, Susan Eggers, Henry Levy Department of Computer Science, University of Washington,Seattle.
CHAPTER 8: CPU and Memory Design, Enhancement, and Implementation
CS 7810 Lecture 24 The Cell Processor H. Peter Hofstee Proceedings of HPCA-11 February 2005.
The Vector-Thread Architecture Ronny Krashinsky, Chris Batten, Krste Asanović Computer Architecture Group MIT Laboratory for Computer Science
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.
Multi-Core Architectures
Hardware Multithreading. Increasing CPU Performance By increasing clock frequency By increasing Instructions per Clock Minimizing memory access impact.
Microprocessor Microarchitecture Instruction Fetch Lynn Choi Dept. Of Computer and Electronics Engineering.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Dynamic Pipelines. Interstage Buffers Superscalar Pipeline Stages In Program Order In Program Order Out of Order.
NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Electrical & Computer Engineering North Carolina State University Ali El-Haj-Mahmoud and.
A few issues on the design of future multicores André Seznec IRISA/INRIA.
Safely Exploiting Multithreaded Processors to Tolerate Memory Latency
By Edward A. Lee, J.Reineke, I.Liu, H.D.Patel, S.Kim
SIMULTANEOUS MULTITHREADING Ting Liu Liu Ren Hua Zhong.
F A S T Frequency-Aware Static Timing Analysis
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.
EKT303/4 Superscalar vs Super-pipelined.
Final Review Prof. Mike Schulte Advanced Computer Architecture ECE 401.
Advanced Computer Architecture pg 1 Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8) Henk Corporaal
Computer Structure 2015 – Intel ® Core TM μArch 1 Computer Structure Multi-Threading Lihu Rappoport and Adi Yoaz.
On-chip Parallelism Alvin R. Lebeck CPS 220/ECE 252.
Lecture 1: Introduction CprE 585 Advanced Computer Architecture, Fall 2004 Zhao Zhang.
15-740/ Computer Architecture Lecture 12: Issues in OoO Execution Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/7/2011.
Advanced Pipelining 7.1 – 7.5. Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.
PipeliningPipelining Computer Architecture (Fall 2006)
Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.
COMP 740: Computer Architecture and Implementation
Prof. Onur Mutlu Carnegie Mellon University
Lynn Choi School of Electrical Engineering
Simultaneous Multithreading
Computer Structure Multi-Threading
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
/ Computer Architecture and Design
Computer Architecture Lecture 4 17th May, 2006
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
The Vector-Thread Architecture
Aravindh Anantaraman*, Kiran Seth†, Eric Rotenberg*, Frank Mueller‡
Lecture 22: Multithreading
The University of Adelaide, School of Computer Science
FAST: Frequency-Aware Static Timing Analysis
ECE 721, Spring 2019 Prof. Eric Rotenberg.
Spring’19 Prof. Eric Rotenberg
Spring 2019 Prof. Eric Rotenberg
Presentation transcript:

NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud, Ahmed S. AL-Zawawi, Aravindh Anantaraman, and Eric Rotenberg Virtual Multiprocessor: An Analyzable, High-Performance Microarchitecture for Real-Time Computing

NC STATE UNIVERSITY CASES 20052El-Haj-Mahmoud © 2005 Embedded Processor Trends  Inheriting desktop high-performance features  Examples ARM11: 8-stage pipeline, caches, dynamic br. pred. Ubicom IP3023: 8 hardware threads PowerPC 750: 2-way superscalar, OOO execution

NC STATE UNIVERSITY CASES 20053El-Haj-Mahmoud © 2005 Real-Time Systems and Analyzability  Schedulability of task-set determined a priori Requires worst-case execution times (WCET)  statically analyzable microarchitecture  Dynamic microarchitecture features complicate real-time design A trade-off between performance and analyzability A B

NC STATE UNIVERSITY CASES 20054El-Haj-Mahmoud © 2005 Multiple Simple Processors + Analyzable + Natural fit with real-time systems − Rigid resource partitioning − Higher cost/performance metric proc 1proc 2 A B C

NC STATE UNIVERSITY CASES 20055El-Haj-Mahmoud © 2005 Simultaneous Multithreading (SMT) + Flexible resource sharing + Better cost/performance metric − Unanalyzable SMT AB C

NC STATE UNIVERSITY CASES 20056El-Haj-Mahmoud © 2005 Unanalyzability of SMT − Violates single-task WCET assumption (tasks analyzed separately) − Arbitrary periods  arbitrary overlap of tasks − Dynamic interference Cannot derive WCETs  Cannot perform schedulability

NC STATE UNIVERSITY CASES 20057El-Haj-Mahmoud © 2005 Real-Time Virtual Multiprocessor  Combine MP analyzability and SMT flexibility  Key idea: Interference-free multithreading SMT performance WCET of each task independent of task-set  RVMP substrate: two parts Highly reconfigurable multithreaded superscalar  Space: multiple arbitrary interference-free partitions  Time: rapidly reconfigure partitions Static schedule orchestrates partitioning

NC STATE UNIVERSITY CASES 20058El-Haj-Mahmoud © 2005 Big Picture Co-design processor and real-time scheduling for analyzable high-performance

NC STATE UNIVERSITY CASES 20059El-Haj-Mahmoud © 2005 RVMP Architecture  Superscalar “ways” are natural partitioning granularity  Different-sized virtual processors carved out of single superscalar

NC STATE UNIVERSITY CASES El-Haj-Mahmoud © 2005 Processor Architecture  Starting point Alpha 21164: 4-way in-order superscalar Ubicom IP3023: 8 hardware threads (4 in RVMP  4 VPs)  Simplifications for analyzability In-order issue within VPs Software-managed scratchpads Static branch prediction Not limitations of RVMP!

NC STATE UNIVERSITY CASES El-Haj-Mahmoud © 2005 Processor Architecture Fetch Unit PC Interleaved Instruction Scratchpad Decode Slotter and Scoreboard (Issue Logic) Int RF Shadow Buffers FP RF Data Scratchpad FU4: FPU FU0: INT FU1: INT/MUL/DIV FU2: INT/AGEN FU3: INT/AGEN 44 1 Shadow Buffers RD WR HRT Fetch Buffer

NC STATE UNIVERSITY CASES El-Haj-Mahmoud © 2005 Fetch Buffer Instruction Scratchpad Fetch Unit Decode Issue Backend FV PV Shadow Buffers FV PV HRT Instruction Fetch

NC STATE UNIVERSITY CASES El-Haj-Mahmoud © 2005 Real-Time Scheduling  Too complicated Must schedule entire hyper-period Overwhelming # of possible space/time schedules High dedicated-storage cost for schedule A B C D

NC STATE UNIVERSITY CASES El-Haj-Mahmoud © 2005 WCET 2 WCET 1 Task A WCET 3 WCET 4 period # of ways 4 ways 3 ways 2 ways 1 way … … … … … Round

NC STATE UNIVERSITY CASES El-Haj-Mahmoud © 2005 Task B period # of ways 4 ways 3 ways 2 ways 1 way … … … … …

NC STATE UNIVERSITY CASES El-Haj-Mahmoud © 2005 Task C period # of ways 4 ways 3 ways 2 ways 1 way … … … … …

NC STATE UNIVERSITY CASES El-Haj-Mahmoud © 2005 Task D period # of ways 4 ways 3 ways 2 ways 1 way … … … … …

NC STATE UNIVERSITY CASES El-Haj-Mahmoud © 2005 FAILEDSUCCEEDED Cyclic schedule

NC STATE UNIVERSITY CASES El-Haj-Mahmoud © 2005 Interaction: Scheduling and Architecture HRT LTC FU0 FU1 FU2 FU3 FU4 FU0 FU1 FU2 FU3 FU cycles 6040 FVPVCVs 0 EOT 1 INVALID

NC STATE UNIVERSITY CASES El-Haj-Mahmoud © 2005 Experiments  Tasks from C-lab and MiBench benchmarks  100 task-sets 4 tasks per task-set (also 8 tasks in paper) grouped according to scalar utilization (U_scalar)  Two experiments Worst-case schedulability analysis Run-time experiments

NC STATE UNIVERSITY CASES El-Haj-Mahmoud © 2005 Worst-Case Schedulability Tests

NC STATE UNIVERSITY CASES El-Haj-Mahmoud © 2005 RVMP Configurations

NC STATE UNIVERSITY CASES El-Haj-Mahmoud © 2005 Run-Time Experiments

NC STATE UNIVERSITY CASES El-Haj-Mahmoud © 2005 Unsafe Behavior of SMT

NC STATE UNIVERSITY CASES El-Haj-Mahmoud © 2005 Summary  Novel contributions Virtualize a single processor − Space: variable-size interference-free partitions − Time: rapid reconfiguration Simple real-time scheduling approach  Analyzability of MP with flexibility of SMT  Co-design processor and real-time scheduling for analyzable high-performance