NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud, Ahmed S. AL-Zawawi, Aravindh Anantaraman, and Eric Rotenberg Virtual Multiprocessor: An Analyzable, High-Performance Microarchitecture for Real-Time Computing
NC STATE UNIVERSITY CASES 20052El-Haj-Mahmoud © 2005 Embedded Processor Trends Inheriting desktop high-performance features Examples ARM11: 8-stage pipeline, caches, dynamic br. pred. Ubicom IP3023: 8 hardware threads PowerPC 750: 2-way superscalar, OOO execution
NC STATE UNIVERSITY CASES 20053El-Haj-Mahmoud © 2005 Real-Time Systems and Analyzability Schedulability of task-set determined a priori Requires worst-case execution times (WCET) statically analyzable microarchitecture Dynamic microarchitecture features complicate real-time design A trade-off between performance and analyzability A B
NC STATE UNIVERSITY CASES 20054El-Haj-Mahmoud © 2005 Multiple Simple Processors + Analyzable + Natural fit with real-time systems − Rigid resource partitioning − Higher cost/performance metric proc 1proc 2 A B C
NC STATE UNIVERSITY CASES 20055El-Haj-Mahmoud © 2005 Simultaneous Multithreading (SMT) + Flexible resource sharing + Better cost/performance metric − Unanalyzable SMT AB C
NC STATE UNIVERSITY CASES 20056El-Haj-Mahmoud © 2005 Unanalyzability of SMT − Violates single-task WCET assumption (tasks analyzed separately) − Arbitrary periods arbitrary overlap of tasks − Dynamic interference Cannot derive WCETs Cannot perform schedulability
NC STATE UNIVERSITY CASES 20057El-Haj-Mahmoud © 2005 Real-Time Virtual Multiprocessor Combine MP analyzability and SMT flexibility Key idea: Interference-free multithreading SMT performance WCET of each task independent of task-set RVMP substrate: two parts Highly reconfigurable multithreaded superscalar Space: multiple arbitrary interference-free partitions Time: rapidly reconfigure partitions Static schedule orchestrates partitioning
NC STATE UNIVERSITY CASES 20058El-Haj-Mahmoud © 2005 Big Picture Co-design processor and real-time scheduling for analyzable high-performance
NC STATE UNIVERSITY CASES 20059El-Haj-Mahmoud © 2005 RVMP Architecture Superscalar “ways” are natural partitioning granularity Different-sized virtual processors carved out of single superscalar
NC STATE UNIVERSITY CASES El-Haj-Mahmoud © 2005 Processor Architecture Starting point Alpha 21164: 4-way in-order superscalar Ubicom IP3023: 8 hardware threads (4 in RVMP 4 VPs) Simplifications for analyzability In-order issue within VPs Software-managed scratchpads Static branch prediction Not limitations of RVMP!
NC STATE UNIVERSITY CASES El-Haj-Mahmoud © 2005 Processor Architecture Fetch Unit PC Interleaved Instruction Scratchpad Decode Slotter and Scoreboard (Issue Logic) Int RF Shadow Buffers FP RF Data Scratchpad FU4: FPU FU0: INT FU1: INT/MUL/DIV FU2: INT/AGEN FU3: INT/AGEN 44 1 Shadow Buffers RD WR HRT Fetch Buffer
NC STATE UNIVERSITY CASES El-Haj-Mahmoud © 2005 Fetch Buffer Instruction Scratchpad Fetch Unit Decode Issue Backend FV PV Shadow Buffers FV PV HRT Instruction Fetch
NC STATE UNIVERSITY CASES El-Haj-Mahmoud © 2005 Real-Time Scheduling Too complicated Must schedule entire hyper-period Overwhelming # of possible space/time schedules High dedicated-storage cost for schedule A B C D
NC STATE UNIVERSITY CASES El-Haj-Mahmoud © 2005 WCET 2 WCET 1 Task A WCET 3 WCET 4 period # of ways 4 ways 3 ways 2 ways 1 way … … … … … Round
NC STATE UNIVERSITY CASES El-Haj-Mahmoud © 2005 Task B period # of ways 4 ways 3 ways 2 ways 1 way … … … … …
NC STATE UNIVERSITY CASES El-Haj-Mahmoud © 2005 Task C period # of ways 4 ways 3 ways 2 ways 1 way … … … … …
NC STATE UNIVERSITY CASES El-Haj-Mahmoud © 2005 Task D period # of ways 4 ways 3 ways 2 ways 1 way … … … … …
NC STATE UNIVERSITY CASES El-Haj-Mahmoud © 2005 FAILEDSUCCEEDED Cyclic schedule
NC STATE UNIVERSITY CASES El-Haj-Mahmoud © 2005 Interaction: Scheduling and Architecture HRT LTC FU0 FU1 FU2 FU3 FU4 FU0 FU1 FU2 FU3 FU cycles 6040 FVPVCVs 0 EOT 1 INVALID
NC STATE UNIVERSITY CASES El-Haj-Mahmoud © 2005 Experiments Tasks from C-lab and MiBench benchmarks 100 task-sets 4 tasks per task-set (also 8 tasks in paper) grouped according to scalar utilization (U_scalar) Two experiments Worst-case schedulability analysis Run-time experiments
NC STATE UNIVERSITY CASES El-Haj-Mahmoud © 2005 Worst-Case Schedulability Tests
NC STATE UNIVERSITY CASES El-Haj-Mahmoud © 2005 RVMP Configurations
NC STATE UNIVERSITY CASES El-Haj-Mahmoud © 2005 Run-Time Experiments
NC STATE UNIVERSITY CASES El-Haj-Mahmoud © 2005 Unsafe Behavior of SMT
NC STATE UNIVERSITY CASES El-Haj-Mahmoud © 2005 Summary Novel contributions Virtualize a single processor − Space: variable-size interference-free partitions − Time: rapid reconfiguration Simple real-time scheduling approach Analyzability of MP with flexibility of SMT Co-design processor and real-time scheduling for analyzable high-performance