Download presentation
Presentation is loading. Please wait.
Published byGerard Matthews Modified over 9 years ago
1
NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud, Ahmed S. AL-Zawawi, Aravindh Anantaraman, and Eric Rotenberg Virtual Multiprocessor: An Analyzable, High-Performance Microarchitecture for Real-Time Computing
2
NC STATE UNIVERSITY CASES 20052El-Haj-Mahmoud © 2005 Embedded Processor Trends Inheriting desktop high-performance features Examples ARM11: 8-stage pipeline, caches, dynamic br. pred. Ubicom IP3023: 8 hardware threads PowerPC 750: 2-way superscalar, OOO execution
3
NC STATE UNIVERSITY CASES 20053El-Haj-Mahmoud © 2005 Real-Time Systems and Analyzability Schedulability of task-set determined a priori Requires worst-case execution times (WCET) statically analyzable microarchitecture Dynamic microarchitecture features complicate real-time design A trade-off between performance and analyzability A B
4
NC STATE UNIVERSITY CASES 20054El-Haj-Mahmoud © 2005 Multiple Simple Processors + Analyzable + Natural fit with real-time systems − Rigid resource partitioning − Higher cost/performance metric proc 1proc 2 A B C
5
NC STATE UNIVERSITY CASES 20055El-Haj-Mahmoud © 2005 Simultaneous Multithreading (SMT) + Flexible resource sharing + Better cost/performance metric − Unanalyzable SMT AB C
6
NC STATE UNIVERSITY CASES 20056El-Haj-Mahmoud © 2005 Unanalyzability of SMT − Violates single-task WCET assumption (tasks analyzed separately) − Arbitrary periods arbitrary overlap of tasks − Dynamic interference Cannot derive WCETs Cannot perform schedulability
7
NC STATE UNIVERSITY CASES 20057El-Haj-Mahmoud © 2005 Real-Time Virtual Multiprocessor Combine MP analyzability and SMT flexibility Key idea: Interference-free multithreading SMT performance WCET of each task independent of task-set RVMP substrate: two parts Highly reconfigurable multithreaded superscalar Space: multiple arbitrary interference-free partitions Time: rapidly reconfigure partitions Static schedule orchestrates partitioning
8
NC STATE UNIVERSITY CASES 20058El-Haj-Mahmoud © 2005 Big Picture Co-design processor and real-time scheduling for analyzable high-performance
9
NC STATE UNIVERSITY CASES 20059El-Haj-Mahmoud © 2005 RVMP Architecture Superscalar “ways” are natural partitioning granularity Different-sized virtual processors carved out of single superscalar
10
NC STATE UNIVERSITY CASES 200510El-Haj-Mahmoud © 2005 Processor Architecture Starting point Alpha 21164: 4-way in-order superscalar Ubicom IP3023: 8 hardware threads (4 in RVMP 4 VPs) Simplifications for analyzability In-order issue within VPs Software-managed scratchpads Static branch prediction Not limitations of RVMP!
11
NC STATE UNIVERSITY CASES 200511El-Haj-Mahmoud © 2005 Processor Architecture Fetch Unit PC Interleaved Instruction Scratchpad Decode Slotter and Scoreboard (Issue Logic) Int RF Shadow Buffers FP RF Data Scratchpad FU4: FPU FU0: INT FU1: INT/MUL/DIV FU2: INT/AGEN FU3: INT/AGEN 44 1 Shadow Buffers RD WR HRT Fetch Buffer
12
NC STATE UNIVERSITY CASES 200512El-Haj-Mahmoud © 2005 Fetch Buffer Instruction Scratchpad Fetch Unit Decode Issue Backend FV PV Shadow Buffers FV PV HRT Instruction Fetch
13
NC STATE UNIVERSITY CASES 200513El-Haj-Mahmoud © 2005 Real-Time Scheduling Too complicated Must schedule entire hyper-period Overwhelming # of possible space/time schedules High dedicated-storage cost for schedule A B C D
14
NC STATE UNIVERSITY CASES 200514El-Haj-Mahmoud © 2005 WCET 2 WCET 1 Task A WCET 3 WCET 4 period 1 2 3 4 # of ways 4 ways 3 ways 2 ways 1 way … … … … … Round
15
NC STATE UNIVERSITY CASES 200515El-Haj-Mahmoud © 2005 Task B period 1 2 3 4 # of ways 4 ways 3 ways 2 ways 1 way … … … … …
16
NC STATE UNIVERSITY CASES 200516El-Haj-Mahmoud © 2005 Task C period 1 2 3 4 # of ways 4 ways 3 ways 2 ways 1 way … … … … …
17
NC STATE UNIVERSITY CASES 200517El-Haj-Mahmoud © 2005 Task D period 1 2 3 4 # of ways 4 ways 3 ways 2 ways 1 way … … … … …
18
NC STATE UNIVERSITY CASES 200518El-Haj-Mahmoud © 2005 FAILEDSUCCEEDED Cyclic schedule
19
NC STATE UNIVERSITY CASES 200519El-Haj-Mahmoud © 2005 Interaction: Scheduling and Architecture HRT LTC FU0 FU1 FU2 FU3 FU4 FU0 FU1 FU2 FU3 FU4 60 40 100 cycles 6040 FVPVCVs 0 EOT 1 INVALID
20
NC STATE UNIVERSITY CASES 200520El-Haj-Mahmoud © 2005 Experiments Tasks from C-lab and MiBench benchmarks 100 task-sets 4 tasks per task-set (also 8 tasks in paper) grouped according to scalar utilization (U_scalar) Two experiments Worst-case schedulability analysis Run-time experiments
21
NC STATE UNIVERSITY CASES 200521El-Haj-Mahmoud © 2005 Worst-Case Schedulability Tests
22
NC STATE UNIVERSITY CASES 200522El-Haj-Mahmoud © 2005 RVMP Configurations
23
NC STATE UNIVERSITY CASES 200523El-Haj-Mahmoud © 2005 Run-Time Experiments
24
NC STATE UNIVERSITY CASES 200524El-Haj-Mahmoud © 2005 Unsafe Behavior of SMT
25
NC STATE UNIVERSITY CASES 200525El-Haj-Mahmoud © 2005 Summary Novel contributions Virtualize a single processor − Space: variable-size interference-free partitions − Time: rapid reconfiguration Simple real-time scheduling approach Analyzability of MP with flexibility of SMT Co-design processor and real-time scheduling for analyzable high-performance
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.