Predictable Programming on a Precision Timed Architecture Hiren D. Patel UC Berkeley Joint work with: Ben Lickly, Isaac Liu, Edward A. Lee - UC Berkeley Sungjun Kim, Stephen A. Edwards - Columbia University
Patel, UC Berkeley, PRET2 Edwards and Lee - Case for PRET 2007 – Edwards and Lee made a case for precision timed computers (PRET machines) –Predictability –Repeatability S. A. Edwards and E. A. Lee, The case for the precision timed (PRET) machine. In Proceedings of the 44th Annual Conference on Design Automation (San Diego, California, June , 2007). DAC '07. ACM, New York, NY,
Patel, UC Berkeley, PRET3 Edwards and Lee - Case for PRET Unpredictability –Difficulty in determining timing behavior through analysis Non-repeatability –Lack of guarantee that every execution yields the same timing behavior Brittleness –Small changes have big effects on timing behavior 3
Patel, UC Berkeley, PRET4 Brittleness Expensive affair Tight coupling of software and hardware Reliance on testing for validation Upgrading difficult Solution: stockpile 4 Source:
Patel, UC Berkeley, PRET5 But wait … Real-time scheduling –Worst-case execution time Detailed model of hardware Large engineering effort Valid for particular hardware models –Interrupts, inter- process communication, locks … Bench testing –Brittle 5 Sebastian Altmeyer, Christian Hümbert, Björn Lisper, and Reinhard Wilhelm. Parametric Timing Analysis for Complex Architectures. In Proceedings of the 14th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA'08), pages , Kaohsiung, Taiwan, August IEEE Computer Society.
Patel, UC Berkeley, PRET6 Precise Timing and High Performance 6 TraditionalAlternative CachesScratchpads Deep out-of-order pipelinesThread-interleaved pipelines Function-only ISAsISAs with timing instructions Function-only languagesLanguages and programming models with timing Best-effort communicationFixed-latency communication Time-sharingMultiple independent processors
Patel, UC Berkeley, PRET7 Outline Introduction Related Work PRET Machine Programming Example Future Work Conclusion 7
Patel, UC Berkeley, PRET8 Related Work Java Optimized Processor –Schoeberl et al. [2003] Timing instructions –Ip and Edwards [2006] Reactive processors –Von Hanxleden et al. [2005] –Salcic et al. [2005] Virtual Simple Architecture –Mueller et al. [2003] 8
Patel, UC Berkeley, PRET9 9 Semantics of Timing Instructions Deadline instructions –Denote the required execution time of a block When decoded –Stall instruction if timer value is not 0 –Otherwise set timer value to new value deadi $t0, 10 … deadi $t0, 8 … deadi $t0, 0 … L0: … deadi $t0, 10 b L0 … Straight Line Block 0 Straight Line Block 1 Loop Block
Patel, UC Berkeley, PRET10 Tracing A Program Fragment A: deadi $t0, 6 B: sethi %hi(0x3f800000), %g1 C: or %g1, 0x200, %g1 D: st %g1, [ %fp ] E: deadi $t0, 8 F: … cycle $t0
Patel, UC Berkeley, PRET11 Precision Timed Architecture Thread-interleaved pipeline Scratchpad memories Time-triggered main memory access Round-robin thread scheduling
Patel, UC Berkeley, PRET12 Memory Hierarchy Clocks –Main clock –Derived clocks Instruction and data scratchpad memories –1 cycle access latency Main memory –16MB size –Latency of 50ns –Frequency:250Mhz ~13 cycles latency 12 Core Main Mem. Main Mem. SPM DMA
Patel, UC Berkeley, PRET13 Thread-interleaved Pipeline Thread stalls –Main memory access –Multi-cycle operations –Deadline instructions Replay mechanism –Execute same PC next iteration –Multi-cycle ALU ops replay instructions 13 Fetch Decode Reg. Access Execute Memory WriteBack F/D D/R R/E E/M M/W Decrement Deadline Timers Stall if Deadline Instruction Increment PC Check main memory access
Patel, UC Berkeley, PRET14 Time-Triggered Access through Memory Wheel Decouple thread’s access pattern Time-triggered access Best-case access time –If accessed 1st cycle Worst-case access time –If accessed 2nd cycle of window 14
Patel, UC Berkeley, PRET15 Tool Flow GCC 3.4.4, SystemC 2.2, Python 2.4 Boot codeMotorola SREC files C programs timing instructions GCC to compile boot code and program code
Patel, UC Berkeley, PRET16 Simple Mutual Exclusion Example Producer followed by Consumer and Observer –Consumer and Observer execute together Loop rate of two rotations of memory wheel –1 st for Producer to write –2 nd Consumer and Observer to read 16 Write to shared data Read from shared data Write to output
Patel, UC Berkeley, PRET17 Video Game Example Graphi c Thread VGA- Driver Thread Even Buffer Odd Buffer Main- Control Thread Odd Queue Even Queue Command Pixel Data Swap (When Sync Requested and When Odd Queue Empty) Sync (After queue swapped) Update Screen (Sync request) Sync (After buffer swapped) Refresh (Sync request) Swap (When sync requested and when Vertical blank)
Patel, UC Berkeley, PRET18 Timing Requirements 18 SignalTiming Requirement Pixel Cycles V. Sync64µs1611 V. Back-porch1.02ms25679 Draw 480 lines15.25ms V. Front-porch350µs8811 H. Sync3.77µs96 H. Back-porch1.89µs48 Draw 640 pixels25.42µs H. Front-porch0.64µs16
Patel, UC Berkeley, PRET19 Timing Implementation Pixel-clock using derived clock –25.175Mhz –~ 39.72ns cycle period Drawing 16 pixels 19
Patel, UC Berkeley, PRET20 Future Work Architecture –DMA –DDR2 main memory model –Thread synchronization primitives –Shared data between threads Real-time Benchmarks –With timing requirements Programming models –Memory allocation schemes –Synchronizations
Patel, UC Berkeley, PRET21 Conclusion What we want … –Time as a first class citizen of embedded computing –Predictability –Repeatability Where we are at … –PRET cycle-accurate simulator –Release …
Patel, UC Berkeley, PRET22
Patel, UC Berkeley, PRET23 Extras
Patel, UC Berkeley, PRET24 More on Brittleness Small changes may have big effects on timing behavior Theorem (Richard’s anomalies): If a task set with fixed priorities, execution times, and precedence constraints is optimally scheduled on a fixed number of processors, then increasing the number of processors, reducing execution times, or weakening precedence constraints can increase the schedule length. Richard L. Graham, “Bounds on the performance of scheduling algorithms”, in E. G. Coffman, Jr.(ed.), Computer and Job-Shop Scheduling Theory, John Wiley, New York, 1975.
Patel, UC Berkeley, PRET25 Richard’s Anomalies T1/3T2/2T3/2T4/2 T9/9T5/4T6/4T7/4 8 T8/ tasks, 3 processors, priority list, precedence order, execution times.
Patel, UC Berkeley, PRET26 eTime’ = eTime - 1 Richard’s Anomalies: Reducing Execution Times T1/2T2/1T3/1T4/1 T9/8T5/3T6/3T7/3 8 T8/3 0312
Patel, UC Berkeley, PRET27 Richard’s Anomalies: More Processors T1/3T2/2T3/2T4/2 T9/9T5/4T6/4T7/4 8 T8/ processors 15
Patel, UC Berkeley, PRET28 Richard’s Anomalies: Changing Priority List T1/3T2/2T3/2T4/2 T9/9T5/4T6/4T7/4 9 T8/ L = (T1,T2,T4,T5,T6,T3,T9,T7,T8)
Patel, UC Berkeley, PRET29 Brittleness Again… In general, all task scheduling strategies are brittle