ID 721C: Using an RTOS in SH Based Product Development John Carbone VP, Marketing 13 October 2010 Version: 3.2 Express Logic, Inc.
2 Abstract Many developers are intimidated by the prospect of using an RTOS, when in fact, an RTOS can help simplify their application development. This paper will explain just what an RTOS can do for developers and how to use it in a painless, productive manner. The objective of this paper is to de-mystify the use of an RTOS, and to use an illustrative example to show how an RTOS can simplify development and improve performance. The class demo will be available for download, for use on a SH7264 board. 2
3 Presenter: John A. Carbone 3 VP, Marketing, Express Logic, Inc. Responsible for product and corporate marketing, Renesas partner relationship, technical articles, and technical training. Presenter at Renesas DevCon 2008, and various industry conferences Authored technical papers on real-time multithreading, certification, and measurement of real-time performance PREVIOUS EXPERIENCE: VP, Marketing at Green Hills Software Embedded developer and FAE Member of the IEEE BA, Mathematics, Boston College
4 Renesas Technology and Solution Portfolio Microcontrollers & Microprocessors #1 Market share worldwide * Analog and Power Devices #1 Market share in low-voltage MOSFET** Solutions for Innovation ASIC, ASSP & Memory Advanced and proven technologies * MCU: 31% revenue basis from Gartner "Semiconductor Applications Worldwide Annual Market Share: Database" 25 March 2010 **Power MOSFET: 17.1% on unit basis from Marketing Eye 2009 (17.1% on unit basis).
55 Renesas Technology and Solution Portfolio Microcontrollers & Microprocessors #1 Market share worldwide * Analog and Power Devices #1 Market share in low-voltage MOSFET** ASIC, ASSP & Memory Advanced and proven technologies * MCU: 31% revenue basis from Gartner "Semiconductor Applications Worldwide Annual Market Share: Database" 25 March 2010 **Power MOSFET: 17.1% on unit basis from Marketing Eye 2009 (17.1% on unit basis). Solutions for Innovation
66 © 2010 Renesas Electronics America Inc. All rights reserved. 6 Microcontroller and Microprocessor Line-up Superscalar, MMU, Multimedia Up to 1200 DMIPS, 45, 65 & 90nm process Video and audio processing on Linux Server, Industrial & Automotive Up to 500 DMIPS, 150 & 90nm process 600uA/MHz, 1.5 uA standby Medical, Automotive & Industrial Legacy Cores Next-generation migration to RX High Performance CPU, FPU, DSC Embedded Security Up to 10 DMIPS, 130nm process 350 uA/MHz, 1uA standby Capacitive touch Up to 25 DMIPS, 150nm process 190 uA/MHz, 0.3uA standby Application-specific integration Up to 25 DMIPS, 180, 90nm process 1mA/MHz, 100uA standby Crypto engine, Hardware security Up to 165 DMIPS, 90nm process 500uA/MHz, 2.5 uA standby Ethernet, CAN, USB, Motor Control, TFT Display High Performance CPU, Low Power Ultra Low Power General Purpose
77 © 2010 Renesas Electronics America Inc. All rights reserved. 7 Microcontroller and Microprocessor Line-up Superscalar, MMU, Multimedia Up to 1200 DMIPS, 45, 65 & 90nm process Video and audio processing on Linux Server, Industrial & Automotive Up to 500 DMIPS, 150 & 90nm process 600uA/MHz, 1.5 uA standby Medical, Automotive & Industrial Legacy Cores Next-generation migration to RX High Performance CPU, FPU, DSC Embedded Security Up to 10 DMIPS, 130nm process 350 uA/MHz, 1uA standby Capacitive touch Up to 25 DMIPS, 150nm process 190 uA/MHz, 0.3uA standby Application-specific integration Up to 25 DMIPS, 180, 90nm process 1mA/MHz, 100uA standby Crypto engine, Hardware security Up to 165 DMIPS, 90nm process 500uA/MHz, 2.5 uA standby Ethernet, CAN, USB, Motor Control, TFT Display High Performance CPU, Low Power Ultra Low Power General Purpose SuperH
88 © 2010 Renesas Electronics America Inc. All rights reserved. 8 Microcontroller and Microprocessor Line-up Superscalar, MMU, Multimedia Up to 1200 DMIPS, 45, 65 & 90nm process Video and audio processing on Linux Server, Industrial & Automotive Up to 500 DMIPS, 150 & 90nm process 600uA/MHz, 1.5 uA standby Medical, Automotive & Industrial Legacy Cores Next-generation migration to RX High Performance CPU, FPU, DSC Embedded Security Up to 10 DMIPS, 130nm process 350 uA/MHz, 1uA standby Capacitive touch Up to 25 DMIPS, 150nm process 190 uA/MHz, 0.3uA standby Application-specific integration Up to 25 DMIPS, 180, 90nm process 1mA/MHz, 100uA standby Crypto engine, Hardware security Up to 165 DMIPS, 90nm process 500uA/MHz, 2.5 uA standby Ethernet, CAN, USB, Motor Control, TFT Display High Performance CPU, Low Power Ultra Low Power General Purpose V850 Memory Write Back Instruction Fetch Data Forward Operand Decode Execute Write Back Branch/LD Pipe High Performance Low Power VERY Small Packages
99 © 2010 Renesas Electronics America Inc. All rights reserved. 9 Microcontroller and Microprocessor Line-up Superscalar, MMU, Multimedia Up to 1200 DMIPS, 45, 65 & 90nm process Video and audio processing on Linux Server, Industrial & Automotive Up to 500 DMIPS, 150 & 90nm process 600uA/MHz, 1.5 uA standby Medical, Automotive & Industrial Legacy Cores Next-generation migration to RX High Performance CPU, FPU, DSC Embedded Security Up to 10 DMIPS, 130nm process 350 uA/MHz, 1uA standby Capacitive touch Up to 25 DMIPS, 150nm process 190 uA/MHz, 0.3uA standby Application-specific integration Up to 25 DMIPS, 180, 90nm process 1mA/MHz, 100uA standby Crypto engine, Hardware security Up to 165 DMIPS, 90nm process 500uA/MHz, 2.5 uA standby Ethernet, CAN, USB, Motor Control, TFT Display High Performance CPU, Low Power Ultra Low Power General Purpose RX Ethernet, CAN, USB, UART, SPI, IIC
10 Express Logic Innovation 10
11 Express Logic’s ThreadX RTOS ThreadX is 3 rd RTOS developed by Bill Lamie 1990: Nucleus® RTX 1993: NucleusPLUS® 1997: ThreadX® Together, over 2 billion RTOS deployments! ThreadX is used in over 800 million electronic products HP ink-jet printers Mobile Devices (Baseband Radio, Bluetooth, WiFi, GPS) Medical, Industrial, Aerospace systems ThreadX is for “When it Really Counts” Commercial products Field-proven Full support Supports full SuperH product family 11
12 Agenda Part I - Key Concepts What Is An RTOS Benefits of an RTOS RTOS services Types of Scheduling Multithreading Preemptive Scheduling Preemption-Threshold™ 12
13 Agenda (2) Part II – An Example Understanding The Problem Available Tools ThreadX ® RTOS A Test To See The Effect of Priority Assignment Building and Running the Test Cases Assessing The Results Summary and Conclusion Q/A More Information 13
14 Key Takeaways Attendees will be able to See how an RTOS can be used to build a real-time system Understand RTOS services and how to use them Build a demo application that uses an RTOS, using HEW Use a real-time event analysis tool to examine system events and measure system performance Understand the relationship between RTOS scheduling algorithms and context switching No longer fear the use of an RTOS 14
15 RTOS File System GraphicsNetworking USB, etc. What Is An RTOS? RTOS What is an RTOS? – Kernel + X, Y, Z, … What does an RTOS do for us? – Manages real-time applications RTOS Services – Scheduler – Threads – Timers – Message Queues – Semaphores – Mutexes – Memory Pools Kernel Hardware Interrupts Scheduler 15
16 Benefits of An RTOS Reclaim CPU cycles – lower overhead Polling keeps CPU at 100% Sleep(), Intra-thread activation, Interrupts reclaim almost all polling cycles Easily add new threads Modular expansion through threads Provides platform for adding middleware TCP/IP Stack USB Stack Graphics Event Trace See Article, “Multitasking Mysteries” 16
17 Threads and Priorities Threads What is a thread? – Semi-independent program segment – Share same memory space – Run “concurrently” How are threads used? – Modularize a program – Minimize stalls Thread Services – Create, Suspend, Relinquish, Terminate, Exit, Prioritize Thread States – READY, RUNNING, SUSPENDED, TERMINATED Thread Priorities Often 0-n, with 0 highest Dynamic or Static Equal priorities – Multiple threads at same priority Unique priorities – Each thread has unique priority Process Thread … Process memory space Highest Lowest Priority … n 17
18 Context Switch Timers Unlimited, one-shot, repeatable Message Passing Queues, send, receive, pend Semaphores/Mutexes Priority inheritance optional Memory Management Byte and block pool allocation ThreadX RTOS Services 18 Thread-1 Context Thread-2 Context
19 Context Switch Thread Context Information critical to thread’s operation Register Contents, Program Counter, Stack Pointer Saved when thread is preempted Restored when thread is resumed Context Switch Interrupt running thread and do something else Result of preemption, interrupt, or cooperative service What’s involved in a context switch? See StepOperationCycles 1 Save the current thread’s context (ie: GP and FP register values and PC) on the stack Save the current stack pointer in the thread's control block Switch to the system stack pointer Return to the scheduler Find the highest priority thread that is ready to run Switch to the new thread's stack Recover the new thread's context Return to the new thread at its previous PC Other processing TOTAL Registers PC SP 19
20 Timers What is a timer? Mechanism to enable applications to perform application C functions at specified intervals of time – One-shot – Repeating Derived from system clock Unlimited number of software timers Why use timers? Time-outs Periodic operations Watchdog services 20
21 What is a Message Queue? Data structure that holds messages A message is a 32-bit word, or a pointer to a larger array of information Means of message-passing among threads Messages usually are inserted at rear of queue (FIFO) but can be inserted at front of queue if desired (LIFO) Messages are removed from front of queue Public resource—any thread can access any queue Message Queues msg_n || … || msg_3 || msg_2 || msg_1 messages inserted at rear of queue messages removed from front of queue Why Use Message Queues? To send data from thread to thread To notify a thread that an event has occurred Threads will suspend on queue full and queue empty 21
22 Mutexes What is a Mutex? A mutex is a binary semaphore that usually incorporates extra features, such as: – Ownership – Priority Inheritance (Priority inversion protection) – Note: requires ability to have multiple threads at same priority Why use mutexes? Coordinate access to single-use resource – Critical section of code – Certain peripherals 22
23 Types of Schedulers Big Loop Scheduling Each thread is polled to see if it needs to run Polling proceeds sequentially, or in priority order Inefficient, lacks responsiveness RTOS Scheduler Controls which thread is allowed to run Performs context switches Provides thread services – Sleep – Relinquish – Terminate Round-Robin Scheduling Cycle through multiple “READY” threads And/or impose “time-slice” for each thread A bit better than the loop, but still not very responsive Thread-1 Thread-2 Thread-3 Thread-4 “Big Loop” Scheduler ? ? ? ? Thread-1 Thread-2 Thread-3 Thread-4 Round- Robin Scheduling 23
24 The Big Loop CPU time is spent checking to see if any activity (thread) has work to do A?, B?, C?, D?, … Time Thread C Thread A Thread F A? A?, B?, C? Thread F Has Work To Do Anyone Have Work To Do? Thread F Is Done or Stuck Anyone Have Work To Do? 24
25 Multithreading Enabling an activity to use the CPU while other activities don’t need it – I/O Delay Thread A Thread B Thread A I/O Start I/O Finish Time Thread A has to wait for I/O When I/O is done, Thread A can continue While Thread A is waiting for I/O, Thread B can use CPU Thread A Waits 25
26 RTOS Scheduling Implement multithreading by keeping track of thread states and activate threads with work to do RTOS Scheduler Thread A Thread B Thread C Thread F Time Thread C Thread B Thread A Thread F 26
27 “RTOS-izing” Code Stand-alone code generally uses “event loops” to run functions While(1) { if (condition_1) { function_1() else if (condition_2) function_2() else if (condition_3) function_3() … endif; } Time to evaluate each “condition_n” expression plus decide and branch, adds up – delays response to “condition_x”. Plus, any new conditions or functions change timing of loop. With an RTOS Run highest priority function (task/thread) When that thread must wait for an event, thread suspends until “event” occurs Enables other threads to get CPU cycles Event triggers interrupt. ISR calls scheduler Scheduler performs context switch Result is faster response and better use of time 27
28 Replace with call to tx_thread_sleep(n) Enables other (READY) threads to get CPU cycles In “n” timer ticks (can be any user-defined duration), suspended thread re-awakens Result is better use of delay time by other threads “RTOS-izing” Code Stand-alone code often uses “Delay Loops” Spin loop occupies CPU 100% for duration of delay period. tx_thread_sleep (1); For i=1, i<10000, i++ { ………… end; } Sleep call frees up CPU for other threads or for low- power operation. 28
29 Results With Big Loop and Spin-Loop Delays – No RTOS 29
30 Using an RTOS for Multithreading Timer expires Application Resumes Sleep Background Runs 30
31 Results With RTOS Multithreading 31
32 Preemptive Scheduling Thread-1 Begins Thread-2 Runs Thread-1 Resumes Priority Time Context Switch Preemption Interruption for higher- priority activity – Interrupt – Thread Preemptive Scheduling Always run highest priority thread that is READY to run – Maximum responsiveness – No Polling, so more efficient – Always results in a context switch 32
33 Preemptive Problems Thread Starvation If a higher-priority thread is always ready, the lower priority threads never execute Excessive Overhead From context switching The subject of our demo Priority Inversion Higher-priority thread can be suspended because a lower-priority thread has a needed resource – see following … Thread-1 Begins Thread-2 Preempts Thread- 1 and Runs … Priority Time Thread-1 may never get to run again 33
34 Priority Inversion Thread-2 Preempts Thread-1 Thread-3 Preempts Thread-2, But Suspends For Mutex-M Thread-1 Obtains Mutex-M Priority Time Even though Thread-3 has the highest priority, it must wait for Thread-2. Thus, priorities have become inverted. Thread-2 Resumes …… Thread-3 Blocked! 34
35 Priority Inheritance Thread-1 Obtains Mutex-M Thread-2 Preempts Thread-1 Thread-3 Preempts Thread-2, But Suspends For Mutex-M Priority Time Thread-1 assumes the priority of Thread-3 until it is finished with Mutex-M Thread-1 Thread-1 Assumes Thread-3 Priority and releases the mutex Thread-3 Resumes Thread-1 Now, we have 2 threads at the same priority 35
36 Preemption-Threshold™ A technique to avoid priority-inversion and reduce context switches Preemption-Threshold establishes a priority ceiling for disabling preemption – preemption requires a priority higher (lower number) than the ceiling PriorityComment 0 Preemption allowed for threads with priorities from 0 to 14 (inclusive) : Thread is assigned Preemption-threshold = 15 [this has the effect of disabling preemption for threads with priority values from 15 to 19 (inclusive)] : Thread is assigned Priority = 20 : 31 For example, assume a thread’s priority is 20, and its preemption threshold is set to 15 Threads with priority lower than (larger number) 14, even if higher than (smaller number) the running thread’s priority (20), will not preempt the running thread 36
37 Example with/without Preemption-Threshold The shaded area identifies priority inversion Without Preemption-Threshold, High_thread must wait for Medium_thread, which has a lower priority With Preemption-Threshold, no priority inversion has been detected High_thread obtains the mutex without waiting for Medium_thread Without Preemption- Threshold With Preemption- Threshold 37
38 Preemption-Threshold Improves Efficiency © 2010 Express Logic, Inc. Relative Time Between Mutex_Get and Mutex_Put Pairs for High_thread Average Time Minimum Time Maximum Time Number of Get-Put Pairs Without Preemption-Threshold With Preemption-Threshold Number of Priority Inversions For Test Case Number of Non-Deterministic Priority Inversions Without Preemption-Threshold9 With Preemption-Threshold0 For more information, see academic research:
39 Part II An Example Project Using ThreadX with HEW on a Renesas SH
40 Understanding The Problem The Problem – Determining the impact of thread priority on context switching Why? – There may be impact we’re unaware of – Understanding the impact enables better use of priorities – Not understanding invites accidental impact How? – Create an application with multiple threads, assign priorities in different ways, and measure the performance of each case – Use development/debug tools available for SH
41 Available Tools SH7264 HEW IDE ThreadX RTOS Test Code TraceX Graphical Event Analyzer 41
42 Renesas SH7264 Core: 144MHz/288DMIPS Superscalar core FPU: 32/64-bit FPU with up to 288MFLOPS at 144MHz RAM: Up to 1MB embedded RAM Display: Up to VGA LCDC, Video Input Media Interface: I2S, SPDIF, NANDC Connectivity: USB HS Host/Device, CAN Bootable from external Serial, NAND and NOR flash 42
43 Express Logic’s ThreadX RTOS Small, fast, easy-to-use RTOS for hard real-time applications. Widely used in Medical, Consumer, Industrial markets. FDA 501(k) and IEC approved. Footprint: 2KB – 15KB on SH-2A Automatically scales based on services used Speed: ~ 1-2ms context switch Most services cycles API: Intuitive, 66 functions in 8 categories 43
44 ThreadX Technology Picokernel Architecture Non-layered implementation for size & speed Deterministic processing, not affected by number of application objects Automatic scaling Event chaining Simplifies processing dependent on multiple events Reduces number of threads required Performance metrics Counts various system events and operations (context switches, etc.) Execution Profile Kit File system, Network stack, USB stack host/device/OTG, Graphics Full featured, fully integrated TraceX and StackX development tools Innovative tools for real-time systems Optimized Interrupt Processing Only scratch registers saved/restored if no preemption No idle thread, hence no context save/restore when system is idle Most of API available directly from ISRs Optional timer thread or direct timer processing in ISR 44
45 ThreadX For Fast Time-To-Market
46 ThreadX For High Performance Total Iterations
47 TraceX Analysis tool to examine system “events” RTOS logs “events” in trace buffer in target memory Events include RTOS services like “queue_send”, “queue_receive” Also internal RTOS operations like “internal_suspend” Upload Trace Buffer to host as a file Binary Hex Table TraceX reads file and converts to graphical representation Shows all threads Shows all logged events Shows time-ticks Shows context switches 47
48 The Application Equal or Unique: Does it matter? Is one method “better” than another? What are the consequences of each with respect to context switching? Construct a system to run and observe Producer-consumer application One producer, three consumers Continuous operation Log events Run to breakpoint View events Count context switches Measure throughput Draw conclusions Producer Consumer 48
49 A Test To See The Impact Receive Messages Send Messages to Thread A Send Messages to Thread B Send Messages to Thread C Send Messages to Thread A Send Messages to Thread B Send Messages to Thread C Thread A Thread B Thread C Thread D Time Cycle 1Cycle 2 49
50 Priority Assignments Case-1 Equal Priorities Thread A = 4 Thread B = 4 Thread C = 4 Thread D = 4 Round-Robin Scheduling Case-2 Unique Priorities Thread A = 1 Thread B = 2 Thread C = 3 Thread D = 4 Preemptive Scheduling Case-2A Preemption-Threshold Case-1Case-2 Thread A Thread B Thread C Thread D Thread B Thread C Thread A Priority Case-2/2A
51 Building And Running the Test Cases Case-1 Review Code Build Download Set Breakpoint Run to Breakpoint – Use TraceX to view activity – Use TraceX to measure timer ticks – Other TraceX information Case-2 Modify code (change priorities) Re-build, etc. Case-2A Set Preemption-Threshold to 1 51
52 Live Demo 52
53 Examining The Events Case-1: Equal Priorities Thread D sends 3 messages to each queue Thread D then suspends (“relinquish”) Thread A reads its 3 messages then suspends (queue empty) Similarly, for Threads B and C Thread D then writes another set of messages 9 Messages 4 Context Switches Message-1To Queues A, B, C Message-2 Message-3 53
54 Examining The Events Case-2: Unique Priorities Thread D sends a message to Thread A Thread A preempts Thread A reads its message then suspends (queue empty) Thread D sends message to Thread B Thread B preempts Similarly for Thread C 9 Messages 18 Context Switches
55 Examining The Events Case-2A: Preemption-Threshold Thread D sends 3 messages to each queue Thread D then suspends (“relinquish”) Thread A reads its 3 messages then suspends (queue empty) Similarly, for Threads B and C Thread D then writes another set of messages 9 Messages 4 Context Switches
Compare Context Switches Case-1 (Equal Priorities) Case-2A (Preemption-Threshold) Case-2 (Unique Priorities) Context Switches 56
57 Compare Timing Case-1 and 2A each show 4,420 ticks in a cycle Case-2 shows 7,531 ticks in a cycle 57
58 Assessing The Results Context Switches Throughput CaseMessages Context Switches Case-1: Equal Priorities94 Case-2: Unique Priorities918 Case-2A: Preemption-Threshold94 Measurement Case-1 (Equal Priorities) Case-2 (Unique Priorities) Case-2A (Preemption -Threshold) Ratio (Case 1 or 2A vs Case-2) Context Switches % Elapsed Time4,420 ticks7,531 ticks4,420 ticks170% Messages Sent999No Change Messages Received999No Change 58
59 Summary And Conclusions Using an RTOS is easy and efficient And, enables use of other tools How You Assign Priorities Has an Impact Priority assignment can have a significant impact on the number of context switches an application performs Throughput vs. Responsiveness Preemption is necessary for maximum responsiveness, but may have a negative impact on throughput Consider equal priorities for throughput Equal priorities allow for fewer preemptions, less overhead Consider Preemption-Threshold Preemption-threshold might offer a solution to excessive context switches, while retaining maximum responsiveness Consider a combination Equal priorities for a set of threads, unique priorities for others, preemption-threshold where appropriate 59
60 Q/A For further information Contact Express Logic, Inc THREADX ( ) 60
61 Express Logic Innovation 61
62 Thank You