Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2010 Renesas Electronics America Inc. All rights reserved. 131L: Optimizing RX Performance John Breitenbach President, Atlantex Corp. 14 October 2010.

Similar presentations


Presentation on theme: "© 2010 Renesas Electronics America Inc. All rights reserved. 131L: Optimizing RX Performance John Breitenbach President, Atlantex Corp. 14 October 2010."— Presentation transcript:

1 © 2010 Renesas Electronics America Inc. All rights reserved. 131L: Optimizing RX Performance John Breitenbach President, Atlantex Corp. 14 October 2010 Version: 1.4

2 2 © 2010 Renesas Electronics America Inc. All rights reserved. John Breitenbach President – Atlantex Corp. Contract Embedded Systems Design (Est. 1998) Renesas Alliance Partner Author of RX QDG, porting guides, app notes, demo code Embedded “cred”: 25+ years embedded systems development Dick Cheney’s UPS Remote control cows Geek “cred” Patent #7,054,045 for Holographic HMI First computers: – Atari 800 – Timex Sinclair 1000 – 16K!

3 3 © 2010 Renesas Electronics America Inc. All rights reserved. Renesas Technology and Solution Portfolio Microcontrollers & Microprocessors #1 Market share worldwide * Analog and Power Devices #1 Market share in low-voltage MOSFET** Solutions for Innovation ASIC, ASSP & Memory Advanced and proven technologies * MCU: 31% revenue basis from Gartner "Semiconductor Applications Worldwide Annual Market Share: Database" 25 March 2010 **Power MOSFET: 17.1% on unit basis from Marketing Eye 2009 (17.1% on unit basis).

4 4 © 2010 Renesas Electronics America Inc. All rights reserved. 4 Renesas Technology and Solution Portfolio Microcontrollers & Microprocessors #1 Market share worldwide * Analog and Power Devices #1 Market share in low-voltage MOSFET** ASIC, ASSP & Memory Advanced and proven technologies * MCU: 31% revenue basis from Gartner "Semiconductor Applications Worldwide Annual Market Share: Database" 25 March 2010 **Power MOSFET: 17.1% on unit basis from Marketing Eye 2009 (17.1% on unit basis). Solutions for Innovation

5 5 © 2010 Renesas Electronics America Inc. All rights reserved. 5 Microcontroller and Microprocessor Line-up Superscalar, MMU, Multimedia  Up to 1200 DMIPS, 45, 65 & 90nm process  Video and audio processing on Linux  Server, Industrial & Automotive  Up to 500 DMIPS, 150 & 90nm process  600uA/MHz, 1.5 uA standby  Medical, Automotive & Industrial  Legacy Cores  Next-generation migration to RX High Performance CPU, FPU, DSC Embedded Security  Up to 10 DMIPS, 130nm process  350 uA/MHz, 1uA standby  Capacitive touch  Up to 25 DMIPS, 150nm process  190 uA/MHz, 0.3uA standby  Application-specific integration  Up to 25 DMIPS, 180, 90nm process  1mA/MHz, 100uA standby  Crypto engine, Hardware security  Up to 165 DMIPS, 90nm process  500uA/MHz, 2.5 uA standby  Ethernet, CAN, USB, Motor Control, TFT Display High Performance CPU, Low Power Ultra Low Power General Purpose

6 6 © 2010 Renesas Electronics America Inc. All rights reserved. 6 Microcontroller and Microprocessor Line-up Superscalar, MMU, Multimedia  Up to 1200 DMIPS, 45, 65 & 90nm process  Video and audio processing on Linux  Server, Industrial & Automotive  Up to 500 DMIPS, 150 & 90nm process  600uA/MHz, 1.5 uA standby  Medical, Automotive & Industrial  Legacy Cores  Next-generation migration to RX High Performance CPU, FPU, DSC Embedded Security  Up to 10 DMIPS, 130nm process  350 uA/MHz, 1uA standby  Capacitive touch  Up to 25 DMIPS, 150nm process  190 uA/MHz, 0.3uA standby  Application-specific integration  Up to 25 DMIPS, 180, 90nm process  1mA/MHz, 100uA standby  Crypto engine, Hardware security  Up to 165 DMIPS, 90nm process  500uA/MHz, 2.5 uA standby  Ethernet, CAN, USB, Motor Control, TFT Display High Performance CPU, Low Power Ultra Low Power General Purpose RX Ethernet, CAN, USB, UART, SPI, IIC

7 7 © 2010 Renesas Electronics America Inc. All rights reserved. Innovation

8 8 © 2010 Renesas Electronics America Inc. All rights reserved. The RX Solution Renesas Extreme RX architecture provides you best in class performance, with a rich set of intelligent peripherals enabling you to create innovative, interactive, connected devices.

9 9 © 2010 Renesas Electronics America Inc. All rights reserved. Agenda Presentation: RX High-Performance Architecture Core Instruction set Peripherals Lab: Measure & Maximize RX Performance Basic benchmarking Improve a real application Q & A

10 10 © 2010 Renesas Electronics America Inc. All rights reserved. Key Takeaways By the end of this session you will be able to: Perform a basic benchmark of the RX Profile critical sections with the RX on-chip debug Maximize your code’s performance with smart peripherals

11 11 © 2010 Renesas Electronics America Inc. All rights reserved. RX Architecture: Enabling High-Performance

12 12 © 2010 Renesas Electronics America Inc. All rights reserved. RX600 CISC CPU 5-STAGE PIPELINE 5 STAGES OF PIPELINE F = FETCH INSTRUCTION D = DECODE INSTRUCTION E = EXECUTE INSTRUCTION M = READ OR WRITE MEMORY W = WRITE BACK TO REGISTER Inst 64bit Instructions Data 32bit Operands (Data) ENHANCED HARVARD ARCHITECTURE WRITE BUFFER For Slower Memory Typically SRAM Typically Flash Memory PRE-FETCH QUEUE (PFQ) Holds 4 to 32 Instructions for Slower Memory Memory Interface 64 32 100MHz CPU Core 16 x 32 bit General Purpose Registers 9 x 32 bit Control Registers RX Architecture … CPU Core and Pipeline 32 bit Floating Point Unit 32 x 32 MAC to 48 bit or 80 bit Result 32 x 32 DIV or MULT 32 bit or 64 bit Result Memory Protection Unit Interrupt Control On-Chip Debug ENHANCED HARVARD ARCHITECTURE 5-STAGE PIPELINE 64bits Buffer Only for Writes FDEMW TICK FD F E D F M E D F W M E D F F W M E D D F W M E E D F W M M E D F W E E E E E W M E D F Achieves One Clock-Per-Instruction (CPI)

13 13 © 2010 Renesas Electronics America Inc. All rights reserved. RX Architecture … Memory Interface SRAM, 100MHz Access 64 bits Flash Memory, 100MHz Access 64 bits 100 MHz Flash and SRAM means zero wait-state code and data access PFQ minimizes stalls from slower memory, such as external memory CPU is bus master of Internal Bus 1 Internal Bus 2 connects to peripherals… External Bus Pins for CPU External Bus Controller (BSC) 32 bits Internal Main Bus 1 32 bits Bus Bridge Peripherals RX600 MCU RX600 CPU 100MHz PIPELINEPFQ BUFFER 64b INST 32b DATA Bus Master of Internal Main Bus 1 BUS MATRIX Allows CPU to concurrently fetch Instructions or access Data from any of 3 sources: Flash Memory SRAM Internal Main Bus 1

14 14 © 2010 Renesas Electronics America Inc. All rights reserved. Multiple Peripheral Busses to Spread Bandwidth Loading CNTL Communication (USB, CAN, SCI, SPI, I2C) Timers (MTU, TPU, TMR, CMT) Analog (DAC, ADC, PGA) GPIO System Control (DMA, E2P, ICU, LVD, RTC, WDG, CLKS) Ethernet MAC Internal Main Bus 2 32 bits DTC (bus master) Bus Bridge DMAC (bus master) Ethernet DMAC (bus master) RX Architecture … System Interface RX600 CPU 100MHz PIPELINEPFQ BUFFER 64b INST 32b DATA External Bus Pins for CPU Bus Master of Internal Main Bus 1 64 bits Bus Bridge EXDMA (external bus master) 32 bits Internal Main Bus 1 32 bits RX600 MCU BUS MATRIX Allows CPU to concurrently fetch Instructions or access Data from any of 3 sources: Flash Memory SRAM Internal Main Bus 1 SRAM, 100MHz Access Flash Memory, 100MHz Access External Bus Controller (BSC) One External Device Another External Device

15 15 © 2010 Renesas Electronics America Inc. All rights reserved. RX Floating Point Unit

16 16 © 2010 Renesas Electronics America Inc. All rights reserved. Question: Who Said It? “Microprocessor manufacturers unfortunately seem to feel that floating-point math is not very important in embedded systems. This has not been my experience.” Jean Labrosse, President Micrium Author, μC/OS operating system

17 17 © 2010 Renesas Electronics America Inc. All rights reserved. RX Floating Point Unit IEEE 754 single precision 32 bits data format Subtract, Multiply, Divide and Integer Conversion directly from CPU registers IEEE 754 Exceptions Floating Point Instructions: FADD - Floating-point ADD FCMP - Floating-point COMPare FDIV - Floating-point DIVide FMUL - Floating-point MULtiply FSUB - Floating-point SUBtract FTOI - Float TO Integer ITOF - Integer TO Floating-point ROUND - ROUND floating-point to integer Floating Point Unit 410410000 Memory map Example: MOV.LR3,R4 FMUL#4104100000H,R4 8 tap FIR: 0.949 uS

18 18 © 2010 Renesas Electronics America Inc. All rights reserved. Under the Hood: FPU Code Generation Sample floating point operation: temperature conversion float Degrees_C, Degrees_F ; Degrees_C = 22.1 ; Degrees_F = (Degrees_C * 9.0/5.0) + 32.0; Variables: single precision floats Floating point constants Floating point operations

19 19 © 2010 Renesas Electronics America Inc. All rights reserved. Under the Hood: FPU Code Generation Code emitted by compiler float Degrees_C, Degrees_F ; Degrees_C = 22.1 ; Degrees_F = (Degrees_C * 9.0/5.0) + 32.0; Degrees_C = 22.1 ; MOV.L#41B0CCCDH,R3 Degrees_F = (Degrees_C * 9.0/5.0) + 32.0; MOV.LR3,R4 FMUL#41100000H,R4 FDIV#40A00000H,R4 FADD#42000000H,R4 Constants stored in IEEE 754 format RX floating point instructions… …operate directly on registers & memory

20 20 © 2010 Renesas Electronics America Inc. All rights reserved. Under the Hood: FPU Code Generation _COM_DIVf MOV.L R1,R15 XOR R2,R15 SHLL #1,R1 MOV.L R1,R3 SHLR #24,R3 SHLL #8,R1 SHLL #1,R2 MOV.L R2,R4 SHLR #24,R4 SHLL #8,R2 CMP #0FFH,R3 BEQ.W exception1 CMP #0FFH,R4 BEQ.W exception2 CMP #0H,R3 BEQ.W exception3 exception_return3 CMP #0H,R4 BEQ.B exception4 exception_return4 SUB R4,R3 ADD #7FH,R3,R3 OR #1H,R1 ROTR #1,R1 RORC R2 Comparison: 1 FPU instruction = 100+ SW instructions Degrees_C = 22.1 ; MOV.L#41B0CCCDH,R3 Degrees_F = (Degrees_C * 9.0/5.0) + 32.0; MOV.LR3,R4 FMUL#41100000H,R4 FDIV#40A00000H,R4 FADD#42000000H,R4 SHLR #1,R1 MOV.L #0H,R5 MOV.L #0H,R4 MOV.L #1AH,R14 BRA.S div_loop_entry div_loop SHLL #1,R5 BTST #0,R4 BNE.S div_loop_entry BSET #0,R5 div_loop_entry SHLL #1,R1 ROLC R4 BTST #1,R4 BNE.S div_1 SUB R2,R1 BC.B div_2 XOR #01H,R4 BRA.S div_2 div_1 ADD R2,R1 BNC.B div_2 XOR #01H,R4 div_2 SUB #1H,R14 BNE.B div_loop div_loop_exit AND #1H,R4 BEQ.S make_result ADD R2,R1 make_result MOV.L R5,R2 SHLL #1,R2 XOR #01H,R4 OR R4,R2 SHLL #6,R2 CMP #0H,R1 BEQ.S end_calc_sticky OR #20H,R2 end_calc_sticky MOV.L R2,R4 BTST #31,R4 BNE.S end_normalize SHLL #1,R2 SUB #1H,R3 end_normalize CMP #0FFH,R3 BLT.B 0FFFF885EH BRA.W return_inf CMP #-17H,R3 BGE.B 0FFFF8866H BRA.W return_zero CMP #0H,R3 BGT.B end_denormal denormalize_loop SHLR #1,R2 BNC.B next_loop BSET #0,R2 next_loop CMP #0H,R3 BGE.B round_denormal ADD #1H,R3 BRA.B denormalize_loop round_denormal BTST #7,R2 BEQ.B end_round_d MOV.L R2,R4 AND #017FH,R4 BEQ.S end_round_d ADD #0100H,R2,R2 BPZ.B end_round_d ADD #1H,R3 end_round_d BRA.B end_round end_denormal BTST #7,R2 BEQ.B end_round MOV.L R2,R4 AND #017FH,R4 BEQ.B end_round ADD #0100H,R2,R2 BNC.B end_round round_carry RORC R2 ADD #1H,R3 end_round SHLL #1,R2 SHLR #8,R2 SHLL #24,R3 OR R2,R3 MOV.L R3,R1 SHLL #1,R15 RORC R1 RTS

21 21 © 2010 Renesas Electronics America Inc. All rights reserved. RX Instruction Set

22 22 © 2010 Renesas Electronics America Inc. All rights reserved. Question: What programming language do you use? If it doesn’t run Python, I won’t use it C/C++ Real men program in assembler No, real men use a hex editor & opcodes I once programmed a database using only 1’s & 0’s You had 0’s !?!?

23 23 © 2010 Renesas Electronics America Inc. All rights reserved. Another Question… How big is your code? <10K 10K - 64K 64K – 128K Under one megabyte Under 4 Megabytes I write code for Microsoft… and own stock in Seagate!

24 24 © 2010 Renesas Electronics America Inc. All rights reserved. Instruction Set Target: Improve code density, support for High Level Langs Analyze code from real-world customer applications Adopt variable byte-length instruction Assign most used instructions to short instruction codes Add addressing modes Benchmark, refine, benchmark, refine… Result: 30% Code Size Reduction Data communication 1.0 Code size (relative) Motor control Data conversion Real-time control 28% less = RX600 = Cortex-M3 based MCU System control 19% less 17% less 25% less Note: Internal benchmark test, your results may vary

25 25 © 2010 Renesas Electronics America Inc. All rights reserved. RX Smart Peripherals

26 26 © 2010 Renesas Electronics America Inc. All rights reserved. Why Smart Peripherals? “I don’t care what it is, when it has an LCD screen, it makes it better.” Kevin Rose, Diggnation

27 27 © 2010 Renesas Electronics America Inc. All rights reserved. Peripherals Targets: Offload the CPU, ease migration, reduce power Cherry pick from extensive portfolio Add intelligent DMAC, Data Transfer Controller, new Timers, Ethernet, USB, CAN Result: Only 5% CPU loading for 60 Hz refresh of static image

28 28 © 2010 Renesas Electronics America Inc. All rights reserved. DMAC vs Data Transfer Control (DTC) Similarities Registers (SAR, DAR, Xfer & Block Count) Byte, words, long words Auto Increment/Decrement SAR/DAR Normal/Repeat/Block modes Interrupt generation Differences DMA faster, 1 transfer/cycle DMA dedicated registers for each channel DMA channels limited DTC many virtual channels DTC channels can be chained DTC much more flexible

29 29 © 2010 Renesas Electronics America Inc. All rights reserved. Lab Technique: Measuring Performance

30 30 © 2010 Renesas Electronics America Inc. All rights reserved. Measuring System Performance Performance Counters Hardware supported high-resolution timer Counts execution cycles & number of passes for two sections of user code No affect on your code Two 32-bit timers or one 64-bit timer Triggered by complex events Selectively record all executions cycles, interrupts, exceptions

31 31 © 2010 Renesas Electronics America Inc. All rights reserved. Performance Analysis Setup

32 32 © 2010 Renesas Electronics America Inc. All rights reserved. RX Performance Labs: Goals RX Core Benchmarking Dhrystone Optimize a real application Benchmark application – polled mode Use timers & interrupts Use DMAC to read & buffer ADC readings “90% CPU loading doubles the schedule, 95% triples it.” Alan M. Davis “201 Principles of Software Development”

33 33 © 2010 Renesas Electronics America Inc. All rights reserved. Start your lab!

34 34 © 2010 Renesas Electronics America Inc. All rights reserved. Start the Lab Keep your dice turned to the section of the lab you are on. (Instructions are provided in the lab handout) Please refer to the Lab Handout and let’s get started! 34

35 35 © 2010 Renesas Electronics America Inc. All rights reserved. Checking Progress We are using the die to keep track of where everyone is in the lab. Make sure to update it as you change sections. When done with the lab, your die will have the 6 pointing up as shown here. 35

36 36 © 2010 Renesas Electronics America Inc. All rights reserved. RX Performance Labs: Review RX Core Benchmarking Dhrystone: 1.65 DMIPS/MHz Performance scales linearly to 100 MHz

37 37 © 2010 Renesas Electronics America Inc. All rights reserved. RX Performance Labs: Sample Application Collect samples from ADC Scale reading to J thermocouple range, convert °C Version 1: Polled No optimization, do everything as fast as possible Version 2: Interrupts Post-process while ADC is sampling One interrupt/sample Version 3: DMAC DMAC buffers 5,000 readings One interrupt/buffer

38 38 © 2010 Renesas Electronics America Inc. All rights reserved. Checking Progress We are using the die to keep track of where everyone is in the lab. Make sure to update it as you change sections. When done with the lab, your die will have the 6 pointing up as shown here. 38

39 39 © 2010 Renesas Electronics America Inc. All rights reserved. RX Performance Labs: Review Reduce CPU overhead w/ smart peripherals Benchmark: 100% of CPU @ 330 kHz Timer & interrupts: 97% of CPU @ 500 kHz Plus DMAC: 71% of CPU @ 500 kHz Take advantage of smart peripherals!

40 40 © 2010 Renesas Electronics America Inc. All rights reserved. Questions?

41 41 © 2010 Renesas Electronics America Inc. All rights reserved. Innovation

42 42 © 2010 Renesas Electronics America Inc. All rights reserved. Feedback Form Please fill out the feedback form! If you do not have one, please raise your hand

43 43 © 2010 Renesas Electronics America Inc. All rights reserved. Thank You!

44


Download ppt "© 2010 Renesas Electronics America Inc. All rights reserved. 131L: Optimizing RX Performance John Breitenbach President, Atlantex Corp. 14 October 2010."

Similar presentations


Ads by Google