Presentation is loading. Please wait.

Presentation is loading. Please wait.

SOC Consortium Course Material Lab2 Debugging and Evaluation Speaker: Sun-Rise Wu Directed by Prof. Tien-Fu Chen October 23, 2003 National Chung Cheng.

Similar presentations


Presentation on theme: "SOC Consortium Course Material Lab2 Debugging and Evaluation Speaker: Sun-Rise Wu Directed by Prof. Tien-Fu Chen October 23, 2003 National Chung Cheng."— Presentation transcript:

1 SOC Consortium Course Material Lab2 Debugging and Evaluation Speaker: Sun-Rise Wu Directed by Prof. Tien-Fu Chen October 23, 2003 National Chung Cheng University Adopted from NCTU & NTU SOC Course Material

2 SOC Consortium Course Material 2 Goal of This Lab ARM Debug Target –Usage of different ARM debug architecture Debug skills Debug skills to be used to debug both software of processor and memory-mapped hardware design running at the target platform. Software cost estimation –Estimation of code sizes and performance of benchmark Profiling utility –Can be used to estimate percentage time of each function in an application Memory configuration –For performance/cost trade-off

3 SOC Consortium Course Material 3 Outline Introduction ARM Debug Target Debugging Skills Software Quality Measurement (Evaluation) AppendixReference

4 SOC Consortium Course Material 4 Introduction (1/2) A debugger is software that enables you to make use of a debug agent in order to examine and control the execution of software running on a debug target. The debugger issues instructions that can: 1.Load software into memory on the target 2.Start and stop execution of that software 3.Display contents of memory, registers, variables 4.Enable you to change stored values. 5.Software Quality Measurement ( code size, performance, profiling.. )

5 SOC Consortium Course Material 5 Introduction (2/2) ARM support two methods to do debugging. GUI : ARM eXtended Debugger (AXD). DOS : ARM Symbolic Debugger (armsd). AXD armsd

6 SOC Consortium Course Material 6 ARM Debug Target AXD can debug design through: –ARMulator (software) –Multi-ICE (hardware) use JTAG –Angel (hardware) Use COM port

7 SOC Consortium Course Material 7 Multi-ICE Arch. (1/3) Multi-ICE connection

8 SOC Consortium Course Material 8 Multi-ICE Arch. (2/3) Debugging software can be run on different computer through Network.

9 SOC Consortium Course Material 9 Multi-ICE Arch. (3/3) To support network connections, an additional application must be running on the windows workstation that runs the The multi-ICE server.  the portmapper allows software on other computers on the network to locate the The multi-ICE server.

10 SOC Consortium Course Material 10 Angel (1/5) Angel system –Debugger: Running on the host computer, giving instructions to Angel and displaying the results obtained from it. –Angel debug monitor: Running alongside the application being debugged on the target platform. –Armsd: The command line must be of the form: armsd –adp –port s=1 –linespeed 38400 image.axf

11 SOC Consortium Course Material 11 Angel (2/5) Debug support –Reporting and modifying memory and processor status –Downloading applications to the target system –Setting breakpoints C library semihosting support –Enabling applications linked the ARM C and C++ libraries to make semihosting requests by SWI Communications support –Using ADP for communicates –Providing an error-correcting communications protocol.

12 SOC Consortium Course Material 12 Angel (3/5) Angel’s communications diagram

13 SOC Consortium Course Material 13 Angel (4/5) Task management –Ensuring that only a single operation is carried out at any time –Assigning task priorities and schedules tasks accordingly –Controlling the Angel environment processor mode

14 SOC Consortium Course Material 14 Angel (5/5) Exception handling SWIInstalling it to support semihosting requests, to allow applications and Angel to enter Supervisor mode UndefinedUsing 3 undefined instructions to set breakpoints in code Data, Prefetch Abort Reporting the exception to the debugger, suspend the application, and pass control back to the debug FIQ,IRQEnabling Angel communications to run off, or both types of interrupt.

15 SOC Consortium Course Material 15 Debugging Skills Control of program execution –set breakpoints on interesting instructions –set watchpoints on interesting data accesses –single step through code Examine and change processor state –read and write register values Examine and change system state –access to system memory Interleaving source code –show C/C++ code and assemble code together

16 SOC Consortium Course Material 16 Watch / break point Watchpoints are taken when the data being watchpointed has changed. Breakpoints are taken when the instruction being breakpointed reaches the execution stage. the program counter is not updated, and retains the address of the breakpointed instruction.

17 SOC Consortium Course Material 17 Software Quality Measurement Memory requirement of the program –Data type: Volatile (RAM), non-volatile (ROM) –Memory performance: access speed, data width, size and range Profiling –build up a picture of the percentage of time spent in each procedure. Performance benchmarking –Evaluate software performance prior to implement on hardware Writing efficient C for ARM cores –ARM/Thumb interworking –Coding styles

18 SOC Consortium Course Material 18 Application Code and Data Size armlink offers two options to provide the relevant information: -info sizes (sizes of all objects) -info totals (summary only) ============================================================ Image component sizes Code RO Data RW Data ZI Data Debug 25840 3444 0 0 104344 Object Totals 22680 762 0 300 9104 Library Totals ============================================================= Code RO Data RW Data ZI Data Debug 48520 4206 0 300 113448 Grand Totals ============================================================= Total RO Size(Code + RO Data) 52726 ( 51.49kB) Total RW Size(RW Data + ZI Data)300 ( 0.29kB) Total ROM Size(Code + RO Data + RW Data) 52726 ( 51.49kB) ============================================================= The size of code/data in –an ELF image can be viewed using fromelf –z –a library can be viewed using armar –sizes

19 SOC Consortium Course Material 19 ARM and Thumb Code Size Simple C routine if (x>=0) return x; else return -x; The equivalent ARM assembly IabsCMPr0,#0;Compare r0 to zero RSBLTr0,r0,#0;If r0<0 (less than=LT) then do r0= 0-r0 MOVpc,lr;Move Link Register to PC (Return) The equivalent Thumb assembly CODE16 ;Directive specifying 16-bit (Thumb) instructions labsCMPr0,#0;Compare r0 to zero BGEreturn;Jump to Return if greater or ;equal to zero NEGr0,r0;If not, negate r0 returnMOVpc,lr;Move Link register to PC (Return)

20 SOC Consortium Course Material 20 Memory Map and Size Considerations The linker calculates the ROM and RAM requirements for code and data as follows: –ROM: Code size + RO data + RW data –RAM: RW Data + ZI data. You may wish to copy code from ROM into faster RAM, which will also increase the RAM requirements Placing the stacks in zero-wait state, 32-bit memory on-chip will significantly improve over 8 or 16- bit off-chip memory Default memory map ROM RAM

21 SOC Consortium Course Material 21 Profiling (1/3) About Profiling: –Profiler samples the program counter and computes the percentage time of each function spent. –Flat Profiling: If only pc-sampling info. is present. It can only display the time percentage spent in each function excluding the time in its children. Flat profiling accumulates limited information without altering the image –Call graph Profiling: If function call count info. is present. It can show the approximations of the time spent in each function including the time in its children. Extra code is added to the image

22 SOC Consortium Course Material 22 Profiling (2/3) Profiling Limitations: –Profiling is NOT available for code in ROM, or for scatter loaded images. –No data is gathered for programs that are too small. Call graph Profiling Flat Profiling

23 SOC Consortium Course Material 23 Profiling (3/3) The Profiler command syntax is as follows: armprof [-parent|-noparent] [-child|-nochild] [-sort options] prf_file Call graph Profiling Sample Output Namecum%self%desc%calls --------------------------------------------------------------------- main17.69%60.06%1 insert_sort77.76%17.69%60.06%1 strcmp60.06%0.00%243432 --------------------------------------------------------------------- qs_string_compare 3.21%0.00%13021 shell_sort3.46%0.00%14059 insert_sort60.06%0.00%243432 strcmp66.75%66.75%0.00%270512 --------------------------------------------------------------------- cumulative self descendants calls

24 SOC Consortium Course Material 24 Performance benchmarking (1/4) Execution time ( real-time vs. emulated ) –$sys_clock –Execution time = Total Cycle count / Cycle Frequency

25 SOC Consortium Course Material 25 Performance benchmarking (2/4) When ARM processor executes program, it will change these clock types according to demand of operating. –increase performance of data access –efficient mechanism of lower power N-cycles (Non-sequential cycle) The ARM core requests a transfer to or from an address which is unrelated to the address used in the preceding cycle. S-cycles (Sequential cycle) The ARM core requests a transfer to or from an address which is either the same, or one word or one-half-word greater than the preceding address. I-cycles (Internal cycle or Idle cycle) The ARM core does not require a transfer, as it is performing an internal function. C-cycles (Coprocessor register transfer cycle) Total clock cycle = (N + S + I + C)-cycles

26 SOC Consortium Course Material 26 Performance benchmarking (3/4) Estimation using different Memory model If no map file is specified: –ARMulator will use a 4GB bank of ‘ideal’ memory, i.e., no wait states. The map file defines regions of memory, and, for each region: –The address range to which that region is mapped. –The data bus width (in bytes). –The access times for the memory region (in ns) mapfile typically contains something like: 00000000 00020000 ROM 2 R 150/100 150/100 10000000 00008000 RAM 4 RW 100/65 100/65 start address, length, label, width, access, read time, write time type non-s/seq non-s/seq

27 SOC Consortium Course Material 27 Performance benchmarking (4/4) Benchmarking cached cores Cache efficiency –Avg. memory access time = hit time +Miss rate x Miss Penalty –Cache Efficiency = Core-Cycles / Total Bus Cycles

28 SOC Consortium Course Material 28 Writing efficient C for ARM cores ARM/Thumb interworking –ARM : Bottleneck, interrupt handle –Thumb: others Compiler optimization: –Space or speed (e.g, -Ospace or -Otime ) –Debug or release version (e.g., -O0,-O1 or -O2) –Instruction scheduling Coding style –Variable type and size –Parameter passing –Loop termination –Division operation and modulo arithmetic

29 SOC Consortium Course Material 29 Data Layout Default char a; short b; char c; int d; Optimized char a; char c; short b; int d; apadb c d abc d occupies 12 bytes, with 4 bytes of padding occupies 8 bytes, without any padding Group variables of the same type together. This is the best way to ensure that as little padding data as possible is added by the compiler.

30 SOC Consortium Course Material 30 Variable Types – Size Examples wordinc ADD a1,a1,#1 MOV pc,lr char charinc (char a) { return a + 1; } int wordinc (int a) { return a + 1; } short shortinc (short a) { return a + 1; } shortinc ADD a1,a1,#1 MOV a1,a1,LSL #16 MOV a1,a1,ASR #16 MOV pc,lr charinc ADD a1,a1,#1 AND a1,a1,#&ff MOV pc,lr

31 SOC Consortium Course Material 31 Stack Usage C/C++ code uses the stack intensively. The stack is used to hold: –Return addresses for subroutines –Local arrays & structures To minimize stack usage: –Keep functions small (few variables, less spills)minimize the number of ‘live’ variables (I.e., those which contain useful data at each point in the function) –Avoid using large local structures or arrays (use malloc/free instead) –Avoid recursion

32 SOC Consortium Course Material 32 Global Data Issues When declaring global variables in source code to be compiled with ARM Software, three things are affected by the way you structure your code: –How much space the variables occupy at run time. This determines the size of RAM required for a program to run. The ARM compilers may insert padding bytes between variables, to ensure that they are properly aligned. –How much space the variables occupy in the image. This is one of the factors determining the size of ROM needed to hold a program. Some global variables which are not explicitly initialized in your program may nevertheless have their initial value (of zero, as defined by the C standard) stored in the image. –The size of the code needed to access the variables. Some data organizations require more code to access the data. As an extreme example, the smallest data size would be achieved if all variables were stored in suitably sized bitfields, but the code required to access them would be much larger.

33 SOC Consortium Course Material 33 Loop termination … int acc(int n) { int i; //loop index int sum=0; for (i=1; i<=n ;i++) for (i=1; i<=n ;i++) sum+=i; return sum; } … int acc(int n) { int i; //loop index int sum=0; for (i=n; i!=0 ;i--) sum+=i; return sum; } … loop.cloop_opt.c

34 SOC Consortium Course Material 34 Division operation and modulo arithmetic The remainder operator ‘%’ is commonly used in modulo arithmetic. –This will be expensive if the modulo value is not a power of two –This can be avoid by rewriting C code to use if () statement heck unsigned counter1 (unsigned counter) { return (++counter % 60); } ============================= counter1 STMFBsp!, {lr} ADDr1, r0, #1 MOVr0, #0x3C BL__rt_udiv MOVr0, r1 LDMIAsp!, {pc} unsigned counter2 (unsigned counter) { if (++counter >= 60) counter=0; return counter } ============================= counter2 ADDr0, r0, #1 CMPr0, #0x3C MOVCSr0, #0 MOVpc, lr modulo.c modulo_opt.c

35 SOC Consortium Course Material 35 Appendix Content of JTAG Content of Embedded ICE

36 SOC Consortium Course Material 36 Content of JTAG (I) JTAG Arch. –Serial scan path from one cell to another –Controlled by TAP controller

37 SOC Consortium Course Material 37 Content of JTAG (II)

38 SOC Consortium Course Material 38 Content of JTAG (III) Pin NameFunction 1SPUSystem powered up, pin connected to Vdd through a 33 ohm resistor 3nTRSTTest reset, active low 5TDITest data in 7TMSTest mode select 9TCKTest clock 11TDOTest data out 12nICERSTTarget System Reset (sometimes referred to nSYSRST or nRSTOUT) 13SPUSystem powered up, pin connected to Vdd through a 33 ohm resistor 2, 4, 6, 8,10,14 VSSSystem ground reference (All VSS pins should be con- nected Boldface represents that these pins are JTAG Signals

39 SOC Consortium Course Material 39 Content of Embedded ICE (I) Debug extensions to the ARM core –The extensions consist of a number of scan chains around the processor core and some additional signals that are used to control the behavior of the core for debug purposes : BREAKPT: enables external hardware to halt processor execution for debug purposes.active high DBGRQ: is a level-sensitive input that causes the CPU to enter debug state when the current instruction has completed. DBGACK: is an output from the CPU that goes high when the core is in debug state

40 SOC Consortium Course Material 40 Content of Embedded ICE (II) The EmbeddedICE logic –This logic is the integrated onchip logic that provides JTAG debug support for ARM core. –This logic is accessed through the TAP controller on the ARM core using the JTAG interface. Consists of: Two watchpoint units A control register A status register A set of registers implementing the Debug Communications Channel link

41 SOC Consortium Course Material 41 Reference Profiling: “Application Note 93: Benchmarking with ARMulator” Efficient C programming: “Application Note 34: Writing Efficient C for ARM” Multi-ICE.pdf ADS_DebuggersGuide.pdf ADS_GettingStarted.pdf AFS_Referece_Guide.pdf Using EmbeddedICE.pdf


Download ppt "SOC Consortium Course Material Lab2 Debugging and Evaluation Speaker: Sun-Rise Wu Directed by Prof. Tien-Fu Chen October 23, 2003 National Chung Cheng."

Similar presentations


Ads by Google