Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright © 2008 Texas Instruments. All rights reserved. 1.Introduction 2.Real-Time System Design Considerations 3.Hardware Interrupts (HWI) 4.Software.

Similar presentations


Presentation on theme: "Copyright © 2008 Texas Instruments. All rights reserved. 1.Introduction 2.Real-Time System Design Considerations 3.Hardware Interrupts (HWI) 4.Software."— Presentation transcript:

1 Copyright © 2008 Texas Instruments. All rights reserved. 1.Introduction 2.Real-Time System Design Considerations 3.Hardware Interrupts (HWI) 4.Software Interrupts (SWI) 5.Task Authoring (TSK) 6.Data Streaming (SIO) 7.Multi-Threading (CLK, PRD) 8.BIOS Instrumentation (LOG, STS, SYS, TRC) 9.Static Systems (GCONF, TCONF) 10.Cache (BCACHE) 11.Dynamic Systems (MEM, BUF) 12.Flash Programming (HexAIS, Flashburn) 13.Inter-Thread Communication (MSGQ,...) 14.DSP Algorithm Standard (XDAIS) 15.Input Output Mini-Drivers (IOM) 16.Direct Memory Access (DMA) 17.Review DSP/BIOS System Integration Workshop D SP TEXAS INSTRUMENTS TECHNOLOGY 1

2 C6000 Memory Considerations 1. Use internal RAM + Fastest, lowest power, best bandwidth − Limited amount of internal RAM available 2. Add external memory + Allows much greater code and data sizes − Much lower performance than internal memory access 3. Enable cache + Near optimal performance for loops and iterative data access + Improves speed, power, EMIF availability − No benefit for non-looping code or 1x used data 4. Use internal RAM and external memory + External memory for low demand or highly looped items + Internal memory for highest demand or DMA-shared memory 5. Tune code for Cache usage + Assure optimal fit to cache + Avoid CPU/DMA contention problems intro 1. IRAM 2. EMIF 3. Use Cache 4. IRAM+EMIF 5. Code Tune D SP TEXAS INSTRUMENTS TECHNOLOGY 2

3 C6000 Memory Considerations 1. Use internal RAM + Fastest, lowest power, best bandwidth − Limited amount of internal RAM available 2. Add external memory + Allows much greater code and data sizes − Much lower performance than internal memory access 3. Enable cache + Near optimal performance for loops and iterative data access + Improves speed, power, EMIF availability − No benefit for non-looping code or 1x used data 4. Use internal RAM and external memory + External memory for low demand or highly looped items + Internal memory for highest demand or DMA-shared memory 5. Tune code for Cache usage + Assure optimal fit to cache + Avoid CPU/DMA contention problems intro 1. IRAM 2. EMIF 3. Use Cache 4. IRAM+EMIF 5. Code Tune D SP TEXAS INSTRUMENTS TECHNOLOGY 3

4 Option 1 : Use Internal Memory  When possible, place all code and data into internal RAM  Select all internal memory to be mapped as RAM  Add IRAM(s) to memory map  Route code/data to IRAM(s)  Ideal choice for initial code development  Defines optimal performance possible  Avoids all concerns of using external memory  Fast and easy to do – just download and run from CCS  In production systems  Add a ROM type resource externally to hold code and initial data  Use DMA (or CPU xfer) to copy runtime code/data to internal RAM  Boot routines available on most TI DSPs  Limited range  Usually not enough IRAM for a complete system  Often need to add external memory and route resources there D SP TEXAS INSTRUMENTS TECHNOLOGY 4

5 C6000 Internal Memory Topology L1P Controller CPU (SPLOOP) L1D Controller L1D RAM / Cache 32B 8B 32B L1P RAM / Cache 32B L2 Controller L2 ROM L2 IRAM / Cache  Level 1 – or “L1” – RAM  Highest performance of any memory in a C6000 system  Two banks are provided L1P (for program) and L1D (for data)  Single cycle memory with wide bus widths to the CPU  Level 2 – or “L2” – RAM  Second best performance in system, can approach single cycle in bursts  Holds both code and data  Usually larger than L1 resources  Wide bus widths to CPU - via L1 controllers D SP TEXAS INSTRUMENTS TECHNOLOGY 5

6 Configure IRAM via GCONF To obtain maximum IRAM, zero the internal caches, which share this memory D SP TEXAS INSTRUMENTS TECHNOLOGY 6

7 Define IRAM Usage via GCONF D SP TEXAS INSTRUMENTS TECHNOLOGY 7

8 Define IRAM Usage via GCONF Here, L1D is used for the most critical storage, and all else is routed to L2 “IRAM”. A variety of options can be quickly tested, and the best kept in the final revision. D SP TEXAS INSTRUMENTS TECHNOLOGY 8

9 Sample of C6000 On-Chip Memory Options DeviceCPUL1PL1DL2$ C6416T6416 1024250 C645564+32 2048300 DM643764+328012830 C672767+32025623 C674767++32 * 25615 Notes:  Memory sizes are in KB  Prices are approximate, @ 100pc volume  6747 also has 128KB of L3 IRAM D SP TEXAS INSTRUMENTS TECHNOLOGY 9

10 C6000 Memory Considerations 1. Use internal RAM + Fastest, lowest power, best bandwidth − Limited amount of internal RAM available 2. Add external memory + Allows much greater code and data sizes − Much lower performance than internal memory access 3. Enable cache + Near optimal performance for loops and iterative data access + Improves speed, power, EMIF availability − No benefit for non-looping code or 1x used data 4. Use internal RAM and external memory + External memory for low demand or highly looped items + Internal memory for highest demand or DMA-shared memory 5. Tune code for Cache usage + Assure optimal fit to cache + Avoid CPU/DMA contention problems intro 1. IRAM 2. EMIF 3. Use Cache 4. IRAM+EMIF 5. Code Tune D SP TEXAS INSTRUMENTS TECHNOLOGY 10

11 Option 2 : Use External Memory  For larger systems, place code and data into external memory  Define available external memories  Route code/data to external memories  Essential for systems with environments larger than available internal memory  Allows systems with size range from Megs to Gigs  Often realized when a build fails for exceeding internal memory range  Avoids all concerns of using external memory  Fast and easy to do – just download and run from CCS  Reduced performance  Off chip memory has wait states  Lots of setup and routing time to get data on chip  Competition for off-chip bus : data, program, DMA, …  Increased power consumption D SP TEXAS INSTRUMENTS TECHNOLOGY 11

12 C6000 Memory Topology L1P Controller CPU (SPLOOP) L1D Controller L1D RAM / Cache 32B 8B 32B L1P RAM / Cache 32B L2 Controller L2 ROM L2 IRAM / Cache External Memory Controller External Memory 16B 4-8 B  External memory interface has narrower bus widths  CPU access to external memory costs many cycles  Exact cycle count varies greatly depending on state of the system at the time D SP TEXAS INSTRUMENTS TECHNOLOGY 12

13 Define External Memory via GCONF D SP TEXAS INSTRUMENTS TECHNOLOGY 13

14 Define External Usage via GCONF D SP TEXAS INSTRUMENTS TECHNOLOGY 14

15 C6000 Memory Considerations 1. Use internal RAM + Fastest, lowest power, best bandwidth − Limited amount of internal RAM available 2. Add external memory + Allows much greater code and data sizes − Much lower performance than internal memory access 3. Enable cache + Near optimal performance for loops and iterative data access + Improves speed, power, EMIF availability − No benefit for non-looping code or 1x used data 4. Use internal RAM and external memory + External memory for low demand or highly looped items + Internal memory for highest demand or DMA-shared memory 5. Tune code for Cache usage + Assure optimal fit to cache + Avoid CPU/DMA contention problems intro 1. IRAM 2. EMIF 3. Use Cache 4. IRAM+EMIF 5. Code Tune D SP TEXAS INSTRUMENTS TECHNOLOGY 15

16 Option 3 : Use Cache & External Memory  Improves peformance in code loops or re-used data values  First access to external resource is ‘normal’  Subsequent accesses are from on-chip caches with:  Much higher speed  Lower power  Reduced external bus contention  Not helpful for non-looping code or 1x used data  Cache holds recent data/code for re-use  Without looping or re-access, cache cannot provide a benefit  Not for use with ‘devices’  Inhibits re-reads from ADCs and writes to DACs  Must be careful when CPU and DMA are active in the same RAMs  Enabling the cache:  Select maximum amounts of internal memory to be mapped as cache  Remove IRAM(s) from memory map  Route code/data to off-chip (or possible remaining onchip) resources  Map off-chip memory as cachable D SP TEXAS INSTRUMENTS TECHNOLOGY 16

17 C6000 Memory Topology L1P Controller CPU (SPLOOP) L1D Controller L1D RAM / Cache 32B 8B 32B L1P RAM / Cache 32B L2 Controller L2 ROM L2 IRAM / Cache External Memory Controller External Memory 16B 4-8 B  Caches automatically collect data and code brought in from EMIF  If requested again, caches provide the information, saving many cycles over repeated EMIF activity  Writes to external memory are also cached to reduce cycles and free EMIF for other usage  Writeback occurs when a cache needs to mirror new addresses  Write buffers on EMIF reduce need for waiting by CPU for writes D SP TEXAS INSTRUMENTS TECHNOLOGY 17

18 Configure Cache via GCONF For best cache results, maximize the internal cache sizes D SP TEXAS INSTRUMENTS TECHNOLOGY 18

19 Memory Attribute Registers : MARs Start AddressEnd AddressSizeSpace 0x4200 00000x42FF FFFF16MBCS2_ 0x4400 00000x44FF FFFF16MBCS3_ 0x4600 00000x46FF FFFF16MBCS4_ 0x4800 00000x48FF FFFF16MBCS5_ 0x8000 00000x8FFF FFFF256MBDDR2 MARMAR AddressEMIF Address Range 660x0184 81084200 0000 – 42FF FFFF 1280x0184 82008000 0000 – 80FF FFFF 1290x0184 82048100 0000 – 81FF FFFF 1300x0184 82088200 0000 – 82FF FFFF 1310x0184 820C8300 0000 – 83FF FFFF 1320x0184 82108400 0000 – 84FF FFFF 1330x0184 82148500 0000 – 85FF FFFF 1340x0184 82188600 0000 – 86FF FFFF 1350x0184 821C8700 0000 – 87FF FFFF  256 MAR bits define cache-ability of 4G of addresses as 16MB groups  Many 16MB areas not used by chip or present on given board  Example: Usable 6437 EMIF addresses at right  EVM6437 memory is:  128MB of DDR2 starting at 0x8000 0000  FLASH, NAND Flash, or SRAM (selected via jumpers) in CS2_ space at 0x4200 0000  Note: with the C64+ program memory is always cached regardless of MAR settings D SP TEXAS INSTRUMENTS TECHNOLOGY 19

20 Configure MAR via GCONF MAR66, 128-135 turned ‘on’ D SP TEXAS INSTRUMENTS TECHNOLOGY 20

21 BCACHE API IRAM modes and MAR can be set in code via BCACHE API  In projects where GCONF is not being used  To allow active run-time reconfiguration option Cache Size Management BCACHE_getSize(*size) rtn sizes of all caches BCACHE_setSize(*size) set sizes of all caches MAR Bit Management marVal = BCACHE_getMar(base) rtn mar val for given address BCACHE_setMar(base, length, 0/1) set mars stated address range typedef struct BCACHE_Size { BCACHE_L1_Size l1psize ; BCACHE_L1_Size l1dsize ; BCACHE_L2_Size l2size ; } BCACHE_Size ; #L1(kB)L2 (kB) 000 1432 2864 316128 432256 D SP TEXAS INSTRUMENTS TECHNOLOGY 21

22 C6000 Memory Considerations 1. Use internal RAM + Fastest, lowest power, best bandwidth − Limited amount of internal RAM available 2. Add external memory + Allows much greater code and data sizes − Much lower performance than internal memory access 3. Enable cache + Near optimal performance for loops and iterative data access + Improves speed, power, EMIF availability − No benefit for non-looping code or 1x used data 4. Use internal RAM and external memory + External memory for low demand or highly looped items + Internal memory for highest demand or DMA-shared memory 5. Tune code for Cache usage + Assure optimal fit to cache + Avoid CPU/DMA contention problems intro 1. IRAM 2. EMIF 3. Use Cache 4. IRAM+EMIF 5. Code Tune D SP TEXAS INSTRUMENTS TECHNOLOGY 22

23 Option 4 : IRAM & Cache Ext’l Memory  Let some IRAM be Cache to improve external memory performance  First access to external resource is ‘normal’  Subsequent access from on-chip caches – better speed, power, EMIF loading  Keep some IRAM as normal addressed internal memory  Most critical data buffers (optimal performance in key code)  Target for DMA arrays routed to/from peripherals (2x EMIF savings)  Internal program RAM  Must be initialized via DMA or CPU before it can be used  Provides optimal code performance  Setting the internal memory properties:  Select desired amounts of internal memory to be mapped as cache  Define remainder as IRAM(s) in memory map  Route code/data to desired on and off chip memories  Map off-chip memory as cachable  To determine optimal settings  Profile and/or use STS on various settings to see which is best  Late stage tuning process when almost all coding is completed D SP TEXAS INSTRUMENTS TECHNOLOGY 23

24 Select Desired IRAM Configuration DeviceCPUL1PL1DL2$ C6416T6416 1024250 C645564+32 2048300 DM643764+328012830 C672767+32025623 C674767++32 * 25615 Notes:  Memory sizes are in KB  Prices are approximate, @ 100pc qty  6747 also has 128KB of L3 IRAM #L1(kB)L2 (kB) 000 1432 2864 316128 432256  Define desired amount of IRAM to be cache (GCONF or BCACHE)  Balance of available IRAM is ‘normal’ internal mapped-address RAM  Any IRAM beyond cache limits are always address mapped RAM  Single cycle access to L1 memories  L2 access time can be as fast as single cycle  Regardless of size, L2 cache is always 4 way associative 24

25 Set Cache Size via GCONF or BCACHE Cache Size Management BCACHE_getSize(*size) BCACHE_setSize(*size) typedef struct BCACHE_Size { BCACHE_L1_Size l1psize ; BCACHE_L1_Size l1dsize ; BCACHE_L2_Size l2size ; } BCACHE_Size ; #L1(kB)L2 (kB) 000 1432 2864 316128 432256 D SP TEXAS INSTRUMENTS TECHNOLOGY 25

26 C64x+ L1D Memory Banks 512x32  Only one L1D access per bank per cycle  Use DATA_MEM_BANK pragma to begin paired arrays in different banks  Note: sequential data are not down a bank, instead they are along a horizontal line across across banks, then onto the next horizontal line  Only even banks (0, 2, 4, 6) can be specified 512x32 Bank 0Bank 2Bank 4Bank 6 #pragma DATA_MEM_BANK(a, 4); short a[256]; #pragma DATA_MEM_BANK(x, 0); short x[256]; for(i = 0; i < count ; i++) { sum += a[i] * x[i]; } #pragma DATA_MEM_BANK(a, 4); short a[256]; #pragma DATA_MEM_BANK(x, 0); short x[256]; for(i = 0; i < count ; i++) { sum += a[i] * x[i]; } D SP TEXAS INSTRUMENTS TECHNOLOGY 26

27 C6000 Memory Considerations 1. Use internal RAM + Fastest, lowest power, best bandwidth − Limited amount of internal RAM available 2. Add external memory + Allows much greater code and data sizes − Much lower performance than internal memory access 3. Enable cache + Near optimal performance for loops and iterative data access + Improves speed, power, EMIF availability − No benefit for non-looping code or 1x used data 4. Use internal RAM and external memory + External memory for low demand or highly looped items + Internal memory for highest demand or DMA-shared memory 5. Tune code for Cache usage + Assure optimal fit to cache + Avoid CPU/DMA contention problems intro 1. IRAM 2. EMIF 3. Use Cache 4. IRAM+EMIF 5. Code Tune D SP TEXAS INSTRUMENTS TECHNOLOGY 27

28 5 : Tune Code for Cache Optimization  Align key code and data for maximal cache usage  Match code/data to fit cache lines fully – align to 128 bytes  Clear caches when CPU and DMA are both active in a given memory  Keep cache from presenting out-of-date values to CPU or DMA  Size and align cache usage where CPU and DMA are both active  Avoid risk of having neighboring data affected by cache clearing operations  Freeze cache to maintain contents  Lock in desired cache contents to maintain performance  Ignore new collecting until cache is ‘thawed’ for reuse There are many ways in which caching can lead to data errors, however a few simple techniques provide the ‘cure’ for all these problems D SP TEXAS INSTRUMENTS TECHNOLOGY 28

29 Example of read coherency problem : 1. DMA collects Buf A 2. CPU reads Buf A, buffer is copied to Cache; DMA collects Buf B 3. CPU reads Buf B, buffer is copied to Cache; DMA collects Buf C over “A” 4. CPU reads Buf C… but Cache sees “A” addresses, provides “A” data – error! 5. Solution: Invalidate Cache range before reading new buffer Write coherency example : 1. CPU writes Buf A. Cache holds written data 2. DMA reads non-updated data from external memory – error! 3. Solution: Writeback Cache range after writing new buffer Program coherency : 1. Host processor puts new code into external RAM 2. Solution: Invalidate Program Cache before running new code Buf A Buf B Buf A Buf B DSP DMAA/D Cache Coherency Cache Ext’l RAM Note: there are NO coherency issues between L1 and L2 ! D SP TEXAS INSTRUMENTS TECHNOLOGY 29

30 Managing Cache Coherency blockPtr: start address of range to be invalidated byteCnt : number of bytes to be invalidated Wait: 1 = wait until operation is completed Cache BCACHE_inv(blockPtr, byteCnt, wait) InvalidateBCACHE_invL1pAll() CacheBCACHE_wb(blockPtr, byteCnt, wait) WritebackBCACHE_wbAll() Invalidate &BCACHE_wbInv(blockPtr, byteCnt, wait) WritebackBCACHE_wbInvAll() Sync to CacheBCACHE_wait() D SP TEXAS INSTRUMENTS TECHNOLOGY 30

31 False Addresses Buffer Cache Lines  False Address: ‘neighbor’ data in the cache but outside the buffer range  Reading data from the buffer re-reads entire line  If ‘neighbor’ data changed externally before CPU was done using prior state, old data will be lost/corrupted as new data replaces it  Writing data to buffer will cause entire line to be written to external memory  External neighbor memory could be overwritten with old data Coherence Side Effect – False Addresses  False Address problems can be avoided by aligning the start and end of buffers on cache line boundaries  Align memory on 128 byte boundaries  Allocate memory in multiples of 128 bytes #defineBUF 128 #pragma DATA_ALIGN (in,BUF) shortin[2][20*BUF]; #defineBUF 128 #pragma DATA_ALIGN (in,BUF) shortin[2][20*BUF]; D SP TEXAS INSTRUMENTS TECHNOLOGY 31

32  Freezing cache prevents data that is currently cached from being evicted  Cache Freeze  Responds to read and write hits normally  No updating of cache on miss  Freeze supported on C64x+ L2/L1P/L1D  Commonly used with Interrupt Service Routines so that one-use code does not replace realtime algo code  Other cache modes: Normal, Bypass Cache Mode Management Mode = BCACHE_getMode(level) rtn state of specified cache oldMode = BCACHE_setMode(level, mode) set state of specified cache typedef enum { BCACHE_ NORMAL, BCACHE_ FREEZE, BCACHE_ BYPASS } BCACHE_Mode ; typedef enum { BCACHE_ L1D, BCACHE_ L1P, BCACHE_ L2 } BCACHE_Level ; Cache Freeze (C64x+) D SP TEXAS INSTRUMENTS TECHNOLOGY 32

33 BCACHE-Based Cache Setup Example This BCACHE example shows how to put the EVM 6437 in the default power-up mode. (Note: code such as this will required for stand-alone bootup where CCS GEL files are not present) #include "myWorkcfg.h“// most BIOS headers provided by config tool #include // headers for DSP/BIOS Cache functions #defineDDR2BASE 0x80000000; // size of DDR2 area on DM6437 EVM #defineDDR2SZ 0x07D00000; // size of external memory setCache() { struct BCACHE_Size cachesize;// L1 and L2 cache size struct cachesize.l1dsize = BCACHE_L1_32K;// L1D cache size 32k bytes cachesize.l1psize = BCACHE_L1_32K;// L1P cache size 32k bytes cachesize.l2size = BCACHE_L2_0K;// L2 cache size ZERO bytes BCACHE_ setSize (&cacheSize);// set the cache sizes BCACHE_ setMode (BCACHE_L1D, BCACHE_NORMAL);// set L1D cache mode to normal BCACHE_setMode(BCACHE_L1P, BCACHE_NORMAL);// set L1P cache mode to normal BCACHE_setMode(BCACHE_L2, BCACHE_NORMAL);// set L2 cache mode to normal BCACHE_ inv (DDR2BASE, DDR2SZ, TRUE);// invalidate DDR2 cache region BCACHE_ setMar (DDR2BASE,DDR2SZ,1);// set DDR2 to be cacheable } 33

34 4x512 128B L1P Controller CPU (SPLOOP) L1D Controller L1D RAM / Cache External Memory Controller External Memory 32B 8B 2x 256 1K 32B 16B L1P RAM / Cache 64B 32B L2 Controller L2 ROM L2 IRAM / Cache 4-8 B C64+ Cache Controller Review  Select how much IRAM and Cache is needed  Enable caching via MARs  Align to 128  Allocate in multiples of 128  Invalidate cache before reads from memory under external control  Writeback cache after writing to RAM under external control D SP TEXAS INSTRUMENTS TECHNOLOGY 34

35 ti Technical Training Organization 35

36 C6000 Memory Considerations 1. Use internal RAM + Fastest, lowest power, best bandwidth − Limited amount of internal RAM available 2. Add external memory + Allows much greater code and data sizes − Much lower performance than internal memory access 3. Enable cache + Near optimal performance for loops and iterative data access + Improves speed, power, EMIF availability − No benefit for non-looping code or 1x used data 4. Use internal RAM and external memory + External memory for low demand or highly looped items + Internal memory for highest demand or DMA-shared memory 5. Tune code for Cache usage + Assure optimal fit to cache + Avoid CPU/DMA contention problems MainHighlight MainNormal SubHighlight SubNormal D SP TEXAS INSTRUMENTS TECHNOLOGY 36


Download ppt "Copyright © 2008 Texas Instruments. All rights reserved. 1.Introduction 2.Real-Time System Design Considerations 3.Hardware Interrupts (HWI) 4.Software."

Similar presentations


Ads by Google