Copyright © 2008 Texas Instruments. All rights reserved. 1.Introduction 2.Real-Time System Design Considerations 3.Hardware Interrupts (HWI) 4.Software.

Slides:



Advertisements
Similar presentations
Cosc 3P92 Week 9 Lecture slides
Advertisements

Computer System Organization Computer-system operation – One or more CPUs, device controllers connect through common bus providing access to shared memory.
Miss Penalty Reduction Techniques (Sec. 5.4) Multilevel Caches: A second level cache (L2) is added between the original Level-1 cache and main memory.
KeyStone Training More About Cache. XMC – External Memory Controller The XMC is responsible for the following: 1.Address extension/translation 2.Memory.
The Linux Kernel: Memory Management
Cache Memory Locality of reference: It is observed that when a program refers to memory, the access to memory for data as well as code are confined to.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
CS 333 Introduction to Operating Systems Class 11 – Virtual Memory (1)
1 Lecture 2: Review of Computer Organization Operating System Spring 2007.
Computer Organization and Architecture The CPU Structure.
Chapter 9 Bootloader. Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 9, Slide 2 Learning Objectives  Need for a bootloader.
1 Computer System Overview OS-1 Course AA
1 CSIT431 Introduction to Operating Systems Welcome to CSIT431 Introduction to Operating Systems In this course we learn about the design and structure.
Midterm Tuesday October 23 Covers Chapters 3 through 6 - Buses, Clocks, Timing, Edge Triggering, Level Triggering - Cache Memory Systems - Internal Memory.
Computer System Overview Chapter 1. Basic computer structure CPU Memory memory bus I/O bus diskNet interface.
NS Training Hardware. Memory Interface Support for SDRAM, asynchronous SRAM, ROM, asynchronous flash and Micron synchronous flash Support for 8,
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.
Multicore Navigator: Queue Manager Subsystem (QMSS)
DSP/BIOS System Integration Workshop Copyright © 2004 Texas Instruments. All rights reserved. D SP TEXAS INSTRUMENTS TECHNOLOGY 1.
Basic Input Output System
Higher Computing Computer Systems S. McCrossan 1 Higher Grade Computing Studies 2. Computer Structure Computer Structure The traditional diagram of a computer...
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Overview: Using Hardware.
Chapter 10: Input / Output Devices Dr Mohamed Menacer Taibah University
1 Computer System Overview Chapter 1. 2 n An Operating System makes the computing power available to users by controlling the hardware n Let us review.
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Computer System Overview Chapter 1. Operating System Exploits the hardware resources of one or more processors Provides a set of services to system users.
SVT workshop October 27, 1998 XTF HB AM Stefano Belforte - INFN Pisa1 COMMON RULES ON OPERATION MODES RUN MODE: the board does what is needed to make SVT.
MICROPROCESSOR INPUT/OUTPUT
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Principles of I/0 hardware.
1-1 Embedded Network Interface (ENI) API Concepts Shared RAM vs. FIFO modes ENI API’s.
Copyright © 2004 Texas Instruments. All rights reserved. T TO Technical Training Organization 1.Introduction 2.Real-Time System Design Considerations 3.Hardware.
Copyright © 2004 Texas Instruments. All rights reserved. 1.Introduction 2.Real-Time System Design Considerations 3.Hardware Interrupts (HWI) 4.Software.
Copyright © 2013, SAS Institute Inc. All rights reserved. MEMORY CACHE – PERFORMANCE CONSIDERATIONS CLAIRE CATES DISTINGUISHED DEVELOPER
Using Direct Memory Access to Improve Performance
L/O/G/O Cache Memory Chapter 3 (b) CS.216 Computer Architecture and Organization.
Chapter 9 Memory Organization By Jack Chung. MEMORY? RAM?
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.
DSP/BIOS System Integration Workshop Copyright © 2004 Texas Instruments. All rights reserved. T TO Technical Training Organization 1.Introduction 2.Real-Time.
Accessing I/O Devices Processor Memory BUS I/O Device 1 I/O Device 2.
Input-Output Organization
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
Electronic Analog Computer Dr. Amin Danial Asham by.
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
Lecture 1: Review of Computer Organization
Basic Memory Management Chapter 3 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. Technical Training Organization T.
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
DSP C5000 Chapter 10 Understanding and Programming the Host Port Interface (EHPI) Copyright © 2003 Texas Instruments. All rights reserved.
SOFTENG 363 Computer Architecture Cache John Morris ECE/CS, The University of Auckland Iolanthe I at 13 knots on Cockburn Sound, WA.
© 2008, Renesas Technology America, Inc., All Rights Reserved 1 Introduction Purpose  This training course demonstrates the Project Generator function.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
COMPSYS 304 Computer Architecture Cache John Morris Electrical & Computer Enginering/ Computer Science, The University of Auckland Iolanthe at 13 knots.
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
Buffering Techniques Greg Stitt ECE Department University of Florida.
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Processor support devices Part 2: Caches and the MESI protocol
Chapter 2 Memory and process management
Ramya Kandasamy CS 147 Section 3
Cache Memory Presentation I
Morgan Kaufmann Publishers Memory & Cache
Memory chips Memory chips have two main properties that determine their application, storage capacity (size) and access time(speed). A memory chip contains.
TI BIOS Cashe (BCACHE) 13 November 2018 Dr. Veton Këpuska.
AT91 Memory Interface This training module describes the External Bus Interface (EBI), which generatesthe signals that control the access to the external.
Chapter 9 Bootloader.
Presentation transcript:

Copyright © 2008 Texas Instruments. All rights reserved. 1.Introduction 2.Real-Time System Design Considerations 3.Hardware Interrupts (HWI) 4.Software Interrupts (SWI) 5.Task Authoring (TSK) 6.Data Streaming (SIO) 7.Multi-Threading (CLK, PRD) 8.BIOS Instrumentation (LOG, STS, SYS, TRC) 9.Static Systems (GCONF, TCONF) 10.Cache (BCACHE) 11.Dynamic Systems (MEM, BUF) 12.Flash Programming (HexAIS, Flashburn) 13.Inter-Thread Communication (MSGQ,...) 14.DSP Algorithm Standard (XDAIS) 15.Input Output Mini-Drivers (IOM) 16.Direct Memory Access (DMA) 17.Review DSP/BIOS System Integration Workshop D SP TEXAS INSTRUMENTS TECHNOLOGY 1

C6000 Memory Considerations 1. Use internal RAM + Fastest, lowest power, best bandwidth − Limited amount of internal RAM available 2. Add external memory + Allows much greater code and data sizes − Much lower performance than internal memory access 3. Enable cache + Near optimal performance for loops and iterative data access + Improves speed, power, EMIF availability − No benefit for non-looping code or 1x used data 4. Use internal RAM and external memory + External memory for low demand or highly looped items + Internal memory for highest demand or DMA-shared memory 5. Tune code for Cache usage + Assure optimal fit to cache + Avoid CPU/DMA contention problems intro 1. IRAM 2. EMIF 3. Use Cache 4. IRAM+EMIF 5. Code Tune D SP TEXAS INSTRUMENTS TECHNOLOGY 2

C6000 Memory Considerations 1. Use internal RAM + Fastest, lowest power, best bandwidth − Limited amount of internal RAM available 2. Add external memory + Allows much greater code and data sizes − Much lower performance than internal memory access 3. Enable cache + Near optimal performance for loops and iterative data access + Improves speed, power, EMIF availability − No benefit for non-looping code or 1x used data 4. Use internal RAM and external memory + External memory for low demand or highly looped items + Internal memory for highest demand or DMA-shared memory 5. Tune code for Cache usage + Assure optimal fit to cache + Avoid CPU/DMA contention problems intro 1. IRAM 2. EMIF 3. Use Cache 4. IRAM+EMIF 5. Code Tune D SP TEXAS INSTRUMENTS TECHNOLOGY 3

Option 1 : Use Internal Memory  When possible, place all code and data into internal RAM  Select all internal memory to be mapped as RAM  Add IRAM(s) to memory map  Route code/data to IRAM(s)  Ideal choice for initial code development  Defines optimal performance possible  Avoids all concerns of using external memory  Fast and easy to do – just download and run from CCS  In production systems  Add a ROM type resource externally to hold code and initial data  Use DMA (or CPU xfer) to copy runtime code/data to internal RAM  Boot routines available on most TI DSPs  Limited range  Usually not enough IRAM for a complete system  Often need to add external memory and route resources there D SP TEXAS INSTRUMENTS TECHNOLOGY 4

C6000 Internal Memory Topology L1P Controller CPU (SPLOOP) L1D Controller L1D RAM / Cache 32B 8B 32B L1P RAM / Cache 32B L2 Controller L2 ROM L2 IRAM / Cache  Level 1 – or “L1” – RAM  Highest performance of any memory in a C6000 system  Two banks are provided L1P (for program) and L1D (for data)  Single cycle memory with wide bus widths to the CPU  Level 2 – or “L2” – RAM  Second best performance in system, can approach single cycle in bursts  Holds both code and data  Usually larger than L1 resources  Wide bus widths to CPU - via L1 controllers D SP TEXAS INSTRUMENTS TECHNOLOGY 5

Configure IRAM via GCONF To obtain maximum IRAM, zero the internal caches, which share this memory D SP TEXAS INSTRUMENTS TECHNOLOGY 6

Define IRAM Usage via GCONF D SP TEXAS INSTRUMENTS TECHNOLOGY 7

Define IRAM Usage via GCONF Here, L1D is used for the most critical storage, and all else is routed to L2 “IRAM”. A variety of options can be quickly tested, and the best kept in the final revision. D SP TEXAS INSTRUMENTS TECHNOLOGY 8

Sample of C6000 On-Chip Memory Options DeviceCPUL1PL1DL2$ C6416T C DM C C * Notes:  Memory sizes are in KB  Prices are 100pc volume  6747 also has 128KB of L3 IRAM D SP TEXAS INSTRUMENTS TECHNOLOGY 9

C6000 Memory Considerations 1. Use internal RAM + Fastest, lowest power, best bandwidth − Limited amount of internal RAM available 2. Add external memory + Allows much greater code and data sizes − Much lower performance than internal memory access 3. Enable cache + Near optimal performance for loops and iterative data access + Improves speed, power, EMIF availability − No benefit for non-looping code or 1x used data 4. Use internal RAM and external memory + External memory for low demand or highly looped items + Internal memory for highest demand or DMA-shared memory 5. Tune code for Cache usage + Assure optimal fit to cache + Avoid CPU/DMA contention problems intro 1. IRAM 2. EMIF 3. Use Cache 4. IRAM+EMIF 5. Code Tune D SP TEXAS INSTRUMENTS TECHNOLOGY 10

Option 2 : Use External Memory  For larger systems, place code and data into external memory  Define available external memories  Route code/data to external memories  Essential for systems with environments larger than available internal memory  Allows systems with size range from Megs to Gigs  Often realized when a build fails for exceeding internal memory range  Avoids all concerns of using external memory  Fast and easy to do – just download and run from CCS  Reduced performance  Off chip memory has wait states  Lots of setup and routing time to get data on chip  Competition for off-chip bus : data, program, DMA, …  Increased power consumption D SP TEXAS INSTRUMENTS TECHNOLOGY 11

C6000 Memory Topology L1P Controller CPU (SPLOOP) L1D Controller L1D RAM / Cache 32B 8B 32B L1P RAM / Cache 32B L2 Controller L2 ROM L2 IRAM / Cache External Memory Controller External Memory 16B 4-8 B  External memory interface has narrower bus widths  CPU access to external memory costs many cycles  Exact cycle count varies greatly depending on state of the system at the time D SP TEXAS INSTRUMENTS TECHNOLOGY 12

Define External Memory via GCONF D SP TEXAS INSTRUMENTS TECHNOLOGY 13

Define External Usage via GCONF D SP TEXAS INSTRUMENTS TECHNOLOGY 14

C6000 Memory Considerations 1. Use internal RAM + Fastest, lowest power, best bandwidth − Limited amount of internal RAM available 2. Add external memory + Allows much greater code and data sizes − Much lower performance than internal memory access 3. Enable cache + Near optimal performance for loops and iterative data access + Improves speed, power, EMIF availability − No benefit for non-looping code or 1x used data 4. Use internal RAM and external memory + External memory for low demand or highly looped items + Internal memory for highest demand or DMA-shared memory 5. Tune code for Cache usage + Assure optimal fit to cache + Avoid CPU/DMA contention problems intro 1. IRAM 2. EMIF 3. Use Cache 4. IRAM+EMIF 5. Code Tune D SP TEXAS INSTRUMENTS TECHNOLOGY 15

Option 3 : Use Cache & External Memory  Improves peformance in code loops or re-used data values  First access to external resource is ‘normal’  Subsequent accesses are from on-chip caches with:  Much higher speed  Lower power  Reduced external bus contention  Not helpful for non-looping code or 1x used data  Cache holds recent data/code for re-use  Without looping or re-access, cache cannot provide a benefit  Not for use with ‘devices’  Inhibits re-reads from ADCs and writes to DACs  Must be careful when CPU and DMA are active in the same RAMs  Enabling the cache:  Select maximum amounts of internal memory to be mapped as cache  Remove IRAM(s) from memory map  Route code/data to off-chip (or possible remaining onchip) resources  Map off-chip memory as cachable D SP TEXAS INSTRUMENTS TECHNOLOGY 16

C6000 Memory Topology L1P Controller CPU (SPLOOP) L1D Controller L1D RAM / Cache 32B 8B 32B L1P RAM / Cache 32B L2 Controller L2 ROM L2 IRAM / Cache External Memory Controller External Memory 16B 4-8 B  Caches automatically collect data and code brought in from EMIF  If requested again, caches provide the information, saving many cycles over repeated EMIF activity  Writes to external memory are also cached to reduce cycles and free EMIF for other usage  Writeback occurs when a cache needs to mirror new addresses  Write buffers on EMIF reduce need for waiting by CPU for writes D SP TEXAS INSTRUMENTS TECHNOLOGY 17

Configure Cache via GCONF For best cache results, maximize the internal cache sizes D SP TEXAS INSTRUMENTS TECHNOLOGY 18

Memory Attribute Registers : MARs Start AddressEnd AddressSizeSpace 0x x42FF FFFF16MBCS2_ 0x x44FF FFFF16MBCS3_ 0x x46FF FFFF16MBCS4_ 0x x48FF FFFF16MBCS5_ 0x x8FFF FFFF256MBDDR2 MARMAR AddressEMIF Address Range 660x – 42FF FFFF 1280x – 80FF FFFF 1290x – 81FF FFFF 1300x – 82FF FFFF 1310x C – 83FF FFFF 1320x – 84FF FFFF 1330x – 85FF FFFF 1340x – 86FF FFFF 1350x C – 87FF FFFF  256 MAR bits define cache-ability of 4G of addresses as 16MB groups  Many 16MB areas not used by chip or present on given board  Example: Usable 6437 EMIF addresses at right  EVM6437 memory is:  128MB of DDR2 starting at 0x  FLASH, NAND Flash, or SRAM (selected via jumpers) in CS2_ space at 0x  Note: with the C64+ program memory is always cached regardless of MAR settings D SP TEXAS INSTRUMENTS TECHNOLOGY 19

Configure MAR via GCONF MAR66, turned ‘on’ D SP TEXAS INSTRUMENTS TECHNOLOGY 20

BCACHE API IRAM modes and MAR can be set in code via BCACHE API  In projects where GCONF is not being used  To allow active run-time reconfiguration option Cache Size Management BCACHE_getSize(*size) rtn sizes of all caches BCACHE_setSize(*size) set sizes of all caches MAR Bit Management marVal = BCACHE_getMar(base) rtn mar val for given address BCACHE_setMar(base, length, 0/1) set mars stated address range typedef struct BCACHE_Size { BCACHE_L1_Size l1psize ; BCACHE_L1_Size l1dsize ; BCACHE_L2_Size l2size ; } BCACHE_Size ; #L1(kB)L2 (kB) D SP TEXAS INSTRUMENTS TECHNOLOGY 21

C6000 Memory Considerations 1. Use internal RAM + Fastest, lowest power, best bandwidth − Limited amount of internal RAM available 2. Add external memory + Allows much greater code and data sizes − Much lower performance than internal memory access 3. Enable cache + Near optimal performance for loops and iterative data access + Improves speed, power, EMIF availability − No benefit for non-looping code or 1x used data 4. Use internal RAM and external memory + External memory for low demand or highly looped items + Internal memory for highest demand or DMA-shared memory 5. Tune code for Cache usage + Assure optimal fit to cache + Avoid CPU/DMA contention problems intro 1. IRAM 2. EMIF 3. Use Cache 4. IRAM+EMIF 5. Code Tune D SP TEXAS INSTRUMENTS TECHNOLOGY 22

Option 4 : IRAM & Cache Ext’l Memory  Let some IRAM be Cache to improve external memory performance  First access to external resource is ‘normal’  Subsequent access from on-chip caches – better speed, power, EMIF loading  Keep some IRAM as normal addressed internal memory  Most critical data buffers (optimal performance in key code)  Target for DMA arrays routed to/from peripherals (2x EMIF savings)  Internal program RAM  Must be initialized via DMA or CPU before it can be used  Provides optimal code performance  Setting the internal memory properties:  Select desired amounts of internal memory to be mapped as cache  Define remainder as IRAM(s) in memory map  Route code/data to desired on and off chip memories  Map off-chip memory as cachable  To determine optimal settings  Profile and/or use STS on various settings to see which is best  Late stage tuning process when almost all coding is completed D SP TEXAS INSTRUMENTS TECHNOLOGY 23

Select Desired IRAM Configuration DeviceCPUL1PL1DL2$ C6416T C DM C C * Notes:  Memory sizes are in KB  Prices are 100pc qty  6747 also has 128KB of L3 IRAM #L1(kB)L2 (kB)  Define desired amount of IRAM to be cache (GCONF or BCACHE)  Balance of available IRAM is ‘normal’ internal mapped-address RAM  Any IRAM beyond cache limits are always address mapped RAM  Single cycle access to L1 memories  L2 access time can be as fast as single cycle  Regardless of size, L2 cache is always 4 way associative 24

Set Cache Size via GCONF or BCACHE Cache Size Management BCACHE_getSize(*size) BCACHE_setSize(*size) typedef struct BCACHE_Size { BCACHE_L1_Size l1psize ; BCACHE_L1_Size l1dsize ; BCACHE_L2_Size l2size ; } BCACHE_Size ; #L1(kB)L2 (kB) D SP TEXAS INSTRUMENTS TECHNOLOGY 25

C64x+ L1D Memory Banks 512x32  Only one L1D access per bank per cycle  Use DATA_MEM_BANK pragma to begin paired arrays in different banks  Note: sequential data are not down a bank, instead they are along a horizontal line across across banks, then onto the next horizontal line  Only even banks (0, 2, 4, 6) can be specified 512x32 Bank 0Bank 2Bank 4Bank 6 #pragma DATA_MEM_BANK(a, 4); short a[256]; #pragma DATA_MEM_BANK(x, 0); short x[256]; for(i = 0; i < count ; i++) { sum += a[i] * x[i]; } #pragma DATA_MEM_BANK(a, 4); short a[256]; #pragma DATA_MEM_BANK(x, 0); short x[256]; for(i = 0; i < count ; i++) { sum += a[i] * x[i]; } D SP TEXAS INSTRUMENTS TECHNOLOGY 26

C6000 Memory Considerations 1. Use internal RAM + Fastest, lowest power, best bandwidth − Limited amount of internal RAM available 2. Add external memory + Allows much greater code and data sizes − Much lower performance than internal memory access 3. Enable cache + Near optimal performance for loops and iterative data access + Improves speed, power, EMIF availability − No benefit for non-looping code or 1x used data 4. Use internal RAM and external memory + External memory for low demand or highly looped items + Internal memory for highest demand or DMA-shared memory 5. Tune code for Cache usage + Assure optimal fit to cache + Avoid CPU/DMA contention problems intro 1. IRAM 2. EMIF 3. Use Cache 4. IRAM+EMIF 5. Code Tune D SP TEXAS INSTRUMENTS TECHNOLOGY 27

5 : Tune Code for Cache Optimization  Align key code and data for maximal cache usage  Match code/data to fit cache lines fully – align to 128 bytes  Clear caches when CPU and DMA are both active in a given memory  Keep cache from presenting out-of-date values to CPU or DMA  Size and align cache usage where CPU and DMA are both active  Avoid risk of having neighboring data affected by cache clearing operations  Freeze cache to maintain contents  Lock in desired cache contents to maintain performance  Ignore new collecting until cache is ‘thawed’ for reuse There are many ways in which caching can lead to data errors, however a few simple techniques provide the ‘cure’ for all these problems D SP TEXAS INSTRUMENTS TECHNOLOGY 28

Example of read coherency problem : 1. DMA collects Buf A 2. CPU reads Buf A, buffer is copied to Cache; DMA collects Buf B 3. CPU reads Buf B, buffer is copied to Cache; DMA collects Buf C over “A” 4. CPU reads Buf C… but Cache sees “A” addresses, provides “A” data – error! 5. Solution: Invalidate Cache range before reading new buffer Write coherency example : 1. CPU writes Buf A. Cache holds written data 2. DMA reads non-updated data from external memory – error! 3. Solution: Writeback Cache range after writing new buffer Program coherency : 1. Host processor puts new code into external RAM 2. Solution: Invalidate Program Cache before running new code Buf A Buf B Buf A Buf B DSP DMAA/D Cache Coherency Cache Ext’l RAM Note: there are NO coherency issues between L1 and L2 ! D SP TEXAS INSTRUMENTS TECHNOLOGY 29

Managing Cache Coherency blockPtr: start address of range to be invalidated byteCnt : number of bytes to be invalidated Wait: 1 = wait until operation is completed Cache BCACHE_inv(blockPtr, byteCnt, wait) InvalidateBCACHE_invL1pAll() CacheBCACHE_wb(blockPtr, byteCnt, wait) WritebackBCACHE_wbAll() Invalidate &BCACHE_wbInv(blockPtr, byteCnt, wait) WritebackBCACHE_wbInvAll() Sync to CacheBCACHE_wait() D SP TEXAS INSTRUMENTS TECHNOLOGY 30

False Addresses Buffer Cache Lines  False Address: ‘neighbor’ data in the cache but outside the buffer range  Reading data from the buffer re-reads entire line  If ‘neighbor’ data changed externally before CPU was done using prior state, old data will be lost/corrupted as new data replaces it  Writing data to buffer will cause entire line to be written to external memory  External neighbor memory could be overwritten with old data Coherence Side Effect – False Addresses  False Address problems can be avoided by aligning the start and end of buffers on cache line boundaries  Align memory on 128 byte boundaries  Allocate memory in multiples of 128 bytes #defineBUF 128 #pragma DATA_ALIGN (in,BUF) shortin[2][20*BUF]; #defineBUF 128 #pragma DATA_ALIGN (in,BUF) shortin[2][20*BUF]; D SP TEXAS INSTRUMENTS TECHNOLOGY 31

 Freezing cache prevents data that is currently cached from being evicted  Cache Freeze  Responds to read and write hits normally  No updating of cache on miss  Freeze supported on C64x+ L2/L1P/L1D  Commonly used with Interrupt Service Routines so that one-use code does not replace realtime algo code  Other cache modes: Normal, Bypass Cache Mode Management Mode = BCACHE_getMode(level) rtn state of specified cache oldMode = BCACHE_setMode(level, mode) set state of specified cache typedef enum { BCACHE_ NORMAL, BCACHE_ FREEZE, BCACHE_ BYPASS } BCACHE_Mode ; typedef enum { BCACHE_ L1D, BCACHE_ L1P, BCACHE_ L2 } BCACHE_Level ; Cache Freeze (C64x+) D SP TEXAS INSTRUMENTS TECHNOLOGY 32

BCACHE-Based Cache Setup Example This BCACHE example shows how to put the EVM 6437 in the default power-up mode. (Note: code such as this will required for stand-alone bootup where CCS GEL files are not present) #include "myWorkcfg.h“// most BIOS headers provided by config tool #include // headers for DSP/BIOS Cache functions #defineDDR2BASE 0x ; // size of DDR2 area on DM6437 EVM #defineDDR2SZ 0x07D00000; // size of external memory setCache() { struct BCACHE_Size cachesize;// L1 and L2 cache size struct cachesize.l1dsize = BCACHE_L1_32K;// L1D cache size 32k bytes cachesize.l1psize = BCACHE_L1_32K;// L1P cache size 32k bytes cachesize.l2size = BCACHE_L2_0K;// L2 cache size ZERO bytes BCACHE_ setSize (&cacheSize);// set the cache sizes BCACHE_ setMode (BCACHE_L1D, BCACHE_NORMAL);// set L1D cache mode to normal BCACHE_setMode(BCACHE_L1P, BCACHE_NORMAL);// set L1P cache mode to normal BCACHE_setMode(BCACHE_L2, BCACHE_NORMAL);// set L2 cache mode to normal BCACHE_ inv (DDR2BASE, DDR2SZ, TRUE);// invalidate DDR2 cache region BCACHE_ setMar (DDR2BASE,DDR2SZ,1);// set DDR2 to be cacheable } 33

4x B L1P Controller CPU (SPLOOP) L1D Controller L1D RAM / Cache External Memory Controller External Memory 32B 8B 2x 256 1K 32B 16B L1P RAM / Cache 64B 32B L2 Controller L2 ROM L2 IRAM / Cache 4-8 B C64+ Cache Controller Review  Select how much IRAM and Cache is needed  Enable caching via MARs  Align to 128  Allocate in multiples of 128  Invalidate cache before reads from memory under external control  Writeback cache after writing to RAM under external control D SP TEXAS INSTRUMENTS TECHNOLOGY 34

ti Technical Training Organization 35

C6000 Memory Considerations 1. Use internal RAM + Fastest, lowest power, best bandwidth − Limited amount of internal RAM available 2. Add external memory + Allows much greater code and data sizes − Much lower performance than internal memory access 3. Enable cache + Near optimal performance for loops and iterative data access + Improves speed, power, EMIF availability − No benefit for non-looping code or 1x used data 4. Use internal RAM and external memory + External memory for low demand or highly looped items + Internal memory for highest demand or DMA-shared memory 5. Tune code for Cache usage + Assure optimal fit to cache + Avoid CPU/DMA contention problems MainHighlight MainNormal SubHighlight SubNormal D SP TEXAS INSTRUMENTS TECHNOLOGY 36