EDMA3 Keystone SoC Devices

Slides:



Advertisements
Similar presentations
Chapter 5 Enhanced Direct Memory Access (EDMA). Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 5, Slide 2 Learning Objectives.
Advertisements

Yaron Doweck Yael Einziger Supervisor: Mike Sumszyk Spring 2011 Semester Project.
KeyStone Training Multicore Navigator Overview. Overview Agenda What is Navigator? – Definition – Architecture – Queue Manager Sub-System (QMSS) – Packet.
Memory/Storage Architecture Lab Computer Architecture Virtual Memory.
Chapter 8.3: Memory Management
Architectural Support for Operating Systems. Announcements Most office hours are finalized Assignments up every Wednesday, due next week CS 415 section.
CE6105 Linux 作業系統 Linux Operating System 許 富 皓. Chapter 2 Memory Addressing.
Computer System Structures memory memory controller disk controller disk controller printer controller printer controller tape-drive controller tape-drive.
1 Process Description and Control Chapter 3 = Why process? = What is a process? = How to represent processes? = How to control processes?
ARM-DSP Communication Architecture
KeyStone Training Multicore Applications Literature Number: SPRPXXX
I/O Tanenbaum, ch. 5 p. 329 – 427 Silberschatz, ch. 13 p
Multicore Navigator: Queue Manager Subsystem (QMSS)
1 CSC 2405: Computer Systems II Spring 2012 Dr. Tom Way.
KeyStone Resource Manager June What is resource manager? LLD for global Resource management – static assignment of the device resources to DSP cores.
Chapter 10 The Stack Stack: An Abstract Data Type An important abstraction that you will encounter in many applications. We will describe two uses:
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Cpr E 308 Input/Output Recall: OS must abstract out all the details of specific I/O devices Today –Block and Character Devices –Hardware Issues – Programmed.
1-1 Embedded Network Interface (ENI) API Concepts Shared RAM vs. FIFO modes ENI API’s.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto OS-Related Hardware.
The Functions of Operating Systems Interrupts. Learning Objectives Explain how interrupts are used to obtain processor time. Explain how processing of.
EE 445S Real-Time Digital Signal Processing Lab Fall 2011 Lab #2 Generating a Sine Wave Using the Hardware & Software Tools for the TI TMS320C6713 DSP.
1 Linux Operating System 許 富 皓. 2 Memory Addressing.
KeyStone Training Multicore Navigator: Packet DMA (PKTDMA)
I/O Computer Organization II 1 Interconnecting Components Need interconnections between – CPU, memory, I/O controllers Bus: shared communication channel.
1 DSP handling of Video sources and Etherenet data flow Supervisor: Moni Orbach Students: Reuven Yogev Raviv Zehurai Technion – Israel Institute of Technology.
Interrupt driven I/O. MIPS RISC Exception Mechanism The processor operates in The processor operates in user mode user mode kernel mode kernel mode Access.
Copyright © 2004 Texas Instruments. All rights reserved. T TO Technical Training Organization 1.Introduction 2.Real-Time System Design Considerations 3.Hardware.
KeyStone SoC Training SRIO Demo: Board-to-Board Multicore Application Team.
Fast Fault Finder A Machine Protection Component.
1 ARM University Program Copyright © ARM Ltd 2013 Using Direct Memory Access to Improve Performance.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
Operating Systems 1 K. Salah Module 1.2: Fundamental Concepts Interrupts System Calls.
1 CS.217 Operating System By Ajarn..Sutapart Sappajak,METC,MSIT Chapter 2 Computer-System Structures Slide 1 Chapter 2 Computer-System Structures.
EDMA3, QDMA and IDMA for the Keystone Platform
Silberschatz, Galvin and Gagne  Applied Operating System Concepts Chapter 2: Computer-System Structures Computer System Architecture and Operation.
How to write a MSGQ Transport (MQT) Overview Nov 29, 2005 Todd Mullanix.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
UDI Advanced Topics DMA and Interrupts Robert Lipe UDI Development Team Lead
Interrupt driven I/O Computer Organization and Assembly Language: Module 12.
DSP C 5000 Chapter 8 Direct Memory Access (DMA) Copyright © 2003 Texas Instruments. All rights reserved.
DMA Driver APIs DMA State Diagram Loading Driver and Opening Channel DMA Channel Attributes Loading Data to a Channel Unloading Data from a Channel.
Slides created by: Professor Ian G. Harris Operating Systems  Allow the processor to perform several tasks at virtually the same time Ex. Web Controlled.
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
OS Memory Addressing. Architecture CPU – Processing units – Caches – Interrupt controllers – MMU Memory Interconnect North bridge South bridge PCI, etc.
KeyStone SoC Training SRIO Demo: Board-to-Board Multicore Application Team.
Embedded Real-Time Systems Processing interrupts Lecturer Department University.
CSL DAT Adapter CSL 2.x DAT Reference Implementation on EDMA3 hardware using EDMA3 Low level driver.
EE 345S Real-Time Digital Signal Processing Lab Fall 2008 Lab #3 Generating a Sine Wave Using the Hardware & Software Tools for the TI TMS320C6713 DSP.
Process concept.
Basic Processor Structure/design
CS703 - Advanced Operating Systems
Chapter 4 C6000 Integration Workshop
EE 445S Real-Time Digital Signal Processing Lab Spring 2017
Chapter 10 The Stack.
Module 2: Computer-System Structures
Processor Fundamentals
EE 445S Real-Time Digital Signal Processing Lab Fall 2013
Chapter 5 Enhanced Direct Memory Access (EDMA)
Direct Memory Access Disk and Network transfers: awkward timing:
Channel Sorting with the EDMA
CSE 451: Operating Systems Autumn 2003 Lecture 2 Architectural Support for Operating Systems Hank Levy 596 Allen Center 1.
CSE 451: Operating Systems Autumn 2001 Lecture 2 Architectural Support for Operating Systems Brian Bershad 310 Sieg Hall 1.
Computer Architecture
CSE 451: Operating Systems Winter 2003 Lecture 2 Architectural Support for Operating Systems Hank Levy 412 Sieg Hall 1.
Module 2: Computer-System Structures
KeyStone Training Multicore Applications Literature Number: SPRPXXX
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
Presentation transcript:

EDMA3 Keystone SoC Devices

Agenda What is DMA? EDMA Architecture Definition of EDMA3 Terminology Synchronization Indexing Example to Summarize Trigger Mechanisms Action Mechanisms Linking Chaining QDMA EDMA3 LLD Review

What is DMA? What is DMA? EDMA Architecture Definition of EDMA3 Terminology Synchronization Indexing Example to Summarize Trigger Mechanisms Action Mechanisms Linking Chaining QDMA EDMA3 LLD Review

Why Use DMA? buf_0 D0 buf_1 D1 D2 D3 The primary function of DMA is to move data without direct CPU involvement. What information does a DMA controller need to perform a transfer? Source address Destination address Length (or size) What options might be useful to perform the transfer? Do you want to interrupt the CPU when the transfer is complete? Is this transfer synchronized to an event (like the McBSP RCV buffer is full)? How do the source and destination addresses update? (same, +1, -1, +4 ?)

DMA in KeyStone Devices There are MANY forms of DMA (Direct Memory Access) in the KeyStone Architecture. EDMA3 – Enhanced DMA handles M DMA CHs and X QDMA CHs DMA – M Channels that can be triggered manually or by events/chaining QDMA – X channels of Quick DMA triggered by writing to a trigger word Q0 Q1 Q2 Qn TC0 TC1 TC2 TCn TeraNet QDMA EVTx Chain Manual EDMA3 DMA Trigger Word Resources connected to TeraNet IDMA – 2 CHs of Internal DMA (Periph Cfg, Xfr L1 ↔ L2) When we say DMA, it means different thing. In KS devices, it can be an EDMA, IDMA or peripheral DMA. IDMA L1D L2 Ch0 PERIPH L1 Ch1 L2 Peripheral DMAs – Each master device hooked to the TeraNet has its own DMA (PktDMA) (e.g. SRIO, EMAC, etc.)

EDMA Architecture What is DMA? EDMA Architecture Definition of EDMA3 Terminology Synchronization Indexing Example to Summarize Trigger Mechanisms Action Mechanisms Linking Chaining QDMA EDMA3 LLD Review

Global Interrupt & Region Interrupt (0-n) EDMA3 Architecture En E1 E0 Q0 Q1 Q2 Qm Queue CC TC Evt Reg (ER) PSET 0 PSET 1 PSET X Evt Enable Reg (EER) TC0 . . . TR Submit TC1 Evt Set Reg (ESR) Data TeraNet TC2 Chain Evt Reg (CER) Early TCC TCm Int Pending Reg – IPR Completion Detection Normal TCC Int Enable Reg – IER Global Interrupt & Region Interrupt (0-n) Memory Protection

Shadow Regions and Memory Protection Multi-level protection: Regions restrict access to the channels from the peripheral masters. Memory Protection provides restricted access to different memory spaces within the device. Each region has a copy of the channel configuration registers to configure the channels allocated to the specific region (DRAEn and DRAEHn, QRAEn). In addition to the shadow regions, there is a global region access to the Channel Controller. Memory protection is provided by setting the privilege level, requestor, and types of access allowed for each region (MPPAn and MPPAG). Each shadow region is also associated with a completion interrupt that can be tied to different interrupt events.

Shadow Region

Definition of EDMA3 Terminology What is DMA? EDMA Architecture Definition of EDMA3 Terminology Synchronization Indexing Example to Summarize Trigger Mechanisms Action Mechanisms Linking Chaining QDMA EDMA3 LLD Review

Direct Memory Access (DMA) Goal : Examples : Controlled by : Copy from memory to memory – HARDWARE memcpy(dst, src, len); Faster than CPU LD/ST. One INT per block vs. one INT per sample Import raw data from off-chip to on-chip before processing. Export results from on-chip to off-chip afterward. Transfer Configuration (i.e., Parameter Set - aka PaRAM or PSET) Transfer configuration primarily includes 8 control registers. Original Data Block Copied DMA Source For ARM+DSP, resources are shared between ARM and DSP. ARM has some channels, DSP has other channels. If you’re using CE, this is taken care of for you. If you’re NOT using CE, it is up to you to manage resources. Length BCNT ACNT Transfer Configuration Destination

(# of contiguous bytes) How Much to Move? Element (# of contiguous bytes) A Count (Element Size) 15 Options Source Destination Index Link Addr Cnt Reload Transfer Count A Count 16 31 B Count (# Elements) Elem 1 Elem 2 Elem N Frame . . B Count C Rsvd Frame 1 Frame 2 Frame M Block C Count C Count (# Frames) A B Transfer Configuration 9

Example: How to VIEW the Transfer Let’s start with a simple example. We need to transfer 12 bytes from “here” to “there.” 8-bit NOTE: These are contiguous memory locations What is ACNT, BCNT, and CCNT? Hmmm…. You can view the transfer several ways: ACNT = 1 BCNT = 4 CCNT = 3 ACNT = 2 BCNT = 2 CCNT = 3 ACNT = 12 BCNT = 1 CCNT = 1 = 12 Which “view” is the best? Well, that depends on what your system needs and the type of sync and indexing (covered later…)

Synchronization What is DMA? EDMA Architecture Definition of EDMA3 Terminology Synchronization Indexing Example to Summarize Trigger Mechanisms Action Mechanisms Linking Chaining QDMA EDMA3 LLD Review

A – Synchronization An event (i.e., McBSP receive register full) triggers the transfer of exactly 1 array of ACNT bytes (2 bytes) Example: McBSP tied to a codec. You want to sync each transfer of a 16-bit word to the receive buffer being full or the transmit buffer being empty. EVTx EVTx EVTx Frame 1 Array1 Array2 Array BCNT Frame 2 Array1 Array2 Array BCNT Frame CCNT Array1 Array2 Array BCNT

AB – Synchronization An event triggers a two-dimensional transfer of BCNT arrays of ACNT bytes (A*B). Example: Line of video pixels; Each line has BCNT pixels consisting of 3 bytes each – Y, Cb, Cr EVTx Frame 1 Array1 Array2 Array BCNT Frame 2 Array1 Array2 Array BCNT Frame CCNT Array1 Array2 Array BCNT

Indexing What is DMA? EDMA Architecture Definition of EDMA3 Terminology Synchronization Indexing Example to Summarize Trigger Mechanisms Action Mechanisms Linking Chaining QDMA EDMA3 LLD Review

Indexing: ‘BIDX, ‘CIDX . . . . A-Sync AB-Sync EDMA3 has two types of indexing: ‘BIDX and ‘CIDX Each index can be set separately for SRC and DST (next slide…) ‘BIDX = index in bytes between ACNT arrays (same for A-sync and AB-sync) ‘CIDX = index in bytes between BCNT frames (different for A-sync vs. AB-sync) ‘BIDX/’CIDX: signed 16-bit, -32768 to +32767 . . EVTx ‘BIDX ‘CIDXA A-Sync . . EVTx ‘BIDX CIDXAB AB-Sync CIDX distance is calculated from the starting address of the previously transferred block (array for A-sync, frame for AB-sync) to the next frame to be transferred.

Indexed Transfers EDMA3 has 4 indexes allowing higher flexibility for complex transfers: SRCBIDX = # bytes between arrays (Ex: SRCBIDX = 2) SRCCIDX = # bytes between frames (Ex: SRCCIDXA = 2, SRCCIDXAB = 4) Note: ‘CIDX depends on the synchronization used – “A” or “AB” DSTBIDX = # bytes between arrays (Ex: DSTBIDX = 3) DSTCIDX = # bytes between frames (Ex: DSTCIDXA = 5, DSTCIDXAB = 8) SRCBIDX DSTBIDX 1 3 9 11 5 7 13 15 1 3 SRCCIDXA DSTCIDXA CCNT = 4. 5 7 SRC (8-bit) 9 11 (contiguous) DST (8-bit) (contiguous)

Example: Using Indexing Remember this example? For each “view”, fill in the proper SOURCE index values: NOTE: These are contiguous memory locations 8-bit ACNT = 1 BCNT = 4 CCNT = 3 ACNT = 2 BCNT = 2 CCNT = 3 ACNT = 12 BCNT = 1 CCNT = 1 ‘BIDX = 1 ‘CIDXA = 1 ‘CIDXAB = 4 ‘BIDX = 2 ‘CIDXA = 2 ‘CIDXAB = 4 ‘BIDX = N/A ‘CIDXA = N/A ‘CIDXAB = N/A Which “view” is the best? Well, that depends on what you are transferring from/to and which sync mode is used.

Example to Summarize What is DMA? EDMA Architecture Definition of EDMA3 Terminology Synchronization Indexing Example to Summarize Trigger Mechanisms Action Mechanisms Linking Chaining QDMA EDMA3 LLD Review

Parameters for a Single Block Transfer 8-bit Pixels &myDest: 8 7 Goals: 1 2 3 4 5 8 9 Transfer a block of 8-bit pixels from &pixel_7 to &myDest Transfer all pixels as quickly as possible (single EVTx – xfr all data, AB-sync) 6 7 8 9 10 11 10 9 12 13 14 15 16 17 10 11 18 19 20 21 22 23 13 24 25 26 27 28 29 14 (Src: &pixel_7) 15 Note: data values are in contiguous memory 16 19 Param Set (active) 31 0 Options &pixel_7 &myDest 1 RSVD 4 3 6 0xFFFF (later) = BCNT Solution 20 31 0 Options Source Destination CCNT RSVD ACNT BCNT SRCBIDX DSTBIDX LINK BCNTRLD SRCCIDX DSTCIDX 21 22 8 bits The goals say that a single event transfers ALL data. If ACNT=1, BCNT would have to be 4. CCNT would have to be 3. In this case, doing an AB-sync transfer, you would have 3 AB transfers – each one waiting for a new EVT which never occurs. So, ACNT has to be 4 and BCNT has to be 3. One event and you get A*B which is the whole transfer. Why can’t we use ACNT=1? How does this transfer work inside the EDMA? What happens when the transfer completes? How do you program this transfer?

Parameters for a Single Block Transfer 8-bit Pixels &myDest: 8 7 Goals: 1 2 3 4 5 8 9 Transfer a block of 8-bit pixels from &pixel_7 to &myDest Transfer all pixels as quickly as possible (single EVTx – xfr all data, AB-sync) 6 7 8 9 10 11 10 9 12 13 14 15 16 17 10 11 18 19 20 21 22 23 13 24 25 26 27 28 29 14 (Src: &pixel_7) 15 Note: data values are in contiguous memory 16 19 Param Set (active) Solution 20 31 0 Options Source Destination CCNT RSVD ACNT BCNT SRCBIDX DSTBIDX LINK BCNTRLD SRCCIDX DSTCIDX 21 22 8 bits The goals say that a single event transfers ALL data. If ACNT=1, BCNT would have to be 4. CCNT would have to be 3. In this case, doing an AB-sync transfer, you would have 3 AB transfers – each one waiting for a new EVT which never occurs. So, ACNT has to be 4 and BCNT has to be 3. One event and you get A*B which is the whole transfer. Why can’t we use ACNT=1? How does this transfer work inside the EDMA? What happens when the transfer completes? How do you program this transfer? 31 0

Parameters for a Single Block Transfer 8-bit Pixels &myDest: 8 7 Goals: 1 2 3 4 5 8 9 Transfer a block of 8-bit pixels from &pixel_7 to &myDest Transfer all pixels as quickly as possible (single EVTx – xfr all data, AB-sync) 6 7 8 9 10 11 10 9 12 13 14 15 16 17 10 11 18 19 20 21 22 23 13 24 25 26 27 28 29 14 (Src: &pixel_7) 15 Note: data values are in contiguous memory 16 19 Param Set (active) Solution 20 31 0 Options Source Destination CCNT RSVD ACNT BCNT SRCBIDX DSTBIDX LINK BCNTRLD SRCCIDX DSTCIDX 21 22 8 bits The goals say that a single event transfers ALL data. If ACNT=1, BCNT would have to be 4. CCNT would have to be 3. In this case, doing an AB-sync transfer, you would have 3 AB transfers – each one waiting for a new EVT which never occurs. So, ACNT has to be 4 and BCNT has to be 3. One event and you get A*B which is the whole transfer. Why can’t we use ACNT=1? How does this transfer work inside the EDMA? What happens when the transfer completes? How do you program this transfer? 31 0

Parameters for a Single Block Transfer 8-bit Pixels &myDest: 8 7 Goals: 1 2 3 4 5 8 9 Transfer a block of 8-bit pixels from &pixel_7 to &myDest Transfer all pixels as quickly as possible (single EVTx – xfr all data, AB-sync) 6 7 8 9 10 11 10 9 12 13 14 15 16 17 11 10 18 19 20 21 22 23 13 24 25 26 27 28 29 14 (Src: &pixel_7) 15 Note: data values are in contiguous memory 16 19 Param Set (active) Solution 20 31 0 Options Source Destination CCNT RSVD ACNT BCNT SRCBIDX DSTBIDX LINK BCNTRLD SRCCIDX DSTCIDX AB-sync 21 22 3 4 8 bits The goals say that a single event transfers ALL data. If ACNT=1, BCNT would have to be 4. CCNT would have to be 3. In this case, doing an AB-sync transfer, you would have 3 AB transfers – each one waiting for a new EVT which never occurs. So, ACNT has to be 4 and BCNT has to be 3. One event and you get A*B which is the whole transfer. Why can’t we use ACNT=1? How does this transfer work inside the EDMA? What happens when the transfer completes? How do you program this transfer? 1 31 0

Parameters for a Single Block Transfer 8-bit Pixels &myDest: 8 7 Goals: 1 2 3 4 5 8 9 Transfer a block of 8-bit pixels from &pixel_7 to &myDest Transfer all pixels as quickly as possible (single EVTx – xfr all data, AB-sync) 6 7 8 9 10 11 10 9 12 13 14 15 16 17 11 10 18 19 20 21 22 23 13 24 25 26 27 28 29 14 (Src: &pixel_7) 15 Note: data values are in contiguous memory 16 19 Param Set (active) Solution 20 31 0 Options Source Destination CCNT RSVD ACNT BCNT SRCBIDX DSTBIDX LINK BCNTRLD SRCCIDX DSTCIDX AB-sync 21 22 3 4 8 bits The goals say that a single event transfers ALL data. If ACNT=1, BCNT would have to be 4. CCNT would have to be 3. In this case, doing an AB-sync transfer, you would have 3 AB transfers – each one waiting for a new EVT which never occurs. So, ACNT has to be 4 and BCNT has to be 3. One event and you get A*B which is the whole transfer. Why can’t we use ACNT=1? How does this transfer work inside the EDMA? What happens when the transfer completes? How do you program this transfer? 1 31 0

Parameters for a Single Block Transfer 8-bit Pixels &myDest: 8 7 Goals: 1 2 3 4 5 8 9 Transfer a block of 8-bit pixels from &pixel_7 to &myDest Transfer all pixels as quickly as possible (single EVTx – xfr all data, AB-sync) 6 7 8 9 10 11 10 9 12 13 14 15 16 17 11 10 18 19 20 21 22 23 13 24 25 26 27 28 29 14 (Src: &pixel_7) 15 Note: data values are in contiguous memory 16 19 Param Set (active) Solution 20 31 0 Options Source Destination CCNT RSVD ACNT BCNT SRCBIDX DSTBIDX LINK BCNTRLD SRCCIDX DSTCIDX AB-sync 21 22 3 4 8 bits The goals say that a single event transfers ALL data. If ACNT=1, BCNT would have to be 4. CCNT would have to be 3. In this case, doing an AB-sync transfer, you would have 3 AB transfers – each one waiting for a new EVT which never occurs. So, ACNT has to be 4 and BCNT has to be 3. One event and you get A*B which is the whole transfer. Why can’t we use ACNT=1? How does this transfer work inside the EDMA? What happens when the transfer completes? How do you program this transfer? 1 31 0

Parameters for a Single Block Transfer 8-bit Pixels &myDest: 8 7 Goals: 1 2 3 4 5 8 9 Transfer a block of 8-bit pixels from &pixel_7 to &myDest Transfer all pixels as quickly as possible (single EVTx – xfr all data, AB-sync) 6 7 8 9 10 11 10 9 12 13 14 15 16 17 11 10 18 19 20 21 22 23 13 24 25 26 27 28 29 14 (Src: &pixel_7) 15 Note: data values are in contiguous memory 16 19 Param Set (active) Solution 20 31 0 Options Source Destination CCNT RSVD ACNT BCNT SRCBIDX DSTBIDX LINK BCNTRLD SRCCIDX DSTCIDX A-sync? 21 22 4 8 bits The goals say that a single event transfers ALL data. If ACNT=1, BCNT would have to be 4. CCNT would have to be 3. In this case, doing an AB-sync transfer, you would have 3 AB transfers – each one waiting for a new EVT which never occurs. So, ACNT has to be 4 and BCNT has to be 3. One event and you get A*B which is the whole transfer. Why can’t we use ACNT=1? How does this transfer work inside the EDMA? What happens when the transfer completes? How do you program this transfer? 31 0

Parameters for a Single Block Transfer 8-bit Pixels &myDest: 8 7 Goals: 1 2 3 4 5 8 9 Transfer a block of 8-bit pixels from &pixel_7 to &myDest Transfer all pixels as quickly as possible (single EVTx – xfr all data, AB-sync) 6 7 8 9 10 11 10 9 12 13 14 15 16 17 11 10 18 19 20 21 22 23 13 - 11 24 25 26 27 28 29 14 - 12 (Src: &pixel_7) 15 - 13 Note: data values are in contiguous memory 16 - 14 19 - 15 Param Set (active) Solution 20 - 16 31 0 Options Source Destination CCNT RSVD ACNT BCNT SRCBIDX DSTBIDX LINK BCNTRLD SRCCIDX DSTCIDX A-sync? 21 - 17 22 - 18 12 8 bits The goals say that a single event transfers ALL data. If ACNT=1, BCNT would have to be 4. CCNT would have to be 3. In this case, doing an AB-sync transfer, you would have 3 AB transfers – each one waiting for a new EVT which never occurs. So, ACNT has to be 4 and BCNT has to be 3. One event and you get A*B which is the whole transfer. Why can’t we use ACNT=1? How does this transfer work inside the EDMA? What happens when the transfer completes? How do you program this transfer? 31 0

Parameters for a Single Block Transfer 8-bit Pixels &myDest: 7 8 Goals: 1 2 3 4 5 8 9 Transfer a block of 8-bit pixels from &pixel_7 to &myDest Transfer all pixels as quickly as possible (single EVTx – xfr all data, AB-sync) 6 7 8 9 10 11 10 9 12 13 14 15 16 17 11 10 18 19 20 21 22 23 13 24 25 26 27 28 29 14 (Src: &pixel_7) 15 Note: data values are in contiguous memory 16 19 Param Set (active) Solution 20 31 0 Options Source Destination CCNT RSVD ACNT BCNT SRCBIDX DSTBIDX LINK BCNTRLD SRCCIDX DSTCIDX AB-sync 21 &pixel_7 22 3 4 8 bits The goals say that a single event transfers ALL data. If ACNT=1, BCNT would have to be 4. CCNT would have to be 3. In this case, doing an AB-sync transfer, you would have 3 AB transfers – each one waiting for a new EVT which never occurs. So, ACNT has to be 4 and BCNT has to be 3. One event and you get A*B which is the whole transfer. &myDest Why can’t we use ACNT=1? How does this transfer work inside the EDMA? What happens when the transfer completes? How do you program this transfer? 1 31 0

Parameters for a Single Block Transfer 8-bit Pixels &myDest: 7 8 Goals: 1 2 3 4 5 8 9 Transfer a block of 8-bit pixels from &pixel_7 to &myDest Transfer all pixels as quickly as possible (single EVTx – xfr all data, AB-sync) 6 7 8 9 10 11 10 9 12 13 14 15 16 17 10 11 18 19 20 21 22 23 13 24 25 26 27 28 29 14 (Src: &pixel_7) 15 Note: data values are in contiguous memory 16 19 Param Set (active) Solution 20 31 0 Options Source Destination CCNT RSVD ACNT BCNT SRCBIDX DSTBIDX LINK BCNTRLD SRCCIDX DSTCIDX AB-sync 21 &pixel_7 22 3 4 8 bits The goals say that a single event transfers ALL data. If ACNT=1, BCNT would have to be 4. CCNT would have to be 3. In this case, doing an AB-sync transfer, you would have 3 AB transfers – each one waiting for a new EVT which never occurs. So, ACNT has to be 4 and BCNT has to be 3. One event and you get A*B which is the whole transfer. &myDest 6 Why can’t we use ACNT=1? How does this transfer work inside the EDMA? What happens when the transfer completes? How do you program this transfer? 1 31 0

Parameters for a Single Block Transfer 8-bit Pixels &myDest: 7 8 Goals: 1 2 3 4 5 8 9 Transfer a block of 8-bit pixels from &pixel_7 to &myDest Transfer all pixels as quickly as possible (single EVTx – xfr all data, AB-sync) 6 7 8 9 10 11 10 9 12 13 14 15 16 17 10 11 18 19 20 21 22 23 13 24 25 26 27 28 29 14 (Src: &pixel_7) 15 Note: data values are in contiguous memory 16 19 Param Set (active) Solution 20 31 0 Options Source Destination CCNT RSVD ACNT BCNT SRCBIDX DSTBIDX LINK BCNTRLD SRCCIDX DSTCIDX AB-sync 21 &pixel_7 22 3 4 8 bits The goals say that a single event transfers ALL data. If ACNT=1, BCNT would have to be 4. CCNT would have to be 3. In this case, doing an AB-sync transfer, you would have 3 AB transfers – each one waiting for a new EVT which never occurs. So, ACNT has to be 4 and BCNT has to be 3. One event and you get A*B which is the whole transfer. &myDest 4 6 Why can’t we use ACNT=1? How does this transfer work inside the EDMA? What happens when the transfer completes? How do you program this transfer? 1 31 0

Parameters for a Single Block Transfer 8-bit Pixels &myDest: 7 8 Goals: 1 2 3 4 5 8 9 Transfer a block of 8-bit pixels from &pixel_7 to &myDest Transfer all pixels as quickly as possible (single EVTx – xfr all data, AB-sync) 6 7 8 9 10 11 9 10 12 13 14 15 16 17 10 11 18 19 20 21 22 23 13 24 25 26 27 28 29 14 (Src: &pixel_7) 15 Note: data values are in contiguous memory 16 19 Param Set (active) Solution 20 31 0 Options Source Destination CCNT RSVD ACNT BCNT SRCBIDX DSTBIDX LINK BCNTRLD SRCCIDX DSTCIDX AB-sync 21 &pixel_7 22 3 4 8 bits The goals say that a single event transfers ALL data. If ACNT=1, BCNT would have to be 4. CCNT would have to be 3. In this case, doing an AB-sync transfer, you would have 3 AB transfers – each one waiting for a new EVT which never occurs. So, ACNT has to be 4 and BCNT has to be 3. One event and you get A*B which is the whole transfer. &myDest 4 6 Why can’t we use ACNT=1? How does this transfer work inside the EDMA? What happens when the transfer completes? How do you program this transfer? 1 31 0

Parameters for a Single Block Transfer 8-bit Pixels &myDest: 7 8 Goals: 1 2 3 4 5 8 9 Transfer a block of 8-bit pixels from &pixel_7 to &myDest Transfer all pixels as quickly as possible (single EVTx – xfr all data, AB-sync) 6 7 8 9 10 11 9 10 12 13 14 15 16 17 10 11 18 19 20 21 22 23 13 24 25 26 27 28 29 14 (Src: &pixel_7) 15 Note: data values are in contiguous memory 16 19 Param Set (active) Solution 20 31 0 Options Source Destination CCNT RSVD ACNT BCNT SRCBIDX DSTBIDX LINK BCNTRLD SRCCIDX DSTCIDX AB-sync 21 &pixel_7 22 3 4 8 bits The goals say that a single event transfers ALL data. If ACNT=1, BCNT would have to be 4. CCNT would have to be 3. In this case, doing an AB-sync transfer, you would have 3 AB transfers – each one waiting for a new EVT which never occurs. So, ACNT has to be 4 and BCNT has to be 3. One event and you get A*B which is the whole transfer. &myDest 4 6 Why can’t we use ACNT=1? How does this transfer work inside the EDMA? What happens when the transfer completes? How do you program this transfer? BCNT or any 1 31 0

Parameters for a Single Block Transfer 8-bit Pixels &myDest: 7 8 Goals: 1 2 3 4 5 8 9 Transfer a block of 8-bit pixels from &pixel_7 to &myDest Transfer all pixels as quickly as possible (single EVTx – xfr all data, AB-sync) 6 7 8 9 10 11 9 10 12 13 14 15 16 17 10 11 18 19 20 21 22 23 13 24 25 26 27 28 29 14 (Src: &pixel_7) 15 Note: data values are in contiguous memory 16 19 Param Set (active) Solution 20 31 0 Options Source Destination CCNT RSVD ACNT BCNT SRCBIDX DSTBIDX LINK BCNTRLD SRCCIDX DSTCIDX AB-sync 21 &pixel_7 22 3 4 8 bits The goals say that a single event transfers ALL data. If ACNT=1, BCNT would have to be 4. CCNT would have to be 3. In this case, doing an AB-sync transfer, you would have 3 AB transfers – each one waiting for a new EVT which never occurs. So, ACNT has to be 4 and BCNT has to be 3. One event and you get A*B which is the whole transfer. &myDest 4 6 Why can’t we use ACNT=1? How does this transfer work inside the EDMA? What happens when the transfer completes? How do you program this transfer? 3 0xffff 1 31 0

Parameters for a Single Block Transfer 8-bit Pixels &myDest: 7 8 Goals: 1 2 3 4 5 8 9 Transfer a block of 8-bit pixels from &pixel_7 to &myDest Transfer all pixels as quickly as possible (single EVTx – xfr all data, AB-sync) 6 7 8 9 10 11 9 10 12 13 14 15 16 17 10 11 18 19 20 21 22 23 13 24 25 26 27 28 29 14 (Src: &pixel_7) 15 Note: data values are in contiguous memory 16 19 Param Set (active) Solution 20 31 0 Options Source Destination CCNT RSVD ACNT BCNT SRCBIDX DSTBIDX LINK BCNTRLD SRCCIDX DSTCIDX AB-sync 21 &pixel_7 22 3 4 8 bits The goals say that a single event transfers ALL data. If ACNT=1, BCNT would have to be 4. CCNT would have to be 3. In this case, doing an AB-sync transfer, you would have 3 AB transfers – each one waiting for a new EVT which never occurs. So, ACNT has to be 4 and BCNT has to be 3. One event and you get A*B which is the whole transfer. &myDest 4 6 Why can’t we use ACNT=1? How does this transfer work inside the EDMA? What happens when the transfer completes? How do you program this transfer? 3 0xffff 1 31 0

Channel OPTions Register The Options register contains bit fields that configure how the channel operates. Each field has a corresponding description in the Param Setup code comments. TCC = Transfer Complete Code to signal completion SYNCDIM = A-sync or AB-sync PRIV = Privilege level of the host that can program the PSET PRIVID = Privilege ID of the host that program the PSET ITCCHEN = Intermediate Transfer Completion Chaining Enable TCCHEN = Transfer Completion Chaining Enable ITCINTEN = Intermediate Transfer Completion Interrupt Enable TCINTEN = Transfer Completion Interrupt Enable TCC = Transfer Completion Code TCCMODE = Point at which the transfer is considered to be complete. SAM = Source Address Mode DAM = Desitination Address Mode FWID = FIFO Width STATIC = Option to enable changing PSET SAM/DAM are typically INCR for normal EDMA transfers. These bits are only set to a “1” for an internal peripheral that supports FIFO mode – this is NOT for internal FIFOs.

Trigger Mechanisms What is DMA? EDMA Architecture Definition of EDMA3 Terminology Synchronization Indexing Example to Summarize Trigger Mechanisms Action Mechanisms Linking Chaining QDMA EDMA3 LLD Review

EDMA3 Basics Revisited T T E A Count: How many items to move 1 2 3 4 5 6 7 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 8 9 10 11 Count: How many items to move A, B, and C counts Addresses: The source & destination addresses Index: How far to increment the src/dst after each transfer T (xfer config) T (xfer config) E (event) Done A (action) Options Source Destination Index Link Addr Cnt Reload Transfer Count C Rsvd A B Event:Triggers the transfer to begin Transfer: The transfer config describes the transfers to be executed when triggered. Resulting Action: What do you want to happen after the transfer is complete?

How to TRIGGER a Transfer There are 3 ways to trigger an EDMA transfer: 1 Event Sync from peripheral SPIREVT SPIXEVT SPI EDMA3 ER EER Start Ch Xfr ER = Event Register (flag) EER = Event Enable Register (user) 2 Manually trigger the channel to run Application Channel y ESR = Event Set Register (user) Set Ch #y; ESR Start Ch Xfr 3 Chain event from another channel (more details later…) Channel x Channel y TCCHEN = TC Chain Enable (OPT) TCCHEN_EN TCC = Chy CER Start Ch Xfr 28

Action Mechanisms What is DMA? EDMA Architecture Definition of EDMA3 Terminology Synchronization Indexing Example to Summarize Trigger Mechanisms Action Mechanisms Linking Chaining QDMA EDMA3 LLD Review

Generate EDMA Interrupt (Setting IERbit) EDMA Channels EDMA Interrupt Generation Channel # Options TCC IPR IER TCINTEN=0 TCC=0 IER0 = 0 1 TCINTEN=0 TCC=1 IER1 = 0 EDMA3CC_INT . 1 TCINTEN=1 TCC=14 IER14 = 1 N TCINTEN=0 TCC=N IERN = 0 Options TCINTEN TCC IER – EDMA Interrupt Enable Register (NOT the CPU IER) IPR – EDMA Interrupt Pending Register (set by TCC) 20 17 12 Use EDMA3 Low-Level Driver (LLD) to program the EDMA IER bits N Channels and ONE interrupt? How do you determine WHICH channel completed?

EDMA Interrupt Dispatcher Here’s the interrupt chain from beginning to end: 1. An interrupt occurs 2. Interrupt Selector 3. HWI_INT5 Properties HWI_INT5 EDMA3CC_GINT 4. EDMA Dispatcher Function 5. ISR (interrupt handler) Read IPR bits Determine which one is set Call corresponding handler (ISR) in Fxn Table void edma_rcv_isr (void) { SEM_post (&semaphore); } How does the ISR Fxn Table (in #4 above) get loaded with the proper handler Fxn names? Use EDMA3 LLD to program the proper callback fxn for this HWI.

Linking What is DMA? EDMA Architecture Definition of EDMA3 Terminology Synchronization Indexing Example to Summarize Trigger Mechanisms Action Mechanisms Linking Chaining QDMA EDMA3 LLD Review

Linking – “Action” – Overview (xfer config) E (event) Done A (action) Options Source Destination Index Link Addr Cnt Reload Transfer Count C Rsvd B Alias: “Re-load” “Auto-init” Need: auto-reload channel with new config Ex1: do the same transfer again Ex2: ping/pong system Solution: use linking to reload Ch config Concept: Linking two or more channels together allows the EDMA to auto-reload a new configuration when the current transfer is complete. Linking still requires a “trigger” to start the transfer (manual, chain, event). You can link as many PSETs as you like – it is only limited by the #PSETs on a device. How does linking work? User must specify the LINK field in the config to link to another PSET. When the current xfr (0) is complete, the EDMA auto reloads the new config (1) from the linked PSET. Config 0 Config 1 reload LINK LINK 1 NULL NOTE: Does NOT start transfer!!

Chaining What is DMA? EDMA Architecture Definition of EDMA3 Terminology Synchronization Indexing Example to Summarize Trigger Mechanisms Action Mechanisms Linking Chaining QDMA EDMA3 LLD Review

Triggering Transfers Revisited There are 3 ways to trigger an EDMA transfer: 1 Event sync from peripheral RRDY XRDY McASP0 EDMA3 ER EER Start Ch Xfr ER = Event Register (flag) EER = Event Enable Register (user) 2 Manually trigger the channel to run Application Channel y ESR = Event Set Register (user) Set Ch #y; ESR Start Ch Xfr  3 Chain event from another channel Channel x Channel y TCCHEN = TC Chain Enable (OPT) TCCHEN_EN TCC = Chy CER Start Ch Xfr

Chaining – “Action” & “Event” – Overview (xfer config) E (event) Done A (action) Options Source Destination Index Link Addr Cnt Reload Transfer Count C Rsvd B Need: When one transfer completes, trigger another transfer to run Ex: ChX completes, kicks off ChY Solution: Use chaining to kick off next xfr Concept: Chaining actually refers to both both an action and an event – the completed ‘action’ from the 1st channel is the ‘event’ for the next channel You can chain as many Chan’s as you like – it is only limited by the #Ch’s on a device Chaining does NOT reload current Chan config – that can only be accomplished by linking. It simply triggers another channel to run. How does chaining work? Set the TCC field to match the next (i.e. chained) channel # Turn ON chaining When the current xfr (X) is complete, it triggers the next Ch (Y) to run Ch X Ch Y Y ? TCC Done ? TCC RUN Y EN DIS Chain EN Chain EN

QDMA What is DMA? EDMA Architecture Definition of EDMA3 Terminology Synchronization Indexing Example to Summarize Trigger Mechanisms Action Mechanisms Linking Chaining QDMA EDMA3 LLD Review

Quick DMA (QDMA) QDMA is used for simple transfers where syncing to an event is not required. Address/count updates and linking are not performed. CCNT = 1 (single event transfer). A transfer can be triggered by two methods: (1) writing to a trigger word (2) using the EDMA3 LLD. It is “quick” because the CPU can initiate a transfer with as few as ONE write to a channel register. How does it work? QDMA channel is “auto-triggered” when CPU writes to the “trigger” word Eliminates the need to write to PSET and kick off transfer w/ separate write to ESR Selection of the trigger word allows CPU to modify only words of interest in a PSET Assumes OPT.STATIC = 1. Count and address updates and linking NOT performed. Only ONE QDMA transfer is allowed in one queue at a time. Example: If ACNT/BCNT/CCNT are typically static for a given algorithm, but SRC is different for each transfer, then SRC could be defined as the trigger word. CPU can initiate a transfer with a single write to the SRC address for the specified PSET.

QDMA Mapping

EDMA3 LLD Review What is DMA? EDMA Architecture Definition of EDMA3 Terminology Synchronization Indexing Example to Summarize Trigger Mechanisms Action Mechanisms Linking Chaining QDMA EDMA3 LLD Review

Programming EDMA3 Low Level Driver (LLD) is optimal way to program EDMA3. Implements synchronized DMA transfers Consists of libraries to manage the EDMA3 peripheral: Resource Manager (EDMA3 RM) manages all EDMA3 hardware resources and interrupts. Driver (EDMA3 DRV) handles all EDMA3 configuration and allocating resources (via RM). Application Code (Drivers) LLD (DRV) Resource Mgr (RM) EDMA3 Hardware

Programming EDMA3 EDMA3_DRV_create(edma3InstanceId, globalConfig&miscParam); hEdma = EDMA3_DRV_open (edma3InstanceId, (void *) &initCfg, &edma3Result); EDMA3_DRV_requestChannel (hEdma, nChannel, nTransferControl, ..); EDMA3_DRV_setSrcParams (hEdma, nChannel, Src Addr, Addrmode, width); EDMA3_DRV_setDestParams (hEdma, nChannel, DstAddr, Addrmode, width); EDMA3_DRV_setTransferParams (hEdma, nChannel, acnt, bcnt, ccnt, bbcntrld, syncType); EDMA3_DRV_enableTransfer (hEdma, nChannel, trgMode);

Program Flow Identify all the channels that are going to be used by the application. Develop corresponding service routines for these events. Initialize all these ISR with the underlying OS. Initialize the Resource Manager to get all the available resources. Create and open the EDMA3 instance. Set the params for the transfers. Enable the transfer.

For More Information Refer to the Enhanced Direct memory Access 3 (EDMA3) for KeyStone Devices User's Guide. Device-specific Data Manuals for the KeyStone SoCs can be found at TI.com/multicore. Multicore articles, tools, and software are available at Embedded Processors Wiki for the KeyStone Device Architecture. View the complete C66x Multicore SOC Online Training for KeyStone Devices, including details on the individual modules. For questions regarding topics covered in this training, visit the support forums at the TI E2E Community website.