Memory Memory 10/ INF5060: Multimedia data communication using network processors
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Overview Memory on the IXP cards Kinds of memory Its features Its accessibility Microengine assembler Memory management
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Kinds of Memory Microengine general purpose registers 128 registersOn chip StrongARM instruction cache 16 KbytesOn chip StrongARM data cache8 KbytesOn chip StrongARM mini cache512 bytesOn chip Scratch(pad)4 KbytesOn chip Instruction store64 KbytesOn chip FlashROM8 Mbytes SRAM8 Mbytes SDRAM256 Mbytes
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors IX Bus Unit IXP Functional Units Ethernet MAC (other IX devices) IX Bus StrongARM Core IXP Network Processor SRAM Unit SDRAM Unit PCI Bus Unit Microengine Various busses PCI Bus Host machine PCI-to-PCI bridge SDRAM (up to 256 MB) SRAM (up to 8 MB) Flash ROM (up to 8 MB) Memory Mapped I/O devices 64 bit/33Mhz 64 bit/116Mhz 32 bit/116Mhz 64 bit/104Mhz
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Kinds of Memory Physical memory on the IXP1200 is contiguous Memory in general is not byte-addressable Memory units emulate byte addressing for the StrongARM Big endian architecture StrongARM: big endian mode Microengines are big endian Memory typeAddressable data unit (bytes) Relative access time (cycles) Scratch(pad) SRAM SDRAM832-40
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Terms Careful ! Inconsistencies ! Wording in Intel IXP manuals Word: 16 bit Longword: 32 bit Quadword: 64 bit Wording in StrongARM and other ARM manuals Halfword: 16 bit Word: 32 bit
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Kinds of Memory Memory accessible to StrongARM Mapped into a single address space Memory accessible to microengines Individually mapped Separate assembler instructions for each kind Device 0 SRAM Unit Device 1 PCI Unit Device 2 Reserved Device 3 StrongARM Core System Device 4 Reserved Device 5 AMBA Translation Unit Device 6 SDRAM Unit A B C FFFF SDRAM Scratchpad Microengine registers SRAM
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Memory: memory, cache memory, registers StrongARM core caches Microengine registers SDRAM SRAM IX Bus Unit: Scratch(pad) memory
StrongARM
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors StrongARM Core Features A general purpose processor With MMU 16 Kbytes instruction cache Round robin replacement 8 Kbytes data cache Round robin replacement Write-back cache, cache replacement on read, not on write 512 byte mini-cache for data that is used once and then discarded To reduce flushing of the main data cache Instruction code stored in SDRAM
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors IX Bus Unit StrongARM Core Access Full access to SDRAM Unit SRAM Unit incl. FlashROM PCI Bus Unit Access to microengine’s Program code Status registers Program counters Access to IX bus unit’s Status registers Scratch memory StrongARM Core SRAM Unit SDRAM Unit PCI Bus Unit Microengine
Microengines
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Microengine Features 4 hardware contexts 2K x 32 bit instruction control store Every instruction is 32 bits long No instruction cache Instructions downloaded onto the microengine by the StrongARM Not loaded from RAM on demand 5-stage instruction pipeline Blocks for reference operations Deferred execution to reduce context switch penalty 256 registers 32 bit registers Load and store architecture Must bring data into registers, work, write to destination Single cycle access in registers Use “reference command” to fetch into registers Yield/sleep during fetch execution
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors IX Bus Unit Microengine Access Full access to SDRAM Unit SRAM Unit IX Bus Unit Access to StrongARM Interrupts Trigger status register reads Access to PCI bus unit Initiate DMA with SDRAM Access to other microengines None Access to self Inter-thread signaling No access to own instruction code SRAM Unit SDRAM Unit PCI Bus Unit StrongARM Core MicroEngine Microengine
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Microengine Registers From: IXP1200 Family Hardware Reference Manual
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Microengine Registers 256 registers 128 general purpose registers Arranged in two banks A and B Instructions with 2 input registers From different banks Otherwise assembler warning 128 transfer registers Transfer registers are not general purpose registers Ports to their neighboring functional unit 64 SDRAM transfer registers Transfer to and from SDRAM 32 read / 32 write 64 SRAM transfer registers Transfer to and from everything but SDRAM 32 read / 32 write 4 busses can be used in parallel By different threads Loading transfer registers 64 bytes at once from one functional unit to another 128 bytes at once from the IX bus
SDRAM
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors General features Recommended use StrongARM instruction code Large data structures Packets during processing 64-bit addressed (8 byte aligned, quadword aligned) 256 Mbytes 928 Mbytes/s peak bandwidth Higher bandwidth than SRAM Higher latency than SRAM Access StrongARM Microengines StrongARM takes precedence PCI DMA on behalf of microengines Direct access to IX Bus Unit’s Transmit and Receive FIFO
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Special features Byte, word, longword access supported through a read-modify- write access to quadwords Speed penalty Direct path from SDRAM to IX Bus Transmit and Receive FIFOs Controlled by microengines Up to 64 bytes transferable without microengine involvement Byte aligner between SDRAM and IX Bus For sending to the Transmit FIFO Shift bytewise when e.g. header length has changed Can only be used by microengines in the t_fifo_wr command
SRAM
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors General features Recommended use Lookup tables Free buffer lists Data buffer queue lists 32-bit addressed (4 byte aligned, word aligned) 8 Mbytes 464 Mbytes/s peak bandwidth Lower bandwidth than SDRAM Lower latency than SDRAM Access StrongARM Microengines StrongARM takes precedence
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Accessing SRAM StrongARM access Byte, word and longword access Bit operations through SRAM Alias Address Space Bit, byte, word write supported through read-modify-write Microengine access Bit and longword access only Up to 8 longwords with one command Bit write supported through read-modify-write Bit operations within instructions
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Special features Atomic push/pop operations For maintaining lists 8 entry push/pop register list Microengines Named commands StrongARM Dedicated memory addresses Don’t cache these memory areas Atomic bit test, set and clear For synchronized access Microengine Use a write transfer register Specify bits to test, read, or write Reading the bit changes the write transfer register StrongARM Special macros for read-modify-write operations Blocks until operation is completed Don’t cache this memory
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Special features 8 entry CAM (content addressable memory) for read locks For synchronized access 8 concurrent locks on memory Protect from StrongARM and microengines Read, unlock and write_unlock Microengines sram assembler command Waits until locks is released StrongARM 3 separate 8 MByte mapped memory regions Failed locking is indicated by flags, read always successful Don’t cache these memory areas
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors StrongARM Core Memory Map Device 1 PCI Unit Device 2 Reserved Device 3 StrongARM Core System Device 4 Reserved Device 5 AMBA Translation Unit Device 6 SDRAM Unit A B C FFFF Device 0 SRAM Unit Slow Port – 385F FFF Command FIFO Test – FF SRAM CSRs – List 7 Pop operations – 27FF FFFF List 6 Pop operations – 277F FFFF List 5 Pop operations – 26FF FFFF List 4 Pop operations – 267F FFFF List 3 Pop operations – 25FF FFFF List 2 Pop operations – 257F FFFF List 1 Pop operations – 24FF FFFF List 0 Pop operations – 247F FFFF List 7 Push operations – 23FF FFFF List 6 Push operations – 237F FFFF List 5 Push operations – 22FF FFFF List 4 Push operations – 227F FFFF List 3 Push operations – 21FF FFFF List 2 Push operations – 217F FFFF List 1 Push operations – 21FF FFFF List 0 Push operations – 207F FFFF Bit Test & Set – 19FF FFFF Bit Test & Clear – 197F FFFF Bit Write Set – 18FF FFFF Bit Write Clear – 187F FFFF CAM Unlock – 167F FFFF Write Unlock – 147F FFFF Read Lock – 127F FFFF Read/Write – 107F FFFF BootROM – 007F FFFF
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Memory Map for SRAM addresses Physical Device FunctionStrongARM Address Space (byte addressing) Microengine SRAM instruction command Microengine Address Space (longword addressing) SlowPort – 385F FFFread/write – 7F FFFF SRAM CSRs – read/write – SRAMPop operations – 27FF FFFFpop – 1F FFFF SRAMPush operations – 23FF FFFFpush – 1F FFFF SRAMBit Test & Set – 19FF FFFFbit_wr (test_and_set_bits) – 1F FFFF SRAMBit Test & Clear – 197F FFFFbit_wr (test_and_clear_bits) – 1F FFFF SRAMBit Write Set – 18FF FFFFbit_wr (set_bits) – 1F FFFF SRAMBit Write Clear – 187F FFFFbit_wr (clear_bits) – 1F FFFF SRAMUnlock – 167F FFFFunlock – 1F FFFF SRAMWrite Unlock – 147F FFFFwrite_unlock – 1F FFFF SRAMRead Lock – 127F FFFFread_lock – 1F FFFF SRAMRead/Write – 107F FFFFread/write – 1F FFFF BootROM – 007F FFFFread/write – 3F FFFF
IX Bus Unit
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors “FBI” Engine Interface IX Bus Unit SDRAM Unit Microengines Ethernet MAC (other IX devices) Transmit FIFO Receive FIFO Hash Units Status Registers IX Bus StrongARM IXP Network Processor IX Bus Unit Scratchpad
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Scratch Memory: General Features Recommended use Passing messages between processors and between threads Semaphores, mailboxes, other IPC 32-bit addressed (4 byte aligned, word aligned) 4 Kbytes Has an atomic autoincrement instruction Only usable by microengines
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors StrongARM Core Memory Map Device 0 SRAM Unit Device 1 PCI Unit Device 2 Reserved Device 3 StrongARM Core System Device 4 Reserved Device 5 AMBA Translation Unit Device 6 SDRAM Unit A B C FFFF Scratchpad Memory B – B004 4FFF IX Bus Unit CSRB ME5 Transfer RegsB ME4 Transfer RegsB ME3 Transfer RegsB ME2 Transfer RegsB ME1 Transfer RegsB ME0 Transfer RegsB ME5 CSRB ME4 CSRB ME3 CSRB ME2 CSRB ME1 CSR B ME0 CSR B ME = microengine
Microengine Assembler
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Using Microengine Registers Programming Context-relative addressing Each threads can have its own window of registers (one 4 th of the total), so they can’t overwrite each other Absolute addressing Register is visible to all threads Context-relative vs. absolute addressing Decided on a per-instruction basis Assembler Supports symbolic names Assigns registers from the different kinds Programmer must take care concerning the number of registers used can hint the assembler to assign (transfer) registers contiguously Context-relative addressing of the registers Threads are only able to address their own register share This is more typically used Assembler notations symbolic_register_name – general purpose register $symbolic_register_name – SRAM transfer register $$symbolic_register_name – SDRAM transfer register Absolute addressing Threads can use more than their share of registers Threads can communicate via registers Assembler notations – general purpose register – SRAM transfer register – SDRAM transfer register
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Microengine Assembler ALU alu[dest_reg, A_operand, alu_op, B_operand] Perform addition, subtraction, bit operations dest_reg transfer register (TR), general purpose register (GPR) or nothing A_operand TR, GPR, immediate data, or nothing B_operand TR, GPR, or immediate data ALU_SHF alu_shf[dest_reg, A_operand, alu_op, B_operand, B_op_shift_cnt] Like ALU, but shift B_operand before evaluation dest_reg Context-relative TR, GPR, or nothing A_operand TR, GPR, immediate data, or nothing B_operand TR, GPR, or immediate data
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Microengine Assembler BR_BCLR, BR_BSET br_bclr[reg, bit_position, label#] Branch if the given bit (0-32) in register reg is cleared or set, respectively reg Context-relative TR or GPR BR=BYTE, BR!=BYTE Br=byte[reg, byte_spec, byte_compare_value, label#] Ranch if the indicated byte (0-3) of register reg is of the constant value byte_compare_value, or not, respectively reg Context-relative TR or GPR
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Microengine Acess to SDRAM Read, write, Receive FIFO read, Transmit FIFO write sdram[sdram_cmd, $$sdram_xfer_reg, source_op_1, source_op_2, ref_count], optional_token Parameters sdram_cmd read: read from SDRAM to TRs write: write from TRs to SDRAM r_fifo_rd: read from Receive FIFO to SDRAM t_fifo_wr: write to Transmit FIFO from SDRAM $$sdram_xfer_reg The first of a set of contiguous TRs for read and write operations One ref_count requires to TRs source_op_1/2 Specifies the address to read from or to write to ref_count Values between 1 and 8 are valid optional_token ctx_arb allows other threads to run until memory operation is complete ctx_swap switches context to the next thread The (complicated) indirect_ref option must be used r_fifo_rd and t_fifo_wr
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Microengine Access to SRAM (1/2) Read, write, read and lock, write and unlock, unlock, … sram[sram_cmd, $sram_xfer_reg, source_op_1, source_op_2, ref_count] optional_token sram_cmd Read or write $ sram_xfer_reg the first of ref_count contiguous TRs source_op_1+source_op_2 Specifies the address to read from or to write to ref_count The number of longwords read or written sram[read_lock, $sram_xfer_reg, source_op_1, source_op_2, ref_count] optional_token Like sram[read, …] But lock the address source_op_1+source_op_2 sram[write_unlock, $sram_xfer_reg, source_op_1, source_op_2, 1] optional_token Write one TR to source_op_1+source_op_2 and unlock the address sram[unlock, --, source_op_1, source_op_2, 1] optional_token Unlock the address specified by souce_op_1+source_op_2
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Microengine Access to SRAM (2/2) …, bit operations, push, pull sram[bit_wr, $bit_mask, source_op_1, source_op_2, bit_op] optional_token As with scratch memory but with the larger address space $ bit_mask is a write TR holds mask on input and optional results sram[push, --, source_op_1, source_op_2, queue_num] optional_token Add source_op_1 and source_op_2 to get an address Push the address onto queue queue_num sram[pop, $popped_list, --, --, queue_num] optional_token Pop an address from queue queue_num Store the pointer in the TR $ popped_list
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Microengine Access to Scratch Memory Read, write, bit operations, in-place increment scratch[bit_wr, $sram_xfer_reg, source_op_1, source_op_2, bit_op], optional_token Bit operations scratch[read, $sram_xfer_reg, source_op_1, source_op_2, ref_count], optional_token Read into transfer registers scratch[write, $sram_xfer_reg, source_op_1, source_op_2, ref_count], optional_token Write from transfer registers scratch[incr, --, source_op_1, source_op_2, 1], optional_token In-place increment by 1 Parameters source_op1/2 Context-relative transfer registers (TRs) or immediate values Sum between 0 and 1023 $sram_xfer_reg For read and write: the first of a set of contiguous TRs to be read or written For bit_wr: a TR containing a bit mask ref_count Number of longwords read or written Between 1 and 8 bit_op set_bits, clear_bits, test_and_set_bits, test_and_clear_bits For the test_ operations, the write TR is modified
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Microengine Assembler Ordering problems Example immed[$$temp, 0x1234] sdram[write,$$temp,base,0,1], ctx_swap, defer[1] immed[$$temp,0x5678] The wrong value may be written Writing and context swapping are deferred The register modification may overtake Address of a register It is possible to determine the address of a register .local a_gp_reg immed[a_gp_reg,&$an_sram_reg] .endlocal
Memory Management
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Resource Manager Task Used by StrongARM code For microACEs and microACE applications to interface with microengines API Load code into microengines Enable/disable microengines Get/set microengine configuration and resource assignment Send and receive packets to and from microcode blocks Allocate and access uncached SRAM, SDRAM and Scratch memory
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Resource Manager Data structures RmMemoryHandle Opaque handle identifying memory allocated by the resource manager typedef int RmMemoryHandle
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Resource Manager RmMalloc Allocate a particular kind of memory RM_SRAM RM_SDRAM RM_SCRATCH Some SRAM and SDRAM is already used by the ASL, some SDRAM is used by Linux, the rest can be used freely by microACEs for data structures of its choosing The memory is not cached The memory is not protected by an MMU, and the virtual address is the same for all processes Returned pointers are always aligned (SDRAM to 8 bytes, SRAM and Scratch to 4 bytes) Requested sizes are rounded to alignment This allocation is not efficient microACEs should allocate all memory they need at once and manage it themselves ix_error RmMalloc( RmMemoryType in_memory_type, unsigned char* out_mem_handle_ptr, int in_size_in_bytes ); RmFree Released memory allocated by RmMalloc ix_error RmFree( unsigned char* ptr );
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Resource Manager Translating between virtual and physical addresses The microengines map memory differently into their address space then the StrongARM StrongARM addresses make no sense and have to be translated to offsets from the start of each particular kind of memory (and back) RmGetPhysOffset ix_error RmGetPhysOffset( RmMemoryType in_memory_type, unsigned char* in_data_ptr, unsigned int* out_offset ); Translate address in_data_ptr in RmAlloc’d memory to its offset from the given memory type The offset is in words (4 byte units) for SRAM and Scratch, and in quadwords (8 byte units) for SDRAM RmGetVirtualAddress ix_error RmGetVirtualAddress( RmMemoryType in_memory_type, unsigned char** out_buffer_ptr, unsigned int in_offset); Take the physical offset from the base of the given memory type and translate it into a virtual address valid for the StrongARM
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Summary Memory on the IXP cards Kinds of memory Its features Its accessibility Microengine assembler Resource Manager functionsStrong