Download presentation
Presentation is loading. Please wait.
Published byNelson Perry Modified over 9 years ago
1
Memory Memory 10/9 - 2004 INF5060: Multimedia data communication using network processors
2
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Overview Memory on the IXP cards Kinds of memory Its features Its accessibility Microengine assembler Memory management
3
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Kinds of Memory Microengine general purpose registers 128 registersOn chip StrongARM instruction cache 16 KbytesOn chip StrongARM data cache8 KbytesOn chip StrongARM mini cache512 bytesOn chip Scratch(pad)4 KbytesOn chip Instruction store64 KbytesOn chip FlashROM8 Mbytes SRAM8 Mbytes SDRAM256 Mbytes
4
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors IX Bus Unit IXP Functional Units Ethernet MAC (other IX devices) IX Bus StrongARM Core IXP Network Processor SRAM Unit SDRAM Unit PCI Bus Unit Microengine Various busses PCI Bus Host machine PCI-to-PCI bridge SDRAM (up to 256 MB) SRAM (up to 8 MB) Flash ROM (up to 8 MB) Memory Mapped I/O devices 64 bit/33Mhz 64 bit/116Mhz 32 bit/116Mhz 64 bit/104Mhz
5
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Kinds of Memory Physical memory on the IXP1200 is contiguous Memory in general is not byte-addressable Memory units emulate byte addressing for the StrongARM Big endian architecture StrongARM: big endian mode Microengines are big endian Memory typeAddressable data unit (bytes) Relative access time (cycles) Scratch(pad)412-14 SRAM416-20 SDRAM832-40
6
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Terms Careful ! Inconsistencies ! Wording in Intel IXP manuals Word: 16 bit Longword: 32 bit Quadword: 64 bit Wording in StrongARM and other ARM manuals Halfword: 16 bit Word: 32 bit
7
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Kinds of Memory Memory accessible to StrongARM Mapped into a single address space Memory accessible to microengines Individually mapped Separate assembler instructions for each kind Device 0 SRAM Unit Device 1 PCI Unit Device 2 Reserved Device 3 StrongARM Core System Device 4 Reserved Device 5 AMBA Translation Unit Device 6 SDRAM Unit 0000 4000 0000 8000 0000 9000 0000 A000 0000 B000 0000 C000 0000 FFFF SDRAM Scratchpad Microengine registers SRAM
8
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Memory: memory, cache memory, registers StrongARM core caches Microengine registers SDRAM SRAM IX Bus Unit: Scratch(pad) memory
9
StrongARM
10
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors StrongARM Core Features A general purpose processor With MMU 16 Kbytes instruction cache Round robin replacement 8 Kbytes data cache Round robin replacement Write-back cache, cache replacement on read, not on write 512 byte mini-cache for data that is used once and then discarded To reduce flushing of the main data cache Instruction code stored in SDRAM
11
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors IX Bus Unit StrongARM Core Access Full access to SDRAM Unit SRAM Unit incl. FlashROM PCI Bus Unit Access to microengine’s Program code Status registers Program counters Access to IX bus unit’s Status registers Scratch memory StrongARM Core SRAM Unit SDRAM Unit PCI Bus Unit Microengine
12
Microengines
13
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Microengine Features 4 hardware contexts 2K x 32 bit instruction control store Every instruction is 32 bits long No instruction cache Instructions downloaded onto the microengine by the StrongARM Not loaded from RAM on demand 5-stage instruction pipeline Blocks for reference operations Deferred execution to reduce context switch penalty 256 registers 32 bit registers Load and store architecture Must bring data into registers, work, write to destination Single cycle access in registers Use “reference command” to fetch into registers Yield/sleep during fetch execution
14
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors IX Bus Unit Microengine Access Full access to SDRAM Unit SRAM Unit IX Bus Unit Access to StrongARM Interrupts Trigger status register reads Access to PCI bus unit Initiate DMA with SDRAM Access to other microengines None Access to self Inter-thread signaling No access to own instruction code SRAM Unit SDRAM Unit PCI Bus Unit StrongARM Core MicroEngine Microengine
15
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Microengine Registers From: IXP1200 Family Hardware Reference Manual
16
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Microengine Registers 256 registers 128 general purpose registers Arranged in two banks A and B Instructions with 2 input registers From different banks Otherwise assembler warning 128 transfer registers Transfer registers are not general purpose registers Ports to their neighboring functional unit 64 SDRAM transfer registers Transfer to and from SDRAM 32 read / 32 write 64 SRAM transfer registers Transfer to and from everything but SDRAM 32 read / 32 write 4 busses can be used in parallel By different threads Loading transfer registers 64 bytes at once from one functional unit to another 128 bytes at once from the IX bus
17
SDRAM
18
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors General features Recommended use StrongARM instruction code Large data structures Packets during processing 64-bit addressed (8 byte aligned, quadword aligned) 256 Mbytes 928 Mbytes/s peak bandwidth Higher bandwidth than SRAM Higher latency than SRAM Access StrongARM Microengines StrongARM takes precedence PCI DMA on behalf of microengines Direct access to IX Bus Unit’s Transmit and Receive FIFO
19
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Special features Byte, word, longword access supported through a read-modify- write access to quadwords Speed penalty Direct path from SDRAM to IX Bus Transmit and Receive FIFOs Controlled by microengines Up to 64 bytes transferable without microengine involvement Byte aligner between SDRAM and IX Bus For sending to the Transmit FIFO Shift bytewise when e.g. header length has changed Can only be used by microengines in the t_fifo_wr command
20
SRAM
21
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors General features Recommended use Lookup tables Free buffer lists Data buffer queue lists 32-bit addressed (4 byte aligned, word aligned) 8 Mbytes 464 Mbytes/s peak bandwidth Lower bandwidth than SDRAM Lower latency than SDRAM Access StrongARM Microengines StrongARM takes precedence
22
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Accessing SRAM StrongARM access Byte, word and longword access Bit operations through SRAM Alias Address Space Bit, byte, word write supported through read-modify-write Microengine access Bit and longword access only Up to 8 longwords with one command Bit write supported through read-modify-write Bit operations within instructions
23
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Special features Atomic push/pop operations For maintaining lists 8 entry push/pop register list Microengines Named commands StrongARM Dedicated memory addresses Don’t cache these memory areas Atomic bit test, set and clear For synchronized access Microengine Use a write transfer register Specify bits to test, read, or write Reading the bit changes the write transfer register StrongARM Special macros for read-modify-write operations Blocks until operation is completed Don’t cache this memory
24
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Special features 8 entry CAM (content addressable memory) for read locks For synchronized access 8 concurrent locks on memory Protect from StrongARM and microengines Read, unlock and write_unlock Microengines sram assembler command Waits until locks is released StrongARM 3 separate 8 MByte mapped memory regions Failed locking is indicated by flags, read always successful Don’t cache these memory areas
25
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors StrongARM Core Memory Map Device 1 PCI Unit Device 2 Reserved Device 3 StrongARM Core System Device 4 Reserved Device 5 AMBA Translation Unit Device 6 SDRAM Unit 0000 4000 0000 8000 0000 9000 0000 A000 0000 B000 0000 C000 0000 FFFF Device 0 SRAM Unit Slow Port3840 0000 – 385F FFF Command FIFO Test3800 0080 – 3800 00FF SRAM CSRs3800 0000 – 3800 0028 List 7 Pop operations2780 0000 – 27FF FFFF List 6 Pop operations2700 0000 – 277F FFFF List 5 Pop operations2680 0000 – 26FF FFFF List 4 Pop operations2600 0000 – 267F FFFF List 3 Pop operations2580 0000 – 25FF FFFF List 2 Pop operations2500 0000 – 257F FFFF List 1 Pop operations2480 0000 – 24FF FFFF List 0 Pop operations2400 0000 – 247F FFFF List 7 Push operations2380 0000 – 23FF FFFF List 6 Push operations2300 0000 – 237F FFFF List 5 Push operations2280 0000 – 22FF FFFF List 4 Push operations2200 0000 – 227F FFFF List 3 Push operations2180 0000 – 21FF FFFF List 2 Push operations2100 0000 – 217F FFFF List 1 Push operations2080 0000 – 21FF FFFF List 0 Push operations2000 0000 – 207F FFFF Bit Test & Set1980 0000 – 19FF FFFF Bit Test & Clear1900 0000 – 197F FFFF Bit Write Set1880 0000 – 18FF FFFF Bit Write Clear1800 0000 – 187F FFFF CAM Unlock1600 0000 – 167F FFFF Write Unlock1400 0000 – 147F FFFF Read Lock1200 0000 – 127F FFFF Read/Write1000 0000 – 107F FFFF BootROM0000 0000 – 007F FFFF
26
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Memory Map for SRAM addresses Physical Device FunctionStrongARM Address Space (byte addressing) Microengine SRAM instruction command Microengine Address Space (longword addressing) SlowPort 3840 0000 – 385F FFFread/write70 0000 – 7F FFFF SRAM CSRs 3800 0000 – 3800 0013read/write60 0000 – 60 0080 SRAMPop operations2400 0000 – 27FF FFFFpop00 0000 – 1F FFFF SRAMPush operations2000 0000 – 23FF FFFFpush00 0000 – 1F FFFF SRAMBit Test & Set1980 0000 – 19FF FFFFbit_wr (test_and_set_bits)00 0000 – 1F FFFF SRAMBit Test & Clear1900 0000 – 197F FFFFbit_wr (test_and_clear_bits) 00 0000 – 1F FFFF SRAMBit Write Set1880 0000 – 18FF FFFFbit_wr (set_bits)00 0000 – 1F FFFF SRAMBit Write Clear1800 0000 – 187F FFFFbit_wr (clear_bits)00 0000 – 1F FFFF SRAMUnlock1600 0000 – 167F FFFFunlock00 0000 – 1F FFFF SRAMWrite Unlock1400 0000 – 147F FFFFwrite_unlock00 0000 – 1F FFFF SRAMRead Lock1200 0000 – 127F FFFFread_lock00 0000 – 1F FFFF SRAMRead/Write1000 0000 – 107F FFFFread/write00 0000 – 1F FFFF BootROM 0000 0000 – 007F FFFFread/write20 0000 – 3F FFFF
27
IX Bus Unit
28
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors “FBI” Engine Interface IX Bus Unit SDRAM Unit Microengines Ethernet MAC (other IX devices) Transmit FIFO Receive FIFO Hash Units Status Registers IX Bus StrongARM IXP Network Processor IX Bus Unit Scratchpad
29
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Scratch Memory: General Features Recommended use Passing messages between processors and between threads Semaphores, mailboxes, other IPC 32-bit addressed (4 byte aligned, word aligned) 4 Kbytes Has an atomic autoincrement instruction Only usable by microengines
30
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors StrongARM Core Memory Map Device 0 SRAM Unit Device 1 PCI Unit Device 2 Reserved Device 3 StrongARM Core System Device 4 Reserved Device 5 AMBA Translation Unit Device 6 SDRAM Unit 0000 4000 0000 8000 0000 9000 0000 A000 0000 B000 0000 C000 0000 FFFF Scratchpad Memory B004 4000 – B004 4FFF IX Bus Unit CSRB004 0000 ME5 Transfer RegsB000 6800 ME4 Transfer RegsB000 6000 ME3 Transfer RegsB000 5800 ME2 Transfer RegsB000 5000 ME1 Transfer RegsB000 4800 ME0 Transfer RegsB000 4000 ME5 CSRB000 2800 ME4 CSRB000 2000 ME3 CSRB000 1800 ME2 CSRB000 1000 ME1 CSR B000 0800 ME0 CSR B000 0000 ME = microengine
31
Microengine Assembler
32
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Using Microengine Registers Programming Context-relative addressing Each threads can have its own window of registers (one 4 th of the total), so they can’t overwrite each other Absolute addressing Register is visible to all threads Context-relative vs. absolute addressing Decided on a per-instruction basis Assembler Supports symbolic names Assigns registers from the different kinds Programmer must take care concerning the number of registers used can hint the assembler to assign (transfer) registers contiguously Context-relative addressing of the registers Threads are only able to address their own register share This is more typically used Assembler notations symbolic_register_name – general purpose register $symbolic_register_name – SRAM transfer register $$symbolic_register_name – SDRAM transfer register Absolute addressing Threads can use more than their share of registers Threads can communicate via registers Assembler notations @symbolic_register_name – general purpose register @$symbolic_register_name – SRAM transfer register @$$symbolic_register_name – SDRAM transfer register
33
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Microengine Assembler ALU alu[dest_reg, A_operand, alu_op, B_operand] Perform addition, subtraction, bit operations dest_reg transfer register (TR), general purpose register (GPR) or nothing A_operand TR, GPR, immediate data, or nothing B_operand TR, GPR, or immediate data ALU_SHF alu_shf[dest_reg, A_operand, alu_op, B_operand, B_op_shift_cnt] Like ALU, but shift B_operand before evaluation dest_reg Context-relative TR, GPR, or nothing A_operand TR, GPR, immediate data, or nothing B_operand TR, GPR, or immediate data
34
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Microengine Assembler BR_BCLR, BR_BSET br_bclr[reg, bit_position, label#] Branch if the given bit (0-32) in register reg is cleared or set, respectively reg Context-relative TR or GPR BR=BYTE, BR!=BYTE Br=byte[reg, byte_spec, byte_compare_value, label#] Ranch if the indicated byte (0-3) of register reg is of the constant value byte_compare_value, or not, respectively reg Context-relative TR or GPR
35
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Microengine Acess to SDRAM Read, write, Receive FIFO read, Transmit FIFO write sdram[sdram_cmd, $$sdram_xfer_reg, source_op_1, source_op_2, ref_count], optional_token Parameters sdram_cmd read: read from SDRAM to TRs write: write from TRs to SDRAM r_fifo_rd: read from Receive FIFO to SDRAM t_fifo_wr: write to Transmit FIFO from SDRAM $$sdram_xfer_reg The first of a set of contiguous TRs for read and write operations One ref_count requires to TRs source_op_1/2 Specifies the address to read from or to write to ref_count Values between 1 and 8 are valid optional_token ctx_arb allows other threads to run until memory operation is complete ctx_swap switches context to the next thread The (complicated) indirect_ref option must be used r_fifo_rd and t_fifo_wr
36
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Microengine Access to SRAM (1/2) Read, write, read and lock, write and unlock, unlock, … sram[sram_cmd, $sram_xfer_reg, source_op_1, source_op_2, ref_count] optional_token sram_cmd Read or write $ sram_xfer_reg the first of ref_count contiguous TRs source_op_1+source_op_2 Specifies the address to read from or to write to ref_count The number of longwords read or written sram[read_lock, $sram_xfer_reg, source_op_1, source_op_2, ref_count] optional_token Like sram[read, …] But lock the address source_op_1+source_op_2 sram[write_unlock, $sram_xfer_reg, source_op_1, source_op_2, 1] optional_token Write one TR to source_op_1+source_op_2 and unlock the address sram[unlock, --, source_op_1, source_op_2, 1] optional_token Unlock the address specified by souce_op_1+source_op_2
37
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Microengine Access to SRAM (2/2) …, bit operations, push, pull sram[bit_wr, $bit_mask, source_op_1, source_op_2, bit_op] optional_token As with scratch memory but with the larger address space $ bit_mask is a write TR holds mask on input and optional results sram[push, --, source_op_1, source_op_2, queue_num] optional_token Add source_op_1 and source_op_2 to get an address Push the address onto queue queue_num sram[pop, $popped_list, --, --, queue_num] optional_token Pop an address from queue queue_num Store the pointer in the TR $ popped_list
38
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Microengine Access to Scratch Memory Read, write, bit operations, in-place increment scratch[bit_wr, $sram_xfer_reg, source_op_1, source_op_2, bit_op], optional_token Bit operations scratch[read, $sram_xfer_reg, source_op_1, source_op_2, ref_count], optional_token Read into transfer registers scratch[write, $sram_xfer_reg, source_op_1, source_op_2, ref_count], optional_token Write from transfer registers scratch[incr, --, source_op_1, source_op_2, 1], optional_token In-place increment by 1 Parameters source_op1/2 Context-relative transfer registers (TRs) or immediate values Sum between 0 and 1023 $sram_xfer_reg For read and write: the first of a set of contiguous TRs to be read or written For bit_wr: a TR containing a bit mask ref_count Number of longwords read or written Between 1 and 8 bit_op set_bits, clear_bits, test_and_set_bits, test_and_clear_bits For the test_ operations, the write TR is modified
39
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Microengine Assembler Ordering problems Example immed[$$temp, 0x1234] sdram[write,$$temp,base,0,1], ctx_swap, defer[1] immed[$$temp,0x5678] The wrong value may be written Writing and context swapping are deferred The register modification may overtake Address of a register It is possible to determine the address of a register .local a_gp_reg immed[a_gp_reg,&$an_sram_reg] .endlocal
40
Memory Management
41
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Resource Manager Task Used by StrongARM code For microACEs and microACE applications to interface with microengines API Load code into microengines Enable/disable microengines Get/set microengine configuration and resource assignment Send and receive packets to and from microcode blocks Allocate and access uncached SRAM, SDRAM and Scratch memory
42
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Resource Manager Data structures RmMemoryHandle Opaque handle identifying memory allocated by the resource manager typedef int RmMemoryHandle
43
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Resource Manager RmMalloc Allocate a particular kind of memory RM_SRAM RM_SDRAM RM_SCRATCH Some SRAM and SDRAM is already used by the ASL, some SDRAM is used by Linux, the rest can be used freely by microACEs for data structures of its choosing The memory is not cached The memory is not protected by an MMU, and the virtual address is the same for all processes Returned pointers are always aligned (SDRAM to 8 bytes, SRAM and Scratch to 4 bytes) Requested sizes are rounded to alignment This allocation is not efficient microACEs should allocate all memory they need at once and manage it themselves ix_error RmMalloc( RmMemoryType in_memory_type, unsigned char* out_mem_handle_ptr, int in_size_in_bytes ); RmFree Released memory allocated by RmMalloc ix_error RmFree( unsigned char* ptr );
44
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Resource Manager Translating between virtual and physical addresses The microengines map memory differently into their address space then the StrongARM StrongARM addresses make no sense and have to be translated to offsets from the start of each particular kind of memory (and back) RmGetPhysOffset ix_error RmGetPhysOffset( RmMemoryType in_memory_type, unsigned char* in_data_ptr, unsigned int* out_offset ); Translate address in_data_ptr in RmAlloc’d memory to its offset from the given memory type The offset is in words (4 byte units) for SRAM and Scratch, and in quadwords (8 byte units) for SDRAM RmGetVirtualAddress ix_error RmGetVirtualAddress( RmMemoryType in_memory_type, unsigned char** out_buffer_ptr, unsigned int in_offset); Take the physical offset from the base of the given memory type and translate it into a virtual address valid for the StrongARM
45
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors Summary Memory on the IXP cards Kinds of memory Its features Its accessibility Microengine assembler Resource Manager functionsStrong
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.