Download presentation
Presentation is loading. Please wait.
1
Stretch Wide Data Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation
2
Objectives Provide more information about Wide Data Registers in Stretch Processors –aligned load and store –unaligned load and store Example will be provided later on
3
Topics Wide data support –WR data type –WR register file Managing the WR register file Aligned and unaligned loads/stores RISC DP Instruction Set Extension Fabric (ISEF) WRAR Memory 128 32 128 32
4
Managing WR Data SCP supports 128-bit wide data –WR : A new unsigned integer data type WR X, Y; Advantages of wide data –Efficient data transfers between memory and register file –Parallel operations on packed data (e.g. SIMD, MIMD)
5
Managing WR Data (cont.) A typical application includes the following 3 parts –Load WR input data from memory to the WR register file –Operate on WR data (Extension Instructions only) –Store WR data back to memory –Load and store instructions are accessed by calling their corresponding intrinsics –Intrinsics are C functions defined by the compiler. They usually map to a single assembly instruction
6
Introducing Wide Data short x[COUNT], y[COUNT]; short *xPtr, *yPtr; short scale; for(i=0; i<COUNT; i++) { *yPtr++ = *xPtr++ * scale; } WR X, Y; for(i=0; i<COUNT/8; i++) { LOAD_INCR(&X, &xPtr, 8*2); V_SCALE8(X, scale, &Y); STORE_INCR(Y, &yPtr, 8*2); } WR data is explicitly managed by the application –User loads input data from memory into a WR variable and –User stores WR variable results back to memory
7
Example: A Simple Memory Copy #include #define COUNT 18 #define align(n) __attribute__ ((aligned ((n)))) int main() { int i; WRA wrData; unsigned char align(16) memSrc[COUNT*16]; unsigned char align(16) memDst[COUNT*16]; unsigned char *memSrcPtr, *memDstPtr; memSrcPtr = memSrc; memDstPtr = memDst; for (i=0; i<COUNT; i++) { WRAL128IU(&wrData, (WRA **) &memSrcPtr, 16); WRAS128IU(wrData, (WRA **) &memDstPtr, 16); } return (0); }
8
Aligned Loads WRL128I –Load 128 bits from an aligned memory address into WR with an immediate (constant) byte offset WR X; int inArray[1024*4]; WRL128I (&X, (WR*) inArray, 3*16); // X = *(inArray+3*16); WRL128IU –Same as WRL128I except that the pointer is post incremented int *inPtr; WRL128IU (&X, (WR**) &inPtr, 1*16); // X = *inPtr; inPtr += 16;
9
Aligned Loads (cont.) Immediate offsets and increments must always be multiples of the load/store size (in bytes) The execution of an aligned instruction with an unaligned memory address will result in a hardware fault (MMU exception) Simple MIPS code lui $1, 4000 ori $1, $1, 2 lw $5, 0($1) sw $5, 8($1) What will be the content of the memory? 0x4444FFFF 0x00000000 0x10000014 0x10000010 0x1000000C 0x10000008 0x10000004 0x10000000
10
Aligned Stores Each load instruction has a corresponding store instruction WRS128I –Store 128 bits from WR to an aligned memory address with an immediate (constant) byte offset WR X; int outArray[256*4]; WRS128I (X, (WR*) outArray, 3*16); WRS128IU –Same as WRS128I except that the pointer is post incremented int *outPtr; WRS128IU (X, (WR**) &outPtr, 1*16);
11
WR Register File – Block Diagram Wide register file is used for holding WR data –32 WR registers (128-bits each) –Divided into 2 banks of 16 registers (WRA and WRB) The WRA / WRB types associate a variable with WR bank A/B WRA v1, v2, v3; WRB w1, w2, w3; The WR type defaults to WRA –Use WRA / WRB to avoid unnecessary register moves between the two WR banks
12
Summary of Aligned Loads/Stores SCP supports a rich set of load and store instructions with multiple addressing modes –Data sizes: 1/2/4/8/16 bytes –Addressing modes: Immediate/Register/Increment, Circular & Bit- reverse Aligned load/store instruction naming convention: WR[bank](operation)(size)(mode) Example: WRAL64IU() BankOperationSize (bits)Mode WR WRA WRB L (load) S (store) 128 64 32 16 8 I (immediate offset) IU (immediate post increment) X (register offset) XU (register offset post increment) CU (circular buffer) RU (bit-reverse)
13
Streaming Data A stream is used for loading (GET) or storing (PUT) sequential data in memory Stream operation includes 3 parts –Initialization –GET/PUT/advance number of bytes –Flush (for PUT only) Enables unaligned and variable sized loads and stores
14
Aligned vs. Unaligned Loads
15
Stream Instructions
16
SCP Streaming Data A stream is initialized with a –Start address –Post-increment / pre-decrement mode SCP supports –3 concurrent input byte streams: 0, 1, and 2 (GETs) set during initialization –1 output byte stream (PUT) –Only a single GET or PUT instruction executes on any given cycle
17
GET Bytes WRGET0I –Load 1-16 bytes from memory into WR –Example: #define DIR 0 int inArray[1024]; WR X; WRGET0INIT(DIR, inArray);// init0 (mode, addr), init1 … WRGET0I(&X, 6); // load next 6 bytes into X … WRGET0I(&X, 9); // load next 9 bytes into X if == 1 decrement else increment
18
PUT Bytes WRPUTI –Store 1-16 bytes from a WR into memory –Example #define DIR 0 int outArray[1024]; WR; WRPUTINIT(DIR, outArray);// init 0 … WRPUTI(X, 6); // store 6 bytes from X … WRPUTI(X, 9); // store 9 bytes from X … WRPUTFLUSH();// flush0, flush1
19
Incrementing vs. Decrementing Streams
20
Summary of GET/PUT Instructions SCP also supports a rich set of instructions for streaming data (unaligned load/store instructions) GET/PUT instructions specify –The number of bytes to load/store from/to the stream –The number of bytes to advance the stream Naming convention: WR[bank](operation)(stream no.)(mode) For example: WRGET0I() BankOperationStream No. (GET only) Mode WR WRA WRB GET (load) PUT (store) 012012 I (immediate byte count and advance amount) X (register byte count and advance amount) XX (register specifies separate byte count and advance amounts)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.