A Retargetable Preprocessor for Multimedia Instructions* (work in progress) INRIA F. Bodin, G. Pokam, J. Simonnet *partially supported by ST Microelectronics.

A Retargetable Preprocessor for Multimedia Instructions* (work in progress) INRIA F. Bodin, G. Pokam, J. Simonnet *partially supported by ST Microelectronics

Multimedia Instructions n Instruction set extension to achieve high performance –many different ones –crucial for embedded systems –difficult to use n Retargetability is an issue

Multimedia Instructions n Exploits sub word parallelism

An Example (Trimedia) char *back, *forward, *idct, *destination; for (i = 0; i<64; i += 1){ destination[i] = ((back[i]+ forward[i] + 1) >> 1) destination[i] = ((back[i]+ forward[i] + 1) >> 1) + idct[i]; + idct[i]; } char *back, *forward, *idct, *destination; for (i = 0; i<64; i += 1){ destination[i] = ((back[i]+ forward[i] + 1) >> 1) destination[i] = ((back[i]+ forward[i] + 1) >> 1) + idct[i]; + idct[i]; } int *i_back = (int *) back; int *i_forward = (int *) forward; int *i_idct = (int *) idct; int *i_dest = (int *) destination; for (i = 0; i<16; i += 1){ temp = QUADAVG(i_back[i], i_forward[i]); temp = QUADAVG(i_back[i], i_forward[i]); i_dest[i] = DSPUQUADADDUI(temp, i_idct[i]); i_dest[i] = DSPUQUADADDUI(temp, i_idct[i]);} int *i_back = (int *) back; int *i_forward = (int *) forward; int *i_idct = (int *) idct; int *i_dest = (int *) destination; for (i = 0; i<16; i += 1){ temp = QUADAVG(i_back[i], i_forward[i]); temp = QUADAVG(i_back[i], i_forward[i]); i_dest[i] = DSPUQUADADDUI(temp, i_idct[i]); i_dest[i] = DSPUQUADADDUI(temp, i_idct[i]);}

MMI Automatic Exploitation Srccode Vectorization[Bik01][Krall00][...] Loop Unrolling [Larsen00][Leupers2000] Idioms/MMIRecognition Alignment CodeGeneration machineindependent machinedependent Pre-processing

The MMI Recognition Phase n Find the instruction available on the machine –after vectorization –after unrolling n User interaction –fast retargetability –not only for compiler writer –no compiler recompilation needed

A MMI Example temp[i] =(back[i]+forward[i]+1)>>1; t1[i] = (back[i] + forward[i]); t2[i] = t1[i] + 1; temp[i] = t2[i] >> 1; t1[i] = (back[i] + forward[i]); t2[i] = t1[i] + 1; temp[i] = t2[i] >> 1; rather than

SWARecog : a Retargetable Engine for MMI n Front-end independent –CoSy –Sage++ n Retargetable –configurable intermediate form n Uses a rewriting system based on U. Assmann’s work [Assmann96]

An Overview of SWARecog CoSySage++ IR description based

The Intermediate Format n Identical for code and rules n Attributes declaration n Node declaration n Edge declaration [NODES] Operator:ENUM = {cast, mul, add, sright, assg,...}; [NODES] VariableName:STRING = {}; [EDGES] distance:INTEGER = {} DEFAULT 0; [NODES] Operator:ENUM = {cast, mul, add, sright, assg,...}; [NODES] VariableName:STRING = {}; [EDGES] distance:INTEGER = {} DEFAULT 0; NODELabel Assign: NodeType = {operator} NodeType = {operator} Operator = {assg} Operator = {assg} ValueType = {int} ValueType = {int} NODELabel Assign: NodeType = {operator} NodeType = {operator} Operator = {assg} Operator = {assg} ValueType = {int} ValueType = {int} (flowdep ObjectAddr:14 Assign:8 1) (flowdep Plus:9 Assign:8 2) [negated] (same ObjectAddr:14 ObjectAddr:11 0) (flowdep ObjectAddr:14 Assign:8 1) (flowdep Plus:9 Assign:8 2) [negated] (same ObjectAddr:14 ObjectAddr:11 0)

NODELabel v: NodeType = {scalar} NodeType = {scalar} Operator = {obj} Operator = {obj} Aliased = {0} Aliased = {0} ValueType = {int} ValueType = {int} VariableName = {*} VariableName = {*} LoopSector = {body} LoopSector = {body} NODELabel v: NodeType = {scalar} NodeType = {scalar} Operator = {obj} Operator = {obj} Aliased = {0} Aliased = {0} ValueType = {int} ValueType = {int} VariableName = {*} VariableName = {*} LoopSector = {body} LoopSector = {body} A Rule Description Example RULE [1] MulToShift: (flowdep v:1 Plus:6 0) (flowdep v:2 Plus:6 0) (flowdep Plus:6 Exp:0 *) 1 (same v:1 v:2 0) (same v:2 v:1 0) RULE [1] MulToShift: (flowdep v:1 Plus:6 0) (flowdep v:2 Plus:6 0) (flowdep Plus:6 Exp:0 *) 1 (same v:1 v:2 0) (same v:2 v:1 0) (flowdep v:1 Shift:7 1) (flowdep IntConst1:8 Shift:7 2) (flowdep Shift:7 Exp:0 *) 1 (flowdep v:1 Shift:7 1) (flowdep IntConst1:8 Shift:7 2) (flowdep Shift:7 Exp:0 *) 1 b = a+a b = a<<1 v + * 1 v << * v

Example-1 /*$pragma[VectorLoop("NoAlias")]*/ for (i = xa; i < xb; i = i+4){ sum = sum + (s[i] * om[i]); sum = sum + (s[i] * om[i]); sum = sum + (s[i+1] * om[i+1]); sum = sum + (s[i+1] * om[i+1]); sum = sum + (s[i+2] * om[i+2]); sum = sum + (s[i+2] * om[i+2]); sum = sum + (s[i+3] * om[i+3]); sum = sum + (s[i+3] * om[i+3]);}/*$pragma[VectorLoop("NoAlias")]*/ for (i = xa; i < xb; i = i+4){ sum = sum + (s[i] * om[i]); sum = sum + (s[i] * om[i]); sum = sum + (s[i+1] * om[i+1]); sum = sum + (s[i+1] * om[i+1]); sum = sum + (s[i+2] * om[i+2]); sum = sum + (s[i+2] * om[i+2]); sum = sum + (s[i+3] * om[i+3]); sum = sum + (s[i+3] * om[i+3]);} for (i = xa ; i < xb ; i = i + 4){ sum = sum + dualDotProd(packCont(s[i], s[i + 1]), sum = sum + dualDotProd(packCont(s[i], s[i + 1]), packCont(om[i], om[i + 1])); sum = sum + dualDotProd(packCont(s[i + 2], s[i + 3]), sum = sum + dualDotProd(packCont(s[i + 2], s[i + 3]), packCont(om[i + 2], om[i + 3])); } for (i = xa ; i < xb ; i = i + 4){ sum = sum + dualDotProd(packCont(s[i], s[i + 1]), sum = sum + dualDotProd(packCont(s[i], s[i + 1]), packCont(om[i], om[i + 1])); sum = sum + dualDotProd(packCont(s[i + 2], s[i + 3]), sum = sum + dualDotProd(packCont(s[i + 2], s[i + 3]), packCont(om[i + 2], om[i + 3])); }

Example-2 for (i = xa ; i < xb ; i = i + 4){ NEWVAR_temp2_1 = dualAdd(packCont(s[i + 2], s[i + 3]), NEWVAR_temp2_1 = dualAdd(packCont(s[i + 2], s[i + 3]), packCont(om[i + 2], om[i + 3])); packCont(om[i + 2], om[i + 3])); NEWVAR_temp2_2 = dualAdd(packCont(s[i], s[i + 1]), NEWVAR_temp2_2 = dualAdd(packCont(s[i], s[i + 1]), packCont(om[i], om[i + 1])); packCont(om[i], om[i + 1])); NEWVAR_temp1_1 = unpackCont(NEWVAR_temp2_1, 0); NEWVAR_temp1_1 = unpackCont(NEWVAR_temp2_1, 0); d[i + 2] = j + 4 + (NEWVAR_temp1_1); d[i + 2] = j + 4 + (NEWVAR_temp1_1); NEWVAR_temp3_1 = unpackCont(NEWVAR_temp2_1, 1); NEWVAR_temp3_1 = unpackCont(NEWVAR_temp2_1, 1); d[i + 3] = j + 4 + (NEWVAR_temp3_1); d[i + 3] = j + 4 + (NEWVAR_temp3_1); NEWVAR_temp1_2 = unpackCont(NEWVAR_temp2_2, 0); NEWVAR_temp1_2 = unpackCont(NEWVAR_temp2_2, 0); d[i] = j + 4 + (NEWVAR_temp1_2); d[i] = j + 4 + (NEWVAR_temp1_2); NEWVAR_temp3_2 = unpackCont(NEWVAR_temp2_2, 1); NEWVAR_temp3_2 = unpackCont(NEWVAR_temp2_2, 1); d[i + 1] = j + 4 + (NEWVAR_temp3_2); d[i + 1] = j + 4 + (NEWVAR_temp3_2); } for (i = xa ; i < xb ; i = i + 4){ NEWVAR_temp2_1 = dualAdd(packCont(s[i + 2], s[i + 3]), NEWVAR_temp2_1 = dualAdd(packCont(s[i + 2], s[i + 3]), packCont(om[i + 2], om[i + 3])); packCont(om[i + 2], om[i + 3])); NEWVAR_temp2_2 = dualAdd(packCont(s[i], s[i + 1]), NEWVAR_temp2_2 = dualAdd(packCont(s[i], s[i + 1]), packCont(om[i], om[i + 1])); packCont(om[i], om[i + 1])); NEWVAR_temp1_1 = unpackCont(NEWVAR_temp2_1, 0); NEWVAR_temp1_1 = unpackCont(NEWVAR_temp2_1, 0); d[i + 2] = j + 4 + (NEWVAR_temp1_1); d[i + 2] = j + 4 + (NEWVAR_temp1_1); NEWVAR_temp3_1 = unpackCont(NEWVAR_temp2_1, 1); NEWVAR_temp3_1 = unpackCont(NEWVAR_temp2_1, 1); d[i + 3] = j + 4 + (NEWVAR_temp3_1); d[i + 3] = j + 4 + (NEWVAR_temp3_1); NEWVAR_temp1_2 = unpackCont(NEWVAR_temp2_2, 0); NEWVAR_temp1_2 = unpackCont(NEWVAR_temp2_2, 0); d[i] = j + 4 + (NEWVAR_temp1_2); d[i] = j + 4 + (NEWVAR_temp1_2); NEWVAR_temp3_2 = unpackCont(NEWVAR_temp2_2, 1); NEWVAR_temp3_2 = unpackCont(NEWVAR_temp2_2, 1); d[i + 1] = j + 4 + (NEWVAR_temp3_2); d[i + 1] = j + 4 + (NEWVAR_temp3_2); } for (i = xa; i < xb; i = i+4) /*$pragma[VectorLoop("NoAlias")]*/{ for (i = xa; i < xb; i = i+4) /*$pragma[VectorLoop("NoAlias")]*/{ d[i] = j + 4 + (s[i] + om[i]); d[i] = j + 4 + (s[i] + om[i]); d[i+1] = j + 4 + (s[i+1] + om[i+1]); d[i+1] = j + 4 + (s[i+1] + om[i+1]); d[i+2] = j + 4 + (s[i+2] + om[i+2]); d[i+2] = j + 4 + (s[i+2] + om[i+2]); d[i+3] = j + 4 + (s[i+3] + om[i+3]); d[i+3] = j + 4 + (s[i+3] + om[i+3]); } for (i = xa; i < xb; i = i+4) /*$pragma[VectorLoop("NoAlias")]*/{ for (i = xa; i < xb; i = i+4) /*$pragma[VectorLoop("NoAlias")]*/{ d[i] = j + 4 + (s[i] + om[i]); d[i] = j + 4 + (s[i] + om[i]); d[i+1] = j + 4 + (s[i+1] + om[i+1]); d[i+1] = j + 4 + (s[i+1] + om[i+1]); d[i+2] = j + 4 + (s[i+2] + om[i+2]); d[i+2] = j + 4 + (s[i+2] + om[i+2]); d[i+3] = j + 4 + (s[i+3] + om[i+3]); d[i+3] = j + 4 + (s[i+3] + om[i+3]); } instance number ($INSTANCE )

Combining the Rules n Strata or alternative based –normalization based IR Form C code IR Form Rule Desc. RewritingEngine RewritingEngine RewritingEngine....... RewritingEngine IR Form Rule Desc. C code IR Form RewritingEngine

Rule Generation n C rules description LHSGeneratorLHSGenerator RHSGeneratorRHSGenerator SWARecogSWARecog C code Rule description Front-endFront-end Front-endFront-end

A Rule Description Example for (i = 0; i < LOOP_BOUND1 -1; i = i+2) /*$pragma[LHS()]*/ { for (i = 0; i < LOOP_BOUND1 -1; i = i+2) /*$pragma[LHS()]*/ { ROOT_1(LEAF_3(tab1[i]) + LEAF_4(tab2[i])); ROOT_1(LEAF_3(tab1[i]) + LEAF_4(tab2[i])); ROOT_2(LEAF_5(tab1[i+1]) + LEAF_6(tab2[i+1])); ROOT_2(LEAF_5(tab1[i+1]) + LEAF_6(tab2[i+1])); } for (i = 0; i < LOOP_BOUND1 -1; i = i+2) /*$pragma[RHS()]*/ { for (i = 0; i < LOOP_BOUND1 -1; i = i+2) /*$pragma[RHS()]*/ { NEWVAR_temp2 = dualAdd(packCont(LEAF_3(tab1[i]),LEAF_5(tab1[i+1])), NEWVAR_temp2 = dualAdd(packCont(LEAF_3(tab1[i]),LEAF_5(tab1[i+1])), packCont(LEAF_4(tab2[i]),LEAF_6(tab2[i+1]))); packCont(LEAF_4(tab2[i]),LEAF_6(tab2[i+1]))); NEWVAR_temp1 = unpackCont(NEWVAR_temp2,0); NEWVAR_temp1 = unpackCont(NEWVAR_temp2,0); NEWVAR_temp3 = unpackCont(NEWVAR_temp2,1); NEWVAR_temp3 = unpackCont(NEWVAR_temp2,1); ROOT_1(NEWVAR_temp1); ROOT_1(NEWVAR_temp1); ROOT_2(NEWVAR_temp3); ROOT_2(NEWVAR_temp3); } for (i = 0; i < LOOP_BOUND1 -1; i = i+2) /*$pragma[LHS()]*/ { for (i = 0; i < LOOP_BOUND1 -1; i = i+2) /*$pragma[LHS()]*/ { ROOT_1(LEAF_3(tab1[i]) + LEAF_4(tab2[i])); ROOT_1(LEAF_3(tab1[i]) + LEAF_4(tab2[i])); ROOT_2(LEAF_5(tab1[i+1]) + LEAF_6(tab2[i+1])); ROOT_2(LEAF_5(tab1[i+1]) + LEAF_6(tab2[i+1])); } for (i = 0; i < LOOP_BOUND1 -1; i = i+2) /*$pragma[RHS()]*/ { for (i = 0; i < LOOP_BOUND1 -1; i = i+2) /*$pragma[RHS()]*/ { NEWVAR_temp2 = dualAdd(packCont(LEAF_3(tab1[i]),LEAF_5(tab1[i+1])), NEWVAR_temp2 = dualAdd(packCont(LEAF_3(tab1[i]),LEAF_5(tab1[i+1])), packCont(LEAF_4(tab2[i]),LEAF_6(tab2[i+1]))); packCont(LEAF_4(tab2[i]),LEAF_6(tab2[i+1]))); NEWVAR_temp1 = unpackCont(NEWVAR_temp2,0); NEWVAR_temp1 = unpackCont(NEWVAR_temp2,0); NEWVAR_temp3 = unpackCont(NEWVAR_temp2,1); NEWVAR_temp3 = unpackCont(NEWVAR_temp2,1); ROOT_1(NEWVAR_temp1); ROOT_1(NEWVAR_temp1); ROOT_2(NEWVAR_temp3); ROOT_2(NEWVAR_temp3); } defines the properties of the leaf expressions to match. the engine generates same_address_+1

A Rule Description Example for (i = 0; i < LOOP_BOUND1 -1; i = i+2) /*$pragma[LHS()]*/ { ROOT_1(LEAF_7(sum) = LEAF_8(sum) + ROOT_1(LEAF_7(sum) = LEAF_8(sum) + (LEAF_3(tab1[i]) * LEAF_4(tab2[i]))); ROOT_2(LEAF_9(sum) = LEAF_10(sum) + ROOT_2(LEAF_9(sum) = LEAF_10(sum) + (LEAF_5(tab1[i+1]) * LEAF_6(tab2[i+1]))); } for (i = 0; i < LOOP_BOUND1 -1; i = i+2) /*$pragma[LHS()]*/ { ROOT_1(LEAF_7(sum) = LEAF_8(sum) + ROOT_1(LEAF_7(sum) = LEAF_8(sum) + (LEAF_3(tab1[i]) * LEAF_4(tab2[i]))); ROOT_2(LEAF_9(sum) = LEAF_10(sum) + ROOT_2(LEAF_9(sum) = LEAF_10(sum) + (LEAF_5(tab1[i+1]) * LEAF_6(tab2[i+1]))); } for (i = 0; i < LOOP_BOUND1 -1; i = i+2) /*$pragma[RHS()]*/ { ROOT_2(LEAF_9(sum) = LEAF_10(sum) + ROOT_2(LEAF_9(sum) = LEAF_10(sum) + dualDotProd(packCont(LEAF_3(tab1[i]),LEAF_5(tab1[i+1])), dualDotProd(packCont(LEAF_3(tab1[i]),LEAF_5(tab1[i+1])), packCont(LEAF_4(tab2[i]),LEAF_6(tab2[i+1])))); packCont(LEAF_4(tab2[i]),LEAF_6(tab2[i+1])))); } for (i = 0; i < LOOP_BOUND1 -1; i = i+2) /*$pragma[RHS()]*/ { ROOT_2(LEAF_9(sum) = LEAF_10(sum) + ROOT_2(LEAF_9(sum) = LEAF_10(sum) + dualDotProd(packCont(LEAF_3(tab1[i]),LEAF_5(tab1[i+1])), dualDotProd(packCont(LEAF_3(tab1[i]),LEAF_5(tab1[i+1])), packCont(LEAF_4(tab2[i]),LEAF_6(tab2[i+1])))); packCont(LEAF_4(tab2[i]),LEAF_6(tab2[i+1])))); }

Conclusion and Perspectives n The prototype is running n Vectorization and alignment phases are under development n Next step : study the tradeoff between unrolling and vectorization

Bibliography n [Assmann96] Graph Rewrite Systems for Program Optimization, U. Assman, Technical Report RR2955, INRIA Rocquencourt, 1996 n [bik01] Experiments with Automatic Vectorization for the Pentium 4 Processor, A. Bik, M. Girkar, P. Grey and X. Tian, CPC, 2001 n [Cheong97] An Optimizer for Multimedia Instruction Sets, G. Cheong and M. Lam, Proceedings of the Second SUIF Compiler Workshop, 1997

Bibliography (cont.) n [Krall00] Compilation Technique for Multimedia Processors, A. Krall and S. Lelait, IJPP, vol. 28, No 4, 2000 n [Larsen00] Exploiting Superword Level Parallelism with Multimedia Instruction Sets, S. Larsen and S. Amarasinghe, PLDI 2000 n [Leupers2000] Code Selection for Media Processors with SIMD Instructions, R. Leupers, DATE 2000 n [Sreraman00] A Vectorizing Compiler for Multimedia Extensions, N. Sreraman and R. Govindarajan, IJPP, vol. 28, No 4, 2000

A Retargetable Preprocessor for Multimedia Instructions* (work in progress) INRIA F. Bodin, G. Pokam, J. Simonnet *partially supported by ST Microelectronics.

Similar presentations

Presentation on theme: "A Retargetable Preprocessor for Multimedia Instructions* (work in progress) INRIA F. Bodin, G. Pokam, J. Simonnet *partially supported by ST Microelectronics."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Retargetable Preprocessor for Multimedia Instructions* (work in progress) INRIA F. Bodin, G. Pokam, J. Simonnet *partially supported by ST Microelectronics.

Similar presentations

Presentation on theme: "A Retargetable Preprocessor for Multimedia Instructions* (work in progress) INRIA F. Bodin, G. Pokam, J. Simonnet *partially supported by ST Microelectronics."— Presentation transcript:

Similar presentations

About project

Feedback