Seoul National University Memory Efficient Software Synthesis from Dataflow Graph Wonyong Sung, Junedong Kim, Soonhoi Ha Codesign and Parallel Processing Lab. Seoul National University
Contents Introduction Code Generation from Block Diagram Specification Synchronous Data Flow and Single Appearance Schedule Proposed Strategies Optimization 1 : code sharing optimization Optimization 2 : minimize buffer requirement Experiments Conclusions
Seoul National University Introduction Motivations Embedded system has limited amount of memory large program = memory cost, performance penalty, power consumption New trend of software development : high level design methodology growing complexity, fast design turn-around time, limited budget, etc. Goal of Research Reduce the code and data size of automatically generated software In an automatic software synthesis environment Specification = Dataflow graph with SDF(Synchronous DataFlow) semantics
Seoul National University Software Synthesis from SDF graph A B C D Possible Schedules : = AABCABACDABABCD = (6A)(4B)(3C)(2D) = (2(3A2B))(3C)(2D) main(){ for(i=0;i<6;i++){A} for(i=0;i<4;i++){B} for(i=0;i<3;i++){C} for(i=0;i<2;i++){D} } main(){ for(i=0;i<2;i++){ for(j=0;j<3;j++){A} for(j=0;j<2;j++){B} } for(i=0;i<3;i++){C} for(i=0;i<2;i++){D} } Single Appearance Schedule (SAS)
Seoul National University Previous Efforts Single Appearance Schedule (SAS): APGAN,RPMC [by Battacharyya et. al.] in Ptolemy Group SAS guarantees the minimum code size (without code sharing) APGAN,RPMC : heuristics to find data minimized SAS schedule ILP formulation for data memory minimization [by Ritz et. al.] in Meyr Group flat single appearance schedule + sharing of data buffer Rate optimal compile time schedule [by Govindarajan et. al.]in Gao Group tried to minimize the buffer requirement using linear programming An algorithm to compute the smallest data buffer size [by Ade et. al.] in GRAPE group
Seoul National University Proposed Strategies Coding style not stuck to one coding style, hybrid approach generated code is a mixture of inlines and functions Optimization 1: Code Sharing Multiple instances of a same kernel treated as different node in SAS Code sharing optimization has gain(block size) and cost(context size) Optimization 2: Schedule Adjustment give up single appearance schedule to reduce the data size (1) represents schedule information with BTLC data structure (2) find possible location for adjustment (3) schedule adjustment
Seoul National University Flowchart of Optimization Procedure Get SAS schedule [RPMC,APGAN] Code sharing optimization code-block size context size BTLC Schedule Adjustment C code generation
Seoul National University Example of Code Sharing (CD2DAT) ramp ramp’ sine sine’ fir1fir2fir3fir4xgraph Code before sharing for(int i=0;i<2;i++) { { /* code for fir1 */ ……………… out = tap*input[i]’ ……………… } /* code for fir 2 */ …………….. Code after sharing for(int i=0;i<2;i++) fir(1); for(int i=0;i<3;i++) fir(2); …………… void fir(int context){ ……………… context_FIR[context].out... ……………… } context definition typedef struct{ double *out; int output_ofs; int output_bs; int output_nx; …………. double decimation; double tap; }context_FIR;
Seoul National University Code Size Overhead (in Sparc/Solaris) without contextwith context 4 bytes40 bytes Reference Overhead = 36 bytes! ….. = value; ….. = *(context_CGCRamp[context].value); ldd[%fp ],%o0sethi %hi(0x20800),%o1 ld [%o1+0x3c8], %o0 mov%o0, %o2 sll%o2, 2, %o1 add%o1, %o0, %o1 sll%01, 3, %o0 add%fp, -424, %o1 add%o1, %o0, %o2 ld[%o2 + 0x1c], %o0 ldd[%o0], %o2
Seoul National University Optimization 1 : Code sharing Multiple instances of a same kernel have their own contexts Kernel code should be transformed into shared version function Shared Version references are only through context variable Gain and cost of sharing Gain = (# instances -1) (code block size) Cost = (#instances) (context variable size) + (code block overhead) Code sharing is performed only when the gain is larger than the cost
Seoul National University Decision Formula 1 > ( -1) ( -1) > > + > context + reference + > context + shared (1) = code sharing overhead = context + reference (2) context = p i (p i ), p i ports where, (x) = 3*sizeof(int) + sizeof(pointer) (3) reference = t {S,C,AS,AP} ( (t) (t)) (t) = reference count (t) = unit overhead t = type of reference (4) = code block size (5) = number of instances
Seoul National University Optimization 2 : Adjusting SAS Adjusting Single Appearance Schedule 2(7A3B)5C ==> 51 2(7A3B2C)C ==> 39 give up single appearance schedule BTLC (Binary Tree with Leaf Chain) CAB 3756 C AB G [6,0,0] = [input, inside, output] [7,0,5] [21,0,15] [0,0,3] [0,0,21]
Seoul National University Computation of Buffer Requirements A 3 B W = |O L I R | I = | I L I R - W | O = | O L O R -W | In general W LR [I,W,O] C AB G [7,0,5] [21,0,15] [0,0,3] [0,0,21] [0,21,30] [30,0,0] [0,30,0] 21 30
Seoul National University Flowchart of Schedule Adjustment Construct BTLC Compute buffer requirement Find candidate for adjustment Adjust schedule (split a chain) SAS schedule BTLC found yes Done code generation no
Seoul National University Splitting A Chain C AB G Schedule = 2(7A3B)5C [0,0,3] [7,0,5] [6,0,0] [0,0,21] [21,0,15] [0,21,30] [30,0,0] [0,30,0] Split point Finding split candidate a chain which has the largest number in this example BC is selected Schedule after splitting 2(7A3B2C)C In general, for a schedule that has two clusters aC a bC b (a and b are loop counts) new schedule is defined as a(C a (b/a)C b )(b%a)C b ), if a<b (a%b)C a b((b/a)C a C b ), otherwise
Seoul National University Decision Formula C C G [0,0,21] [12,0,0] [0,12,6] [0,6,0] 12 6 AB 73 [0,0,3] [7,0,5] 21 [6,0,0] [21,0,15] [0,21,15] [6,0,0] New Schedule 2(7A3B2C)C Gain = 12 |Cluster| = |W| value of the cluster
Seoul National University Experiment : CD2DAT [0,280,0] G 407 4F F11 X21 fork 1 M1 S21 R21 S1R1 F2 F3X1 [1,0,1] [0,0,1] [1,0,1] [2,0,1] [1,0,2] [1,0,0] [1,0,2] [3,0,4] [7,0,5] [7,0,4] [1,0,0] [4,0,0] [280,4,0] [56,0,40] [6,0,8] [0,0,1] [0,1,1] [0,0,2] [0,1,2] [0,2,1] [0,1,2] [0,1,1] [0,1,6] [0,6,56] [0,56,280] F F2 F3X1 [3,0,4] [7,0,5] [7,0,4] [1,0,0] [4,0,0] [35,4,0] [56,35,40] [6,0,8] [0,6,56] [0,56,40] [0,35,35] F4 X1 [7,0,4] [1,0,0] [4,0,0] [35,4,0] 4 G [0,35,0] 35
Seoul National University Experimental Result CD2DATFilter Bank SAS Code Sharing Schedule Adjustment Program size after each optimization Memory behavior of CD2DAT in ARM7 FetchesMiss SAS Code Sharing Schedule Adjustment
Seoul National University Conclusion Our Environment PeaCE : Ptolemy extension as Codesign Environment Optimization Techniques in Software Synthesis For automatic code generation from dataflow graph Joint minimization of code and data size Selective application code sharing and schedule adjustment to SAS Future works Clustering : multiple fine grain nodes into a large one increase chance of code sharing Buffer sharing further reduce the buffer size and increase the cache effect
Seoul National University Thank You !