Download presentation
Presentation is loading. Please wait.
1
Software Synthesis and Analysis
A SoC Design Automation School of EECS Seoul National University
2
Introduction System-level design
Chip-level design and/or board-level design Board-level design Partition HW into multiple chips (off-the-shelf IC’s and ASIC’s) Partition SW into multiple processors Synthesize interface logic Place and route Chip-level design Synthesis of HW part Synthesis of SW part HW synthesis problem alone is relatively well defined ® Exploit conventional high-level synthesis and logic synthesis methodology We focus on chip-level SW synthesis
3
Introduction Software synthesis Generate SW (possibly C code) from partitioned internal representation for SW Generate machine code running on the target processor SW compilation Generally takes an input specification in imperative, non-concurrent languages such as C The input already describes the implementation at a fairly detailed level. SW synthesis Input is an implementation-independent function description such as FSMs Optimized translation from the inputs to an implementation in C or assembly code Need to optimize scheduling (static and/or dynamic) for system performance Difficult to synthesize efficient SW for a wide range of applications
4
Software Synthesis Approaches
Cosyma J. Henkel, Th. Benner, R. Ernst, “Hardware generation and partitioning effects in the COSYMA system,” Proc. Int. Workshop on Hardware-Software Codesign, Oct, 1993. Mutual exclusion between hardware and software execution In some cases, co-design is not successful because of communication overhead. SW HW communication overhead
5
Software Synthesis Approaches
Vulcan R. Gupta and G. De Micheli, System Synthesis via Hardware-Software Co-Design, CSL-TR , Stanford Univ., Oct Allow concurrent execution of hardware and software only for multiple output inter-thread control dependency SW HW Flow Graphs Program Threads Program Routines
6
Software Synthesis Approaches
SNU Y. Shin and K. Choi, “Software synthesis through task decomposition by dependency analysis,” Proc. ICCAD, Nov Thread generation Ts = = = Tc while Tb + * = = < + = = = + nw nw = + nr T(nr)
7
Software Synthesis Approaches
Thread clustering T(nr): T5 ST(nr): T1, T6 = = = T2 T1 while + * T4 T7 T6 T3 = = < + = = = + nw nw = + T5 nr
8
Software Synthesis Approaches
Scheduling of threads T1 T1 T1 static scheduling T1 T3 T3 T3 T2 T5 T2 T2 T3 T2 dynamic scheduling T(nr) T5 ST(nr) T5 T(nr) T5 T(nr) T(nr) T6 T6 T6 static scheduling T6 T7 T7 T7 T7
9
Software Synthesis Approaches
Scheduling overhead conditional of while loop T1 Ts: T1, T2 Tc: T3 Tb: T4, T5, T(nr) ST(nr) : T2, T5 T2 T3 T5 T4 body of while loop T(nr) H/W delay T1 T3 T4 T2 T5 T(nr) T1 T2 T3 T5 T4 T(nr) context switching polling overhead
10
Software Synthesis Approaches
Code of nr node is converted to thread code and scheduler code do { Ti; _read_hw(&done); } while (!done && there are some Tis); Tnr; remaining Tis; while (1) { _read_hw(&done); if (!done) continue; _read_hw(&data1); _read_hw(&data2); ..... _read_hw(&datan); } scheduler code _read_hw(&data1); _read_hw(&data2); ..... _read_hw(&datan); polling sequences of straight-line code thread code(part of T(nr))
11
Software Synthesis Approaches
Lempel-Ziv data compression Execution time (sec) Co-design with mutual exclusion 0.17 0.27 File 1 File 2 size in bytes 1320 2293 All S/W solution 0.28 0.44 Co-design by our algorithm 0.11
12
Software Synthesis Approaches
POLIS F. Balarin, M. Chiodo, P. Giusto, H. Hsieh, A. Jurecska, L. Lavagno, A. Sangiovanni-Vincentelli, E. Sentovich, K. Suzuki, “Synthesis of software programs for embedded control applications,” IEEE Tr. on CAD, June 1999. For the design of real-time control systems Starts with a network of CFSMs formal languages (Esterel) translators CFSMs partitioning partitioned CFSMs HW synthesis SW synthesis interface synthesis BLIF optimized hardware C code OS synthesis HW interface logic synthesis integration S-graph scheduler template + timing constraints simulation formal verification intermediate format translator
13
Software Synthesis Approaches
CFSM (Esterel) Generate an s-graph (BDD) Optimize thru variable ordering CFSMs partitioning SW synthesis execution cycles, code size module simple: input c: integer; output y; var a: integer in loop await c; if a=?c then a:=0; emit y; else a:=a+1; end if end loop end var end module S-graph scheduler template + timing constraints SW estimation OS synthesis C code BEGIN present_c=1 T F a=?c F T a’:=a a’:=a+1 a’:=0 emit_y:=0 emit_y:=1 END
14
Software Synthesis Approaches
Generate RTOS Schedule sw-CFSMs Can bypass RTOS and chain certain executions of CFSMs Handle events Between sw-CFSMs: one flag for each input sw-CFSM to hw-CFSM: memory mapped I/O hw-CFSM to sw-CFSM: polling or interrupt Commercial RTOS can also be used but is less efficient CFSMs partitioning SW synthesis execution cycles, code size S-graph scheduler template + timing constraints SW estimation OS synthesis C code
15
Software Synthesis Approaches
Picasso B. Lin, “Software synthesis of process-based concurrent programs,” Proc. DAC, June 1998. Typical embedded software applications are specified in terms of communicating processes. An OS manages the run-time schedule of the processes. --> overhead in program size, run-time memory requirement, and execution time Other approaches Compile-time scheduling Hybrid (run-time scheduling for conditional or non-deterministic computations) ... This paper proposes static scheduling based on a Petri net theoretic technique. For an RC5 encryption example, about 78X speedup is achieved on SUN workstation running Solaris.
16
Software Synthesis Approaches
Petri net model ping (input chan(int) a, output chan(int) b) { int x; for (;;) { x=<-a; /* receive */ c2 if (x<0) b x=10-x; c else x=10+x; d b<-=x; /* send */ c1 }} pong (input chan(int) c, output chan(int) d) { int y, z=0; d<-=10; /* send */ c2 y=<-c; /* receive */ c1 z=(z+y) % 345; f system ( ) { chan (int) c1, c2; par { ping (c2, c1); pong (c1, c2); p1 p2 c2 c2 b c1 c d f c1
17
Software Synthesis Approaches
Parallel composition p1 p2 p1 p2 c2 c2 c2 b c1 b c d f c d c1 c1 f
18
Software Synthesis Approaches
Maximal expansion p1 p2 p1 p2 a g a g p3 p3 b d h b d h c e k c e k i i p4 p4 j l j l Maximal expansion: largest unrolling (cut of cycle) from initial marking. Initial place: place without input transition Cut-off place: place without output transition f f P1’ P2’ P3’
19
Software Synthesis Approaches
Cut-off marking p1 p2 p1 p2 a g a g p3 p3 b d h b d h c e k c e k i i p4 p4 j l j l Cut-off marking: Marking that is reachable from initial marking and enables no transition. f f P1’ P2’ P3’ P1’ P2’ P3’
20
Software Synthesis Approaches
Generate all maximal expansions from cut-off markings p3 h p1 p2 a g k i p3 p4 b d h j l c e k f i p4 p1 p2 j l a g Cut-off marking: Marking that is reachable from initial marking and enables no transition. p3’ f b P1’ P2’ P3’ c e p4
21
Software Synthesis Approaches
Static scheduling for each maximal expansion p1 p2 a g a g p3 b d h b h d h c e k c i c k e i e k i p4 j l j l f f p3p4 P1’ P2’ P3’
22
Software Synthesis Approaches
Multi-rate communication J. Cortadella, A. Kondratyev, L. Lavagno, M. Massot, S. Moral, C. Passerone, Y. Watanabe, A. Sangiovanni-Vincentelli, “Task generation and compile-time scheduling for mixed data-control embedded software,” Proc. DAC, June 2000. Example COEF OUT IN PROCESS Filter (InPort DATA, InPort COEF, OutPort OUT) { float c, d; int j; c=1; j=0; while (1) { SELECT(DATA,COEF) { case DATA; READ(DATA,d,1); if (j==N) { j=0; d=d*c; WRITE(OUT,d,1); } else j++; break; case COEF; READ(COEF,c,1); break; } } } PROCESS GetData (InPort IN, OutPort DATA) { float sample, sum; int i; while (1) { sum = 0; for (i=0; i<N; i++) { READ(IN,sample,1); sum += sample; WRITE(DATA,sample,1); } WRITE(DATA,sum/N,1); } } DATA
23
Software Synthesis Approaches
Petri net model Each port is modeled as a place. Channel connection of ports are modeled by merged places p1 p5 t1 t2 t6 Tcoef sum=0; i=0; WRITE(DATA,sum/N,1) c=1;j=0 false p6 COEF p2 i<N Tin true t3 t5 i++ READ(DATA,d,1) READ(COEF,c,1) DATA IN t7 t10 p3 p4 false p7 j==N t4 j++ t9 t8 READ(IN,sample,1); sum +=sample; WRITE(DATA,sample,1) j=0;d=d*c; WRITE(OUT,d,1) true
24
Software Synthesis Approaches
Scheduling Generate a subtree of the reachability tree
25
Software Synthesis Approaches
Code generation Minimize code size await node transition found already
26
Software Synthesis Approaches
Code generation Minimize code size cs1 sum=0; i=0; t1 cs2 c=1; j=0; t6 if (i<N) return; p2, t3 DATA = sum/N; t2 cs3 d = DATA; t7 if (j==N) { p7 j=0; d=d*c; t8 WRITE(OUT,d,1); } else j++; t9 cs5 cs4 cs1 cs4 cs5 READ(COEF,c,1) t10 READ(IN,sample,1); t4 sum += sample; DATA = sample; i++; t5
27
Software Synthesis Approaches
Generated code No scheduler overhead --> 4 to 10 times faster (video appl.) cs2 cs1 cs5 cs4 cs3 Init () { } sum=0; i=0; -- t1 c=1; j=0; -- t6 Tcoef () { } READ(COEF,c,1) -- t10 Tin () { L0: goto L0; } READ(IN,sample,1);-- t4 sum += sample; DATA = sample; i++; -- t5 d = DATA; -- t7 if (j==N) { -- p7 j=0; d=d*c; -- t8 WRITE(OUT,d,1); } else j++; -- t9 sum=0; i=0; -- t1 c=1; j=0; -- t6 if (i<N) return; -- p2, t3 DATA = sum/N; t2 if (i<N) return; -- p2, t3 DATA = sum/N; t2 d = DATA; -- t7 if (j==N) { -- p7 j=0; d=d*c; -- t8 WRITE(OUT,d,1); } else j++; -- t9 d = DATA; -- t7 if (j==N) { -- p7 j=0; d=d*c; -- t8 WRITE(OUT,d,1); } else j++; -- t9 sum=0; i=0; -- t1 READ(COEF,c,1) -- t10 READ(IN,sample,1);-- t4 sum += sample; DATA = sample; i++; -- t5
28
Software Synthesis Approaches
Code partitioning A. Nacul and T. Givargis, "Synthesis of time-constrained multitasking embedded software", ACM TODAES, Oct Phantom compiler Multitasking Supported by POSIX
29
Software Synthesis Approaches
CFG transformation atomic execution block synchronization point basic block setup function frame cleanup
30
Software Synthesis Approaches
Generated code
31
Software Synthesis Approaches
Experimental result
32
Software Timing Analysis
Performance metrics Extreme case performance HARD real-time systems (strict timing-constraint) Ex. automotive engine control unit Typically the worst-case is of interest Probabilistic performance SOFT real-time systems Ex. cellular phone Average case performance Systems without real-time constraints Ex. printer
33
Software Timing Analysis
Analysis component Path analysis Utilization of system resources Micro-architectural resources, memory behavior Input characterization
34
Software Timing Analysis
Path analysis for hard real-time systems Extreme case selection Select the longer one for each branch Too pessimistic Path enumeration Regular expression is used Ap: regular expression for static feasible execution paths if B then S1 else S2 - B(S1+S2) while B do S B(SB)n Ip: path information provided by user samepath(S1, S3) (*S1*)(*S3*)+(*S1*)(*S3*) Intersection of Ap and Ip: all feasible execution paths Limitation: the intersection operation is very expensive --> pessimistic approximations are used T(ApIp) = T(Ap((*S1*)(*S3*))) =< min(T(Ap(*S1*)), T(Ap(*S3*))) Bounding techniques Implicit path enumeration Using ILP, determine the bound of the execution count of each basic block
35
Software Timing Analysis
Implicit path enumeration Y.-T. Li and S. Malik, “Performance analysis of embedded software using implicit path enumeration,” Proc. DAC, June 1995. Cinderella Objective Function Total execution time = where xi is the execution count of basic block Bi ci is the execution time of basic block Bi Program Structural Constraints Derived from the program’s control flow graph Program Functionality Constraints Provided by the user Specify loop bounds and other path information
36
Software Timing Analysis
Structural constraints /* k >= 0 */ s=k; while (k<10) { if (ok) j++; else { j=0; ok= true; } k++; r=j; CFG Constraints Code
37
Software Timing Analysis
Functionality constraints if k0, the loop body will be iterated 0-10 times B5 will be executed at most 1 time CFG Constraints
38
Software Timing Analysis
Instruction cache analysis Y.-T. Li, S. Malik, and A. Wolfe, “Performance estimation of embedded software with instruction cache modeling,” Proc. ICCAD, Nov Restricted to direct-mapped caches Definitions l-block: A contiguous sequence of instructions within the same block that are mapped to the same line in the instruction cache. Conflict: For any two l-blocks that map to the same cache line, they conflict with each other if the execution of one l-block will displace the cache content of the other.
39
Software Timing Analysis
New cost function Total execution time = Cache constraints use cache conflict graph (CCG), one for each cache line includes s includes e due to previous program execution
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.