Presentation is loading. Please wait.

Presentation is loading. Please wait.

Séminaire COSI-Roscoff’011 Séminaire COSI ’01 Power Driven Processor Array Partitionning for FPGA SoC S.Derrien, S. Rajopadhye.

Similar presentations


Presentation on theme: "Séminaire COSI-Roscoff’011 Séminaire COSI ’01 Power Driven Processor Array Partitionning for FPGA SoC S.Derrien, S. Rajopadhye."— Presentation transcript:

1 Séminaire COSI-Roscoff’011 Séminaire COSI ’01 Power Driven Processor Array Partitionning for FPGA SoC S.Derrien, S. Rajopadhye

2 Séminaire COSI-Roscoff’012 Content n Context and motivations Silicon compilation tools Target architectures Power consumption Related work n Partitioning n Modeling Power n Experimental results n Conclusion

3 Séminaire COSI-Roscoff’013 Silicon compilation tools n Parallel processor array architectures Regular and scalable (well suited to FPGAs) Specialized high-performance data-path n Restricted class of loops SUREs (uniform dependencies) Static polyhedral loop domain n Compute intensive nested loops Image processing (motion estimation, stereo vision) Signal processing (QR factorization, DLMS)

4 Séminaire COSI-Roscoff’014 Power consumption n General model and motivations P=Pstat+Vdd.Cd.Df (gate level model) Estimate at RTL level (entropy based models) n Mainly dictated by : On chip area cost and activity Off-chip I/O volume n System level power model ? Estimate from specs and target arch.

5 Séminaire COSI-Roscoff’015 Target architecture FPGA CPU System Memory Ext world n Embedded CPU Power PC NIOS n Soc bus Amba, Coreconnect Plug ’n play IP cores n Shared Memory Low latency High bandwidth

6 Séminaire COSI-Roscoff’016 Related Work n Compiler transformations to reduce mem accesses [Kandemir] Loop fusion Loop tiling Loop reordering n Design space exploration for custom memory systems [Imec] Systematic exploration Multi-level memory hierachy The approach is brute force

7 Séminaire COSI-Roscoff’017 Content n Context and motivations n Target architectures n Partitioning Clustering (LSGP) Tiling (LPGS) Co-partitionning n modeling Power n Experimental results n Conclusion

8 Séminaire COSI-Roscoff’018 n Partition PE array into Tiles Tiles are executed sequentially Intermediate results stored in off-chip memory requires unidirectionnal communications : n Tile shape is rectangular Bound // to PE space base vectors Perfect « Tiling » of processor space Tiling (LPGS)

9 Séminaire COSI-Roscoff’019 Tiling (LPGS)   =2   =3 Matrix  diagonal det|  |=N pe domain height

10 Séminaire COSI-Roscoff’0110 n Regroups PEs into Clusters operations executed sequentially I/O accesses reduced n Cluster shape is rectangular Bound // to PE space basis vectors Perfect « Tiling » of processor space n Scheduling is axes-major Several possible schedulings Seq. of clustering along each axis Simplifies control logic Clustering (LSGP)

11 Séminaire COSI-Roscoff’0111 Clustering (LSGP)  y =3  y =2 Matrix  diagonal det|  |=N pe size  y x…x  x PE index vector Iteration index vector Original space- time mapping

12 Séminaire COSI-Roscoff’0112 Clustering (LSGP) 12 6 1 1 1 1 3 1 PE original  x =2  x =2,  y =3 Resource usage estimate :

13 Séminaire COSI-Roscoff’0113 Hybrid-partitioning n Step1 : array is Tiled Tune the I/O volume n Step2 : Tile is clusteredArray Tune the resource usage n Trade-Off Off-chip I/O Volume Local memory sizes

14 Séminaire COSI-Roscoff’0114 Content n Context and motivations n Target architectures n Partitioning n modeling Power IO power model Core power model Putting it all together n Experimental results n Conclusion

15 Séminaire COSI-Roscoff’0115 Dynamic IO Energy model n IO Energy depends on IO volume (Ram clock speed) Operation (Rd,Wr) Port Toggle rate E io =K rd.V rd + K wr.V wr n Determine IO volume For all loop variables Given tiling parameters Number write I/O operations Technological constant

16 Séminaire COSI-Roscoff’0116 n Tile IO volume is called « foot print » Estimate for this foot print [Arg95] Spread vector of dependencies IO Volume estimate (1/2) : substituting i th row with spread vector

17 Séminaire COSI-Roscoff’0117 n Total Tile IO volume: n Example : d A =[1 0 0] a A =[1 0 0] l A =2 V A = 2.H.  1 d B =[0 1 0] a B =[1 0 0] l B =2 V B = 2.H.   d C =[0 0 1] a C =[1 0 0] l C =4 V C =     IO Volume estimate (1/2) k th variable byte widthNumber of variables Tile size parameterSpread vector

18 Séminaire COSI-Roscoff’0118 n FPGA power dissipation model P core =P stat +K c.D lc.n lc.f Not suited to our target FPGA architecture. n Distinction between LCs (mem and logic) P core =P stat +K c.D lc.n lc.f+ K m.D m.n m.f Core power model (1/4) Technology constant Average toggle rate Nbs of logic cells Design operating freq.

19 Séminaire COSI-Roscoff’0119 Core power model (2/4) n Control logic is not modeled too complex to estimate no significant contribution to power n Core power depends on Number of PEs : depends on  and  Area usage for each PE : depends on  Average toggle rate for PE datapath and local memory (application constant)

20 Séminaire COSI-Roscoff’0120 Core power model (3/4) n Memory ressource usage LCs used as distributed memory (16x1bits) Datapath is design constant (library based) n Area cost for a PE array Clustering parameter along processor space j Register width along processor space k Datapath functional cost Number of PEs

21 Séminaire COSI-Roscoff’0121 Core power model (4/4) n Energy cost for the whole loop nest we have E c =P c.n cycle.T cycle we will consider n cycle =V calc /n p n Total core energy cost Energy is not dependant on n p !! Total loop computation volumeAverage toggle rate

22 Séminaire COSI-Roscoff’0122 Content n Context and motivations n Target architectures n Partitioning n Modeling Power n Experimental results Model validation Extrapolations n Conclusion

23 Séminaire COSI-Roscoff’0123 IO power model results

24 Séminaire COSI-Roscoff’0124 Core power model results

25 Séminaire COSI-Roscoff’0125 System power model

26 Séminaire COSI-Roscoff’0126 Content n Context and motivations n Target architectures n Partitioning n modeling Power n Experimental results n Conclusion Solving the optimisation problem (Lagrange Multipliers) Custom cache for embedded CPUs Extension to SAREs (affine dependances)

27 Séminaire COSI-Roscoff’0127 Conclusion n Models matches experiments Cheap measurement setup Many components contribute to current dissipation (LEDs, PCI, etc…) n Observations Trade-off evolves with technology More sensitive for Asics ?

28 Séminaire COSI-Roscoff’0128 Future Work(1/2) n Formulation of the optimization pb Minimize Energy/iteration Contraints on Performance and Area n Analitycal solution ? Lagrange multipliers No closed form for n>3 BUT fast numerical methods

29 Séminaire COSI-Roscoff’0129 Future Work(2/2) n Model for embedded CPUs Trade-off cache-size and memory acceses. Determine optimal cache size and associated tiling parameters. n Extension to SARE ? Affine dependencies. More general loops.


Download ppt "Séminaire COSI-Roscoff’011 Séminaire COSI ’01 Power Driven Processor Array Partitionning for FPGA SoC S.Derrien, S. Rajopadhye."

Similar presentations


Ads by Google