Tile Size Selection for Low-Power Tile-based Architectures Michael Brown
Observation ● Some types of applications run better over multiple cores than other types
Goal ● Given an application(s) and an embedded performance target ● Find the computational granularity that will minimize power consumed
Refresh of Embedded DSPs ● No need to exceed performance target ● (paper) Not concerned with area ● Want lowest power possible
Steps ● Generate a set of tile architectures ● Determine power efficiency of each ● Parallelize and profile algorithm ● Compare costs
Computational Graularity ● Defined as maximum arith. ops/cycle (where sources are local) 1:32 4:8 32:1 (# of tiles):(# of operations/cycle/tile)
Generating Architectures ● CMP with 1-32 cores (tiles) that maintains a constant computational width ● Used Synchroscalar to create tiles based on the Blackfin DSP (Analog Devices)
Determine Power Efficiency ● Large tiles have high switching capacitance per operation ● Small tiles have poor data locality requiring extra cycles
Parallelize Algorithms ● Choose static media-based apps. ● Allows data flow graph to be made to maximize parallelism ● Data flow graph also allows profiling communication between parallel elements (recursive bisection algorithm)
Compare Costs ● Cost of hardware – SRAM, control logic, computational units – Register file – Interconnect ● Cost of software – Inter-tile communication
Compare Costs - Hardware ● SRAM, control logic, comp. Units – Area and power grow linearly with tile size ● Register file – Capacity and number of ports grow linearly – Area and power grow quadratically ● Interconnect wiring – Area and power grow quadratically like reg.
Compare Costs - Software ● Case1: minimal communication – Power savings gained by using smaller tiles passed on as system power savings ● Case2: heavy communication – Power savings gained by using smaller tiles is lost to extra cycles required to maintain cache coherency between tiles – Needs higher frequency to make same wall clock time – Higher frequency then needs voltage scaling
Compare Costs
Comparing Additional Parameters - Communication Application communication for differing interconnects and tile sizes
Comparing Additional Parameters - Power Application power usage for differing interconnects and tile sizes
Results