Presentation is loading. Please wait.

Presentation is loading. Please wait.

Compiler-Directed Power Density Reduction in NoC-Based Multi-Core Designs Sri Hari Krishna Narayanan, Mahmut Kandemir, Ozcan Ozturk Embedded Mobile Computing.

Similar presentations


Presentation on theme: "Compiler-Directed Power Density Reduction in NoC-Based Multi-Core Designs Sri Hari Krishna Narayanan, Mahmut Kandemir, Ozcan Ozturk Embedded Mobile Computing."— Presentation transcript:

1 Compiler-Directed Power Density Reduction in NoC-Based Multi-Core Designs Sri Hari Krishna Narayanan, Mahmut Kandemir, Ozcan Ozturk Embedded Mobile Computing Center (EMC 2 ) The Pennsylvania State University International Symposium on Quality Electronic Design 03/27-29, 2006, San Jose

2 2 Introduction to the Problem Increasing transistor counts and rising clock frequencies leads to increased power dissipation. Increased scaling coupled with increased power dissipation has lead to increased power density. Increased power density leads to rising thermal problems which requires solutions.

3 3 Solutions to Thermal Issues in multiprocessor environments Dynamic Thermal Management  Heo et al. ISLPED2003 Activity Migration between two processors.  Shang et al. Micro 2003 Communication is routed away from a potential hotspot. Upon a thermal emergency communication is throttled.

4 4 Problems with the current solutions Repeated suspension of execution or communication leads to performance loss. So it is beneficial to reduce the number of suspensions. How?  Reduce the number of thermal emergencies by reducing the power density.  Reduce the density by changing which processors are active and how much computation they perform within certain bounds.

5 5 Default Code Mapping Module #define N 5000 #define ITER 1int du1[N], du2[N], du3[N];int au1[N][N][2], au2[N][N][2], au3[N][N][2];int a11=1, a12=-1, a13=-1; int a21=2, a22=3, a23=- 3; int a31=5, a32=-5, a33=-2; int l;/* Initialization loop */ int sig = 1;int main(){ int kx; int ky; int kz;printf("Thread:%d\n",mp_nu mthreads()); for(kx = 0; kx < N; kx = kx + 1) { for(ky = 0; ky < N; ky = ky + 1) { for(kz = 0; kz <= 1; kz = kz + 1) { au1[kx][ky][kz] = 1; au2[kx][ky][kz] = 1; au3[kx][ky][kz] = 1; } }} }} /* main */ Code Default (performance oriented) Mapping Default Mapping Performance oriented  Active processors are close to each other.  Less communication cost.  Higher power density More thermal emergencies.  We propose to change this mapping into a temperature aware one.

6 6 Integer Linear Programming Model Phase 1  Increases the bounding box of the active processors given a communication cost limit and hence reduces the overall power density. Initial After Phase 1

7 7 Integer Linear Programming Model Constraints *  The number of active processors remains constant  The amount of extra communication between active processors in the new mapping has to be under the sum of the old communication and the relaxation allowed.  The area of bounding box must be maximized. * Exact mathematical expressions are given in the paper.

8 8 Phase 1 Default Code Mapping Module #define N 5000 #define ITER 1int du1[N], du2[N], du3[N];int au1[N][N][2], au2[N][N][2], au3[N][N][2];int a11=1, a12=-1, a13=-1; int a21=2, a22=3, a23=- 3; int a31=5, a32=-5, a33=-2; int l;/* Initialization loop */ int sig = 1;int main(){ int kx; int ky; int kz;printf("Thread:%d\n",mp_nu mthreads()); for(kx = 0; kx < N; kx = kx + 1) { for(ky = 0; ky < N; ky = ky + 1) { for(kz = 0; kz <= 1; kz = kz + 1) { au1[kx][ky][kz] = 1; au2[kx][ky][kz] = 1; au3[kx][ky][kz] = 1; } }} }} /* main */ Code ILP Module Default (performance oriented) Mapping Overall power density reduced mapping Phase1 mapping Overall density is reduced Communication cost increased

9 9 Integer Linear Programming Model Phase 2  Given the reduced overall power density mapping from phase 1, a new mapping with reduced local power density is generated. After Phase 2 After Phase 1

10 10 Integer Linear Programming Model Constraints *  Each old active processor that has high power density is split.  Each split processor performs same communication as the old processor.  The area of the bounding box remains constant.  The total power spent is within the bouding box is minimized by minimizing the communication path. * Exact mathematical expressions are given in the paper.

11 11 Phase 1 Phase 2 Default Code Mapping Module #define N 5000 #define ITER 1int du1[N], du2[N], du3[N];int au1[N][N][2], au2[N][N][2], au3[N][N][2];int a11=1, a12=-1, a13=-1; int a21=2, a22=3, a23=- 3; int a31=5, a32=-5, a33=-2; int l;/* Initialization loop */ int sig = 1;int main(){ int kx; int ky; int kz;printf("Thread:%d\n",mp_nu mthreads()); for(kx = 0; kx < N; kx = kx + 1) { for(ky = 0; ky < N; ky = ky + 1) { for(kz = 0; kz <= 1; kz = kz + 1) { au1[kx][ky][kz] = 1; au2[kx][ky][kz] = 1; au3[kx][ky][kz] = 1; } }} }} /* main */ Code ILP Module Default (performance oriented) Mapping Overall power density reduced mapping Thermal aware mapping

12 12 Profiling #define N 5000 #define ITER 1int du1[N], du2[N], du3[N];int au1[N][N][2], au2[N][N][2], au3[N][N][2];int a11=1, a12=-1, a13=-1; int a21=2, a22=3, a23=- 3; int a31=5, a32=-5, a33=-2; int l;/* Initialization loop */ int sig = 1;int main(){ int kx; int ky; int kz;printf("Thread:%d\n",mp_numthrea ds()); for(kx = 0; kx < N; kx = kx + 1) { for(ky = 0; ky < N; ky = ky + 1) { for(kz = 0; kz <= 1; kz = kz + 1) { au1[kx][ky][kz] = 1; au2[kx][ky][kz] = 1; au3[kx][ky][kz] = 1; } }} }} /* main */ Cycle times Chunk sizes Proc. Energy Communication Router Energy HotSpot + Shutdown HotSpot + Shutdown Implementation HotSpot Temperature estimation tool Developed by Skadron at UVa T(i+  ) = HS(T(i), floorplan, power,cycles,  ) Shutdown Any processor or router that is too hot must be turned off to allow cooldown

13 13 Algorithm 1. Initially mark processors as being active 2. While (all execution is not completed) { 2.a Time_Taken = Time_Taken + 1 2.b If a processor was active 2.b.i. Reduce the chunks that it has to execute by 1 2.c Calculate the new current temperature for all processors. T(i+  ) = HS(T(i), floorplan, power,cycles,  ) 2.d If a processor is too hot 2.d.i. Mark it as inactive 2.e If a router is too hot 2.e.i. Mark all processors communicating though it as inactive. 2.f Determine all the active processors and routers for the next scheduling step. } 3. Return Time_Taken

14 14 NoC Multi-core Model Routers are roughly 1/5 th the area of the processors Processors communicate using x-y routing  Used to estimate the cost of communication

15 15 Parameters used ParameterBrief Explanation Processor300MHz single issue Chip Area8.2 mm * 7 mm Threshold Temperature86.12 C W1 Mesh Size5 * 5 Grid Processor Area1.4 mm * 1.4 mm Router Area.24 mm * 1.44 mm

16 16 Benchmarks Used BenchmarkCycles (millions) Processor Energy (uJ) Router Energy (uJ) adi4381239551.1604697 eflux5680918.11696502 tsf17992548001.1515800 syntc14381239551.10 syntc25680918.185917071

17 17 Results – Thermal Emergencies

18 18 Results - Performance

19 19 Conclusions Dynamic thermal management leads to suspension of execution. We propose a novel compiler directed mechanism to reduce occurrences of thermal emergencies. By reducing the number of thermal emergencies performance is improved.

20 Thank you!


Download ppt "Compiler-Directed Power Density Reduction in NoC-Based Multi-Core Designs Sri Hari Krishna Narayanan, Mahmut Kandemir, Ozcan Ozturk Embedded Mobile Computing."

Similar presentations


Ads by Google