CoNA : Dynamic Application Mapping for Congestion Reduction in Many-Core Systems 2012 IEEE 30th International Conference on Computer Design (ICCD) M. Fattah, M. Ramirez, M. Daneshtalab, P. Liljeberg, J. Plosila 1
Outline Introduction Mapping Problem and Evaluation Metrics Contiguous Neighborhood Allocation Mapping Experimental Setup Results and Analysis Conclusion 2
Outline Introduction Mapping Problem and Evaluation Metrics Contiguous Neighborhood Allocation Mapping Experimental Setup Results and Analysis Conclusion 3
Introduction An efficient algorithm for run-time application mapping problem Three novel contributions First node selection First task selection Map the rest of tasks onto nearest neighborhood 4
Outline Introduction Mapping Problem and Evaluation Metrics Contiguous Neighborhood Allocation Mapping Experimental Setup Results and Analysis Conclusion 5
Mapping Problem and Evaluation Metrics Applications A p =TG(T, E) t i T e i,j E Communication platform AG(Ñ, L) ñ i,j ={(r i,j, pe i,j )| ñ i,j Ñ, 0≤ i<M, 0≤ j<N} Manhattan Distance : MD(ñ i,j, ñ m,n ) = (|i - m| + |j - n|) Mapping function map: T → Ñ, s.t. map(t i ) = ñ m,n ; ∀ t i ∈ T, ∃ n m,n ∈ Ñ 6
Evaluation Metrics Packet latency Average Manhattan Distance Average Weighted Manhattan Distance 7
Evaluation Metrics (cont.) Mapped Region Dispersion Internal Congestion Ratio (ICR) The number of edges using the same channel with respect to its total number of edges 8
Outline Introduction Mapping Problem and Evaluation Metrics Contiguous Neighborhood Allocation Mapping Experimental Setup Results and Analysis Conclusion 9
Contiguous Neighborhood Allocation Mapping (CoNA) Three steps First node selection Choosing the first task of the application Contiguous neighborhood allocation 10
CoNA (cont.) 11
CoNA (cont.) First node selection The nearest node to the central manager among the nodes with the largest number of available neighbors 12
CoNA (cont.) Choosing the first task of the application Selects the task with the largest number of edges The most intensive communication 13
CoNA (cont.) Contiguous neighborhood allocation Task graph is traversed in the breadth-first order, paired with their predecessors is: {(t 1, t 4 ), (t 2, t 4 ), (t 5, t 4 ), (t 0, t 1 ), (t 3, t 2 )} Select the one which fits in the smallest square with the first node 14
CoNA (cont.) Contiguous neighborhood allocation Task graph is traversed in the breadth-first order, paired with their predecessors is: {(t 1, t 4 ), (t 2, t 4 ), (t 5, t 4 ), (t 0, t 1 ), (t 3, t 2 )} Select the one which fits in the smallest square with the first node 15
CoNA (cont.) Contiguous neighborhood allocation Task graph is traversed in the breadth-first order, paired with their predecessors is: {(t 1, t 4 ), (t 2, t 4 ), (t 5, t 4 ), (t 0, t 1 ), (t 3, t 2 )} Select the one which fits in the smallest square with the first node 16
CoNA (cont.) Contiguous neighborhood allocation Task graph is traversed in the breadth-first order, paired with their predecessors is: {(t 1, t 4 ), (t 2, t 4 ), (t 5, t 4 ), (t 0, t 1 ), (t 3, t 2 )} Select the one which fits in the smallest square with the first node 17
CoNA (cont.) Contiguous neighborhood allocation Task graph is traversed in the breadth-first order, paired with their predecessors is: {(t 1, t 4 ), (t 2, t 4 ), (t 5, t 4 ), (t 0, t 1 ), (t 3, t 2 )} Select the one which fits in the smallest square with the first node 18
CoNA (cont.) 19
Outline Introduction Mapping Problem and Evaluation Metrics Contiguous Neighborhood Allocation Mapping Experimental Setup Results and Analysis Conclusion 20
Experimental Setup NoC platform Plasma processor Local memory DMA controller Tra-NI interface Central manager (CM) The maximum number of applications that could be injected per second into the system is denoted as λ full 21
Experimental Setup (cont.) Simulation To extract packet latency FPGA To investigate CoNA time complexity Xilinx ML605 22
Experimental Setup (cont.) Application set Task graphs are randomly generated (set1) using the Task graph generator Number of nodes : 4 – 11 Weight of edges : 4 – 16 flits The weights of applications edges are equally multiplied by 16 (set16) 23
Outline Introduction Mapping Problem and Evaluation Metrics Contiguous Neighborhood Allocation Mapping Experimental Setup Results and Analysis Conclusion 24
Results and Analysis Packet latency evaluation Time complexity evaluation 25
Packet latency evaluation 26
Packet latency evaluation (cont.) 27
Packet latency evaluation (cont.) 28
Packet latency evaluation (cont.) 29
Time complexity evaluation 30
Time complexity evaluation (cont.) 31
Outline Introduction Mapping Problem and Evaluation Metrics Contiguous Neighborhood Allocation Mapping Experimental Setup Results and Analysis Conclusion 32
Conclusion An efficient run-time task allocation is proposed Reduce internal and external congestions Three novel contributions 33
Thank you ! 34