Smart Hill Climbing for Agile Dynamic Mapping in Many- Core Systems Design Automation Conference(DAC), pp.1-6, May 29-June 7 2013, Austin, TX, USA M. Fattah,

Slides:



Advertisements
Similar presentations
Ch. 12 Routing in Switched Networks
Advertisements

Multicast Tree Reconfiguration in Distributed Interactive Applications Pål Halvorsen 1,2, Knut-Helge Vik 1 and Carsten Griwodz 1,2 1 Department of Informatics,
Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
TDMA Scheduling in Wireless Sensor Networks
LOAD BALANCING IN A CENTRALIZED DISTRIBUTED SYSTEM BY ANILA JAGANNATHAM ELENA HARRIS.
DYNAMIC POWER ALLOCATION AND ROUTING FOR TIME-VARYING WIRELESS NETWORKS Michael J. Neely, Eytan Modiano and Charles E.Rohrs Presented by Ruogu Li Department.
“Location-Aided Routing (LAR) in Mobile Ad Hoc Network” by Young-bae ko Nitin H. Validya presented by Mark Miyashita.
Gossip Scheduling for Periodic Streams in Ad-hoc WSNs Ercan Ucan, Nathanael Thompson, Indranil Gupta Department of Computer Science University of Illinois.
CS Dept, City Univ.1 Low Latency Broadcast in Multi-Rate Wireless Mesh Networks LUO Hongbo.
1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,
Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.
Scheduling Algorithms for Wireless Ad-Hoc Sensor Networks Department of Electrical Engineering California Institute of Technology. [Cedric Florens, Robert.
On the Construction of Energy- Efficient Broadcast Tree with Hitch-hiking in Wireless Networks Source: 2004 International Performance Computing and Communications.
Introduction to Evolutionary Computation  Genetic algorithms are inspired by the biological processes of reproduction and natural selection. Natural selection.
Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions.
Randomized Planning for Short Inspection Paths Tim Danner Lydia E. Kavraki Department of Computer Science Rice University.
Online Data Gathering for Maximizing Network Lifetime in Sensor Networks IEEE transactions on Mobile Computing Weifa Liang, YuZhen Liu.
Distributed Combinatorial Optimization
1 Algorithms for Bandwidth Efficient Multicast Routing in Multi-channel Multi-radio Wireless Mesh Networks Hoang Lan Nguyen and Uyen Trang Nguyen Presenter:
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
CoNA : Dynamic Application Mapping for Congestion Reduction in Many-Core Systems 2012 IEEE 30th International Conference on Computer Design (ICCD) M. Fattah,
COVERTNESS CENTRALITY IN NETWORKS Michael Ovelgönne UMIACS University of Maryland 1 Chanhyun Kang, Anshul Sawant Computer Science Dept.
Improved results for a memory allocation problem Rob van Stee University of Karlsruhe Germany Leah Epstein University of Haifa Israel WADS 2007 WAOA 2007.
Clustering Unsupervised learning Generating “classes”
A User Experience-based Cloud Service Redeployment Mechanism KANG Yu.
Distributed Quality-of-Service Routing of Best Constrained Shortest Paths. Abdelhamid MELLOUK, Said HOCEINI, Farid BAGUENINE, Mustapha CHEURFA Computers.
Self-Organizing Agents for Grid Load Balancing Junwei Cao Fifth IEEE/ACM International Workshop on Grid Computing (GRID'04)
COGNITIVE RADIO FOR NEXT-GENERATION WIRELESS NETWORKS: AN APPROACH TO OPPORTUNISTIC CHANNEL SELECTION IN IEEE BASED WIRELESS MESH Dusit Niyato,
Efficient Protocols for Massive Data Transport Sailesh Kumar.
Distributed Asynchronous Bellman-Ford Algorithm
Internet Traffic Engineering by Optimizing OSPF Weights Bernard Fortz (Universit é Libre de Bruxelles) Mikkel Thorup (AT&T Labs-Research) Presented by.
Low Contention Mapping of RT Tasks onto a TilePro 64 Core Processor 1 Background Introduction = why 2 Goal 3 What 4 How 5 Experimental Result 6 Advantage.
Network Aware Resource Allocation in Distributed Clouds.
ROBUST RESOURCE ALLOCATION OF DAGS IN A HETEROGENEOUS MULTI-CORE SYSTEM Luis Diego Briceño, Jay Smith, H. J. Siegel, Anthony A. Maciejewski, Paul Maxwell,
June 21, 2007 Minimum Interference Channel Assignment in Multi-Radio Wireless Mesh Networks Anand Prabhu Subramanian, Himanshu Gupta.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 March 01, 2005 Session 14.
1 Multicast Algorithms for Multi- Channel Wireless Mesh Networks Guokai Zeng, Bo Wang, Yong Ding, Li Xiao, Matt Mutka Michigan State University ICNP 2007.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.
REECH ME: Regional Energy Efficient Cluster Heads based on Maximum Energy Routing Protocol Prepared by: Arslan Haider. 1.
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
1 11 Channel Assignment for Maximum Throughput in Multi-Channel Access Point Networks Xiang Luo, Raj Iyengar and Koushik Kar Rensselaer Polytechnic Institute.
Real-Time Support for Mobile Robotics K. Ramamritham (+ Li Huan, Prashant Shenoy, Rod Grupen)
6 December On Selfish Routing in Internet-like Environments paper by Lili Qiu, Yang Richard Yang, Yin Zhang, Scott Shenker presentation by Ed Spitznagel.
University “Ss. Cyril and Methodus” SKOPJE Cluster-based MDS Algorithm for Nodes Localization in Wireless Sensor Networks Ass. Biljana Stojkoska.
Efficient Computing k-Coverage Paths in Multihop Wireless Sensor Networks XuFei Mao, ShaoJie Tang, and Xiang-Yang Li Dept. of Computer Science, Illinois.
Run-time Adaptive on-chip Communication Scheme 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C.
Networks-on-Chip (NoC) Suleyman TOSUN Computer Engineering Deptartment Hacettepe University, Turkey.
1 ICC 2013, 9-13 June, Budapest, Hungary Localization packet scheduling for an underwater acoustic sensor network By Hamid Ramezani & Geert Leus.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
1 30 November 2006 An Efficient Nearest Neighbor (NN) Algorithm for Peer-to-Peer (P2P) Settings Ahmed Sabbir Arif Graduate Student, York University.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
Static Process Scheduling
Scalable and Topology-Aware Load Balancers in Charm++ Amit Sharma Parallel Programming Lab, UIUC.
Efficient Resource Allocation for Wireless Multicast De-Nian Yang, Member, IEEE Ming-Syan Chen, Fellow, IEEE IEEE Transactions on Mobile Computing, April.
Urban Traffic Simulated From A Dual Perspective Hu Mao-Bin University of Science and Technology of China Hefei, P.R. China
Incremental Run-time Application Mapping for Heterogeneous Network on Chip 2012 IEEE 14th International Conference on High Performance Computing and Communications.
ICPADS '12 Proceedings of the 2012 IEEE 18th International Conference on Parallel and Distributed Systems, Pages Tianyi Wang, Gang Quan, Shangping.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
SERENA: SchEduling RoutEr Nodes Activity in wireless ad hoc and sensor networks Pascale Minet and Saoucene Mahfoudh INRIA, Rocquencourt Le Chesnay.
1 Comparative Study of two Genetic Algorithms Based Task Allocation Models in Distributed Computing System Oğuzhan TAŞ 2005.
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Communication Costs in Parallel Machines Dr. Xiao Qin Auburn University
Corelite Architecture: Achieving Rated Weight Fairness
Nithin Michael, Yao Wang, G. Edward Suh and Ao Tang Cornell University
Privacy and Fault-Tolerance in Distributed Optimization Nitin Vaidya University of Illinois at Urbana-Champaign.
B. Jayalakshmi and Alok Singh 2015
Communication Driven Remapping of Processing Element (PE) in Fault-tolerant NoC-based MPSoCs Chia-Ling Chen, Yen-Hao Chen and TingTing Hwang Department.
Presentation transcript:

Smart Hill Climbing for Agile Dynamic Mapping in Many- Core Systems Design Automation Conference(DAC), pp.1-6, May 29-June , Austin, TX, USA M. Fattah, M. Daneshtalab, P. Liljeberg, J. Plosila Reporter Hsuan-Ru Li

Outline  Introduction  Related Work  Definitions  Smart First Node Selection  Results And Analysis  Conclusions 2

Introduction  Many-core systems will feature an extremely dynamic workload.  An unpredictable sequence of different applications enter and leave the system at run-time.  A run-time system manager is required to efficiently map an incoming application onto the system resources. 3

Introduction(cont.)  Central manager (CM) of the system decides on the appropriate node for each task.  The system performance is significantly influenced by the utilized mapping approach.  Consider a contiguous application mapping:  Relatively close nodes.  No fragmentation. 4

Introduction(cont.)  Finding a convex region of nodes is a polynomial, O(n 3 ), problem.  CoNA method significantly decreased complexity. 5

Introduction(cont.)  CoNA starts from a first node and attempts to map the application tasks onto a set of contiguous nodes around it.  Select a first node leads to least fragmentation of remaining nodes.  Hill climbing search heuristic is adapted in order to find the optimum first node rapidly among all the available nodes. 6

Introduction(cont.)  Smart Hill Climbing: HiC with intelligence.  n is the given network size.  Best case: O(√n)  Worst case: O(n 2 ) 7

Outline  Introduction  Related Work  Definitions  Smart First Node Selection  Results And Analysis  Conclusions 8

Related Work  Different first node or generally the mapping area selection methods.  First node selection:  Nearest Neighbor (NN)  Best Neighbor (BN)  Region selection :  Incremental (INC) approach  CoNA  VIP-supported approach 9

Outline  Introduction  Related Work  Definitions  Smart First Node Selection  Results And Analysis  Conclusions 10

Definitions  Consider a homogenous mesh-based NoC in definitions and experiments.  Define several evaluation metrics as assessment tools to compare different algorithms.  Mapping algorithms try to allocate system resources in an optimal way. 11

Definitions(cont.)  Each application in the system is represented by a directed graph denoted as a task graph A p =TG(T, E), t i ∈ T, edge e i,j ∈ E, w i,j = amount of data transferred. 12

Definitions(cont.)  Architecture graph AG(N, L) is a simple M×M 2D-mesh NoC with the XY routing.  A set of nodes n x,y ∈ N, connected together through communication links l k ∈ L.  Each node n x,y contains a 5-port router r x,y connected to the local processing element pe x,y by its local port. 13

Definitions(cont.) 14

Definitions(cont.) 15

Definitions(cont.)  One-to-one mapping function from the set of application tasks T, to the set of NoC nodes N: 16

Definitions(cont.)  Mapping function is started if and only if there are enough available nodes to map onto them. 17

Definitions(cont.)  A task t i is mapped onto as nt i and the packet corresponding to the edge e i,j as pck i,j  The packet sent from nt i to nt j. The set of running applications on the system is also denoted by APPS.  |APPS| means number of running applications. 18

Definitions(cont.) 19 Average Weighted Manhattan Distance (AWMD)

Definitions(cont.)  Congestion increases the network latency dramatically and also increases the network dynamic power consumption.  External congestion occurs when a network channel is contented by the packets of different applications.  Internal congestion is related to the packets of the same application. 20

Definitions(cont.)  To decrease congestion  The mapped area of an application should be as convex as possible and minimally fragmented.  The allocated nodes as a metric to assess the Mapped Region Dispersion. 21

Definitions(cont.)  The area with the smallest MRD is almost circular.  However, a circular region will generate more area fragmentation in long term.  The best mapped area would be square. 22

Definitions(cont.)  MRD of a square with |T| nodes  Normalized MRD metric 23

Definitions(cont.)  The NMRD value of 1 means a squared area.  NMRD increases  Mapped area is getting more fragmented.  Less similar to a square shape. 24

Outline  Introduction  Related Work  Definitions  Smart First Node Selection  Results And Analysis  Conclusions 25

Smart First Node Selection 26  Approximate model estimates the number of available nodes in a square shape around a given node(First node).  Proposed model is utilized in the adapted hill climbing search hubristic  Find the appropriate first node in an agile and smart manner.

Smart First Node Selection(cont.) 27  Square factor of a given node, SF(n i,j ), is the estimated number of contiguous, almost square-shaped, available nodes around that node.

Smart First Node Selection(cont.) 28  First find the largest square centered on n i,j, SQ max = (n i,j, r max )  Some more nodes beyond the square borders not belonging to system rectangles, as marked with asterisk.

Smart First Node Selection(cont.) 29  The algorithm also calculates a direction, called open direction (openDir).  Indicates one of the neighbors of the node estimated to have a larger SF.  An openDir of value zero  No specific direction is predicted to result in a larger square factor.

Smart First Node Selection(cont.) 30  SF calculation has a linear time complexity of O (|APPS|).  However, this will take O(M 2 |APPS|) time. (exhaustive search)

Smart First Node Selection(cont.) 31  Smart Hill Climbing(SHiC) starts from a randomly selected node and walks smartly through the network nodes.  This significantly reduces the amount of traversed nodes, resulting in an agile mapping algorithm.

Smart First Node Selection(cont.) 32  SHiC looks for the node with the optimum SF according to a preference function. (SF(n cur ) SF(n cur )) OR (SF(n next ) ≥ |T| AND SF(n next ) < SF(n cur ))

Smart First Node Selection(cont.) 33

Smart First Node Selection(cont.) 34 Find the Largest SF

Smart First Node Selection(cont.) 35

Smart First Node Selection(cont.) 36

Smart First Node Selection(cont.) 37  Stochastic hill climbing approach is executed several times starting from different randomly chosen nodes.  The algorithm is repeated (2+√|APPS|) times (line 1).  The SHiC outer loop is executed O(√|APPS|) times, the while loop is executed O(M) times.

Smart First Node Selection(cont.) 38  SF(n next ) is calculated in each iteration in O(|APPS|).  The time complexity of the SHiC will be O(M×|APPS| 3/2 ).  This is significantly faster than the exhaustive search.

Outline  Introduction  Related Work  Definitions  Smart First Node Selection  Results And Analysis  Conclusions 39

Results And Analysis  Applications with 4 to 35 tasks are generated using TGG.  The communication volumes (w i,j ) are randomly distributed between 2 to 16 flits of data.  SystemC many-core platform which utilizes a pruned version of Noxim, as its communication architecture. 40

Results And Analysis(cont.)  Different mapping and first node selection are evaluated.  The network size varying from 8×8 to 20×20 nodes.  A random sequence of applications is entered into the scheduler FIFO according to the desired rate, λ.  The maximum possible scheduling rate is called λ full. 41

Results And Analysis(cont.)  An allocation request for the scheduled application is sent to the CM of the platform residing in the node n 0,0. 42

Results And Analysis(cont.) 43

Results And Analysis(cont.)  In order to assess the square factor accuracy.  Perform exhaustive search to select the node.  Network size: 16×16.  Applications enter the system with 0.8λ full rate. 44

Results And Analysis(cont.) 45

Results And Analysis(cont.)  Search on the SF, exhaustive search is the best result.  SHiC on dispersion (NMRD) and power (AWMD) metrics show only 4% of increase.  SHiC approach significantly enhances the performance of the system under the same utilized mapping algorithm. 46

Results And Analysis(cont.)  CoNA/SHiC verse CoNA/NN case.  Both the dispersion and power dissipation of applications increase versus increase of λ.  SHiC keeps its preeminence over other approaches and outperforms by 10 to 30 percent. 47

Results And Analysis(cont.)  Study the λ effect on the system performance. 48

Results And Analysis(cont.)  CoNA/SHiC verse NN/NN case.  λ is kept 0.8λ full.  SHiC scales well as the network size increases and keeps the system performance at the same level. 49

Results And Analysis(cont.) 50

Outline  Introduction  Related Work  Definitions  Smart First Node Selection  Results And Analysis  Conclusions 51

Conclusions  This algorithm utilized an approximate model which quickly estimates the available area around a given node.  The provided open direction aided the climbing algorithm to reach the optimum node faster by taking smart steps. 52

Conclusions(cont.)  Results emphasized the significant impact of convex mapping on congestion reduction of the network. 53

Thank You 54