Distributed Sequencing for Resource Sharing in Multi-Applicative Heterogeneous NoC Platforms 林鼎原 Department of Electrical Engineering National Cheng Kung.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

VSMC MIMO: A Spectral Efficient Scheme for Cooperative Relay in Cognitive Radio Networks 1.
D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.
SOFTWARE TESTING. INTRODUCTION  Software Testing is the process of executing a program or system with the intent of finding errors.  It involves any.
Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra,
1/1/ /e/e eindhoven university of technology Microprocessor Design Course 5Z008 Dr.ir. A.C. (Ad) Verschueren Eindhoven University of Technology Section.
Software Design Process A Process is a set of related and (sequenced) tasks that transforms a set of input to a set of output. Inputs Outputs Design Process.
Requirements on the Execution of Kahn Process Networks Marc Geilen and Twan Basten 11 April 2003 /e.
Synthesis of Embedded Software Using Free-Choice Petri Nets.
What's inside a router? We have yet to consider the switching function of a router - the actual transfer of datagrams from a router's incoming links to.
High Performance Router Architectures for Network- based Computing By Dr. Timothy Mark Pinkston University of South California Computer Engineering Division.
11/14/05ELEC Fall Multi-processor SoCs Yijing Chen.
1 Concurrent and Distributed Systems Introduction 8 lectures on concurrency control in centralised systems - interaction of components in main memory -
CS533 - Concepts of Operating Systems
Dynamic NoC. 2 Limitations of Fixed NoC Communication NoC for reconfigurable devices:  NOC: a viable infrastructure for communication among task dynamically.
Internetworking Fundamentals (Lecture #2) Andres Rengifo Copyright 2008.
Router modeling using Ptolemy Xuanming Dong and Amit Mahajan May 15, 2002 EE290N.
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
Software Design Processes and Management
Switching Techniques Student: Blidaru Catalina Elena.
Computer System Architectures Computer System Software
Object-Oriented Software Engineering Practical Software Development using UML and Java Chapter 8: Modelling Interactions and Behaviour.
Mobility Limited Flip-Based Sensor Networks Deployment Reporter: Po-Chung Shih Computer Science and Information Engineering Department Fu-Jen Catholic.
High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.
1.eCognition Overview. 1 eCognition eCognition is a knowledge utilisation platform based on Active Knowledge Network technology eCognition covers the.
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
An Introduction to Software Architecture
Networks-on-Chips (NoCs) Basics
REXAPP Bilal Saqib. REXAPP  Radio EXperimentation And Prototyping Platform Based on NOC  REXAPP Compiler.
Ob-Chip Networks and Testing1 On-Chip Networks and Testing-II.
1 © 2015 B. Wilkinson Modification date: January 1, 2015 Designing combinational circuits Logic circuits whose outputs are dependent upon the values placed.
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
LATA: A Latency and Throughput- Aware Packet Processing System Author: Jilong Kuang and Laxmi Bhuyan Publisher: DAC 2010 Presenter: Chun-Sheng Hsueh Date:
Addressing Queuing Bottlenecks at High Speeds Sailesh Kumar Patrick Crowley Jonathan Turner.
Embedding Constraint Satisfaction using Parallel Soft-Core Processors on FPGAs Prasad Subramanian, Brandon Eames, Department of Electrical Engineering,
Ihr Logo Operating Systems Internals & Design Principles Fifth Edition William Stallings Chapter 2 (Part II) Operating System Overview.
Distributed Credit-based Non-Preemptive Resource Management Scheme for Hard Real-time Systems 林鼎原 Department of Electrical Engineering National Cheng Kung.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
Performance Analysis of a JPEG Encoder Mapped To a Virtual MPSoC-NoC Architecture Using TLM 林孟諭 Dept. of Electrical Engineering National Cheng Kung.
Axel Jantsch 1 Networks on Chip Axel Jantsch 1 Shashi Kumar 1, Juha-Pekka Soininen 2, Martti Forsell 2, Mikael Millberg 1, Johnny Öberg 1, Kari Tiensurjä.
Processor Architecture
Run-time Adaptive on-chip Communication Scheme 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
1 Presenter: Min Yu,Lo 2015/12/21 Kumar, S.; Jantsch, A.; Soininen, J.-P.; Forsell, M.; Millberg, M.; Oberg, J.; Tiensyrja, K.; Hemani, A. VLSI, 2002.
Unit III Bandwidth Utilization: Multiplexing and Spectrum Spreading In practical life the bandwidth available of links is limited. The proper utilization.
ADAM: Run-time Agent-based Distributed Application Mapping for on-chip Communication 林鼎原 Department of Electrical Engineering National Cheng Kung University.
Virtual-Channel Flow Control William J. Dally
1 Architecture and Behavioral Model for Future Cognitive Heterogeneous Networks Advisor: Wei-Yeh Chen Student: Long-Chong Hung G. Chen, Y. Zhang, M. Song,
Concepts and Structures. Main difficulties with OS design synchronization ensure a program waiting for an I/O device receives the signal mutual exclusion.
Optimizing Packet Lookup in Time and Space on FPGA Author: Thilan Ganegedara, Viktor Prasanna Publisher: FPL 2012 Presenter: Chun-Sheng Hsueh Date: 2012/11/28.
Background Computer System Architectures Computer System Software.
TCP/IP Protocol Suite Suresh Kr Sharma 1 The OSI Model and the TCP/IP Protocol Suite Established in 1947, the International Standards Organization (ISO)
Data and Computer Communications 8 th and 9 th Edition by William Stallings Chapter 10 – Circuit Switching and Packet Switching.
Power-aware NOC Reuse on the Testing of Core-based Systems* CSCE 932 Class Presentation by Xinwang Zhang April 26, 2007 * Erika Cota, et al., International.
SOFTWARE TESTING LECTURE 9. OBSERVATIONS ABOUT TESTING “ Testing is the process of executing a program with the intention of finding errors. ” – Myers.
Information Delivery Manuals: Process Mapping
Random access memory Sequential circuits all depend upon the presence of memory. A flip-flop can store one bit of information. A register can store a single.
Parallel Programming By J. H. Wang May 2, 2017.
Random access memory Sequential circuits all depend upon the presence of memory. A flip-flop can store one bit of information. A register can store a single.
Switching Techniques In large networks there might be multiple paths linking sender and receiver. Information may be switched as it travels through various.
Parallel Programming in C with MPI and OpenMP
Deadlock Free Hardware Router with Dynamic Arbiter
Introduction to cosynthesis Rabi Mahapatra CSCE617
PRESENTATION COMPUTER NETWORKS
Virtual TCAM for Data Center Switches
The use of Neural Networks to schedule flow-shop with dynamic job arrival ‘A Multi-Neural Network Learning for lot Sizing and Sequencing on a Flow-Shop’
Paper by D.L Parnas And D.P.Siewiorek Prepared by Xi Chen May 16,2003
Design principles for packet parsers
Communication Driven Remapping of Processing Element (PE) in Fault-tolerant NoC-based MPSoCs Chia-Ling Chen, Yen-Hao Chen and TingTing Hwang Department.
Multiprocessor System Interconnects
Presentation transcript:

Distributed Sequencing for Resource Sharing in Multi-Applicative Heterogeneous NoC Platforms 林鼎原 Department of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C 2016/7/1 1 隱藏之投影片 ★★★ Xen ??? To be studied :

2 Abstract(1/1)  While static reconfiguration between applications has already been extensively studied, we propose a potential increase in hardware resource usage by enabling concurrent or overlapping applications on the top of a heterogeneous NoC platform.  In this paper, we describe a distributed sequencing protocol allowing hardware resource sharing between several applications.  This protocol ensures correct synchronization of the processing between hardware resources without the need of a global fine-grain scheduler on the system, thus alleviating the pressure on the run-time system.

3 Introduction  While last decade’s Application Specific Integrated Circuits (ASICs) had been designed to fulfill a single well defined application, the present decade has been, for consumer electronics, the one of Systems-on-Chip (SoC).  There has been a race for the integration of more and more functionality, and the limiting factor has soon become the interconnect architecture.  Nevertheless, the development of Networks-on-Chip (NoC) has broken this wall by allowing for more complex interconnect topologies on which information is carried within data packet.

4 Introduction  Using the NoC technology, it is now not so rare to find more than a hundred cores in the bigger SoCs,  massively parallel homogeneous single-chip multiprocessors  heterogeneous manycores SoCs.  In order to maximize utilization of these flexible resources, it is possible to map several tasks on a same reconfigurable hardware.

5 Application segmenting & composition(1/4) Spatial segmenting of an application:  In various cases, there may be a need for a larger buffering capacity, in the case of data-dependent processing, where the previous data decoded determines a parameterization of the current processing, or large data reordering, e.g. transposition of large matrices.  For that reason, the targeted data-flow applications present specific PEs, which essentially have a large buffering function, that we will call memory buffers (MBs).  The coexistence of multiple applications on the network on chip is possible only if one can stall the other applications in MBs( memory buffers ), while access to the shared resources is granted only to the elected application.  Hence, applications should be segmented using MBs, so that portions of applications that are exclusive can be limited in space.  MBs are dedicated to store data of a single task  Each disconnected component of the task graph, including the MBs at the boundary will be called a “ segment ” in the following.

6 Application segmenting & composition(2/4) Compound task graph :  Given the mapping of the tasks of each application on the hardware resources of the NoC platform, it is possible to derive a composition of several task graphs.  The result is a directed graph,  nodes are labeled with the set of applications using the corresponding hardware resource  Square nodes will be MBs, round nodes will be PEs.  As the segments defined on each application are considered together, they may intersect and form “ multi-segments ”, defined as the union of the application segments sharing at least a common node,  as shown in Figure 1.

7 Application segmenting & composition(3/4)  Square nodes will be MBs (memory buffers)  Round nodes will be PEs (processing elements) Figure 1. Labeled compound task graph for composition of two (similar) applications a1 and a2, with associated segmenting

8 Application segmenting & composition(4/4) Inter-segment spatial task parallelization:  For a well segmented system containing several concurrent applications, considering that data of each application may be stalled in the MBs at the boundaries of each segment or multi-segment,  the scheduling of the application on each multi-segment is independent from the others.  Indeed, provided that sufficient data is present in the input MBs of a multi- segment, and that sufficient free space is available in the output MBs of the multi-segment, there is no dependency outside of the multi-segment.  As an example multi-segments 1 and 3 could be running application a1 while multi-segment 2 would run application a2 (Figure 1).

9 Intra-segment task synchronization protocol(1/8) Scheduling issue for resource sharing:  When multiple applications are competing for several shared resources that may be acquired in a different order by the applications, several issues may induce deadlocks.  We must guarantee that all shared resources switch to the same application together  Risk of deadlock for multiple shared resources when accessed in different orders.  (1)Cyclic dependencies  (2)Flow divergence

10 Intra-segment task synchronization protocol(2/8) 1)Cyclic dependencies :  In Figure 2. On this example, there may be a scheduling where the above shared resource is switched to application a1, while the below shared resource is switched to application a2,  forming a deadly embrace that prevents both applications to acquire the second needed shared resource. Figure 2. Example combination of task graphs leading to a cyclic dependency

11 Intra-segment task synchronization protocol(3/8) 2)Flow divergence:  In this example, there is a potential race condition in the propagation of the applications a1 and a2,  one can find a particular scheduling where the fork in a1 would be propagated first to the above shared node, and try to access the below shared node after that, while this node would already have switched to application a2.  Each application having acquired only one of the shared resources, neither one would be able to propagate on the whole segment Figure 3. Example nonlinear task graphs leading to a race condition

12 Intra-segment task synchronization protocol(4/8)  Our proposal is to enforce the naive concurrent scheduling of applications along their respective data paths by a few added virtual dependencies that would define a consistent switching of the PEs to the same application.  The key element is that the decision to switch to an application should be taken in a single point, and then propagated downstream to all shared nodes in the segment.  Identify the border of the shared subset  Let a single entry point in the border  Create virtual dependencies to and from this entry point:

13 Intra-segment task synchronization protocol(5/8) Shared subset within a multi-segment:  Identify the border of the shared subset  The shared subset of the graph on the multi-segment is defined as the complementary component  It may also be defined as the union of the fanout cones of all the shared nodes of the multi-segment. Figure 4. Shared subset and border identification

14 Intra-segment task synchronization protocol(6/8) Sequencing node selection  Once the shared subset and the multi-applicative border have been identified, what is left to do for single point synchronization is to modify the graph so that the new multi-applicative border is limited to a single node.  Then, this “ sequencing node ” will present a fanout cone that covers all the shared subset:  the decision to switch to either application will propagate to all the shared subset.

15 Intra-segment task synchronization protocol(7/8)  it is possible to derive a final compound task graph for the multi-segment, where a sequencing node has been elected among the shared nodes, and which includes additional arcs ( 箭頭 )to enforce the desired scheduling on the multi-segment, as shown in Figure 5. Figure 5. Sequencing node and added dependencies

16 Intra-segment task synchronization protocol(8/8)  In this example, the dashed arrows ( ) are new dependencies that were added to create a sequencing node.  arcs should be added from the output MBs to the first nodes in the segment, on which signaling messages should be sent when sufficient storage space is available in the MB to absorb all data produced during the session.  The dotted arrows( ) are the (optional) arcs that ensure the end of the propagation of the sequencing node’s decision on the entire shared sub- graph.  additional arcs should be added to notify the sequencing node at the end of the propagation.

17 Conclusion  In order to overcome potential deadlocks in the concurrent scheduling, an algorithm to introduce virtual dependencies between PEs was proposed.  This algorithm prevents the system to perform conflicting scheduling choices in different places by a minimal increase in synchronization between PEs.