Review of OS Controlled NoC from IMEC Jim Stevens RC Reading Group 01/30/2008.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

System Integration and Performance
Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
Jaringan Komputer Lanjut Packet Switching Network.
11 1 Hierarchical Coarse-grained Stream Compilation for Software Defined Radio Yuan Lin, Manjunath Kudlur, Scott Mahlke, Trevor Mudge Advanced Computer.
Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al RC Reading Group – 3/29/2006 Presenter: Ilya Tabakh.
Network based System on Chip Final Presentation Part B Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.
A General approach to MPLS Path Protection using Segments Ashish Gupta Ashish Gupta.
Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.
Orion: A Power-Performance Simulator for Interconnection Networks Presented by: Ilya Tabakh RC Reading Group4/19/2006.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
A General approach to MPLS Path Protection using Segments Ashish Gupta Ashish Gupta.
 The Open Systems Interconnection model (OSI model) is a product of the Open Systems Interconnection effort at the International Organization for Standardization.
MULTICOMPUTER 1. MULTICOMPUTER, YANG DIPELAJARI Multiprocessors vs multicomputers Interconnection topologies Switching schemes Communication with messages.
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
Distributed Quality-of-Service Routing of Best Constrained Shortest Paths. Abdelhamid MELLOUK, Said HOCEINI, Farid BAGUENINE, Mustapha CHEURFA Computers.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
On-Chip Networks and Testing
1 Distributed Operating Systems and Process Scheduling Brett O’Neill CSE 8343 – Group A6.
A Distributed Scheduling Algorithm for Real-time (D-SAR) Industrial Wireless Sensor and Actuator Networks By Kiana Karimpour.
Parallelism and Robotics: The Perfect Marriage By R.Theron,F.J.Blanco,B.Curto,V.Moreno and F.J.Garcia University of Salamanca,Spain Rejitha Anand CMPS.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
CH2 System models.
Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.
Operating Systems for Reconfigurable Systems John Huisman ID:
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
1 © 2003, Cisco Systems, Inc. All rights reserved. CCNA 2 Module 9 Basic Router Troubleshooting.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
Transparent Process Migration: Design Alternatives and the Sprite Implementation Fred Douglis and John Ousterhout.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 3: Operating-System Structures System Components Operating System Services.
Embedded Runtime Reconfigurable Nodes for wireless sensor networks applications Chris Morales Kaz Onishi 1.
© 2004 Mercury Computer Systems, Inc. FPGAs & Software Components Graham Bardouleau & Jim Kulp Mercury Computer Systems, Inc. High Performance Embedded.
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
TECHNION – Israel Institute of Technology Department of Electrical Engineering The Computer Network Laboratory Crankback Prediction in ATM According to.
Chapter 8-2 : Multicomputers Multiprocessors vs multicomputers Multiprocessors vs multicomputers Interconnection topologies Interconnection topologies.
1 Process migration n why migrate processes n main concepts n PM design objectives n design issues n freezing and restarting a process n address space.
Page 1 Process Migration & Allocation Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this.
CS533 - Concepts of Operating Systems 1 The Mach System Presented by Catherine Vilhauer.
A Utility-based Approach to Scheduling Multimedia Streams in P2P Systems Fang Chen Computer Science Dept. University of California, Riverside
Chapter 24 Transport Control Protocol (TCP) Layer 4 protocol Responsible for reliable end-to-end transmission Provides illusion of reliable network to.
Visualizing QoS. Background(1/2) A tremendous growth in the development and deployment of networked applications such as video streaming, IP telephony,
Run-time Adaptive on-chip Communication Scheme 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C.
Networks-on-Chip (NoC) Suleyman TOSUN Computer Engineering Deptartment Hacettepe University, Turkey.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Unit III Bandwidth Utilization: Multiplexing and Spectrum Spreading In practical life the bandwidth available of links is limited. The proper utilization.
IT3002 Computer Architecture
Multi-objective Topology Synthesis and FPGA Prototyping Framework of Application Specific Network-on-Chip m Akram Ben Ahmed Xinyu LI, Omar Hammami.
Chapter 2 Process Management. 2 Objectives After finish this chapter, you will understand: the concept of a process. the process life cycle. process states.
© imec 2003 Designing an Operating System for a Heterogeneous Reconfigurable SoC Vincent Nollet, P. Coene, D. Verkest, S. Vernalde, R. Lauwereins IMEC,
A Bandwidth Scheduling Algorithm Based on Minimum Interference Traffic in Mesh Mode Xu-Yajing, Li-ZhiTao, Zhong-XiuFang and Xu-HuiMin International Conference.
18-WAN Technologies and Dynamic routing Dr. John P. Abraham Professor UTPA.
SERENA: SchEduling RoutEr Nodes Activity in wireless ad hoc and sensor networks Pascale Minet and Saoucene Mahfoudh INRIA, Rocquencourt Le Chesnay.
Embedded Real-Time Systems Processing interrupts Lecturer Department University.
1.3 Operating system services An operating system provide services to programs and to the users of the program. It provides an environment for the execution.
Runtime Reconfigurable Network-on- chips for FPGA-based systems Mugdha Puranik Department of Electrical and Computer Engineering
Self Healing and Dynamic Construction Framework:
What Are Routers? Routers are an intermediate system at the network layer that is used to connect networks together based on a common network layer protocol.
A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.
Congestion Control, Internet transport protocols: udp
Improving cache performance of MPEG video codec
Anne Pratoomtong ECE734, Spring2002
Smita Vijayakumar Qian Zhu Gagan Agrawal
Dynamic Routing Protocols part3 B
Routing.
Presentation transcript:

Review of OS Controlled NoC from IMEC Jim Stevens RC Reading Group 01/30/2008

Today’s Papers Operating-system controlled network on chip. Nollet, V.; Marescaux, T.; Verkest, D. Design Automation Conference (DAC), Proceedings. 41st Volume, Issue, 2004 Page(s): Centralized run-time resource management in a network-on-chip containing reconfigurable hardware tiles. Nollet, V.; Marescaux, T.; Avasare, P.; Verkest, D.; Mignolet, J.-Y. Design, Automation and Test in Europe (DATE), Proceedings Volume, Issue, 7-11 March 2005 Page(s): Vol. 1

Operating-System Controlled Network on Chip V. Nollet, T. Marescaux, and D. Verkest DAC 04

Abstract Managing NoC is challenging OS needs to control NoC Tight integration allows for efficiency OS can –Optimize communication resource usage –Reduce interference between applications

Introduction Future systems will consist of tiles of processing elements (PE) Tiles connected by NoC Mapping tasks onto tiles and dynamically managing communication is extremely challenging Goals –Ensure that compute power matches communication needs –Provide required QoS

Multiprocessor Emulation System consists of a StrongARM processor in a Compaq iPAQ PDA connected to an FPGA using iPAQ expansion port Two NoCs built in FPGA –Packet-switched 3x3 bidirectional mesh called data NoC –Another network for OS control messages Both networks at 30 MHz StrongARM at 206 MHz

Transport Layer

Data Network Interface PEs connect to data NoC with data Network Interface Component (dNIC). dNIC responsibilities: –Buffer I/O messages for PE –Provide higher level interface to data router –Collect statistics Blocked message count: number of received messages that were blocked in the data router while waiting for the PE input buffer to be released. Injection rate control mechanism: throttles rate of messages being sent from PE

Control Network Interface Connected to control network by cNIC Provides OS with unified view of communication resources Collects stats from dNIC Allows OS to: –Dynamically set routing tables –Manage injection rate of dNIC

Operating System One PE is denoted as master –Monitors system and assigned tasks to slave PEs Slaves contain a basic RPC-like mechanism to execute OS functions for master Slaves can also call back to the OS using similar functionality for tasks such as synchronization

Operating System Diagram

NoC Control Tools Dynamic Statistics Collection –OS polls cNICs to get traffic stats –Collects the blocked message count to see if congestion is occuring Dynamic Injection Rate Control –Modifies the send window of PE to reduce congestion –Setting window tasks is deterministic and fast (57 μs) OS-Controlled Adaptive Routing –Modify routing tables to reduce congestion –Complex operation: temporary stop messages on a channel by sending sync messages, update routing tables using cNIC OS interface, and finally notify all relevant tasks to resume sending. –Non-deterministic because it depends on network traffic and complexity of table update.

Send Window Parameters

Case Study Tested system with MJPEG decoder Consists of four tasks running on PEs Two tasks run on StrongARM (tile 3) Two other tasks are hardware blocks: –Huffman decoder/dequantisation –2D-IDCT and YUV to RGB converter Added message gen/sink modules to put traffic on the network to interfere with channel from node 7 to node 6. OS samples cNICs every 20 ms.

Decoder Communication Played same sequence with two different windowing techniques (window spreading and allocating continuous blocks) with no interference Decrease the window size from 100% to ~0.02% For window spreading, throughput of the video decoder does not decrease until effective window is less than 2% of bandwidth, half throughput occurs at 1.5% of bandwidth For continuous allocation, half throughput occurs at 75% of bandwidth When inference is enabled, window spreading helps reduce jitter because communication is more evenly spread.

Decoder with interference

Centralized Run-Time Resource Management in a Network-on-Chip Containing Reconfigurable Hardware Tiles V. Nollet, T. Marescaux, P. Avasare, D. Verkest, J-Y. Mignolet DATE 05

Introduction Same assumptions and system setup as previous paper This paper focusing on the task assignment heuristic and dynamic task migration Claims to be first paper to address run-time task migration in an NoC context

System Description Same as before Task mapping heuristic must find the best PE for each task Want to reduce internal fragmentation and optimize communication paths

Resource Management Heuristic Requires application specification, user requirements, and current resource usages as input Specification is given by a task graph that contains properties of each task such as computation and communication needs User requirements given by a simple QoS specification

Heuristic Steps Calculate requested resource load Calculate task execution variance Calculate task communication weight Sort tasks according to mapping importance Sort PEs for most important unmapped tasks Map task to best computing resource

Backtracking Some inputs will result in no valid mapping Use backtracking to attempt to find another mapping –Undo previous N steps, select second best PE instead of best PE, then remap remaining N-1 steps –If fails, then try again with N+1 If backtracking fails, options are: –Use run-time task migration –Use hierarchical configuration –Restart heuristic with reduced user requirements

RH Add-ons For reconfigurable hardware (RH), must task into account internal fragmentation fo reconfigurable area If both first and second best tasks are reconfigurable, then want to pick one with lowest internal fragmentation Also consider if a regular PE could be used for this task instead of RH –Want to map only computationally intensive tasks to RH Can also create softcore processors on RH -They refer to this as “hierarchical configuration”

Heuristic Performance Compared to algorithm that explores full solution space for a multimedia pipeline application Defined LIGHT, MEDIUM, and HEAVY computational loads for previous load of the platform –If more than 50% of a PE’s resources are used, then the PE is considered used, otherwise free. Table 1 shows the success rate for the heuristic with varying number of backtracking steps with respect to searching the full mapping solution space. Demonstrates that RH add-ons to algorithm improve performance. Use hop-bandwidth product to show mapping quality.

Mapping Success

Hop-bandwidth Mapping Quality

Run-Time Task Migration Goal is to move a task from source tile to destination tile Can only migrate at predefined checkpoints in execution (we’ve seen this before) They cite a paper to discuss moving between a PE to RH Must assure communication consistency –Current methods (buffering or dropping) are not well suited to NoC for various reasons

Migration Process OS issues a migration request Wait until process reaches a checkpoint –OS does not know how long this will take When checkpoint is reached, OS is signaled OS tells other processes to stop sending to process –Last message sent by a process has a tag OS migrates the process, but does not delete the original process OS tells other processes, including the source tile, to update their routing tables (DLT) to contain the new task location When all tagged messages have been received at the new task location, the OS tells the other processes to start sending again and it frees the original location

Migration Process for Pipelines Based on assumption that pipelined apps have stateless points (units of work in the pipeline are independent) To start migration, flush the pipeline with a tagged message. Move pipeline task that needs to be migrated Update routing tables for pipeline tasks and restart pipeline

Benchmarking Migration Reaction time: migration request to when task is ready to migrate (checkpoint or stateless point) Freeze time: amount of time the migrating task is suspended If free resources are available, can start migration during reaction time for pipelines

Conclusion (Both Papers) IMEC has developed a NoC prototype that allows the OS to control network traffic dNIC and cNIC provide network interface to PEs and RH OS can control injection rate, change routing tables, and migrate tasks at run-time to reduce congestion. Static task-mapping heuristic based on specification can find efficient ways to take advantage of both communication and computation resources.