P ROCESSING WHILE ROUTING : A NETWORK - ON - CHIP - BASED PARALLEL SYSTEM S.R. Fernandes 1 B.C. Oliveira 2 M. Costa 2 I.S. Silva 2 Computers & Digital.

Slides:



Advertisements
Similar presentations
Evaluation of On-Chip Interconnect Architectures for Multi-Core DSP Students : Haim Assor, Horesh Ben Shitrit 2. Shared Bus 3. Fabric 4. Network on Chip.
Advertisements

Computer Architecture
A Novel 3D Layer-Multiplexed On-Chip Network
Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.
REAL-TIME COMMUNICATION ANALYSIS FOR NOCS WITH WORMHOLE SWITCHING Presented by Sina Gholamian, 1 09/11/2011.
Miguel Gorgues, Dong Xiang, Jose Flich, Zhigang Yu and Jose Duato Uni. Politecnica de Valencia, Spain School of Software, Tsinghua University, China, Achieving.
Spring 2008 Network On Chip Platform Instructor: Yaniv Ben-Itzhak Students: Ofir Shimon Guy Assedou.
Network based System on Chip Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.
Network based System on Chip Students: Medvedev Alexey Shimon Ofir Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.
Multiprocessors Andreas Klappenecker CPSC321 Computer Architecture.
Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim
1 Multi - Core fast Communication for SoPC Multi - Core fast Communication for SoPC Technion – Israel Institute of Technology Department of Electrical.
Chapter Hardwired vs Microprogrammed Control Multithreading
Rotary Router : An Efficient Architecture for CMP Interconnection Networks Pablo Abad, Valentín Puente, Pablo Prieto, and Jose Angel Gregorio University.
Modern trends in computer architecture and semiconductor scaling are leading towards the design of chips with more and more processor cores. Highly concurrent.
1 Evgeny Bolotin – ICECS 2004 Automatic Hardware-Efficient SoC Integration by QoS Network on Chip Electrical Engineering Department, Technion, Haifa, Israel.
1 Fast Communication for Multi – Core SOPC Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab.
Parallel Computer Architectures
Introduction to Parallel Processing Ch. 12, Pg
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Back-end Timing Models Core Models.
A Scalable, Cache-Based Queue Management Subsystem for Network Processors Sailesh Kumar, Patrick Crowley Dept. of Computer Science and Engineering.
Yao Wang, Yu Wang, Jiang Xu, Huazhong Yang EE. Dept, TNList, Tsinghua University, Beijing, China Computing System Lab, Dept. of ECE Hong Kong University.
High Performance Embedded Computing © 2007 Elsevier Lecture 16: Interconnection Networks Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
1 The Turn Model for Adaptive Routing. 2 Summary Introduction to Direct Networks. Deadlocks in Wormhole Routing. System Model. Partially Adaptive Routing.
Blue Gene / C Cellular architecture 64-bit Cyclops64 chip: –500 Mhz –80 processors ( each has 2 thread units and a FP unit) Software –Cyclops64 exposes.
Introduction to Interconnection Networks. Introduction to Interconnection network Digital systems(DS) are pervasive in modern society. Digital computers.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Report Advisor: Dr. Vishwani D. Agrawal Report Committee: Dr. Shiwen Mao and Dr. Jitendra Tugnait Survey of Wireless Network-on-Chip Systems Master’s Project.
A Low-Power CAM Design for LZ Data Compression Kun-Jin Lin and Cheng-Wen Wu, IEEE Trans. On computers, Vol. 49, No. 10, Oct Presenter: Ming-Hsien.
Author : Jing Lin, Xiaola Lin, Liang Tang Publish Journal of parallel and Distributed Computing MAKING-A-STOP: A NEW BUFFERLESS ROUTING ALGORITHM FOR ON-CHIP.
Shubhendu S. Mukherjee, Peter Bannon, Steven Lang, Aaron Spink, and David Webb Alpha Development Group, Compaq HOT Interconnects 9 (2001) Presented by.
CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Course Wrap-Up Miodrag Bolic CEG4136. What was covered Interconnection network topologies and performance Shared-memory architectures Message passing.
O1TURN : Near-Optimal Worst-Case Throughput Routing for 2D-Mesh Networks DaeHo Seo, Akif Ali, WonTaek Lim Nauman Rafique, Mithuna Thottethodi School of.
A Decompression Architecture for Low Power Embedded Systems Lekatsas, H.; Henkel, J.; Wolf, W.; Computer Design, Proceedings International.
Department of Computer Science and Engineering The Pennsylvania State University Akbar Sharifi, Emre Kultursay, Mahmut Kandemir and Chita R. Das Addressing.
CSE 661 PAPER PRESENTATION
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
Axel Jantsch 1 NOCARC Network on Chip Architecture KTH, VTT Nokia, Ericsson, Spirea TEKES, Vinnova.
Performed by: Guy Assedou Ofir Shimon Instructor: Yaniv Ben-Yitzhak המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory הטכניון - מכון.
Performance Analysis of a JPEG Encoder Mapped To a Virtual MPSoC-NoC Architecture Using TLM 林孟諭 Dept. of Electrical Engineering National Cheng Kung.
The Alpha Network Architecture Mukherjee, Bannon, Lang, Spink, and Webb Summary Slides by Fred Bower ECE 259, Spring 2004.
Axel Jantsch 1 Networks on Chip Axel Jantsch 1 Shashi Kumar 1, Juha-Pekka Soininen 2, Martti Forsell 2, Mikael Millberg 1, Johnny Öberg 1, Kari Tiensurjä.
Run-time Adaptive on-chip Communication Scheme 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C.
Yu Cai Ken Mai Onur Mutlu
1 Presenter: Min Yu,Lo 2015/12/21 Kumar, S.; Jantsch, A.; Soininen, J.-P.; Forsell, M.; Millberg, M.; Oberg, J.; Tiensyrja, K.; Hemani, A. VLSI, 2002.
Teaching The Principles Of System Design, Platform Development and Hardware Acceleration Tim Kranich
BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers.
ICC Module 3 Lesson 1 – Computer Architecture 1 / 13 © 2015 Ph. Janson Information, Computing & Communication Computer Architecture Clip 2 – Von Neumann.
Virtual-Channel Flow Control William J. Dally
Network On Chip Cache Coherency Final presentation – Part A Students: Zemer Tzach Kalifon Ethan Kalifon Ethan Instructor: Walter Isaschar Instructor: Walter.
A Low-Area Interconnect Architecture for Chip Multiprocessors Zhiyi Yu and Bevan Baas VLSI Computation Lab ECE Department, UC Davis.
HAT: Heterogeneous Adaptive Throttling for On-Chip Networks Kevin Kai-Wei Chang Rachata Ausavarungnirun Chris Fallin Onur Mutlu.
Corse Overview Miodrag Bolic ELG7187 Topics in Computers: Multiprocessor Systems on Chip.
Network On Chip Cache Coherency Midterm presentation Students: Zemer Tzach Kalifon Ethan Kalifon Ethan Instructor: Walter Isaschar Instructor: Walter Isaschar.
VIRTUAL NETWORK PIPELINE PROCESSOR Design and Implementation Department of Communication System Engineering Presented by: Mark Yufit Rami Siadous.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Communication Costs in Parallel Machines Dr. Xiao Qin Auburn University
Microarchitecture.
Embedded Systems Design
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
Chapter 2 – Computer hardware
Azeddien M. Sllame, Amani Hasan Abdelkader
On-Time Network On-chip
Using Packet Information for Efficient Communication in NoCs
Verilog to Routing CAD Tool Optimization
Mihir Awatramani Lakshmi kiran Tondehal Xinying Wang Y. Ravi Chandra
RECONFIGURABLE NETWORK ON CHIP ARCHITECTURE FOR AEROSPACE APPLICATIONS
Course Outline for Computer Architecture
Presentation transcript:

P ROCESSING WHILE ROUTING : A NETWORK - ON - CHIP - BASED PARALLEL SYSTEM S.R. Fernandes 1 B.C. Oliveira 2 M. Costa 2 I.S. Silva 2 Computers & Digital Techniques, IET,2009 Reporter: 陳健豪

OUTLINE Introduction Related works IPNoSys architecture Results Conclusions

I NTRODUCTION Technology integration has increased to the point where the development of multi-core processor architectures is a market reality nowadays. Bus-based design remains useful while the number of cores in the processor is kept to a limit. More powerful interconnections, such as network-on-chip(NoC).

I NTRODUCTION NoC requires more chip area and more power. This paper proposes IPNoSys system, where the routers are also responsible for the execution of operations,besides the routing process.

R ELATED WORKS NoC

IPN O S YS ARCHITECTURE The NoC is not only interconnection mechanism but also becomes an active element in the execution of applications. square 2D mesh XY routing policy virtual-cut-through (VCT) and wormhole switching scheme virtual channel credit-based control flow distributed arbitration and input buffering

IPN O S YS ARCHITECTURE

An arithmetic logic unit (ALU) allowing the router to perform the most common logic- arithmetic operations usually found in applications. Routing packets and processing,being called routing and processing unit (RPU). The memory modules are accessed by memory access cores (MAC).

IPN O S YS ARCHITECTURE

spiral complement routing algorithm:

IPN O S YS ARCHITECTURE Deadlock treatment:

The number of virtual channels is the number of times that the packet should pass through the same physical channel in the same direction. In our case the maximum is three times (Fig. 3) Thus, the IPNoSys system treats the deadlock through a solution called local execution

IPN O S YS ARCHITECTURE Packet format

IPN O S YS ARCHITECTURE Routing and processing unit (RPU)

IPN O S YS ARCHITECTURE

Memory access core The MACs placed in the corners are responsible for reading the packets from memory and to injecting them into the NoC,

R ESULTS implemented in cycle-accurate SystemC Different NoC dimensions Three simulation cases Simple counter DCT RLE

R ESULTS Simple counter sequential and a parallel execution

IPNoSys system allowed to reduce the maximum number of performed instructions around 80% comparing the sequential and parallel execution

R ESULTS DCT The 2D-DCT is largely used in compression process of images.

The DCT application has much data dependencies,which is the worst case in ILP.

Required memory for IPNoSys is slightly increased with more parallelism because of the rise of the communication.

R ESULTS RLE RLE is suited for compressing any type of data regardless of its information content. For example, an uncompressed string formed by 15‘A’characters would normally require 15 bytes to store:AAAAAAAAAAAAAAA.

It means the number of packets decrease,on average, at the end of its execution.

D ETAILED COMPARISON (STORM X IPN O S YS ) STORM instances with one, two, four or 15 SPARC V8 processors 2D-mesh NoC two, three, five and 16 routers,respectively cache coherent directory-based MP-SoC platform XY routing scheme

C ONCLUSIONS This paper presented an innovative NoC-based architecture that does not use traditional processors, IPNoSys. Architecture’s execution capability independent of the number of application instructions and NoC dimensions. In DCT,the execution time in the IPNoSys is 3.5 times smaller than the STORM best case that shows the efficiency of the parallelism in this system. In RLE,the IPNoSys performance also was better than STORM.