Data/Frame Memory PE 0 PE 1 PE 2 PE 3 PE N … Control Instruction Memory Interconnect The SIMD Concept.

Slides:



Advertisements
Similar presentations
Designing Embedded Hardware 01. Introduction of Computer Architecture Yonam Institute of Digital Technology.
Advertisements

PIPELINE AND VECTOR PROCESSING
Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Lecture 6: Multicore Systems
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
Introduction to Microprocessors and Microcomputers.
EEE226 MICROPROCESSORBY DR. ZAINI ABDUL HALIM School of Electrical & Electronic Engineering USM.
A reconfigurable system featuring dynamically extensible embedded microprocessor, FPGA, and customizable I/O Borgatti, M. Lertora, F. Foret, B. Cali, L.
In God We Trust Class presentation for the course: “Custom Implementation of DSP systems” Presented by: Mohammad Haji Seyed Javadi May 2013 Instructor:
Platform-based Design 5KK70 TU/e 2009 Henk Corporaal Bart Mesman.
COMP3221 lec31-mem-bus-I.1 Saeid Nooshabadi COMP 3221 Microprocessors and Embedded Systems Lectures 31: Memory and Bus Organisation - I
Embedded Systems Programming
Processor Architectures and Program Mapping TU/e 5kk10 Henk Corporaal Jef van Meerbergen Bart Mesman Exploiting DLP SIMD architectures.
Introduction to Reconfigurable Computing CS61c sp06 Lecture (5/5/06) Hayden So.
SYNAR Systems Networking and Architecture Group CMPT 886: Architecture of Niagara I Processor Dr. Alexandra Fedorova School of Computing Science SFU.
\course\eleg652-03F\Topic1a- 03F.ppt1 Vector and SIMD Computers Vector computers SIMD.
1 SODA: A Low-power Architecture For Software Radio Yuan Lin 1, Hyunseok Lee 1, Mark Woh 1, Yoav Harel 1, Scott Mahlke 1, Trevor.
Processor Architectures and Program Mapping 5kk10 TU/e 2006 Henk Corporaal Jef van Meerbergen Bart Mesman.
VIRAM-1 Architecture Update and Status Christoforos E. Kozyrakis IRAM Retreat January 2000.
Silicon Programming--Altera Tools1 “Silicon Programming“ programmable logic Altera devices and the Altera tools major tasks in the silicon programming.
Chapter 4 Computer Memory
Penn ESE Spring DeHon 1 FUTURE Timing seemed good However, only student to give feedback marked confusing (2 of 5 on clarity) and too fast.
IVisual: An Intelligent Visual Sensor SoC with 2790fps CMOS Image Sensor and 205GOPS/W Vision Processor Video analysis technology –Healthcare, HMI, surveillance,
Eye-RIS. Vision System sense – process - control autonomous mode Program stora.
Using Programmable Logic to Accelerate DSP Functions 1 Using Programmable Logic to Accelerate DSP Functions “An Overview“ Greg Goslin Digital Signal Processing.
MCU – Microcontroller Unit – 1 MCU  1 cip or VLSI core – application-specific.
1 Copyright © 2011, Elsevier Inc. All rights Reserved. Appendix E Authors: John Hennessy & David Patterson.
Motivation Mobile embedded systems are present in: –Cell phones –PDA’s –MP3 players –GPS units.
Revised: Aug 1, ECE 263 Embedded System Design Lesson 1 68HC12 Overview.
10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring.
Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.
Multiprocessor systems Objective n the multiprocessors’ organization and implementation n the shared-memory in multiprocessor n static and dynamic connection.
A New Generation of DSP Architectures Bryan Ackland and Paul D’Arcy Lucent Technologies Paper Review Babak Noory Professor Maitham Shams March.
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
Electronics in High Energy Physics Introduction to Electronics in HEP Field Programmable Gate Arrays Part 1 based on the lecture of S.Haas.
Introduction to CMOS VLSI Design Lecture 22: Case Study: Intel Processors David Harris Harvey Mudd College Spring 2004.
Service Engineeing & Optimization Revision 1.1 MOTOROLA L6 i-Mode L6 i-Mode Block Diagram.
6 th /June, ISCA2005, 1/30NEC Corporation An Integrated Memory Array Processor Architecture for Embedded Image Recognition Systems *1 Shorin KYO *1 Shin'ichiro.
History of Microprocessor MPIntroductionData BusAddress Bus
PIPELINING AND VECTOR PROCESSING
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
COMPUTER ORGANIZATIONS CSNB123. COMPUTER ORGANIZATIONS CSNB123 Why do you need to study computer organization and architecture? Computer science and IT.
Computer Architecture And Organization UNIT-II General System Architecture.
The TM3270 Media-Processor. Introduction Design objective – exploit the high level of parallelism available. GPPs with Multi-media extensions (Ex: Intel’s.
Alpha 21364: A Scalable Single-chip SMP Peter Bannon Senior Consulting Engineer Compaq Computer Corporation Shrewsbury, MA.
Hardware Benchmark Results for An Ultra-High Performance Architecture for Embedded Defense Signal and Image Processing Applications September 29, 2004.
Chair MPSoC MPSoC Programming Solution “ CoreManager” hardware unit for:  Dependency checking  Task scheduling  Local memory management of PEs  C programmable.
Mikrodenetleyiciler/Mikrokontrol örler (Microcontrollers) Bu Sunu Adresindeki Video İçeriği Kullanılarak Hazırlanmıştır.
The “Drink Mixer” Design Constraints. Project Success Criteria An ability to digitally mix audio and adjust individual levels An ability to digitally.
Lecture 3: Computer Architectures
Fundamentals of Programming Languages-II
WorldScape Defense Company, L.L.C. Company Proprietary Slide 1 An Ultra-High Performance Scalable Processing Architecture for HPC and Embedded Applications.
A 1.2V 26mW Configurable Multiuser Mobile MIMO-OFDM/-OFDMA Baseband Processor Motivations –Most are single user, SISO, downlink OFDM solutions –Training.
1 Basic Processor Architecture. 2 Building Blocks of Processor Systems CPU.
Array computers. Single Instruction Stream Multiple Data Streams computer There two types of general structures of array processors SIMD Distributerd.
Computer Organization IS F242. Course Objective It aims at understanding and appreciating the computing system’s functional components, their characteristics,
Computer Architecture: Intro Anatomy of a CPU J. Schmalzel S. Mandayam.
Seminar On 8085 microprocessor
Differences of 8086,80386,i7.
OCR GCSE Computer Science Teaching and Learning Resources
Embedded Systems Design
Computer Architecture 2
פרק 2: חיווט, זיכרונות בנקים זוגיים ואי-זוגיים
Number Representations and Basic Processor Architecture
Introduction to Digital Signal Processors (DSPs)
COMPUTER ARCHITECTURES FOR PARALLEL ROCESSING
Wireless Embedded Systems
Modified from notes by Saeid Nooshabadi
ADSP 21065L.
Artificial Intelligence: Driving the Next Generation of Chips and Systems Manish Pandey May 13, 2019.
Presentation transcript:

Data/Frame Memory PE 0 PE 1 PE 2 PE 3 PE N … Control Instruction Memory Interconnect The SIMD Concept

Embedded Computer Architecture 5KK73 TU/e Henk Corporaal Bart Mesman SIMD: XETAL and IMAP

Xetal-II Philips, NXP, TU/e

XETAL-II SRAM 8051 ZigBee CPLD Xetal-II

600 mW 90 nm CMOS 53.5 GOPS (arithmetic 84MHz Best computational efficient programmable silicon in 2007 [Kleihorst, et al.2007] GPO Out I2CI2C GPI Program (16k x 56b) Data (2k x 16b) Linear Processor Array (320 PEs) Sequential I/O Memory (2 lines x 320 words) DIP DOP Frame Memory (2048 lines x 320 words) IMEM (240 kb) OMEM,LUT (240 kb) In Xetal-II Processor

All PEs access the same line of the Frame Memory Xetal-II Memory Access

Integrated Memory Array Processor IMAP (NEC)

IMAP-CE IMAP-1 IMAP-VISION MHz, 32PE/Chip 15MHz, 8PE /Chip Peak Performance(GOPS) 100MHz, 128PE/Chip 4-Way VLIW,50GOPS 0.18um, 2 ~ 4Watt IMAP-2 40MHz, 64PE/Chip IMAPCAR 100MHz, 128PE/Chip 4-Way VLIW+MAC, 100GOPS (-40 ℃~ 85 ℃ ), 0.13 um, <2Watt 1000 IMAP Series Processors (NEC) ISSCC’03 ISSCC’95 Year 11.0mm PE8 CP EXTIF DPLL IMAP-CE( 32.7M Tr, 0.18um ) (PE8: eight PEs integration block) CAMP’97 [Shorin Kyo, et al.2005]

IMAPCAR Block Diagram and Features Video IN Video OUT P$,D$,STK RAM EMEM Host Processor Control Processor (CP) 4 Way VLIW PE SR0 SR1 SR2 IMEM External Mem. I/F 12.8 GByte/s 0.8 GByte/s SR3 128 EMEM ADD MUL RDU 24 x 8b General Purpose Registers To/Fr other PEs To/Fr IMEM LSU COMM To/Fr CP LOG 4)128 individual RAM blocks 1)128 4-Way VLIW PEs 2)< 100MHz 3)130nm CMOS ALUx1,MULx1,LOGx1,LSUx1

IMAPCAR Memory Access: local addressing support Each PE could access different lines of the Memory Requires separate memory module per PE

IMAPCAR2: XC core (NEC)

XC Core: SIMD/MIMD 90nm CMOS, 108MHz [Shorin Kyo, et al.2009]

XC Core: SIMD Mode

XC Core: MIMD Support

4 SIMD PE -> 1 MIMD FPU

Xetal-Pro TU/e : 2010

Xetal-Pro Memory Access All PEs access the same line of the Scratchpad Memory or Frame Memory Characteristics: 1)320 single-issue PEs 125MHz 3)65nm CMOS 4)1pJ/op at sub- threshold