CATA 06© 2006 Wayne Wolf Multiprocessor Systems-on-Chips Wayne Wolf Dept. of Electrical Engineering Princeton University.

Slides:



Advertisements
Similar presentations
3D Graphics Content Over OCP Martti Venell Sr. Verification Engineer Bitboys.
Advertisements

Philips Research ICS 252 class, February 3, The Trimedia CPU64 VLIW Media Processor Kees Vissers Philips Research Visiting Industrial Fellow
Processing Efficiency Jonah Probell Multimedia Systems Engineer Tensilica Truly Understanding Low-Power Multimedia Chip Design.
VADA Lab.SungKyunKwan Univ. 1 L3: Lower Power Design Overview (2) 성균관대학교 조 준 동 교수
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 6: Multicore Systems
High Performance Embedded Computing © 2007 Elsevier Lecture 15: Embedded Multiprocessor Architectures Embedded Computing Systems Mikko Lipasti, adapted.
System on Chip (SOC).
Chapter 8 Hardware Conventional Computer Hardware Architecture.
Addressing the System-on-a-Chip Interconnect Woes Through Communication-Based Design N. Vinay Krishnan EE249 Class Presentation.
L27:Lower Power Algorithm for Multimedia Systems 성균관대학교 조 준 동
Embedded Computer Architecture 5KK73 TU/e Henk Corporaal
11/14/05ELEC Fall Multi-processor SoCs Yijing Chen.
Spring 07, Jan 16 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Introduction Vishwani D. Agrawal James J. Danaher.
Architectural Considerations for CPU and Network Interface Integration C. D. Cranor; R. Gopalakrishnan; P. Z. Onufryk IEEE Micro Volume: 201, Jan.-Feb.
Define Embedded Systems Small (?) Application Specific Computer Systems.
Configurable System-on-Chip: Xilinx EDK
6/30/2015HY220: Ιάκωβος Μαυροειδής1 Moore’s Law Gordon Moore (co-founder of Intel) predicted in 1965 that the transistor density of semiconductor chips.
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in partitioned architectures Rajeev Balasubramonian Naveen.
Embedded Software for Video Wayne Wolf Princeton University and MediaWorks Technology.
EEL 6935 Embedded Systems Long Presentation 2 Group Member: Qin Chen, Xiang Mao 4/2/20101.
High Performance Embedded Computing © 2007 Elsevier Chapter 5, part 1: Multiprocessor Architectures High Performance Embedded Computing Wayne Wolf.
SET TOP BOX What is set-top box ? An interactive device which integrates the video and audio decoding capabilities of television with a multimedia application.
Getting Started With DSP A. What is DSP? B. Which TI DSP do I use? Highest performance C6000 Most power efficient C5000 Control optimized C2000 TMS320C6000™
1/1/ / faculty of Electrical Engineering eindhoven university of technology Input/Output devices Part 3: Programmable I/O and DSP's dr.ir. A.C. Verschueren.
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
Chapter 18 Multicore Computers
Computer performance.
High Performance Embedded Computing © 2007 Elsevier Lecture 16: Interconnection Networks Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
L29:Lower Power Embedded Architecture Design 성균관대학교 조 준 동 교수,
1 HW-SW Framework for Multimedia Applications on MPSoC: Practice and Experience Adviser : Chun-Tang Chao Adviser : Chun-Tang Chao Student : Yi-Ming Kuo.
Introduction Course Overview and Basic understanding of Computer Architecture.
February 12, 1998 Aman Sareen DPGA-Coupled Microprocessors Commodity IC’s for the Early 21st Century by Aman Sareen School of Electrical Engineering and.
ECE-777 System Level Design and Automation Introduction 1 Cristinel Ababei Electrical and Computer Department, North Dakota State University Spring 2012.
1 3-General Purpose Processors: Altera Nios II 2 Altera Nios II processor A 32-bit soft core processor from Altera Comes in three cores: Fast, Standard,
Lecture 13 Introduction to Embedded Systems Graduate Computer Architecture Fall 2005 Shih-Hao Hung Dept. of Computer Science and Information Engineering.
THE PHILIPS NEXPERIA DIGITAL VIDEO PLATFORM. The Digital Video Revolution  Transition from Analog to Digital Video  Navigate, store, retrieve and share.
Architectures for mobile and wireless systems Ese 566 Report 1 Hui Zhang Preethi Karthik.
SYSTEM-ON-CHIP (SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY.
Technical Seminar Introduction to networking with Linux Administration Amit Kumar Sahoo EC ADVANCED EMBEDDED MICROPROCESSORS AND APPLICATIONS.
Eng.Abed Al Ghani H. Abu Jabal Introduction to computers.
Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics Testability and architecture. Design methodologies. Multiprocessor system-on-chip.
1. DAC 2006 CAD Challenges for Leading-Edge Multimedia Designs.
High Performance Embedded Computing © 2007 Elsevier Lecture 3: Design Methodologies Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte Based.
High Performance Embedded Computing © 2007 Elsevier Chapter 1, part 2: Embedded Computing High Performance Embedded Computing Wayne Wolf.
Operating Systems. Definition An operating system is a collection of programs that manage the resources of the system, and provides a interface between.
Configurable, reconfigurable, and run-time reconfigurable computing.
Spring 2007Lecture 16 Heterogeneous Systems (Thanks to Wen-Mei Hwu for many of the figures)
F. Gharsalli, S. Meftali, F. Rousseau, A.A. Jerraya TIMA laboratory 46 avenue Felix Viallet Grenoble Cedex - France Embedded Memory Wrapper Generation.
25 April 2000 SEESCOASEESCOA STWW - Programma Evaluation of on-chip debugging techniques Deliverable D5.1 Michiel Ronsse.
VLSI Algorithmic Design Automation Lab. THE TI OMAP PLATFORM APPROACH TO SOC.
Axel Jantsch 1 Networks on Chip A Paradigm Change ? Axel Jantsch Laboratory of Electronics and Computer Systems, Royal Institute of Technology, Stockholm.
What is a Microprocessor ? A microprocessor consists of an ALU to perform arithmetic and logic manipulations, registers, and a control unit Its has some.
Hy-C A Compiler Retargetable for Single-Chip Heterogeneous Multiprocessors Philip Sweany 8/27/2010.
Multiprocessor SoC integration Method: A Case Study on Nexperia, Li Bin, Mengtian Rong Presented by Pei-Wei Li.
1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.
1 of 14 Lab 2: Design-Space Exploration with MPARM.
PRESENTED BY: MOHAMAD HAMMAM ALSAFRJALANI UFL ECE Dept. 3/31/2010 UFL ECE Dept 1 CACHE OPTIMIZATION FOR AN EMBEDDED MPEG-4 VIDEO DECODER.
Aditya Dayal M. Tech, VLSI Design ITM University, Gwalior.
© 2004 Wayne Wolf Overheads for Computers as Components 2e Overview zWhy multiprocessors? zThe structure of multiprocessors. zElements of multiprocessors:
Presented By Aditya Dayal ITM University, Gwalior.
Embedded Systems. What is Embedded Systems?  Embedded reflects the facts that they are an integral.
Lynn Choi School of Electrical Engineering
ECE354 Embedded Systems Introduction C Andras Moritz.
Microarchitecture.
System On Chip.
Distributed Real-Time Embedded Video Processing
A High Performance SoC: PkunityTM
Introduction to Embedded Systems
What Choices Make A Killer Video Processor Architecture?
Presentation transcript:

CATA 06© 2006 Wayne Wolf Multiprocessor Systems-on-Chips Wayne Wolf Dept. of Electrical Engineering Princeton University

Outline zApplications of MPSoCs. zWhat makes MPSoCs different? zExample MPSoCs. zDesign methodologies. zNetworks-on-chips.

Billion-transistor chips zMore transistors are manufactured in California per year than raindrops fall. zWe will soon be able to manufacture in volume chips with one billion transistors. zWe can manufacture, but can we design? Sematech

MPSoC applications zSophisticated markets: y High volume. y Demanding performance, power requirements. y Strict price restrictions. zOften standards-driven. zExamples: y Communications. y Multimedia. y Networking.

Approximate market segments Cell phone600 million PC120 million CD30 million DVD40 million Digital television6 million (US) Digital camera24 million (US)

Standards-based embedded systems zMany product categories rely on standards. zStandards body provides reference implementation. y Reduces development time. y Don’t want to introduce bugs. zReference implementation may not be well-suited to implementation: y No task structure; y Not optimized. MPEG Tampere meeting

MPEG 1/2-style compression engine motion estimator + DCTQ variable length coder buffer Q -1 DCT -1 + picture store/ predictor

H.264/AVC zRelatively new video compression standard. y Many modes to improve image quality. y Combines broadcast, videoconference approaches. y Supports displays from cell phone to HDTV. zReference implementation includes over 720,000 lines of C.

Ogg Vorbis audio compression zWindow sizing trades quality, computational cost. zModern audio encoders change window size dynamically. y Loop characteristics are harder to predict.

What makes MPSoCs different? zMulti-tasking. y Higher levels of parallelism help, but make homogeneous architectures less attractive. zReal-time operation. y Traditional memory systems gave huge, unpredictable differences in access time. zLow-power operation. y Everyone worries about power, but MPSoC designers worry more. zLow cost.

Consumer electronics prices Best Buy November 2003:

Scientific multiprocessing zTraditional scientific algorithms perform numerical computations. y Single algorithm on large amounts of data. zScientific multiprocessors emphasize easy programming of a single data set over multiple CPUs. interconnect CPU mem Data array

Embedded vs. scientific applications zEmbedded applications provide task-level parallelism. zEmbedded applications run many different types of algorithms at once. CPU 1CPU 2 mem a1a2a3 +

Architectures for real time zReal time means computing to deadlines. y Requires careful resource management. zCan’t stop the pipeline. zMust size buffers to maintain throughput, minimize power and cost. a1a2a3 +

Mudge et al: mobile supercomputers zMobile speech recognition, video, etc. requires high performance and low energy.

Mudge et al: energy gap

Generic MPSoC architecture zRely on external bulk memory. zHeterogeneous internal architecture: y Heterogeneous CPUs. y Heterogeneous interconnect. y Heterogeneous memory. y Heterogeneous programming environment. Bulk memory

Philips Nexperia set-top box MIPS Trimedia Off-chip SDRAM MC bridgeTC bridge Bus ctrl Clocks, DMA, Reset, debug I 2 C, Smcard PCI USB, 1394 MMI bus MBS 2D AICP MPEG SPDIF GPIO C bridge

TI OMAP zTargets communications, multimedia. zMultiprocessor with DSP, RISC. C55x DSP OMAP 5910: ARM9 MMU Memory ctrl MPU interface System DMA control bridge I/O

ST Nomadik zTargets mobile multimedia. zA multiprocessor- of-multiprocessors. ARM9 Memory system I/O bridges Audio accelerator Video accelerator heterogeneous multiprocessors

Nomadik video accelerator MMDSP+ data RAM instr RAM Xbus Interrupt controller Picture post processing Video codec Picture input processing Local data bus Master AHB DMA

Nomadik audio processor Slave AHB Timers, GPIO, etc. MMDSP+ Y RAM X RAM Instr cache ARM DMA DMA1 DMA2 Master AHB X Bus Y Bus

MediaWorks ISI media platform zDesigned for MPEG- 4. zFive Tensilica processors. y All different instruction sets. zI/O, memory control, etc. MediaWorks ISI

ARM MPCore zUp to 4 ARM11 cores. zEach processor has its own cache. zShared memory. y Configurable memory access. CPU/ Vector L1 $ CPU/ Vector L1 $ CPU/ Vector L1 $ CPU/ Vector L1 $ Interrupt distributor Snoop controller ictrl

Cell processor zIBM/Sony/Toshiba for PS3, other consumer devices. zFirst implementation has: y 8 Cell processors (no cache). y Ring network. y 1 PowerPC. PowerPC Cell

How many platforms are there? zLarge markets encourage diversity: y Unique requirements that must be met by platform customization. y Standards set the parameters within customization can occur. zAggressive requirements encourage diversity: y Battery operation. y Low heat dissipation. y Small physical size.

What makes MPSoCs different for chip designers? zChip design process must include lots of software. zSoftware must be designed to hardware-like constraints: y Real-time. y Low-power. y Area-constrained. zComputation never stops. y Stalling is not an option---buffer design is important.

Methodology challenges zIP-based design. zMemory system design. zInterconnect. zHardware/software co-design. zDesign verification.

Design productivity gap zFrom ITRS 99:

Processors zCPU or hardwired unit? zWhat instruction set? y Configurable processors provide power/performance advantages. y Standard instruction sets provide compatibility. zHow many CPUs? zHow many accelerators?

Memory system design zHomogeneous or heterogeneous memory? zRequired memory bandwidth, latency? zCaching structure? zMemory consistency?

Interconnect zWhat communication topology? y A few general-purpose topologies? y Or application-specific topologies? zWhat protocols? y Custom protocols for MPSoC? y Quality-of-service is an important requirement for many applications.

Development environment and tools zDevelopment environment includes the entire multiprocessor: y Debugging. y Interprocessor communication. y Interconnect and memory system optimization. zSimulation is an important tool.

RTOS and middleware zNeed very fast communication primitives. y Features cannot come at the expense of performance/power. zBoard-level RTOSs target a different design point: y More features. y Not worried about energy consumption. zMiddleware provides application-specific services built upon scheduling andIPC primitives.

Why middleware? zResources must be dynamically allocated for efficiency. zResource allocation in a multiprocessor requires middleware layer above the operating systems. zChallenge: low-power, high-throughput middleware services. zST Micro provides hardware support: y CORBA. y MPI.

Verification problems zFunctional: y Buffer overflow/underflow. xBuffers may be very large. y State-based behavior. xMay take many cycles to get into the right state. zPerformance: y Clock period. xMay depend upon details of memory state. y Real-time performance. xSoftware performance in the presence of busses, caches, etc.

Networks-on-chips zBuild single-chip multiprocessors using packet-switched network. y Better design partitioning, decouples physical and architectural design. zDesign levels: y Network topology. y Routing. y Flow control. z Systems: y Dally. y KTH Nostrum. y SPIN. y Slim-Spider. y QNoC. y Philips.

Design challenges zGeneric vs. custom. y Customized designs provide better power/performance. y Hard to justify design effort for full custom network. zNetwork design parameters. y Packet size, buffer size, etc. zLayer design. y What to optimize away, what to keep flexible.

Smart Camera system-on-chip: Behavior model & computation architecture zReal-time gesture recognition z150 frame/sec zDual-pipeline computation architecture

Smart Camera system-on-chip: RAW vs. Application-specific Networks- on-Chip RAW ASNoC z ASNoC has three local networks z RAW is implemented based on its design documentation z Positions of computation nodes are optimized in RAW z The same group of computation nodes z Different communication architectures z ASNoC has less switches and links

Smart Camera system-on- chip: Results and comparison z Higher performance: 196% z Lower power: 40% z Less area: 36% metal area, 49% silicon area z Less network resource: 38% switch capacity, 33% link capacity z Higher network utilization: 227% switch utilization, 316% link utilization

Grand unified application and SoCs zGesture recognition, face recognition, facial expression analysis, speech recognition, non- speech sound recognition, Etc. zAlgorithms + architecture. CPU video