Presentation is loading. Please wait.

Presentation is loading. Please wait.

CATA 06© 2006 Wayne Wolf Multiprocessor Systems-on-Chips Wayne Wolf Dept. of Electrical Engineering Princeton University.

Similar presentations


Presentation on theme: "CATA 06© 2006 Wayne Wolf Multiprocessor Systems-on-Chips Wayne Wolf Dept. of Electrical Engineering Princeton University."— Presentation transcript:

1 CATA 06© 2006 Wayne Wolf Multiprocessor Systems-on-Chips Wayne Wolf Dept. of Electrical Engineering Princeton University

2 Outline zApplications of MPSoCs. zWhat makes MPSoCs different? zExample MPSoCs. zDesign methodologies. zNetworks-on-chips.

3 Billion-transistor chips zMore transistors are manufactured in California per year than raindrops fall. zWe will soon be able to manufacture in volume chips with one billion transistors. zWe can manufacture, but can we design? Sematech

4 MPSoC applications zSophisticated markets: y High volume. y Demanding performance, power requirements. y Strict price restrictions. zOften standards-driven. zExamples: y Communications. y Multimedia. y Networking.

5 Approximate market segments Cell phone600 million PC120 million CD30 million DVD40 million Digital television6 million (US) Digital camera24 million (US)

6 Standards-based embedded systems zMany product categories rely on standards. zStandards body provides reference implementation. y Reduces development time. y Don’t want to introduce bugs. zReference implementation may not be well-suited to implementation: y No task structure; y Not optimized. MPEG Tampere meeting

7 MPEG 1/2-style compression engine motion estimator + DCTQ variable length coder buffer Q -1 DCT -1 + picture store/ predictor

8 H.264/AVC zRelatively new video compression standard. y Many modes to improve image quality. y Combines broadcast, videoconference approaches. y Supports displays from cell phone to HDTV. zReference implementation includes over 720,000 lines of C.

9 Ogg Vorbis audio compression zWindow sizing trades quality, computational cost. zModern audio encoders change window size dynamically. y Loop characteristics are harder to predict.

10 What makes MPSoCs different? zMulti-tasking. y Higher levels of parallelism help, but make homogeneous architectures less attractive. zReal-time operation. y Traditional memory systems gave huge, unpredictable differences in access time. zLow-power operation. y Everyone worries about power, but MPSoC designers worry more. zLow cost.

11 Consumer electronics prices Best Buy November 2003:

12 Scientific multiprocessing zTraditional scientific algorithms perform numerical computations. y Single algorithm on large amounts of data. zScientific multiprocessors emphasize easy programming of a single data set over multiple CPUs. interconnect CPU mem Data array

13 Embedded vs. scientific applications zEmbedded applications provide task-level parallelism. zEmbedded applications run many different types of algorithms at once. CPU 1CPU 2 mem a1a2a3 +

14 Architectures for real time zReal time means computing to deadlines. y Requires careful resource management. zCan’t stop the pipeline. zMust size buffers to maintain throughput, minimize power and cost. a1a2a3 +

15 Mudge et al: mobile supercomputers zMobile speech recognition, video, etc. requires high performance and low energy.

16 Mudge et al: energy gap

17 Generic MPSoC architecture zRely on external bulk memory. zHeterogeneous internal architecture: y Heterogeneous CPUs. y Heterogeneous interconnect. y Heterogeneous memory. y Heterogeneous programming environment. Bulk memory

18 Philips Nexperia set-top box MIPS Trimedia Off-chip SDRAM MC bridgeTC bridge Bus ctrl Clocks, DMA, Reset, debug I 2 C, Smcard PCI USB, 1394 MMI bus MBS 2D AICP MPEG SPDIF GPIO C bridge

19 TI OMAP zTargets communications, multimedia. zMultiprocessor with DSP, RISC. C55x DSP OMAP 5910: ARM9 MMU Memory ctrl MPU interface System DMA control bridge I/O

20 ST Nomadik zTargets mobile multimedia. zA multiprocessor- of-multiprocessors. ARM9 Memory system I/O bridges Audio accelerator Video accelerator heterogeneous multiprocessors

21 Nomadik video accelerator MMDSP+ data RAM instr RAM Xbus Interrupt controller Picture post processing Video codec Picture input processing Local data bus Master AHB DMA

22 Nomadik audio processor Slave AHB Timers, GPIO, etc. MMDSP+ Y RAM X RAM Instr cache ARM DMA DMA1 DMA2 Master AHB X Bus Y Bus

23 MediaWorks ISI media platform zDesigned for MPEG- 4. zFive Tensilica processors. y All different instruction sets. zI/O, memory control, etc. MediaWorks ISI

24 ARM MPCore zUp to 4 ARM11 cores. zEach processor has its own cache. zShared memory. y Configurable memory access. CPU/ Vector L1 $ CPU/ Vector L1 $ CPU/ Vector L1 $ CPU/ Vector L1 $ Interrupt distributor Snoop controller ictrl www.arm.com

25 Cell processor zIBM/Sony/Toshiba for PS3, other consumer devices. zFirst implementation has: y 8 Cell processors (no cache). y Ring network. y 1 PowerPC. PowerPC Cell

26 How many platforms are there? zLarge markets encourage diversity: y Unique requirements that must be met by platform customization. y Standards set the parameters within customization can occur. zAggressive requirements encourage diversity: y Battery operation. y Low heat dissipation. y Small physical size.

27 What makes MPSoCs different for chip designers? zChip design process must include lots of software. zSoftware must be designed to hardware-like constraints: y Real-time. y Low-power. y Area-constrained. zComputation never stops. y Stalling is not an option---buffer design is important.

28 Methodology challenges zIP-based design. zMemory system design. zInterconnect. zHardware/software co-design. zDesign verification.

29 Design productivity gap zFrom ITRS 99:

30 Processors zCPU or hardwired unit? zWhat instruction set? y Configurable processors provide power/performance advantages. y Standard instruction sets provide compatibility. zHow many CPUs? zHow many accelerators?

31 Memory system design zHomogeneous or heterogeneous memory? zRequired memory bandwidth, latency? zCaching structure? zMemory consistency?

32 Interconnect zWhat communication topology? y A few general-purpose topologies? y Or application-specific topologies? zWhat protocols? y Custom protocols for MPSoC? y Quality-of-service is an important requirement for many applications.

33 Development environment and tools zDevelopment environment includes the entire multiprocessor: y Debugging. y Interprocessor communication. y Interconnect and memory system optimization. zSimulation is an important tool.

34 RTOS and middleware zNeed very fast communication primitives. y Features cannot come at the expense of performance/power. zBoard-level RTOSs target a different design point: y More features. y Not worried about energy consumption. zMiddleware provides application-specific services built upon scheduling andIPC primitives.

35 Why middleware? zResources must be dynamically allocated for efficiency. zResource allocation in a multiprocessor requires middleware layer above the operating systems. zChallenge: low-power, high-throughput middleware services. zST Micro provides hardware support: y CORBA. y MPI.

36 Verification problems zFunctional: y Buffer overflow/underflow. xBuffers may be very large. y State-based behavior. xMay take many cycles to get into the right state. zPerformance: y Clock period. xMay depend upon details of memory state. y Real-time performance. xSoftware performance in the presence of busses, caches, etc.

37 Networks-on-chips zBuild single-chip multiprocessors using packet-switched network. y Better design partitioning, decouples physical and architectural design. zDesign levels: y Network topology. y Routing. y Flow control. z Systems: y Dally. y KTH Nostrum. y SPIN. y Slim-Spider. y QNoC. y Philips.

38 Design challenges zGeneric vs. custom. y Customized designs provide better power/performance. y Hard to justify design effort for full custom network. zNetwork design parameters. y Packet size, buffer size, etc. zLayer design. y What to optimize away, what to keep flexible.

39 Smart Camera system-on-chip: Behavior model & computation architecture zReal-time gesture recognition z150 frame/sec zDual-pipeline computation architecture

40 Smart Camera system-on-chip: RAW vs. Application-specific Networks- on-Chip RAW ASNoC z ASNoC has three local networks z RAW is implemented based on its design documentation z Positions of computation nodes are optimized in RAW z The same group of computation nodes z Different communication architectures z ASNoC has less switches and links

41 Smart Camera system-on- chip: Results and comparison z Higher performance: 196% z Lower power: 40% z Less area: 36% metal area, 49% silicon area z Less network resource: 38% switch capacity, 33% link capacity z Higher network utilization: 227% switch utilization, 316% link utilization

42 Grand unified application and SoCs zGesture recognition, face recognition, facial expression analysis, speech recognition, non- speech sound recognition, Etc. zAlgorithms + architecture. CPU video


Download ppt "CATA 06© 2006 Wayne Wolf Multiprocessor Systems-on-Chips Wayne Wolf Dept. of Electrical Engineering Princeton University."

Similar presentations


Ads by Google