Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spring 2007Lecture 16 Heterogeneous Systems (Thanks to Wen-Mei Hwu for many of the figures)

Similar presentations


Presentation on theme: "Spring 2007Lecture 16 Heterogeneous Systems (Thanks to Wen-Mei Hwu for many of the figures)"— Presentation transcript:

1 Spring 2007Lecture 16 Heterogeneous Systems (Thanks to Wen-Mei Hwu for many of the figures)

2 Spring 2007Lecture 16 What are Heterogeneous Systems? Programmable -- not restricted to one particular application, though may be heavily optimized for a class of applications. Multi-core -- Multiple, independent, execution units on a chip –Some people are starting to use the term “many-core” for architectures where there are enough cores that you have to use a non-sequential programming model to get full performance out of the system. Heterogeneous -- Cores are different –Optimize cores for specific types of applications –Can schedule for performance or power

3 Spring 2007Lecture 16 Why are they Interesting? Embedded applications have tough performance and power requirements Example: GSM decoder requires 10 Minst/second in software Motorola V70 GSM cell phone has power budget of approximately 0.8 watts total when in use. –Includes both encode and decode –Includes microphone, speaker, radio

4 Spring 2007Lecture 16 Application-Specific Integrated Circuits CPU Input Data Custom Logic Buffer Custom Logic Output Data Control

5 Spring 2007Lecture 16 Why Not Keep Using ASICs? Decreasing Product Cycles Design Time/Cost –Transistors/chip rising at 50%/year –Transistors/designer day rising at 10%/year Re-usable cores helping some, but not enough –Mask cost greater than $1M Need to fabricate many chips to justify a design Lack of Flexibility –More and more, consumers want multifunction devices (ex. Cell phone with camera) –Increases design time, cost

6 Spring 2007Lecture 16 Why Heterogeneous Systems? Different parts of programs have different requirements –Control-intensive portions need good branch predictors, speculation, big caches to achieve good performance –Data-processing portions need lots of ALUs, have simpler control flows Power Consumption –Features like branch prediction, out-of-order execution, tend to have very high power/performance ratios. –Applications often have time-varying performance requirements Observation: Much of the performance, power advantages of ASICs comes from application-specific memory, not application-specific processing

7 Spring 2007Lecture 16 Changing Memory to Communication CPU Weight_Ai (Az, F_ga3, Ap3) Weight_Ai (Az, F_g4, Ap4) Residu (Ap3, &syn_subfr[i],) Copy (Ap3, h, 11) Set_zero (&h[11], 11) Syn_filt (Ap4, h, h, 22, &h) tmp = h[0] * h[0]; for (i = 1 ; i < 22 ; i++) tmp = tmp + h[i] * h[i]; tmp1 = tmp >> 8; tmp = h[0] * h[1]; for (i = 1 ; i < 21 ; i++) tmp = tmp + h[i] * h[i+1]; tmp2 = tmp >> 8; if (tmp2 <= 0) tmp2 = 0; else tmp2 = tmp2 * MU; tmp2 = tmp2/tmp1; preemphasis (res2, temp2, 40) Syn_filt (Ap4, res2, &syn_p), 40, mem_syn_pst, 1); agc (&syn[i_subfr], &syn) 29491, 40) res2 m_syn F_g3 F_g4 Az_4 synth syn Ap3 Ap4 h tmp tmp1 tmp2 CPU DRAM DRAMDRAM Weight_Ai Copy+ Set_zero Residu Syn_filt Corr0/Corr1 preemph agc Syn_filt PE’s res2 m_syn F_g3 F_g4 Az_4 synth syn Ap3 Ap4 h tmp tmp1 tmp2 PE’s DRAM

8 Spring 2007Lecture 16 View from source code Note how memory operations dominate Note presence of “expensive” instructions

9 Spring 2007Lecture 16 Not as Easy as it Looks ** * * + Residu preemphasis **** + Syn_filt res [0:39] [39:0] [0:39] MEM time Order of access to data may make transforming memory ops into communication hard

10 Spring 2007Lecture 16 Compilers to the Rescue!

11 Spring 2007Lecture 16 Heterogeneous Processor Vision ACC LOCAL MEMORY ACC M A I N M E M O R Y GPP MTM ACC LOCAL MEMORY Memory transfer module schedules system-wide bulk data movement General-purpose processor orchestrates activity Accelerators can use scheduled, streaming communication… or can operate on locally-buffered data pushed to them in advance Accelerated activities and associated private data are localized for bandwidth, power, efficiency

12 Spring 2007Lecture 16 Intel Network Processor -- Existing Example XScale Core Hash Engine Scratch- pad SRAM RFIFO Micro engine Micro engine Micro engine Micro engine Micro engine Micro engine Micro engine Micro engine Micro engine Micro engine Micro engine Micro engine Micro engine Micro engine Micro engine Micro engine QDR SRAM QDR SRAM QDR SRAM QDR SRAM RDRAM PCI CSRs TFIFO SPI4 / CSIX

13 Spring 2007Lecture 16 STI Cell Processor-- Emerging Example Power Processor Element (PPE) (Simplified 64-bit PowerPC with VMX) SPE4 SPE3 SPE2 SPE1 SPE8 SPE7 SPE6 SPE5 I/O Controller I/O Controller Memory Controller Memory Controller RAM EIB Dual configurable High-speed channels (38.4 GB/sec ea.) Dual 12.8 GB/sec memory busses. Element Interconnect Bus (EIB) internal communication system. Synergistic Processing Element (SPE)

14 Spring 2007Lecture 16 Overview of the Rest of the Semester This is the last formal lecture –If we haven’t covered it already, we can’t really expect you to use it on your projects Final project proposal due Tuesday in class I’ll be in my office (208 CSL) during class on 3/27 to provide an opportunity to discuss project issues Quiz 2 is 3/29 Final project demos are 5/3


Download ppt "Spring 2007Lecture 16 Heterogeneous Systems (Thanks to Wen-Mei Hwu for many of the figures)"

Similar presentations


Ads by Google