Spring 2007Lecture 16 Heterogeneous Systems (Thanks to Wen-Mei Hwu for many of the figures)

Slides:



Advertisements
Similar presentations
DSPs Vs General Purpose Microprocessors
Advertisements

Intel Pentium 4 ENCM Jonathan Bienert Tyson Marchuk.
VADA Lab.SungKyunKwan Univ. 1 L3: Lower Power Design Overview (2) 성균관대학교 조 준 동 교수
1 SECURE-PARTIAL RECONFIGURATION OF FPGAs MSc.Fisnik KRAJA Computer Engineering Department, Faculty Of Information Technology, Polytechnic University of.
Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.
 Understanding the Sources of Inefficiency in General-Purpose Chips.
Computer Architecture & Organization
Embedded Systems: Introduction. Course overview: Syllabus: text, references, grading, etc. Schedule: will be updated regularly; lectures, assignments.
Network based System on Chip Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.
Performance Analysis of the IXP1200 Network Processor Rajesh Krishna Balan and Urs Hengartner.
Computer System Overview
Synergistic Processing In Cell’s Multicore Architecture Michael Gschwind, et al. Presented by: Jia Zou CS258 3/5/08.
ECE 526 – Network Processing Systems Design
CS 7810 Lecture 24 The Cell Processor H. Peter Hofstee Proceedings of HPCA-11 February 2005.
Cell Broadband Processor Daniel Bagley Meng Tan. Agenda  General Intro  History of development  Technical overview of architecture  Detailed technical.
Embedded System Spring, 2011 Lecture 3: The PIC Microcontrollers Eng. Wazen M. Shbair.
Presenter MaxAcademy Lecture Series – V1.0, September 2011 Introduction and Motivation.
Hardware Overview Net+ARM – Well Suited for Embedded Ethernet
Programming the Cell Multiprocessor Işıl ÖZ. Outline Cell processor – Objectives – Design and architecture Programming the cell – Programming models CellSs.
Please do not distribute
C.S. Choy95 COMPUTER ORGANIZATION Logic Design Skill to design digital components JAVA Language Skill to program a computer Computer Organization Skill.
Comp-TIA Standards.  AMD- (Advanced Micro Devices) An American multinational semiconductor company that develops computer processors and related technologies.
Cell Architecture. Introduction The Cell concept was originally thought up by Sony Computer Entertainment inc. of Japan, for the PlayStation 3 The architecture.
Introduction to the Cell multiprocessor J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, D. Shippy (IBM Systems and Technology Group)
Introduction Course Overview and Basic understanding of Computer Architecture.
Computer Architecture ECE 4801 Berk Sunar Erkay Savas.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Lecture 18 Lecture 18: Case Study of SoC Design ECE 412: Microcomputer Laboratory.
E0001 Computers in Engineering1 The System Unit & Memory.
DUSD(Labs) Breaking the Memory Wall for Scalable Microprocessor Platforms Wen-mei Hwu with John W. Sias, Erik M. Nystrom, Hong-seok Kim, Chien-wei Li,
2007 Sept 06SYSC 2001* - Fall SYSC2001-Ch1.ppt1 Computer Architecture & Organization  Instruction set, number of bits used for data representation,
1 3-General Purpose Processors: Altera Nios II 2 Altera Nios II processor A 32-bit soft core processor from Altera Comes in three cores: Fast, Standard,
One-Chip TeraArchitecture 19 martie 2009 One-Chip TeraArchitecture Gheorghe Stefan
Architectures for mobile and wireless systems Ese 566 Report 1 Hui Zhang Preethi Karthik.
XStream: Rapid Generation of Custom Processors for ASIC Designs Binu Mathew * ASIC: Application Specific Integrated Circuit.
SYSTEM-ON-CHIP (SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY.
ECEn 191 – New Student Seminar - Session 9: Microprocessors, Digital Design Microprocessors and Digital Design ECEn 191 New Student Seminar.
Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
DSP Lecture Series DSP Memory Architecture Dr. E.W. Hu Nov. 28, 2000.
Buffer-On-Board Memory System 1 Name: Aurangozeb ISCA 2012.
F. Gharsalli, S. Meftali, F. Rousseau, A.A. Jerraya TIMA laboratory 46 avenue Felix Viallet Grenoble Cedex - France Embedded Memory Wrapper Generation.
CPEN Digital System Design
Computer Organization and Design Computer Abstractions and Technology
Kevin Eady Ben Plunkett Prateeksha Satyamoorthy.
1 Unit 2: Computer Systems Session One Part One. 2 Aims: Discussion into what will be covered in this unit. Assessment Understand the basic principles.
Computer Organization & Assembly Language © by DR. M. Amer.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.
XStream: Rapid Generation of Custom Processors for ASIC Designs Binu Mathew * ASIC: Application Specific Integrated Circuit.
Chapter 1 Introduction to the Systems Approach
LYU0703 Parallel Distributed Programming on PS3 1 Huang Hiu Fung Wong Chung Hoi Supervised by Prof. Michael R. Lyu Department of Computer.
© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Lecture 4: Microarchitecture: Overview and General Trends.
DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO Session 2 Computer Organization.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
SSU 1 Dr.A.Srinivas PES Institute of Technology Bangalore, India 9 – 20 July 2012.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
IBM Cell Processor Ryan Carlson, Yannick Lanner-Cusin, & Cyrus Stoller CS87: Parallel and Distributed Computing.
MICROPROCESSOR AMARTYA ROY-72 ANGSHUMAN CHATTERJEE-80 ASHISH LOHIA-70 MOLOY CHAKRABORTY-60.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
● Cell Broadband Engine Architecture Processor ● Ryan Layer ● Ben Kreuter ● Michelle McDaniel ● Carrie Ruppar.
Microcontrollers & GPIO
ECE354 Embedded Systems Introduction C Andras Moritz.
Cell Architecture.
Architecture & Organization 1
Introduction to Microprocessors
Architecture & Organization 1
Dynamically Reconfigurable Architectures: An Overview
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
Computer Evolution and Performance
Presentation transcript:

Spring 2007Lecture 16 Heterogeneous Systems (Thanks to Wen-Mei Hwu for many of the figures)

Spring 2007Lecture 16 What are Heterogeneous Systems? Programmable -- not restricted to one particular application, though may be heavily optimized for a class of applications. Multi-core -- Multiple, independent, execution units on a chip –Some people are starting to use the term “many-core” for architectures where there are enough cores that you have to use a non-sequential programming model to get full performance out of the system. Heterogeneous -- Cores are different –Optimize cores for specific types of applications –Can schedule for performance or power

Spring 2007Lecture 16 Why are they Interesting? Embedded applications have tough performance and power requirements Example: GSM decoder requires 10 Minst/second in software Motorola V70 GSM cell phone has power budget of approximately 0.8 watts total when in use. –Includes both encode and decode –Includes microphone, speaker, radio

Spring 2007Lecture 16 Application-Specific Integrated Circuits CPU Input Data Custom Logic Buffer Custom Logic Output Data Control

Spring 2007Lecture 16 Why Not Keep Using ASICs? Decreasing Product Cycles Design Time/Cost –Transistors/chip rising at 50%/year –Transistors/designer day rising at 10%/year Re-usable cores helping some, but not enough –Mask cost greater than $1M Need to fabricate many chips to justify a design Lack of Flexibility –More and more, consumers want multifunction devices (ex. Cell phone with camera) –Increases design time, cost

Spring 2007Lecture 16 Why Heterogeneous Systems? Different parts of programs have different requirements –Control-intensive portions need good branch predictors, speculation, big caches to achieve good performance –Data-processing portions need lots of ALUs, have simpler control flows Power Consumption –Features like branch prediction, out-of-order execution, tend to have very high power/performance ratios. –Applications often have time-varying performance requirements Observation: Much of the performance, power advantages of ASICs comes from application-specific memory, not application-specific processing

Spring 2007Lecture 16 Changing Memory to Communication CPU Weight_Ai (Az, F_ga3, Ap3) Weight_Ai (Az, F_g4, Ap4) Residu (Ap3, &syn_subfr[i],) Copy (Ap3, h, 11) Set_zero (&h[11], 11) Syn_filt (Ap4, h, h, 22, &h) tmp = h[0] * h[0]; for (i = 1 ; i < 22 ; i++) tmp = tmp + h[i] * h[i]; tmp1 = tmp >> 8; tmp = h[0] * h[1]; for (i = 1 ; i < 21 ; i++) tmp = tmp + h[i] * h[i+1]; tmp2 = tmp >> 8; if (tmp2 <= 0) tmp2 = 0; else tmp2 = tmp2 * MU; tmp2 = tmp2/tmp1; preemphasis (res2, temp2, 40) Syn_filt (Ap4, res2, &syn_p), 40, mem_syn_pst, 1); agc (&syn[i_subfr], &syn) 29491, 40) res2 m_syn F_g3 F_g4 Az_4 synth syn Ap3 Ap4 h tmp tmp1 tmp2 CPU DRAM DRAMDRAM Weight_Ai Copy+ Set_zero Residu Syn_filt Corr0/Corr1 preemph agc Syn_filt PE’s res2 m_syn F_g3 F_g4 Az_4 synth syn Ap3 Ap4 h tmp tmp1 tmp2 PE’s DRAM

Spring 2007Lecture 16 View from source code Note how memory operations dominate Note presence of “expensive” instructions

Spring 2007Lecture 16 Not as Easy as it Looks ** * * + Residu preemphasis **** + Syn_filt res [0:39] [39:0] [0:39] MEM time Order of access to data may make transforming memory ops into communication hard

Spring 2007Lecture 16 Compilers to the Rescue!

Spring 2007Lecture 16 Heterogeneous Processor Vision ACC LOCAL MEMORY ACC M A I N M E M O R Y GPP MTM ACC LOCAL MEMORY Memory transfer module schedules system-wide bulk data movement General-purpose processor orchestrates activity Accelerators can use scheduled, streaming communication… or can operate on locally-buffered data pushed to them in advance Accelerated activities and associated private data are localized for bandwidth, power, efficiency

Spring 2007Lecture 16 Intel Network Processor -- Existing Example XScale Core Hash Engine Scratch- pad SRAM RFIFO Micro engine Micro engine Micro engine Micro engine Micro engine Micro engine Micro engine Micro engine Micro engine Micro engine Micro engine Micro engine Micro engine Micro engine Micro engine Micro engine QDR SRAM QDR SRAM QDR SRAM QDR SRAM RDRAM PCI CSRs TFIFO SPI4 / CSIX

Spring 2007Lecture 16 STI Cell Processor-- Emerging Example Power Processor Element (PPE) (Simplified 64-bit PowerPC with VMX) SPE4 SPE3 SPE2 SPE1 SPE8 SPE7 SPE6 SPE5 I/O Controller I/O Controller Memory Controller Memory Controller RAM EIB Dual configurable High-speed channels (38.4 GB/sec ea.) Dual 12.8 GB/sec memory busses. Element Interconnect Bus (EIB) internal communication system. Synergistic Processing Element (SPE)

Spring 2007Lecture 16 Overview of the Rest of the Semester This is the last formal lecture –If we haven’t covered it already, we can’t really expect you to use it on your projects Final project proposal due Tuesday in class I’ll be in my office (208 CSL) during class on 3/27 to provide an opportunity to discuss project issues Quiz 2 is 3/29 Final project demos are 5/3