© 2000 Morgan Kaufman Overheads for Computers as Components Accelerators zAccelerated systems. zSystem design: yperformance analysis; yscheduling and.

Slides:



Advertisements
Similar presentations
Computer Architecture
Advertisements

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
PradeepKumar S K Asst. Professor Dept. of ECE, KIT, TIPTUR. PradeepKumar S K, Asst.
COMP3221: Microprocessors and Embedded Systems Lecture 17: Computer Buses and Parallel Input/Output (I) Lecturer: Hui.
A reconfigurable system featuring dynamically extensible embedded microprocessor, FPGA, and customizable I/O Borgatti, M. Lertora, F. Foret, B. Cali, L.
1 Architectural Complexity: Opening the Black Box Methods for Exposing Internal Functionality of Complex Single and Multiple Processor Systems EECC-756.
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 1: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
Lecture Objectives: 1)Explain the limitations of flash memory. 2)Define wear leveling. 3)Define the term IO Transaction 4)Define the terms synchronous.
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Multiprocessors zWhy multiprocessors? zCPUs and accelerators. zMultiprocessor performance.
Input-output and Communication Prof. Sin-Min Lee Department of Computer Science.
Chapter 7 Hardware Accelerators 金仲達教授 清華大學資訊工程學系 (Slides are taken from the textbook slides)
Introduction to Systems Architecture Kieran Mathieson.
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Processes and operating systems zInterprocess communication. zOperating system performance.
Winter-Spring 2001Codesign of Embedded Systems1 Introduction to HW/SW Co-Synthesis Algorithms Part of HW/SW Codesign of Embedded Systems Course (CE )
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
Embedded Software for Video Wayne Wolf Princeton University and MediaWorks Technology.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
INPUT/OUTPUT ARCHITECTURE By Truc Truong. Input Devices Keyboard Keyboard Mouse Mouse Scanner Scanner CD-Rom CD-Rom Game Controller Game Controller.
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
Input/Output. Input/Output Problems Wide variety of peripherals —Delivering different amounts of data —At different speeds —In different formats All slower.
Computer Organization Computer Organization & Assembly Language: Module 2.
Operating Systems.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 3: Operating Systems Computer Science: An Overview Eleventh Edition.
NETW 3005 I/O Systems. Reading For this lecture, you should have read Chapter 13 (Sections 1-4, 7). NETW3005 (Operating Systems) Lecture 10 - I/O Systems2.
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
Computer Architecture Lecture10: Input/output devices Piotr Bilski.
Lecture 10 Hardware Accelerators Ingo Sander
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #21 – HW/SW.
High Performance Embedded Computing © 2007 Elsevier Lecture 18: Hardware/Software Codesign Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
Computers Operating System Essentials. Operating Systems PROGRAM HARDWARE OPERATING SYSTEM.
© 2000 Morgan Kaufman Overheads for Computers as Components Networks zNetwork-based design. yCommunication analysis. ySystem performance analysis. zInternet-enabled.
I/O Computer Organization II 1 Interconnecting Components Need interconnections between – CPU, memory, I/O controllers Bus: shared communication channel.
Operating Systems David Goldschmidt, Ph.D. Computer Science The College of Saint Rose CIS 432.
EEE440 Computer Architecture
© 2000 Morgan Kaufman Overheads for Computers as Components Processes and operating systems Operating systems. 1.
Chapter 3 Operating Systems © 2007 Pearson Addison-Wesley. All rights reserved.
Operating Systems © 2007 Pearson Addison-Wesley. All rights reserved.
Chapter 3: Operating Systems
L/O/G/O Input Output Chapter 4 CS.216 Computer Architecture and Organization.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
Computer Organization. This module surveys the physical resources of a computer system.  Basic components  CPU  Memory  Bus  I/O devices  CPU structure.
Data Management for Decision Support Session-4 Prof. Bharat Bhasker.
12/8/20151 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Roy Campbell, Sam King,
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
© 2000 Morgan Kaufman Overheads for Computers as Components I/O devices zI/O devices: yserial links ytimers and counters ykeyboards ydisplays yanalog I/O.
FPGA-Based System Design: Chapter 7 Copyright  2004 Prentice Hall PTR Topics n Hardware/software co-design.
1 Lecture 1: Computer System Structures We go over the aspects of computer architecture relevant to OS design  overview  input and output (I/O) organization.
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Dec 1, 2005 Part 2.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 8 Networks and Multiprocessors.
Chapter 3 Operating Systems. © 2005 Pearson Addison-Wesley. All rights reserved 3-2 Chapter 3 Operating Systems 3.1 The Evolution of Operating Systems.
© 2000 Morgan Kaufman Overheads for Computers as Components1 Design methodologies zA procedure for designing a system. zUnderstanding your methodology.
Lecture 3: Computer Architectures
CS4315A. Berrached:CMS:UHD1 Introduction to Operating Systems Chapter 1.
1 Device Controller I/O units typically consist of A mechanical component: the device itself An electronic component: the device controller or adapter.
Interrupts and Exception Handling. Execution We are quite aware of the Fetch, Execute process of the control unit of the CPU –Fetch and instruction as.
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
Big Picture Lab 4 Operating Systems C Andras Moritz
Computer Architecture. Top level of Computer A top level of computer consists of CPU, memory, an I/O components, with one or more modules of each type.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
 Operating system.  Functions and components of OS.  Types of OS.  Process and a program.  Real time operating system (RTOS).
Lab 9 Multiprocessor, Buses, SPI, I2C. Multiprocessors Why multiprocessors? The structure of multiprocessors. Elements of multiprocessors: – Processing.
Distributed Processors
Computer Architecture
Multithreaded Programming
Operating System Introduction.
Chapter 13: I/O Systems.
Presentation transcript:

© 2000 Morgan Kaufman Overheads for Computers as Components Accelerators zAccelerated systems. zSystem design: yperformance analysis; yscheduling and allocation.

© 2000 Morgan Kaufman Overheads for Computers as Components Accelerated systems zUse additional computational unit dedicated to some functions? yHardwired logic. yExtra CPU. zHardware/software co-design: joint design of hardware and software architectures.

© 2000 Morgan Kaufman Overheads for Computers as Components Accelerated system architecture CPU accelerator memory I/O request data result data

© 2000 Morgan Kaufman Overheads for Computers as Components Accelerator vs. co- processor zA co-processor executes instructions. yInstructions are dispatched by the CPU. zAn accelerator appears as a device on the bus. yThe accelerator is controlled by registers.

© 2000 Morgan Kaufman Overheads for Computers as Components Accelerator implementations zApplication-specific integrated circuit. zField-programmable gate array (FPGA). zStandard component. yExample: graphics processor.

© 2000 Morgan Kaufman Overheads for Computers as Components System design tasks zDesign a heterogeneous multiprocessor architecture. yProcessing element (PE): CPU, accelerator, etc. zProgram the system.

© 2000 Morgan Kaufman Overheads for Computers as Components Why accelerators? zBetter cost/performance. yCustom logic may be able to perform operation faster than a CPU of equivalent cost. yCPU cost is a non-linear function of performance. cost performance

© 2000 Morgan Kaufman Overheads for Computers as Components Why accelerators? cont’d. zBetter real-time performance. yPut time-critical functions on less-loaded processing elements. yRemember RMS utilization---extra CPU cycles must be reserved to meet deadlines. cost performance deadline deadline w. RMS overhead

© 2000 Morgan Kaufman Overheads for Computers as Components Why accelerators? cont’d. zGood for processing I/O in real-time. zMay consume less energy. zMay be better at streaming data. zMay not be able to do all the work on even the largest single CPU.

© 2000 Morgan Kaufman Overheads for Computers as Components Accelerated system design zFirst, determine that the system really needs to be accelerated. yHow much faster is the accelerator on the core function? yHow much data transfer overhead? zDesign the accelerator itself. zDesign CPU interface to accelerator.

© 2000 Morgan Kaufman Overheads for Computers as Components Performance analysis zCritical parameter is speedup: how much faster is the system with the accelerator? zMust take into account: yAccelerator execution time. yData transfer time. ySynchronization with the master CPU.

© 2000 Morgan Kaufman Overheads for Computers as Components Accelerator execution time zTotal accelerator execution time: yt accel = t in + t x + t out Data input Accelerated computation Data output

© 2000 Morgan Kaufman Overheads for Computers as Components Data input/output times zBus transactions include: yflushing register/cache values to main memory; ytime required for CPU to set up transaction; yoverhead of data transfers by bus packets, handshaking, etc.

© 2000 Morgan Kaufman Overheads for Computers as Components Accelerator speedup zAssume loop is executed n times. zCompare accelerated system to non- accelerated system: yS = n(t CPU - t accel ) y = n[t CPU - (t in + t x + t out )] Execution time on CPU

© 2000 Morgan Kaufman Overheads for Computers as Components Single- vs. multi-threaded zOne critical factor is available parallelism: ysingle-threaded/blocking: CPU waits for accelerator; ymultithreaded/non-blocking: CPU continues to execute along with accelerator. zTo multithread, CPU must have useful work to do. yBut software must also support multithreading.

© 2000 Morgan Kaufman Overheads for Computers as Components Total execution time zSingle-threaded:z Multi-threaded: P2 P1 A1 P3 P4 P2 P1 A1 P3 P4

© 2000 Morgan Kaufman Overheads for Computers as Components Execution time analysis zSingle-threaded: yCount execution time of all component processes. z Multi-threaded: yFind longest path through execution.

© 2000 Morgan Kaufman Overheads for Computers as Components Sources of parallelism zOverlap I/O and accelerator computation. yPerform operations in batches, read in second batch of data while computing on first batch. zFind other work to do on the CPU. yMay reschedule operations to move work after accelerator initiation.

© 2000 Morgan Kaufman Overheads for Computers as Components Accelerated systems zSeveral off-the-shelf boards are available for acceleration in PCs: yFPGA-based core; yPC bus interface.

© 2000 Morgan Kaufman Overheads for Computers as Components Accelerator/CPU interface zAccelerator registers provide control registers for CPU. zData registers can be used for small data objects. zAccelerator may include special-purpose read/write logic. yEspecially valuable for large data transfers.

© 2000 Morgan Kaufman Overheads for Computers as Components Caching problems zMain memory provides the primary data transfer mechanism to the accelerator. zPrograms must ensure that caching does not invalidate main memory data. yCPU reads location S. yAccelerator writes location S. yCPU writes location S. BAD

© 2000 Morgan Kaufman Overheads for Computers as Components Synchronization zAs with cache, main memory writes to shared memory may cause invalidation: yCPU reads S. yAccelerator writes S. yCPU reads S.

© 2000 Morgan Kaufman Overheads for Computers as Components Partitioning zDivide functional specification into units. yMap units onto PEs. yUnits may become processes. zDetermine proper level of parallelism: f3(f1(),f2()) f1()f2() f3() vs.

© 2000 Morgan Kaufman Overheads for Computers as Components Partitioning methodology zDivide CDFG into pieces, shuffle functions between pieces. zHierarchically decompose CDFG to identify possible partitions.

© 2000 Morgan Kaufman Overheads for Computers as Components Partitioning example Block 1 Block 2 Block 3 cond 1 cond 2 P1P2P3 P4 P5

© 2000 Morgan Kaufman Overheads for Computers as Components Scheduling and allocation zMust: yschedule operations in time; yallocate computations to processing elements. zScheduling and allocation interact, but separating them helps. yAlternatively allocate, then schedule.

© 2000 Morgan Kaufman Overheads for Computers as Components Example: scheduling and allocation P1P2 P3 d1d2 Task graph Hardware platform M1M2

© 2000 Morgan Kaufman Overheads for Computers as Components Example process execution times

© 2000 Morgan Kaufman Overheads for Computers as Components Example communication model zAssume communication within PE is free. zCost of communication from P1 to P3 is d1 =2; cost of P2->P3 communication is d2 = 4.

© 2000 Morgan Kaufman Overheads for Computers as Components First design zAllocate P1, P2 -> M1; P3 -> M2. time M1 M2 network P1P2 d2 P3 Time = 19

© 2000 Morgan Kaufman Overheads for Computers as Components Second design zAllocate P1 -> M1; P2, P3 -> M2: M1 M2 network P1 P2 d2 P3 Time = 18

© 2000 Morgan Kaufman Overheads for Computers as Components System integration and debugging zTry to debug the CPU/accelerator interface separately from the accelerator core. zBuild scaffolding to test the accelerator. zHardware/software co-simulation can be useful.