BRASS Eylon Caspi, Michael Chu, Randy Huang, Joseph Yeh, John Wawrzynek University of California, Berkeley – BRASS.

Slides:



Advertisements
Similar presentations
Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University.
Advertisements

1/1/ /e/e eindhoven university of technology Microprocessor Design Course 5Z008 Dr.ir. A.C. (Ad) Verschueren Eindhoven University of Technology Section.
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
Hardwired networks on chip for FPGAs and their applications
Model for Supporting High Integrity and Fault Tolerance Brian Dobbing, Aonix Europe Ltd Chief Technical Consultant.
Reconfigurable Computing: What, Why, and Implications for Design Automation André DeHon and John Wawrzynek June 23, 1999 BRASS Project University of California.
PipeRench: A Coprocessor for Streaming Multimedia Acceleration Seth Goldstein, Herman Schmit et al. Carnegie Mellon University.
SCORE - Stream Computations Organized for Reconfigurable Execution Eylon Caspi, Michael Chu, Randy Huang, Joseph Yeh, Yury Markovskiy Andre DeHon, John.
1 Hardware and Software Architecture Chapter 2 n The Intel Processor Architecture n History of PC Memory Usage (Real Mode)
Statically Bounding Memory Usage for SCORE Process Networks Eylon Caspi EE290N 5/15/02 University of California, Berkeley IAIA IBIB OAOA OBOB.
Memory Design Example. Selecting Memory Chip Selecting SRAM Memory Chip.
Fall 2006Lecture 16 Lecture 16: Accelerator Design in the XUP Board ECE 412: Microcomputer Laboratory.
BRASS SCORE: Eylon Caspi, Randy Huang, Yury Markovskiy, Joe Yeh, John Wawrzynek BRASS Research Group University of California, Berkeley Stream Computations.
CS294-6 Reconfigurable Computing Day 22 November 5, 1998 Requirements for Computing Systems (SCORE Introduction)
HSRA: High-Speed, Hierarchical Synchronous Reconfigurable Array William Tsu, Kip Macy, Atul Joshi, Randy Huang, Norman Walker, Tony Tung, Omid Rowhani,
Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.
Chapter 8 Operating System Support
Figure 1.1 Interaction between applications and the operating system.
A Streaming Multi-Threaded Model Eylon Caspi,Randy Huang,Yury Markovskiy, Joe Yeh,André DeHon,John Wawrzynek BRASS Research Group University of California,
HSRA: High-Speed, Hierarchical Synchronous Reconfigurable Array William Tsu, Kip Macy, Atul Joshi, Randy Huang, Norman Walker, Tony Tung, Omid Rowhani,
Chapter 2: Impact of Machine Architectures What is the Relationship Between Programs, Programming Languages, and Computers.
Computer Organization and Architecture
CS294-6 Reconfigurable Computing Day 3 September 1, 1998 Requirements for Computing Devices.
BRASS Analysis of QuasiStatic Scheduling Techniques in a Virtualized Reconfigurable Machine Yury Markovskiy, Eylon Caspi, Randy Huang, Joseph Yeh, Michael.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
Pipelining By Toan Nguyen.
Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Chapter 3 Memory Management: Virtual Memory
February 12, 1998 Aman Sareen DPGA-Coupled Microprocessors Commodity IC’s for the Early 21st Century by Aman Sareen School of Electrical Engineering and.
A Fast On-Chip Profiler Memory Roman Lysecky, Susan Cotterell, Frank Vahid* Department of Computer Science and Engineering University of California, Riverside.
Computer Architecture and Organization Introduction.
Chapter 5 Operating System Support. Outline Operating system - Objective and function - types of OS Scheduling - Long term scheduling - Medium term scheduling.
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
Automated Design of Custom Architecture Tulika Mitra
Amalgam: a Reconfigurable Processor for Future Fabrication Processes Nicholas P. Carter University of Illinois at Urbana-Champaign.
Compilation for Scalable, Paged Virtual Hardware Eylon Caspi Qualifying Exam 3/6/01 University of California, Berkeley IAIA IBIB OAOA OBOB.
CIS250 OPERATING SYSTEMS Memory Management Since we share memory, we need to manage it Memory manager only sees the address A program counter value indicates.
Group No 5 1.Muhammad Talha Islam 2.Karim Akhter 3.Muhammad Arif 4.Muhammad Umer Khalid.
Computers Operating System Essentials. Operating Systems PROGRAM HARDWARE OPERATING SYSTEM.
Operating Systems David Goldschmidt, Ph.D. Computer Science The College of Saint Rose CIS 432.
Hardware Image Signal Processing and Integration into Architectural Simulator for SoC Platform Hao Wang University of Wisconsin, Madison.
CALTECH cs184c Spring DeHon CS184c: Computer Architecture [Parallel and Multithreaded] Day 14: May 24, 2001 SCORE.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.
1 - CPRE 583 (Reconfigurable Computing): Reconfiguration Management Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 5: Wed 10/14/2009.
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
Lecture 2: Computer Architecture: A Science ofTradeoffs.
Gedae, Inc. Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004.
Chapter 13 – I/O Systems (Pgs ). Devices  Two conflicting properties A. Growing uniformity in interfaces (both h/w and s/w): e.g., USB, TWAIN.
MAPLD 2005Ardini1 Demand and Penalty-Based Resource Allocation for Reconfigurable Systems with Runtime Partitioning John Ardini.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.
1 - CPRE 583 (Reconfigurable Computing): Design Patterns Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 19: Fri 10/28/2011 (Design.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
1 of 14 Lab 2: Design-Space Exploration with MPARM.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #22 – Multi-Context.
Computer Architecture Chapter (8): Operating System Support
Runtime Temporal Partitioning Assembly to Reduce FPGA Reconfiguration Time Abelardo Jara-Berrocal, Ann Gordon-Ross HCS Research Laboratory College of Engineering.
Processes and threads.
CS184b: Computer Architecture (Abstractions and Optimizations)
William Stallings Computer Organization and Architecture
Architecture & Organization 1
Architecture & Organization 1
Operating Systems.
Dynamically Scheduled High-level Synthesis
ESE535: Electronic Design Automation
Presentation transcript:

BRASS Eylon Caspi, Michael Chu, Randy Huang, Joseph Yeh, John Wawrzynek University of California, Berkeley – BRASS group André DeHon California Institute of Technology – Dept. Computer Science Stream Computations Organized for Reconfigurable Execution SCORE

BRASS FPL 2000 (8/30/00)2 Goal: Software Survival  Software for microprocessors survives on new devices  Binary compatibility  Automatic improvement  Software for reconfigurable devices does not  Substantial effort to port/redeploy

BRASS FPL 2000 (8/30/00)3 Outline  Problem: Software Survival  A New Compute Model  SCORE Components  Preliminary Results  Future Work

BRASS FPL 2000 (8/30/00)4 Why Can’t Reconfig. Software Survive?  Resource constraints/sizes are exposed:  to programmer  in low-level representation (netlist)  Design revolves around device size  Algorithmic structure  Exploited parallelism

BRASS FPL 2000 (8/30/00)5 The SCORE Approach  A compute model with unbounded resources  Efficient hardware virtualization  Demand paging

BRASS FPL 2000 (8/30/00)6 Page-Compatible Devices  Family of devices with:  Common page definition  Varying number of pages  Binary Compatibility  Automatic Performance Improvement

BRASS FPL 2000 (8/30/00)7 Virtualizing a Netlist (is bad)  Netlist is sensitive to timing  Disallow asynchronous features (e.g. busses)  Synchronous  WASMII [Ling+Amano, FCCM ’93]  Page I/O via registers  Execute each cycle of every page  Huge reconfiguration overhead! Execute Reconfigure time Page Execution

BRASS FPL 2000 (8/30/00)8 Previous Attempts at Virtualization  Multi-context  DPGA[DeHon, FPGA ‘94]  TM-FPGA[Xilinx, FCCM ‘97]  Configuration Cache  Striped  PipeRench[CMU, FPGA ’98]  Pipelined reconfiguration  Restricted to feed-forward pipelines

BRASS FPL 2000 (8/30/00)9 Streams  Goal  Less frequent reconfiguration  Batch process block of inputs  Amortize reconfiguration cost over large data set  Stream is:  Unidirectional page-to-page link  FIFO queue of data tokens  Unbounded depth

BRASS FPL 2000 (8/30/00)10 Stream Implementation  Only one endpoint (page) loaded  Stream = memory buffer  Desire distributed, on-chip memory  Both endpoints (pages) loaded  Stream = wire

BRASS FPL 2000 (8/30/00)11 Execution Example: Spatial DCT Zig-Zag Quantize / ZLE Huffman Enc. DCT Zig-zag Huffman Enc. Quantize / ZLE

BRASS FPL 2000 (8/30/00)12 Execution Example: Time-Multiplexed DCTZig-zag Quant / ZLE Huffman Enc.

BRASS FPL 2000 (8/30/00)13 SCORE Components Graph-based Compute Model Hardware Support Scheduler Run-time Support

BRASS FPL 2000 (8/30/00)14 SCORE Compute Model  Computation = graph of compute nodes  Concretely:compute pages  Abstractly:operators with local state (FSM)  Communication = streaming data flow  Storage =  Streams  Memory segments, accessed through streams

BRASS FPL 2000 (8/30/00)15 SCORE Hardware Model  Paged FPGA  Compute Page (CP) Fixed-size slice of RC hardware Fixed number of I/O ports  Distributed, on-chip memory Configurable Memory Block (CMB) Stream access  High-level interconnect  Microprocessor  Run-time support + user code

BRASS FPL 2000 (8/30/00)16 SCORE Run-Time Support  Mechanics of run-time reconfiguration  Page swap [context save/load]  Reconfigure interconnect  Page Scheduling  Which page to run where, when  Static … Dynamic

BRASS FPL 2000 (8/30/00)17 Functional Simulation  FPGA based on HSRA [Berkeley, FPGA ’99]  CP:512 4-LUTs  CMB:2Mbit DRAM  Area for CP-CMB pair:  Page reconfiguration:5000 cycles (from CMB)  Synchronous operation(same clock speed as processor)  x86 microprocessor  Page Scheduler task  Swap on timer interrupt (every 250,000 cycles)  Fully dynamic scheduling.25  :12.9mm 2 (1/9 of PII-450).18  : 6.7mm 2 (1/16 of PIII-600)

BRASS FPL 2000 (8/30/00)18 Applications  Multimedia processing applications  Hand-partitioned into 512-LUT pages  Good applications  Primarily feed-forward (feedback loops fit in HW)  Bad applications  Large, tight feedback loops (e.g. ADPCM) ApplicationPagesSegments JPEGEncode136 Decode134 MPEGEncode45102 WaveletEncode146 Decode156

BRASS FPL 2000 (8/30/00)19 Application: JPEG Encode

BRASS FPL 2000 (8/30/00)20 Scaling Results: JPEG Encode Physical Compute Pages Total Time (Makespan in millions of cycles)

BRASS FPL 2000 (8/30/00)21 Summary  SCORE enables software survival on reconfigurable systems  Binary compatibility  Automatic performance scaling  Virtual Hardware  Requirements:  Graph-based compute model  Paged FPGA hardware  Run-time support for RTR/Scheduling

BRASS FPL 2000 (8/30/00)22 Future Work  Compilation/CAD  Partitioning FSM operators into pages  Study architectural parameters  Page size  CMB size  Tolerable reconfiguration time  Scheduling  Static scheduling

BRASS FPL 2000 (8/30/00)23 More Info on the Web  SCORE project:   Tutorial:  score_tutorial.html