Download presentation
Presentation is loading. Please wait.
Published byYazmin Orange Modified over 9 years ago
1
BRASS http://brass.cs.berkeley.edu/SCORE/ Eylon Caspi, Michael Chu, Randy Huang, Joseph Yeh, John Wawrzynek University of California, Berkeley – BRASS group André DeHon California Institute of Technology – Dept. Computer Science Stream Computations Organized for Reconfigurable Execution SCORE
2
BRASS FPL 2000 (8/30/00)2 Goal: Software Survival Software for microprocessors survives on new devices Binary compatibility Automatic improvement Software for reconfigurable devices does not Substantial effort to port/redeploy
3
BRASS FPL 2000 (8/30/00)3 Outline Problem: Software Survival A New Compute Model SCORE Components Preliminary Results Future Work
4
BRASS FPL 2000 (8/30/00)4 Why Can’t Reconfig. Software Survive? Resource constraints/sizes are exposed: to programmer in low-level representation (netlist) Design revolves around device size Algorithmic structure Exploited parallelism
5
BRASS FPL 2000 (8/30/00)5 The SCORE Approach A compute model with unbounded resources Efficient hardware virtualization Demand paging
6
BRASS FPL 2000 (8/30/00)6 Page-Compatible Devices Family of devices with: Common page definition Varying number of pages Binary Compatibility Automatic Performance Improvement
7
BRASS FPL 2000 (8/30/00)7 Virtualizing a Netlist (is bad) Netlist is sensitive to timing Disallow asynchronous features (e.g. busses) Synchronous WASMII [Ling+Amano, FCCM ’93] Page I/O via registers Execute each cycle of every page Huge reconfiguration overhead! Execute Reconfigure time Page Execution
8
BRASS FPL 2000 (8/30/00)8 Previous Attempts at Virtualization Multi-context DPGA[DeHon, FPGA ‘94] TM-FPGA[Xilinx, FCCM ‘97] Configuration Cache Striped PipeRench[CMU, FPGA ’98] Pipelined reconfiguration Restricted to feed-forward pipelines
9
BRASS FPL 2000 (8/30/00)9 Streams Goal Less frequent reconfiguration Batch process block of inputs Amortize reconfiguration cost over large data set Stream is: Unidirectional page-to-page link FIFO queue of data tokens Unbounded depth
10
BRASS FPL 2000 (8/30/00)10 Stream Implementation Only one endpoint (page) loaded Stream = memory buffer Desire distributed, on-chip memory Both endpoints (pages) loaded Stream = wire
11
BRASS FPL 2000 (8/30/00)11 Execution Example: Spatial DCT Zig-Zag Quantize / ZLE Huffman Enc. DCT Zig-zag Huffman Enc. Quantize / ZLE
12
BRASS FPL 2000 (8/30/00)12 Execution Example: Time-Multiplexed DCTZig-zag Quant / ZLE Huffman Enc.
13
BRASS FPL 2000 (8/30/00)13 SCORE Components Graph-based Compute Model Hardware Support Scheduler Run-time Support
14
BRASS FPL 2000 (8/30/00)14 SCORE Compute Model Computation = graph of compute nodes Concretely:compute pages Abstractly:operators with local state (FSM) Communication = streaming data flow Storage = Streams Memory segments, accessed through streams
15
BRASS FPL 2000 (8/30/00)15 SCORE Hardware Model Paged FPGA Compute Page (CP) Fixed-size slice of RC hardware Fixed number of I/O ports Distributed, on-chip memory Configurable Memory Block (CMB) Stream access High-level interconnect Microprocessor Run-time support + user code
16
BRASS FPL 2000 (8/30/00)16 SCORE Run-Time Support Mechanics of run-time reconfiguration Page swap [context save/load] Reconfigure interconnect Page Scheduling Which page to run where, when Static … Dynamic
17
BRASS FPL 2000 (8/30/00)17 Functional Simulation FPGA based on HSRA [Berkeley, FPGA ’99] CP:512 4-LUTs CMB:2Mbit DRAM Area for CP-CMB pair: Page reconfiguration:5000 cycles (from CMB) Synchronous operation(same clock speed as processor) x86 microprocessor Page Scheduler task Swap on timer interrupt (every 250,000 cycles) Fully dynamic scheduling.25 :12.9mm 2 (1/9 of PII-450).18 : 6.7mm 2 (1/16 of PIII-600)
18
BRASS FPL 2000 (8/30/00)18 Applications Multimedia processing applications Hand-partitioned into 512-LUT pages Good applications Primarily feed-forward (feedback loops fit in HW) Bad applications Large, tight feedback loops (e.g. ADPCM) ApplicationPagesSegments JPEGEncode136 Decode134 MPEGEncode45102 WaveletEncode146 Decode156
19
BRASS FPL 2000 (8/30/00)19 Application: JPEG Encode
20
BRASS FPL 2000 (8/30/00)20 Scaling Results: JPEG Encode Physical Compute Pages Total Time (Makespan in millions of cycles)
21
BRASS FPL 2000 (8/30/00)21 Summary SCORE enables software survival on reconfigurable systems Binary compatibility Automatic performance scaling Virtual Hardware Requirements: Graph-based compute model Paged FPGA hardware Run-time support for RTR/Scheduling
22
BRASS FPL 2000 (8/30/00)22 Future Work Compilation/CAD Partitioning FSM operators into pages Study architectural parameters Page size CMB size Tolerable reconfiguration time Scheduling Static scheduling
23
BRASS FPL 2000 (8/30/00)23 More Info on the Web SCORE project: http://brass.cs.berkeley.edu/SCORE/ Tutorial: http://brass.cs.berkeley.edu/documents/ score_tutorial.html
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.