Storage Assignment during High-level Synthesis for Configurable Architectures Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.

Storage Assignment during High-level Synthesis for Configurable Architectures Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer Engineering University of California, Santa Barbara {gong, wanggang, kastner}@ece.ucsb.edu http://express.ece.ucsb.edu November 7, 2005

11/7/2005GONG et al: Storage Assignment2 What are we dealing with?  FPGA-based reconfigurable architectures  with distributed block RAM modules  Synthesizing high-level programs into designs Block RAM Configurable Logic Blocks

11/7/2005GONG et al: Storage Assignment3 control logic Options of Storage Assignment MUX datapath control logic datapath  Given the same storage/logic resources, different storage assignments exist OR

11/7/2005GONG et al: Storage Assignment4 Objective  Different arrangements achieve different performances.  Objective: achieve the best performance (throughput) under the resource constraints, improve resource utilizations, and meet design goals (time, frequencies, etc.)

11/7/2005GONG et al: Storage Assignment5 Outline  Target architectures  Data partitioning problem  Memory optimizations  Experimental results  Concluding remarks

11/7/2005GONG et al: Storage Assignment7 Target Architecture  FPGA-based fine-grained reconfigurable computing architecture with distributed block RAM modules

11/7/2005GONG et al: Storage Assignment8 Memory Access Latencies  Memory access delay = BRAM access delay + interconnect delays  BRAM access time are fixed with the architecture  Interconnect delays are variables.  One clock cycle to access near data, or two or even more to access data far away from the CLB.  Difficult to precisely estimate execution time.

11/7/2005GONG et al: Storage Assignment9 Outline  Target architectures  Data partitioning problem  Problem formulation  Data partitioning algorithm  Memory optimizations  Experimental results  Concluding remarks

11/7/2005GONG et al: Storage Assignment10 Problem Formulation  Inputs:  An l-level nested loop L  A set of n data arrays N  An architecture with BRAM modules M.  Partitioning problem: partition data arrays N into a set of data portions P, and seek an assignment from P to block RAM modules M.  Objective: optimize latency Block RAM Configurable Logic Blocks

11/7/2005GONG et al: Storage Assignment11 Overview of Data Partitioning Algorithm  Code analysis  Determine possible partitioning directions  Architectural-level synthesis  Resource allocation, scheduling and binding  Discover the design properties  Granularity adjustment  Use experimental cost function to estimate performances

11/7/2005GONG et al: Storage Assignment12 Code Analysis  Iteration space and data spaces  Index functions determine access footprints iteration space data space S

11/7/2005GONG et al: Storage Assignment13 Iteration/Data Space Partitioning  Partitioning on the iteration space derive corresponding partitioning on data spaces  Using the index functions  Communication-free partitioning iteration space data space S

11/7/2005GONG et al: Storage Assignment14 Iteration/Data Space Partitioning  Communication-efficient partitioning  Data access footprints overlapped  The reason of remote memory accesses, when not placed together iteration spacedata space S

11/7/2005GONG et al: Storage Assignment15 Architectural-level Synthesis  Synthesize the innermost iteration body  Pipelining designs  Collect performance results  execution time T,  initial intervals II,  and resource utilization u mul, u BRAM, and u CLB

11/7/2005GONG et al: Storage Assignment16 Estimating the Execution Time  Resource utilizations determine the performance of the pipelined designs  Execution time are linear to the number of initial intervals and the granularity.  When more resources are not occupied, more operations could be performed simultaneously.

11/7/2005GONG et al: Storage Assignment17 Granularity Adjustment  For each possible partitioning direction, check different granularity to obtain the best performance  Coarsest: use as less block RAM modules as possible control logic datapath

11/7/2005GONG et al: Storage Assignment18 Granularity Adjustment  For each possible partitioning direction, check different granularity to obtain the best performance  Finest: distribute data to all block RAM modules control logic datapath

11/7/2005GONG et al: Storage Assignment19 Cost Function  An experiential formulation based our architectural- level synthesis results.  Estimate global memory accesses m r and total memory accesses m t, and their ratio  Factor benefits memory accesses to nearby block RAM modules

11/7/2005GONG et al: Storage Assignment20 Outline  Target architectures  Data partitioning problem  Memory optimizations  Scalar replacement  Data prefetching  Experimental results  Concluding remarks

11/7/2005GONG et al: Storage Assignment21 Scalar Replacement  Scalar replacement increases data reuses and reduces memory access  Memory are accessed in the previous iteration  Use contents already in registers rather than access it again

11/7/2005GONG et al: Storage Assignment22 Data Prefetching and Buffer Insertion  Buffer insertion reduces critical paths, and optimizes clock frequencies.  Schedule the global memory access one cycle earlier  One (two, or more) cycle depend on the size of chip and the # of BRAM  Reduce the length of critical paths

11/7/2005GONG et al: Storage Assignment24 Experimental Setup  Target architecture: Xilinx Virtex II FPGA.  Target frequency: 150 MHz.  Benchmarks: image processing applications and DSP  SOBEL edge detection  Bilinear filtering  2D Gauss blurring  1D Gauss filter  SUSAN principle

11/7/2005GONG et al: Storage Assignment25 Collected Results  Pre-layout and post-layout timing and area results are collected  Original: assign one block RAM to the entire data array  Partitioned: the iteration/data spaces are partitioned under resource constraints.  Optimized: memory optimizations applied on the partitioned designs.

11/7/2005GONG et al: Storage Assignment26 Results: Execution Time  The average speedup: 2.75 times  Under given resources, partitioned to 4 portions.  After further optimizations: 4.80 times faster.

11/7/2005GONG et al: Storage Assignment27 Results: Achievable Clock Frequencies  About 10 percent slower than the original ones. After optimizations, about 7 percent faster than those of partitioned ones.

11/7/2005GONG et al: Storage Assignment29 Concluding Remarks  A data and iteration space partitioning approach for homogeneous block RAM modules  integrated with existing architectural-level synthesis techniques  parallelize input designs  dramatically improve system performance

11/7/2005GONG et al: Storage Assignment30 Thank You  Prof Ryan Kastner and Gang Wang  Reviewers  All audiences

11/7/2005GONG et al: Storage Assignment31 Questions

Storage Assignment during High-level Synthesis for Configurable Architectures Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.

Similar presentations

Presentation on theme: "Storage Assignment during High-level Synthesis for Configurable Architectures Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Storage Assignment during High-level Synthesis for Configurable Architectures Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.

Similar presentations

Presentation on theme: "Storage Assignment during High-level Synthesis for Configurable Architectures Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer."— Presentation transcript:

Similar presentations

About project

Feedback