June 5, 20061 Architectural Exploration: 802.11a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence.

Slides:



Advertisements
Similar presentations
March, 2007http://csg.csail.mit.edu/arvind802.11a-1 Architectural Exploration: Area-Performance tradeoff in a Transmitter Arvind Computer Science.
Advertisements

Constructive Computer Architecture: Multirule systems and Concurrent Execution of Rules Arvind Computer Science & Artificial Intelligence Lab. Massachusetts.
March 2007http://csg.csail.mit.edu/arvindSemantics-1 Scheduling Primitives for Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
Folding and Pipelining complex combinational circuits Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February.
ECE 734: Project Presentation Pankhuri May 8, 2013 Pankhuri May 8, point FFT Algorithm for OFDM Applications using 8-point DFT processor (radix-8)
November 2, 2006http://csg.csail.mit.edu/6.827/L15-1 An hardware inspired model for parallel programming Arvind Computer Science & Artificial Intelligence.
ECE 551 Digital System Design & Synthesis Lecture 11 Verilog Design for Synthesis.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
February 21, 2007http://csg.csail.mit.edu/6.375/L07-1 Bluespec-4: Architectural exploration using IP lookup Arvind Computer Science & Artificial Intelligence.
March, 2007http://csg.csail.mit.edu/arvindIPlookup-1 IP Lookup Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
September 24, L08-1 IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab.
IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L06-1.
Pipelining combinational circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology February 20, 2013http://csg.csail.mit.edu/6.375L05-1.
March 4, 2009L13-1http://csg.csail.mit.edu/6.375 Multiple Clock Domains Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
February 14, 2007L04-1http://csg.csail.mit.edu/6.375/ Bluespec-1: Design methods to facilitate rapid growth of SoCs Arvind Computer Science & Artificial.
September 3, 2009L02-1http://csg.csail.mit.edu/korea Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial.
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Folded Combinational Circuits as an example of Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Multiple Clock Domains (MCD) Arvind with Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 15, 2010.
March, 2007http://csg.csail.mit.edu/arvindIFFT-1 Combinational Circuits: IFFT, Types, Parameterization... Arvind Computer Science & Artificial Intelligence.
September 22, 2009http://csg.csail.mit.edu/koreaL07-1 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab.
Constructive Computer Architecture Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology
Introduction to Field Programmable Gate Arrays Lecture 1/3 CERN Accelerator School on Digital Signal Processing Sigtuna, Sweden, 31 May – 9 June 2007 Javier.
September 8, 2009http://csg.csail.mit.edu/koreaL03-1 Combinational Circuits in Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
Folding complex combinational circuits to save area Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February.
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Constructive Computer Architecture Sequential Circuits - 2 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
March 1, 2006http://csg.csail.mit.edu/6.375/L09-1 Bluespec-3: Architecture exploration using static elaboration Arvind Computer Science & Artificial Intelligence.
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
An FFT for Wireless Protocols Dr. J. Greg Nash Centar ( HAWAI'I INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES Mobile.
Simple Inelastic and Folded Pipelines Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 14, 2011L04-1.
Computer Architecture: A Constructive Approach Pipelining combinational circuits Teacher: Yoav Etsion Taken (with permission) from Arvind et al.*, Massachusetts.
Multiple Clock Domains (MCD) Arvind with Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Combinational Circuits in Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 9, 2011L03-1
Problem: design complexity advances in a pace that far exceeds the pace in which verification technology advances. More accurately: (verification complexity)
Overview Logistics Last lecture Today HW5 due today
Bluespec-3: A non-pipelined processor Arvind
Folded Combinational Circuits as an example of Sequential Circuits
Folded “Combinational” circuits
Sequential Circuits - 2 Constructive Computer Architecture Arvind
Architectural Exploration:
Sequential Circuits: Constructive Computer Architecture
Combinational Circuits in Bluespec
Combinational Circuits in Bluespec
FFT: An example of complex combinational circuits
Combinational Circuits in Bluespec
Pipelining combinational circuits
Constructive Computer Architecture: Guards
Pipelining combinational circuits
332:437 Lecture 7 Verilog Hardware Description Language Basics
Bluespec-4: Architectural exploration using IP lookup Arvind
Combinational Circuits and Simple Synchronous Pipelines
Combinational Circuits and Simple Synchronous Pipelines
Modules with Guarded Interfaces
Pipelining combinational circuits
Sequential Circuits - 2 Constructive Computer Architecture Arvind
Bluespec-3: A non-pipelined processor Arvind
332:437 Lecture 7 Verilog Hardware Description Language Basics
Combinational Circuits in Bluespec
Multiple Clock Domains
FFT: An example of complex combinational circuits
Constructive Computer Architecture: Guards
332:437 Lecture 7 Verilog Hardware Description Language Basics
Simple Synchronous Pipelines
Multirule systems and Concurrent Execution of Rules
Introduction to Bluespec: A new methodology for designing Hardware
Pipelining combinational circuits
Architectural Exploration:
Simple Synchronous Pipelines
Presentation transcript:

June 5, Architectural Exploration: a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence Laboratory Massachusetts Institute of Technology MIT-Nokia Architecture Group Helsinki, June 5, 2006

2 Why architectural exploration Architects are clever people and can think of a variety of designs But often cannot determine which design is best for a given metric (e.g., power) Too short of time and manpower to go far enough with several designs for proper evaluation  Guess work instead of architectural exploration New design tools can change all that

3 This talk Architectural exploration of a transmitter The goal is to show that it is easy and economical to do so in Bluespec You don’t have to know a or Bluespec to understand the talk

a Transmitter Overview ControllerScramblerEncoderInterleaverMapper IFFT Cyclic Extend headers data IFFT Transforms 64 (frequency domain) complex numbers into 64 (time domain) complex numbers accounts for > 95% area 24 Uncoded bits One OFDM symbol (64 Complex Numbers) Must produce one OFDM symbol every 4 sec Depending upon the transmission rate, consumes 1, 2 or 4 tokens to produce one OFDM symbol

5 Combinational IFFT in0 … in1 in2 in63 in3 in4 Radix 4 x16 Radix 4 … … out0 … out1 out2 out63 out3 out4 Permute_1Permute_2Permute_3 All numbers are complex and represented as two sixteen bit quantities. Fixed-point arithmetic is used to reduce area, power,... * * * * *j t2t2 t0t0 t3t3 t1t1

6 Design Tradeoffs 1.We can decrease the area by multiplexing some circuits It may be a win if the throughput requirements can be met without increasing the frequency 2.Power can be lowered by lowering the frequency, which can be adjusted by changing the voltage power  (voltage) 2

7 Combinational IFFT Opportunity for reuse in0 … in1 in2 in63 in3 in4 Radix 4 x16 Radix 4 … … out0 … out1 out2 out63 out3 out4 Permute_1Permute_2Permute_3 Reuse the same circuit three times

8 Circular pipeline: Reusing the Pipeline Stage in0 … in1 in2 in63 in3 in4 out0 … out1 out2 out63 out3 out4 … Radix 4 Permute_1Permute_2Permute_3 Stage Counter 16 Radix 4s can be shared but not the three permutations. Hence the need for muxes 64, 4-way Muxes

9 Superfolded circular pipeline: Just one Radix-4 node! in0 … in1 in2 in63 in3 in4 out0 … out1 out2 out63 out3 out4 Radix 4 Permute_1 Permute_2 Permute_3 Stage Counter 0 to 2 Index Counter 0 to 15 64, 4-way Muxes 4, 16-way Muxes 4, 16-way DeMuxes Designs with 2, 4, and 8 Radix-4 modules make sense too!

10 Which design consumes the least energy to transmit a symbol? Can we quickly code up all the alternatives? single source with parameters? Not practical in traditional hardware description languages like Verilog/VHDL

June 5, Expressing the designs in Bluespec

12 Bluespec code: Radix-4 Node function Vector#(4,Complex) radix4(Vector#(4,Complex) t, Vector#(4,Complex) k); Vector#(4,Complex) m = newVector(), y = newVector(), z = newVector(); m[0] = k[0] * t[0]; m[1] = k[1] * t[1]; m[2] = k[2] * t[2]; m[3] = k[3] * t[3]; y[0] = m[0] + m[2]; y[1] = m[0] – m[2]; y[2] = m[1] + m[3]; y[3] = i*(m[1] – m[3]); z[0] = y[0] + y[2]; z[1] = y[1] + y[3]; z[2] = y[0] – y[2]; z[3] = y[1] – y[3]; return(z); endfunction Polymorphic code: works on any type of numbers for which *, + and - have been defined * * * * *j

13 Combinational IFFT Can be used as a reference in0 … in1 in2 in63 in3 in4 Radix 4 x16 Radix 4 … … out0 … out1 out2 out63 out3 out4 Permute_1Permute_2Permute_3 stage_f function repeat it three times

14 Bluespec Code for Combinational IFFT function SVector#(64, Complex) stage_f(Bit#(2) stage, SVector#(64, Complex) stage_in); begin for (Integer i = 0; i < 16; i = i + 1) begin Integer idx = i * 4; let twid = getTwiddle(stage, fromInteger(i)); let y = radix4(twid, stage_in[idx:idx+3]); stage_temp[idx] = y[0]; stage_temp[idx + 1] = y[1]; stage_temp[idx + 2] = y[2]; stage_temp[idx + 3] = y[3]; end //Permutation for (Integer i = 0; i < 64; i = i + 1) stage_out[i] = stage_temp[permute[i]]; end return(stage_out); function SVector#(64, Complex) ifft (SVector#(64, Complex) in_data); //Declare vectors SVector#(4,SVector#(64, Complex)) stage_data = replicate(newSVector); stage_data[0] = in_data; for (Integer stage = 0; stage < 3; stage = stage + 1) stage_data[i+1] = stage_f(stage, stage_data[i]); return(stage_data[3]); Stage function The code is unfolded to generate a combinational circuit

15 Synchronous pipeline rule sync-pipeline (True); inQ.deq(); sReg1 <= f1(inQ.first()); sReg2 <= f2( sReg1 ); outQ.enq(f3(sReg2)); endrule x sReg1inQ f1f2f3 sReg2outQ This is real IFFT code; just replace f1, f2 and f3 with stage_f code

16 Folded pipeline x sReg inQ rule folded-pipeline (True); if (stage==1) begin inQ.deq(); sxIn= inQ.first(); end else sxIn= sReg; sxOut = f(stage,sxIn); if (stage==3) outQ.enq(sxOut); else sReg <= sxOut; stage <= (stage==3)? 1 : stage+1; endrule f outQ stage f1 f2 f3 function f (stage,sx); case (stage) 1: return f1(sx); 2: return f2(sx); 3: return f3(sx); endcase endfunction This is real IFFT code too...

17 Expressing these designs in Bluespec is easy All these designs were done in less than one day! Area and power estimates? Combinational Pipelined Folded (16 Radices) Super-Folded (8 Radices) Super-Folded (4 Radices) Super-Folded (2 Radices) Super-Folded (1 Radix) How long will it take to write these designs in Verilog? VHDL? SystemC?

18 Bluespec Tool flow Bluespec SystemVerilog source Verilog 95 RTL Verilog sim VCD output Debussy Visualization Bluespec Compiler RTL synthesis gates C Bluespec C sim Cycle Accurate FPGA Power estimatio n tool Sequence Design PowerTheater

a Transmitter Synthesis results for various IFFT designs IFFT DesignArea (mm 2 ) Min. CLK Period(ns) Latency (clks/Sym) ns/output (req 4000) Combinational Pipelined Folded (16 Radices) Super-Folded (8 Radices) SF (4 Radices) SF (2 Radices) SF (1 Radix) TSMC.18 micron; numbers reported are before place and route. Some areas will be larger after layout.

20 Algorithmic Improvements in0 … in1 in2 in63 in3 in4 Radix 4 x16 Radix 4 … … out0 … out1 out2 out63 out3 out4 Permute_1Permute_2Permute_3 1. All the three permutations can be made identical  more saving in area 2. One multiplication can be removed from Radix-4

a Transmitter Synthesis results: old vs. new IFFT designs IFFT DesignOld Area (mm 2 ) New Area (mm 2 ) Combinational Pipelined Folded (16 Radices) Super-Folded (8 Radices) SF(4 Radices) SF(2 Radices) SF (1 Radix) TSMC.18 micron; numbers reported are before place and route. ??? expected

a Transmitter Synthesis results with new IFFT designs IFFT DesignArea (mm 2 ) Min. CLK Period (ns) Latency (clks/Sym bol) Min. ns/ output Permitted Clock scaling Combinational Pipelined Folded (16 Radices) Super-Folded (8 Radices) SF(4 Radices) SF(2 Radices) SF (1 Radix) TSMC.18 micron; numbers reported are before place and route.

a Transmitter with new IFFT designs: Power Estimates IFFT Design c1 Area (mm 2 ) c2 Min Freq. c3 100MHz c4 Min Freq. c5 Energy/Symb (nJ) c6 Combinational5.911 MHz Pipeline (48 R-4)6.261 MHz Folded (16 R-4)4.611 MHz SF (8 R-4) MHz SF (4 R-4)2.753MHz SF (2 R-4)2.216MHz SF (1 R-4)1.6712MHz c3 = min clock x scaling factor; c4 is raw data collected by the Sequence Design PowerTheater c5 = c4xc3/100MHz/voltage scaling(=10); c6 = c5x4 sec Work in progress

24 Summary It is essential to do architectural exploration for better (area, power, performance,...) designs. It is possible to do so with new design tools and methodologies. Better and faster tools for estimating area, timing and power would dramatically increase our capability to do architectural exploration. Thanks