March, 2007http://csg.csail.mit.edu/arvind802.11a-1 Architectural Exploration: Area-Performance tradeoff in 802.11a Transmitter Arvind Computer Science.

Slides:



Advertisements
Similar presentations
Constructive Computer Architecture: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute.
Advertisements

BSV execution model and concurrent rule scheduling Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology February.
Elastic Pipelines and Basics of Multi-rule Systems Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February.
Computer Architecture: A Constructive Approach Six Stage Pipeline/Bypassing Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts Institute.
An EHR based methodology for Concurrency management Arvind (with Asif Khan) Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Constructive Computer Architecture: Multirule systems and Concurrent Execution of Rules Arvind Computer Science & Artificial Intelligence Lab. Massachusetts.
March 2007http://csg.csail.mit.edu/arvindSemantics-1 Scheduling Primitives for Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
Folding and Pipelining complex combinational circuits Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February.
Combinational Circuits in Bluespec
a By Yasir Ateeq. Table of Contents INTRODUCTION TASKS OF TRANSMITTER PACKET FORMAT PREAMBLE SCRAMBLER CONVOLUTIONAL ENCODER PUNCTURER INTERLEAVER.
November 2, 2006http://csg.csail.mit.edu/6.827/L15-1 An hardware inspired model for parallel programming Arvind Computer Science & Artificial Intelligence.
June 5, Architectural Exploration: a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence.
February 21, 2007http://csg.csail.mit.edu/6.375/L07-1 Bluespec-4: Architectural exploration using IP lookup Arvind Computer Science & Artificial Intelligence.
March, 2007http://csg.csail.mit.edu/arvindIPlookup-1 IP Lookup Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
September 24, L08-1 IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab.
IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 4, 2013
December 10, 2009 L29-1 The Semantics of Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute.
Pipelining combinational circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology February 20, 2013http://csg.csail.mit.edu/6.375L05-1.
March 4, 2009L13-1http://csg.csail.mit.edu/6.375 Multiple Clock Domains Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of.
Distributed computing using Projective Geometry: Decoding of Error correcting codes Nachiket Gajare, Hrishikesh Sharma and Prof. Sachin Patkar IIT Bombay.
Folded Combinational Circuits as an example of Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Multiple Clock Domains (MCD) Arvind with Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 15, 2010.
Multiple Clock Domains (MCD) Continued … Arvind with Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology November.
Parallel architecture Technique. Pipelining Processor Pipelining is a technique of decomposing a sequential process into sub-processes, with each sub-process.
Constructive Computer Architecture: Guards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology September 24, 2014.
March, 2007http://csg.csail.mit.edu/arvindIFFT-1 Combinational Circuits: IFFT, Types, Parameterization... Arvind Computer Science & Artificial Intelligence.
September 22, 2009http://csg.csail.mit.edu/koreaL07-1 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab.
Constructive Computer Architecture Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology
Modeling a Multicarrier Wireless Communication Transceiver Embedded Software Systems Literature Survey March 24,2004 By Hunaid Lotia.
September 8, 2009http://csg.csail.mit.edu/koreaL03-1 Combinational Circuits in Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
Folding complex combinational circuits to save area Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February.
Constructive Computer Architecture Sequential Circuits - 2 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
March 1, 2006http://csg.csail.mit.edu/6.375/L09-1 Bluespec-3: Architecture exploration using static elaboration Arvind Computer Science & Artificial Intelligence.
Simple Inelastic and Folded Pipelines Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 14, 2011L04-1.
Computer Architecture: A Constructive Approach Pipelining combinational circuits Teacher: Yoav Etsion Taken (with permission) from Arvind et al.*, Massachusetts.
a/g Kernel Identification Saba Zia Bilal Saqib.
Multiple Clock Domains (MCD) Arvind with Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Combinational Circuits in Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 9, 2011L03-1
Introduction to OFDM and Cyclic prefix
A question of science Circuit Symbols
Folded Combinational Circuits as an example of Sequential Circuits
Folded “Combinational” circuits
Sequential Circuits - 2 Constructive Computer Architecture Arvind
Sequential Circuits Constructive Computer Architecture Arvind
Architectural Exploration:
Sequential Circuits: Constructive Computer Architecture
Combinational Circuits in Bluespec
FFT: An example of complex combinational circuits
Combinational Circuits in Bluespec
Pipelining combinational circuits
Multirule Systems and Concurrent Execution of Rules
Constructive Computer Architecture: Guards
Sequential Circuits Constructive Computer Architecture Arvind
Pipelining combinational circuits
Combinational Circuits and Simple Synchronous Pipelines
Combinational Circuits and Simple Synchronous Pipelines
Modules with Guarded Interfaces
Pipelining combinational circuits
Sequential Circuits - 2 Constructive Computer Architecture Arvind
Multirule systems and Concurrent Execution of Rules
Combinational Circuits in Bluespec
Multiple Clock Domains
FFT: An example of complex combinational circuits
Constructive Computer Architecture: Guards
Simple Synchronous Pipelines
Multirule systems and Concurrent Execution of Rules
Pipelining combinational circuits
Architectural Exploration:
Simple Synchronous Pipelines
11ac 80MHz Transmission Flow
Presentation transcript:

March, 2007http://csg.csail.mit.edu/arvind802.11a-1 Architectural Exploration: Area-Performance tradeoff in a Transmitter Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology

March, a-2http://csg.csail.mit.edu/arvind a Transmitter Overview ControllerScramblerEncoderInterleaverMapper IFFT Cyclic Extend headers data IFFT Transforms 64 (frequency domain) complex numbers into 64 (time domain) complex numbers accounts for 85% area 24 Uncoded bits One OFDM symbol (64 Complex Numbers) Must produce one OFDM symbol every 4 sec Depending upon the transmission rate, consumes 1, 2 or 4 tokens to produce one OFDM symbol

March, a-3http://csg.csail.mit.edu/arvind Preliminary results [MEMOCODE 2006] Dave, Gerding, Pellauer, Arvind Design Lines of Relative Block Code (BSV)Area Controller 49 0% Scrambler 40 0% Conv. Encoder 113 0% Interleaver 76 1% Mapper % IFFT 95 85% Cyc. Extender 23 3% Complex arithmetic libraries constitute another 200 lines of code

March, a-4http://csg.csail.mit.edu/arvind Combinational IFFT in0 … in1 in2 in63 in3 in4 Bfly4 x16 Bfly4 … … out0 … out1 out2 out63 out3 out4 Permute Reuse the same circuit three times to reduce area

March, a-5http://csg.csail.mit.edu/arvind fg Design Alternatives Reuse a block over multiple cycles we expect: Throughput to Area to ff g decrease – less parallelism The clock needs to run faster for the same throughput  hyper-linear increase in energy decrease – reusing a block

March, a-6http://csg.csail.mit.edu/arvind Circular pipeline: Reusing the Pipeline Stage in0 … in1 in2 in63 in3 in4 out0 … out1 out2 out63 out3 out4 … Bfly4 Permute Stage Counter

March, a-7http://csg.csail.mit.edu/arvind Superfolded circular pipeline: Just one Bfly-4 node! in0 … in1 in2 in63 in3 in4 out0 … out1 out2 out63 out3 out4 Bfly4 Permute Index == 15? Index: 0 to 15 64, 2-way Muxes 4, 16-way Muxes 4, 16-way DeMuxes Stage 0 to 2

March, a-8http://csg.csail.mit.edu/arvind Pipelining a block inQoutQ f2f1 f3 Combinational C inQoutQ f2f1 f3 Pipeline P inQoutQ f Folded Pipeline FP Clock? Area?Throughput? Clock: C < P  FP Area: FP < C < PThroughput: FP < C < P

March, a-9http://csg.csail.mit.edu/arvind Synchronous pipeline x sReg1inQ f1f2f3 sReg2outQ rule sync-pipeline (True); inQ.deq(); sReg1 <= f1(inQ.first()); sReg2 <= f2( sReg1 ); outQ.enq(f3(sReg2)); endrule This is real IFFT code; just replace f1, f2 and f3 with stage_f code This rule can fire only if Atomicity: Either all or none of the state elements inQ, outQ, sReg1 and sReg2 will be updated - inQ has an element - outQ has space

March, a-10http://csg.csail.mit.edu/arvind Stage functions f1, f2 and f3 function f1(x); return (stage_f(1,x)); endfunction function f2(x); return (stage_f(2,x)); endfunction function f3(x); return (stage_f(3,x)); endfunction The stage_f function was given earlier

March, a-11http://csg.csail.mit.edu/arvind Problem: What about pipeline bubbles? x sReg1inQ f1f2f3 sReg2outQ rule sync-pipeline (True); inQ.deq(); sReg1 <= f1(inQ.first()); sReg2 <= f2( sReg1 ); outQ.enq(f3(sReg2)); endrule Red and Green tokens must move even if there is nothing in the inQ! Modify the rule to deal with these conditions Also if there is no token in sReg2 then nothing should be enqueued in the outQ Valid bits or the Maybe type

March, a-12http://csg.csail.mit.edu/arvind The Maybe type data in the pipeline typedef union tagged { void Invalid; data_T Valid; } Maybe#(type data_T); data valid/invalid Registers contain Maybe type values rule sync-pipeline (True); if ( inQ.notEmpty()) begin sReg1 <= Valid f1(inQ.first()); inq.deq(); end else sReg1 <= Invalid; case (sReg1) matches tagged Valid.sx1: sReg2 <= Valid f2(sx1); tagged Invalid: sReg2 <= Invalid; case (sReg2) matches tagged Valid.sx2: outQ.enq(f3(sx2)); endrule

March, a-13http://csg.csail.mit.edu/arvind Folded pipeline rule folded-pipeline (True); if (stage==0) begin sxIn= inQ.first(); inQ.deq(); end else sxIn= sReg; sxOut = f(stage,sxIn); if (stage==n-1) outQ.enq(sxOut); else sReg <= sxOut; stage <= (stage==n-1)? 0 : stage+1; endrule x sReg inQ f outQ stage The same code will work for superfolded pipelines by changing n and stage function f notice stage is a dynamic parameter now! no for- loop Need type declarations for sxIn and sxOut

March, a-14http://csg.csail.mit.edu/arvind a Transmitter Synthesis results (Only the IFFT block is changing) IFFT DesignArea (mm 2 ) Throughput Latency (CLKs/sym) Min. Freq Required Pipelined MHz Combinational MHz Folded (16 Bfly-4s) MHz Super-Folded (8 Bfly-4s) MHz SF(4 Bfly-4s) MHz SF(2 Bfly-4s) MHz SF (1 Bfly4) MHZ TSMC.18 micron; numbers reported are before place and route. The same source code All these designs were done in less than 24 hours!

March, a-15http://csg.csail.mit.edu/arvind Why are the areas so similar Folding should have given a 3x improvement in IFFT area BUT a constant twiddle allows low- level optimization on a Bfly-4 block a 2.5x area reduction!

March, a-16http://csg.csail.mit.edu/arvind Parameterize the synchronous pipeline Vector#(n, Reg#(t)) sReg <- replicateM(mkReg(Invalid)); rule sync-pipeline (True); if ( inQ.notEmpty()) begin (sReg[1]) <= Valid f(0,inQ.first()); inq.deq(); end else ( sReg[1]) <= Invalid; for (Integer stage = 1; stage < n-1; stage = stage+1) case (sReg[stage]) matches tagged Valid.sx: ( sReg[stage+1]) <= Valid f(stage,sx); tagged Invalid : ( sReg[stage+1]) <= Invalid; endcase case (sReg[n-1]) matches tagged Valid.sx: outQ.enq(f(n-1,sx)); endcase endrule x sReg[1]inQ f1f1 fnfn sReg[n-1]outQ n and stage are static parameters

March, a-17http://csg.csail.mit.edu/arvind Syntax: Vector of Registers Register suppose x and y are both of type Reg. Then x <= y means x._write(y._read()) Vector of (say) Int x[i] means sel(x,i) x[i] = y[j] means x = update(x,i, sel(y,j)) Vector of Registers x[i] <= y[j] does not work. The parser thinks it means (sel(x,i)._read)._write(sel(y,j)._read), which will not type check (x[i]) <= y[j] does work!