Evaluating the Raw microprocessor Michael Bedford Taylor Raw Architecture Group Computer Science and AI Laboratory Massachusetts Institute of Technology
Evaluating the Raw microprocessor Brief Overview of Raw Architecture Avenues of Evaluation Empirical- Comparison with P3 Analytical- Modeling Large scale ILP Experiential- Experimental Systems
The Raw Architecture Divide the silicon into an array of identical, programmable tiles. (A signal can get through a small amount of logic and to the next tile in one cycle.)
Raw Architecture Compute Processor Routers On-chip networks
Raw Architecture Compute Processor Routers On-chip networks
IFRFD ATL M1M2 FP E U TV F4WB r26 r27 r25 r24 Input FIFOs from Static Router r26 r27 r25 r24 Output FIFOs to Static Router Inside the compute processor – networks are integrated directly into the bypass paths
pval5=seed.0*6.0 pval4=pval5+2.0 tmp3.6=pval4/3.0 tmp3=tmp3.6 v3.10=tmp3.6-v2.7 v3=v3.10 v2.4=v2 pval3=seed.o*v2.4 tmp2.5=pval3+2.0 tmp2=tmp2.5 pval6=tmp1.3-tmp2.5 v2.7=pval6*5.0 v2=v2.7 seed.0=seed pval1=seed.0*3.0 pval0=pval1+2.0 tmp0.1=pval0/2.0 tmp0=tmp0.1 v1.2=v1 pval2=seed.0*v1.2 tmp1.3=pval2+2.0 tmp1=tmp1.3 pval7=tmp1.3+tmp2.5 v1.8=pval7*3.0 v1=v1.8 v0.9=tmp0.1-v1.8 v0=v0.9 pval5=seed.0*6.0 pval4=pval5+2.0 tmp3.6=pval4/3.0 tmp3=tmp3.6 v3.10=tmp3.6-v2.7 v3=v3.10 v2.4=v2 pval3=seed.o*v2.4 tmp2.5=pval3+2.0 tmp2=tmp2.5 pval6=tmp1.3-tmp2.5 v2.7=pval6*5.0 v2=v2.7 seed.0=seed pval1=seed.0*3.0 pval0=pval1+2.0 tmp0.1=pval0/2.0 tmp0=tmp0.1 v1.2=v1 pval2=seed.0*v1.2 tmp1.3=pval2+2.0 tmp1=tmp1.3 pval7=tmp1.3+tmp2.5 v1.8=pval7*3.0 v1=v1.8 v0.9=tmp0.1-v1.8 v0=v0.9 Raw’s bypass-integrated on-chip networks serve as a Scalar Operand Network, or SON. Multiple Raw tiles Program graph
Empirical Evaluation Comparison to P3 ParameterRaw (IBM ASIC)P3 (Intel) Litho180 nm ProcessCMOS 7SFP858 Metal LayersCu 6Al 6 FO1 Delay23 ps11 ps Dielectric k Design StyleStandard CellFull custom Initial Freq425 MHz MHz Die Area331 mm mm 2
Analytical Evaluation Scalar Operand Network Research (SONs). (See HPCA 2003 and future.)
Scalar Operand Network The network and the associated algorithms that are responsible for matching operands and operations In space.
SON Performance Metric: 5-tuple conventional distributed multiprocessor Superscalar (not scalable)
Raw: a new point in the region. conventional distributed multiprocessor Raw SON Superscalar SON (not scalable)
Impact of Receive Occupancy, 64 tiles, i.e.,
Experiential Evaluation (i.e., Real Hardware, Real Systems) Systems Online or in Pipeline Workstation Microphone Array Fabric System (Software Radio on Raw) (IP Routing on Raw)
Raw Chip Specifications IBM SA27E Process 180 nm, 6-metal copper ASIC process 16 Tile RAW Processor 18.23mm x 18.23mm 1657 pin CCGA package 1152 HSTL signal pins Clock and Power 420MHz (actual) 10 watts (power save mode) 18 watts typical 35 watts max
.. twenty-eight 32-bit buses connecting Raw Chip to I/O and Memory System Raw Motherboard
2 Microphone Board 2 Microphones 1 A-to-D 1 CPLD 2 Connectors
1020 Element Microphone Array
Fabric System Architecture Design: two distinct board types Board 1: Quad Raw Board Board 2: I/O & Memory Board Replicate and connect
Summary