Oct 2, 2001 SSS: 1 Stanford Streaming Supercomputer (SSS) Project Meeting Bill Dally, Pat Hanrahan, and Ron Fedkiw Computer Systems Laboratory Stanford.

Oct 2, 2001 SSS: 1 Stanford Streaming Supercomputer (SSS) Project Meeting Bill Dally, Pat Hanrahan, and Ron Fedkiw Computer Systems Laboratory Stanford University October 2, 2001

Oct 2, 2001 SSS: 2 Agenda Introductions (now) Vision – subset of ASCI review slides Goals for the quarter Schedule of meetings for the quarter

Oct 2, 2001 SSS: 3 nVidea GeForce3 ~80 Gflops/sec ~800 Gops/sec Computation is inexpensive and plentiful Velio VC3003 1Tb/s I/O BW DRAM < $0.20/MB

Oct 2, 2001 SSS: 4 But supercomputers are very expensive Cost more per GFLOPS, GUPS, and GByte than low end machines Hard to achieve high fraction of peak performance on global problems Based on clusters of CPUs that are scaling at only 20%/year vs. 50% historically

Oct 2, 2001 SSS: 5 Microprocessors no longer realize the potential of VLSI 52%/year 74%/year 19%/year 30:1 1,000:1 30,000:1

Oct 2, 2001 SSS: 6 Streaming processors leverage emerging technology Streaming supercomputer can achieve –$20/GFLOPs, $2/M-GUPS –Scalable to PFLOPS and 10 13 GUPS Enabled by –Stream architecture Exposes and exploits parallelism and locality High arithmetic intensity (ops/BW) Hides latency –Efficient interconnection networks High global bandwidth Low latency

Oct 2, 2001 SSS: 7 What is stream processing? SAD Image 1 convolve Image 0 convolve Depth Map Operations within a kernel operate on local data Streams expose data parallelism Kernels can be partitioned across chips to exploit control parallelism

Oct 2, 2001 SSS: 8 Why does it get good performance – easily? 2GB/s32GB/s SDRAM Stream Register File ALU Cluster 544GB/s

Oct 2, 2001 SSS: 9 Architecture of a Streaming Supercomputer

Oct 2, 2001 SSS: 10 Streaming processor

Oct 2, 2001 SSS: 11 A layered software system simplifies stream programming

Oct 2, 2001 SSS: 12 Domain-specific language example: Marble shader in RTSL surface shader float4 shiny_marble_imagine (texref noise) { float4 Cd = lightmodel_diffuse({ 0.4, 0.4, 0.4, 1 }, { 0.5, 0.5, 0.5, 1 }); float4 Cs = lightmodel_specular({ 0.35, 0.35, 0.35, 1 }, Zero, 20); fragment float y; fragment float4 pos = Pobj * {10, 10, 10, 1}; y = pos[1] + 3.0 * turbulence4_imagine_scalar(noise, pos); y = sin(y*pi); return ({marble_color(y), 1.0f} * Cd + Cs); } float turbulence4_imagine_scalar (texref noise, float4 pos) { fragment float4 addr1 = pos; fragment float4 addr2 = pos * {2, 2, 2, 1}; fragment float4 addr3 = pos * {4, 4, 4, 1}; fragment float4 addr4 = pos * {8, 8, 8, 1}; fragment float val; val = (0.5) * texture(noise, addr1)[0]; val = val + (0.25) * texture(noise, addr2)[0]; val = val + (0.125) * texture(noise, addr3)[0]; val = val + (0.0625) * texture(noise, addr4)[0]; return val; } float3 marble_color(float x) { float x2; x = sqrt(x+1.0)*.7071; x2 = sqrt(x); return {.30 +.6*x2,.30 +.8*x,.60 +.4*x2 }; }

Oct 2, 2001 SSS: 13 Stream-level application description example: SHARP Raytracer Computation expressed as streams of records passing through kernels Similar to computation required for Monte-Carlo radiation transport Ray GenShaderTraverser Intersector CameraGridTriangles & Materials Lights, Normals, RaysHits Rays Pixels VoxID Rays +

Oct 2, 2001 SSS: 14 Expected application performance Arithmetic-limited applications –Includes applications where domain decomposition can be applied Like TFLO and LES Expected to achieve a large fraction of peak performance Communication-limited applications –Such as applications requiring matrix solution Ax = b At the very least will benefit from high global bandwidth We hope to find new methods to solve matrix equations using streaming

Oct 2, 2001 SSS: 15 Conclusion Computation is cheap yet supercomputing is expensive Streams enable supercomputing to exploit advantages of emerging technology –by exposing locality and concurrency –Order of magnitude cost/performance improvement for both arithmetic-limited and communication-limited codes $20/GFLOPS and $2/M-GUPS –Scalable from desktop (1 TFLOPS) to machine room (1 PFLOPS) A layered software system using domain-specific languages simplifies stream programming –MCRT, ODEs, PDEs Early results on graphics and image processing are encouraging

Oct 2, 2001 SSS: 16 Plan for AY2001-2002

Oct 2, 2001 SSS: 17 Project Goals for Fall Quarter AY2001-2002 Map two applications to the stream model –Fluid flow (TFLO), and molecular dynamics candidates Define a high-level stream programming language –Generalize stream access without destroying locality Draft strawman SSS architecture and identify key issues

Oct 2, 2001 SSS: 18 Meeting Schedule Fall Quarter AY2001-2002 Goal: shared knowledge base and vision across the project 10/9 – TFLO (Juan) 10/16 – RTSL (Bill M.) 10/23 – Molecular Dynamics (Eric) 10/30 – Imagine and its programming system (Ujval) 11/6 – C*, ZPL, etc… + SPL brainstorming (Ian) 11/13 – Metacompilation (Ben C.) 11/20 – Application followup (Ron/Heinz) 11/27 – Strawman architecture (Ben S.) 12/4 – Streams vs. CMP (Blue Gene/Light, etc…) (Bill D.)

Oct 2, 2001 SSS: 1 Stanford Streaming Supercomputer (SSS) Project Meeting Bill Dally, Pat Hanrahan, and Ron Fedkiw Computer Systems Laboratory Stanford.

Similar presentations

Presentation on theme: "Oct 2, 2001 SSS: 1 Stanford Streaming Supercomputer (SSS) Project Meeting Bill Dally, Pat Hanrahan, and Ron Fedkiw Computer Systems Laboratory Stanford."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Oct 2, 2001 SSS: 1 Stanford Streaming Supercomputer (SSS) Project Meeting Bill Dally, Pat Hanrahan, and Ron Fedkiw Computer Systems Laboratory Stanford.

Similar presentations

Presentation on theme: "Oct 2, 2001 SSS: 1 Stanford Streaming Supercomputer (SSS) Project Meeting Bill Dally, Pat Hanrahan, and Ron Fedkiw Computer Systems Laboratory Stanford."— Presentation transcript:

Similar presentations

About project

Feedback