Presentation is loading. Please wait.

Presentation is loading. Please wait.

Oct 2, 2001 SSS: 1 Stanford Streaming Supercomputer (SSS) Project Meeting Bill Dally, Pat Hanrahan, and Ron Fedkiw Computer Systems Laboratory Stanford.

Similar presentations


Presentation on theme: "Oct 2, 2001 SSS: 1 Stanford Streaming Supercomputer (SSS) Project Meeting Bill Dally, Pat Hanrahan, and Ron Fedkiw Computer Systems Laboratory Stanford."— Presentation transcript:

1 Oct 2, 2001 SSS: 1 Stanford Streaming Supercomputer (SSS) Project Meeting Bill Dally, Pat Hanrahan, and Ron Fedkiw Computer Systems Laboratory Stanford University October 2, 2001

2 Oct 2, 2001 SSS: 2 Agenda Introductions (now) Vision – subset of ASCI review slides Goals for the quarter Schedule of meetings for the quarter

3 Oct 2, 2001 SSS: 3 nVidea GeForce3 ~80 Gflops/sec ~800 Gops/sec Computation is inexpensive and plentiful Velio VC3003 1Tb/s I/O BW DRAM < $0.20/MB

4 Oct 2, 2001 SSS: 4 But supercomputers are very expensive Cost more per GFLOPS, GUPS, and GByte than low end machines Hard to achieve high fraction of peak performance on global problems Based on clusters of CPUs that are scaling at only 20%/year vs. 50% historically

5 Oct 2, 2001 SSS: 5 Microprocessors no longer realize the potential of VLSI 52%/year 74%/year 19%/year 30:1 1,000:1 30,000:1

6 Oct 2, 2001 SSS: 6 Streaming processors leverage emerging technology Streaming supercomputer can achieve –$20/GFLOPs, $2/M-GUPS –Scalable to PFLOPS and 10 13 GUPS Enabled by –Stream architecture Exposes and exploits parallelism and locality High arithmetic intensity (ops/BW) Hides latency –Efficient interconnection networks High global bandwidth Low latency

7 Oct 2, 2001 SSS: 7 What is stream processing? SAD Image 1 convolve Image 0 convolve Depth Map Operations within a kernel operate on local data Streams expose data parallelism Kernels can be partitioned across chips to exploit control parallelism

8 Oct 2, 2001 SSS: 8 Why does it get good performance – easily? 2GB/s32GB/s SDRAM Stream Register File ALU Cluster 544GB/s

9 Oct 2, 2001 SSS: 9 Architecture of a Streaming Supercomputer

10 Oct 2, 2001 SSS: 10 Streaming processor

11 Oct 2, 2001 SSS: 11 A layered software system simplifies stream programming

12 Oct 2, 2001 SSS: 12 Domain-specific language example: Marble shader in RTSL surface shader float4 shiny_marble_imagine (texref noise) { float4 Cd = lightmodel_diffuse({ 0.4, 0.4, 0.4, 1 }, { 0.5, 0.5, 0.5, 1 }); float4 Cs = lightmodel_specular({ 0.35, 0.35, 0.35, 1 }, Zero, 20); fragment float y; fragment float4 pos = Pobj * {10, 10, 10, 1}; y = pos[1] + 3.0 * turbulence4_imagine_scalar(noise, pos); y = sin(y*pi); return ({marble_color(y), 1.0f} * Cd + Cs); } float turbulence4_imagine_scalar (texref noise, float4 pos) { fragment float4 addr1 = pos; fragment float4 addr2 = pos * {2, 2, 2, 1}; fragment float4 addr3 = pos * {4, 4, 4, 1}; fragment float4 addr4 = pos * {8, 8, 8, 1}; fragment float val; val = (0.5) * texture(noise, addr1)[0]; val = val + (0.25) * texture(noise, addr2)[0]; val = val + (0.125) * texture(noise, addr3)[0]; val = val + (0.0625) * texture(noise, addr4)[0]; return val; } float3 marble_color(float x) { float x2; x = sqrt(x+1.0)*.7071; x2 = sqrt(x); return {.30 +.6*x2,.30 +.8*x,.60 +.4*x2 }; }

13 Oct 2, 2001 SSS: 13 Stream-level application description example: SHARP Raytracer Computation expressed as streams of records passing through kernels Similar to computation required for Monte-Carlo radiation transport Ray GenShaderTraverser Intersector CameraGridTriangles & Materials Lights, Normals, RaysHits Rays Pixels VoxID Rays +

14 Oct 2, 2001 SSS: 14 Expected application performance Arithmetic-limited applications –Includes applications where domain decomposition can be applied Like TFLO and LES Expected to achieve a large fraction of peak performance Communication-limited applications –Such as applications requiring matrix solution Ax = b At the very least will benefit from high global bandwidth We hope to find new methods to solve matrix equations using streaming

15 Oct 2, 2001 SSS: 15 Conclusion Computation is cheap yet supercomputing is expensive Streams enable supercomputing to exploit advantages of emerging technology –by exposing locality and concurrency –Order of magnitude cost/performance improvement for both arithmetic-limited and communication-limited codes $20/GFLOPS and $2/M-GUPS –Scalable from desktop (1 TFLOPS) to machine room (1 PFLOPS) A layered software system using domain-specific languages simplifies stream programming –MCRT, ODEs, PDEs Early results on graphics and image processing are encouraging

16 Oct 2, 2001 SSS: 16 Plan for AY2001-2002

17 Oct 2, 2001 SSS: 17 Project Goals for Fall Quarter AY2001-2002 Map two applications to the stream model –Fluid flow (TFLO), and molecular dynamics candidates Define a high-level stream programming language –Generalize stream access without destroying locality Draft strawman SSS architecture and identify key issues

18 Oct 2, 2001 SSS: 18 Meeting Schedule Fall Quarter AY2001-2002 Goal: shared knowledge base and vision across the project 10/9 – TFLO (Juan) 10/16 – RTSL (Bill M.) 10/23 – Molecular Dynamics (Eric) 10/30 – Imagine and its programming system (Ujval) 11/6 – C*, ZPL, etc… + SPL brainstorming (Ian) 11/13 – Metacompilation (Ben C.) 11/20 – Application followup (Ron/Heinz) 11/27 – Strawman architecture (Ben S.) 12/4 – Streams vs. CMP (Blue Gene/Light, etc…) (Bill D.)


Download ppt "Oct 2, 2001 SSS: 1 Stanford Streaming Supercomputer (SSS) Project Meeting Bill Dally, Pat Hanrahan, and Ron Fedkiw Computer Systems Laboratory Stanford."

Similar presentations


Ads by Google