Download presentation
Presentation is loading. Please wait.
1
December 10, 2002 SS-FQ02-W: 1 Stanford Streaming Supercomputer (SSS) Fall Quarter 2002 Wrapup Meeting Bill Dally, Computer Systems Laboratory Stanford University December 10, 2002
2
SS-FQ02-W: 2 Overview Where we are today –First year goal was met: demonstrated feasibility on single node –Feedback from site visit team was very positive –Potential for a big impact on scientific computing –But still much to do! Key FY03 goals –Get long-term software infrastructure in place Select approach, implement baseline Brook to SSS compiler –Multi-node versions that scale Language, compiler, simulator –Tackle hard problems: 3-D, Irregular neighborhoods/sparse matrix solve Language support, numerics support, evaluate on simulator –Refine architecture Cluster organization, aspect ratio, register organization, memory organization –Industrial Partner Start serious discussions, outreach to build support, close partner in 04
3
December 10, 2002 SS-FQ02-W: 3 But first, lets review our overall goal Exploit capabilities of VLSI to realize cost- effective scientific computing.
4
December 10, 2002 SS-FQ02-W: 4 The big picture VLSI technology enables us to put TeraOPS on a chip –Conventional general-purpose architecture cannot exploit this –The problem is bandwidth Streams expose locality and concurrency –Perform operations in record (not operation as with vector) order –Enables compiler optimization at a larger scale than scalar processing A stream architecture achieves high arithmetic intensity –Intensity = arithmetic rate/bandwidth –Bandwidth hierarchy, compound stream operations A Streaming Supercomputer is feasible –100GFLOPS (64-b) on a chip, 1TFLOPS single-board computer, PFLOPS systems
5
December 10, 2002 SS-FQ02-W: 5 Review – What is the SSS Project About? Exploit streams to give 100x improvement in performance/cost for scientific applications vs. ‘cluster’ supercomputers –From 100 GFLOPS PCs to TFLOPS single-board computers to PFLOPS supercomputers Use layered programming system to simplify development and tuning of applications –Stream languages –Streaming virtual machine Demonstrated feasibility of streaming scientific computing in year 1 Refine architecture and programming system in year 2 –Demonstrate realistic applications (3D, irregular) –Build usable compiler –Resolve architecture questions – aspect ratio, conditional execution, sparse clusters, reg organization, memory system, etc… Build a prototype and demonstrate CITS applications in years 3-6 –With industrial and government partners –Broaden our base of support
6
December 10, 2002 SS-FQ02-W: 6 Software Infrastructure Compiler –Decide on flow from Brook->SVM->SSS –Select base compiler ORC, Gnu, SUIF, Tendra, others… –“Spike” a simple program from Brook->SSS –Optimizations SVM Simulator
7
December 10, 2002 SS-FQ02-W: 7 3-D Applications StreamFLO StreamFEM StreamMD/Gromacs
8
December 10, 2002 SS-FQ02-W: 8 Irregular Grids Need an application Brook support for variable degree Architecture/run-time support
9
December 10, 2002 SS-FQ02-W: 9 Multi-Node Execution Brook support Manual partitioning for first step Simple application on SVM simulator
10
December 10, 2002 SS-FQ02-W: 10 Industrial Partner Candidates –Cray, IBM, Sun, HP, SGI, Intel Initial discussion –Present SSS project and results to date –Discuss collaboration models –Identify next steps
11
December 10, 2002 SS-FQ02-W: 11 Outreach National Labs –Los Alamos –Livermore –Sandia Other Government –NASA –DARPA –DoD (Charlie Holland) –AFOSR User communities
12
December 10, 2002 SS-FQ02-W: 12 Software Fall 02 Goals Brook –Multi-node issues: Synchronization primitives Data Partitioning –Variable length records SVM –Multi-node simulator –Performance numbers for 3 apps Compilation –Pick new infrastructure & design compiler (Reservoir) –Generate SVM code from Brook – (StreamC to SVM) –SVM to {SMP, graphics, SSS} (SVM is SMP) Run-Time (Software services) –Identify issues Issues –Variable length records? With stencils?
13
December 10, 2002 SS-FQ02-W: 13 Software Win 02 Goals Brook –Define carefully the semantics of the operators –Work on “views of memory” abstraction –Support for partitioning, shared memory, naming, fitting into stream abstraction –Support for irregular neighborhoods –Multithreaded version (Christos) –Concrete Winter goals [Ian/Frank] Review of the language [Pat] Partitioning (UPC) Multi-node/Multi-threaded version Irregular support – w/ application PPoPP paper MD on BRT
14
December 10, 2002 SS-FQ02-W: 14 Software Win 02 Goals SVM –Finish prototype single node implementation [Done] –Compiler issue –Implement multinode version w/ multi-node app. Start with one that runs on one processor [Francois] Multithreaded on SMP – on SGI [+] Cluster version [++] –SVM to simulator path Mattan – not an intermediate between Brook and SSS
15
December 10, 2002 SS-FQ02-W: 15 Software Win 02 Goals (3 of 3) Start regular meetings Compiler –Decide on flow from Brook->SVM->SSS [Mattan] Requirements –Select base compiler [Jayanth] ORC, Gnu, SUIF, Tendra, others… –“Spike” a simple program from Brook->SSS [Mattan/Jayanth ++] –Brook to Nvidia –Optimizations [Spring] Run time –Write a white paper
16
December 10, 2002 SS-FQ02-W: 16 Application Fall 02 Goals SteamMD –Migrate to Gromacs StreamFlo –Complete –3D StreamFEM –3D –Sparse LA Scalability – multiple nodes Look at Sierra, purple benchmarks: ppm, sweep3D
17
December 10, 2002 SS-FQ02-W: 17 Application Win 02 Goals StreamFLO[Fatica] –Partioned version; scalable –Convert to 3D StreamFEM [Barth] –Partioned version; scalable –Convert to 3D –Sparse LA StreamMD [Eric/student] –Migrate to GROMACS [Vijay Pande/Michael Levitt groups] –Redo inner (force) and outer (neighbor) loops –Partitioned version; scalable –Finish port to NV30: build cluster and folding@homefolding@home Model applications [Ron/Frank] –Model PDES with sparse matrix solves An irregular application [Ron/Frank] Look at Sierra, purple benchmarks: ppm, sweep3D [delay]
18
December 10, 2002 SS-FQ02-W: 18 Architecture Fall 02 Goals Simulator –Multi-node working –Indexable SRF –Scalar processor Point Studies –Conditionals –Aspect ratio –Indexable SRF –Add & Store (remote ops in general) –Iterative operations & extended precision –Network Spec –Flesh out I/O App studies
19
December 10, 2002 SS-FQ02-W: 19 Architecture Win 02 Goals Single-Node Simulator [Jung-Ho, Knight] –64-bit support, MULADD, Scalar Processor Multi-Node Simulator [Jung-Ho, Abhishek] –Network model –Multi-node mechanisms Point Studies –Aspect ratio SSE vs VLIW –Conditional execution [Mattan/Ujval] –Sparse clusters –SRF organization [Nuwan] –Cache alternatives [Jung Ho] –Add and store study [Jung Ho] –I/O –Iterative operations [Francois]
20
December 10, 2002 SS-FQ02-W: 20 Special Win 02 Goals Fix website [Pat] –Public and private websites Name that computer –Mississippi –Axios –Submit names to Mattan –Bill, Pat, Bill to choose Project Party
21
December 10, 2002 SS-FQ02-W: 21 Winter Quarter Meeting Schedule 1/7RonAnything 1/14Francois/MattanWhat is SVM 1/21Fatica3D Flo 1/28PatRTSL partitioning 2/4Bill Carlson [Pat]UPC 2/11Francois/IanDiscussion of targets SSS/CG/MPI 2/18Tim B.Irregular grid 2/25MattanCompilation Infrastructure 3/4Jung HoAdd & Store 3/11BillWrapup
22
December 10, 2002 SS-FQ02-W: 22 Papers Arch –Indexable SRFs (Nuwan) –Streaming Supercomputer Overview (Tim K.) –Streaming on conventional CPUs (Mattan) –Conditionals (Ujval) –Remote Ops (Jung Ho) –Aspect Ratio (?) –Data parallel (SSE) vs. ILP (VLIW) Software –Design of Brook (Ian) –Data parallel programming on graphics HW (Pat) –Brook to CG Compiler Apps –Gromacs –StreamFEM (Tim 2 ) Overview (Bill and Pat)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.