Presentation is loading. Please wait.

Presentation is loading. Please wait.

LCSE – Unisys Collaboration Paul Woodward, LCSE Unisys Briefing June 7, 2005.

Similar presentations


Presentation on theme: "LCSE – Unisys Collaboration Paul Woodward, LCSE Unisys Briefing June 7, 2005."— Presentation transcript:

1 LCSE – Unisys Collaboration Paul Woodward, LCSE Unisys Briefing June 7, 2005

2 Unisys Donation, March, 2003: Unisys donated a 32-processor ES7000 to the LCSE & one to MSI. Microsoft donated software, DataCenter 2003 and SQL Server. Intel donated chips.

3 Unisys was initiating an HPC program. LCSE could demonstrate power of the ES7000 on scientific problems using the Windows OS. LCSE could explore possibility of supporting graphics applications on this machine.

4 Performance Study for Computational Fluid Dynamics: LCSE codes ported to ES7000. Computational kernel performance measured, with excellent results. Parallel performance study identified issues that were addressed successfully with Unisys assistance.

5 This is the best performance per CPU that we have obtained anywhere to date. To achieve this, we did not compromise our code implementation strategy – we can still do completely out-of-core computations on problems of any size. We worked with Dave Johnson of Unisys to pin our processes down to their CPUs, while we allowed the data read and written to come from and go to any place in the machine. We are now working to get both 16-CPU partitions computing this efficiently together. Unisys now offers a larger shared memory configuration that solves this problem, but our task would be needed to get multiple of those machines to work together.

6 Bottom Line: LCSE performance figures are triple those achieved by NCSA job mix, or by applications represented at the Natl. Academy “Future of Supercomputing” meeting. Focus on running small job fast exploits unique SMP advantage. Also, SHMOD allows out-of-core, billion-cell simulation.

7 ES-7000 doing Billion-Cell Simulations & Many Smaller Ones Large memory a great advantage. Many fast attached disks. Highly reliable system. We like Windows. Serves as central hub of the LCSE. White papers for Unisys & acknow- ledgements in scientific papers.

8 First large-scale multifluid PPM simulation. Billion-Cell Simulation Underway. New interface tracking method implemented for Los Alamos

9 A parameter study of turbulent shear layer flows was made possible by the ES7000

10 Plans: Integrate ES7000 as data analysis and central control engine of newly funded prototype system. Explore possibility of greater Unisys participation in a proposal next January for a full-up system.

11 New NSF Major Research Instrumentation Project: $300,000 for 1-year prototyping. Goal is truly interactive visualization of 2 TB data set on PowerWall at full resolution. Prototype will handle only 1 panel. Plan January proposal for 10 panels Data replication to avoid contention, SATA disks, Infiniband networking.

12 Motivation: Move from data presentation to data exploration. Generate PowerWall movies under interactive user control (just roll the mouse wheel and travel). Pipeline the data analysis and visualization process, so that it no longer takes days for each step, but is instead immediate.

13 Motivation for the Motivation: Need to do this because of the data explosion implied by national supercomputing installations. Largest machines at NSF centers can easily generate 5 to 10 TB/day (60- 120 MB/sec) of useful fluid flow simulation data. LambdaRail 10 Gbit/s connection can bring this directly to LCSE.

14 Two Modes of Interactive Use: Pre-computed bricks of bytes are replicated on the disks at each node and user travels through this 4-D data volume in virtual vehicle. Upon button click, raw data snapshot drawn into large shared memory, and user travels through this 3-D volume, looking at any desired variable, in virtual vehicle.

15 Mode A Requirement (full-up system): 2 TB replicated at each node. Local disk system streams data into graphics card at 400 MB/sec. 80 graphics engines each render at 400 MVoxels/sec to 10 PWall panels. Peak rendering rate of 32 GVoxel/s produces 2 frames/sec of 8.6 Gvoxel/frame on 10-panel PWall. (Prototype system does only 1 panel).

16 Mode B Requirement (full-up system): SMP memory holds raw data snap shot of 6×2×2048³ B = 96 GB SMP memory holds 32-bit single variable array of 4×2048³ B = 32 GB SMP processes data at 80 Gflop/s. IB4X streams 400 MB/sec to each node simultaneously from SMP 80 graphics engines render at 400 MVoxels/sec to 10 PWall panels.

17 New NSF Major Research Instrumentation Project: ES7000 will be the data analysis end of the data processing pipeline. Goal is interactive visualization directly from raw data, rather than from pregenerated voxel bricks. Much more I/O intensive. Large memory shared among many CPUs allows rapid voxel gen.

18 System now under construction in the LCSE. Dell PC nodes can act as intelligent storage servers and also as image generation engines. Dell 670n Dual 3.6 Xeon EM64 8GB DDR2 SDRAM

19 Key issues for Unisys: SATA disks directly attached or just through IB from PC servers? PCI Express x16 for high-end graphics engines, or put these into PC nodes & use PCI-e for IB__X? Infiniband network integrating with other machines and their storage. Bigger shared memory & more CPUs than present 16.

20 Potential Unisys Role: Memory network 10X faster than cluster network, so can work from raw data, which is 10X larger. Then can see any quantity on demand Entirely new capability for interactive data exploration. Unisys SMP would need to drive 80 rendering engines & 960 disks, either directly or in PC nodes on IB switch.

21 Opportunity (keeping options open for multiple possible suppliers): NSF encourages us to go back for ~1 M$ after proof of concept. Schedule gives Unisys time to integrate any essential new technologies – PCI Express x16, Infiniband, SATA. We can be testbed, working w Unisys. Major opportunity on horizon.

22 Prototyping Effort Now: Proposed Linux cluster to NSF; have 12 Dell nodes, each with:  Dual P4 Xeon @ 3.6 GHz  8 GB memory  nVidia Quadro 4400 graphics card  12 Seagate 400 GB SATA disks  3Ware 12-channel SATA controller  Infiniband 4X (Topspin) HCA 10 IB4X links to Unisys ES7000

23 Near-Term Goals for ES7000: Infiniband drivers. Get all 32 CPUs cooperating over IB. Improve performance of A3D data analysis application. Integrate with Linux cluster  We are fine with Windows  Government sponsors insist on Linux Pipeline data from A3D on ES7000 to HVR on PCs for raw data rendering.

24 Middle-Term Goals for ES7000: 3Ware SATA controller drivers? Attach SATA drives directly? Measure I/O performance. Experiment, in preparation for January NSF proposal, with  IB on more recent Unisys model?  Potential to drive Nvidia graphics? Experiment with resource sharing and on demand (preemptive) visualization

25 We have not settled on the scaled-up architecture: Scale up of present system is possible. IB cluster of Dell nodes is possible. SMP cluster of Unisys nodes is possible. Time is short, so options other than first 2 are handicapped. Other vendors unlikely.

26 Things that now seem definite: SATA disks (Seagate partnership under negotiation; buying 200 now). Programmable graphics engine(s) on PCI Express X16 (nVidia, perhaps with SLI, or perhaps even IBM cell). Infiniband 4X. (12X in full-up system?) Linux (our sponsors are determined). Intel CPUs.

27 Our Guess at a Best Fit Role for Unisys: Scale up present system with IB__X. Dell PC nodes act as storage servers. Dell PC nodes host programmable graphics engines that cooperatively render images to PowerWall display. Unisys SMP provides large shared memory and 80 Gflop/s processing power to enable interactive visualiza- tion from raw data.


Download ppt "LCSE – Unisys Collaboration Paul Woodward, LCSE Unisys Briefing June 7, 2005."

Similar presentations


Ads by Google