Workshop on Commodity-Based Visualization Clusters Learning From the Stanford/DOE Visualization Cluster Mike Houston, Greg Humphreys, Randall Frank, Pat.

Slides:



Advertisements
Similar presentations
Premio Desktop and Intel Processor Roadmap for Q3/2003 Premio Desktop and Intel Processor Roadmap for Q4/2003 to Q3/2004 By Calvin Chen Technical Director.
Advertisements

Premio Predator G2 Workstation Training
Zeus Server Product Training Son Nguyen Zeus Server Product Training Son Nguyen.
Beowulf Supercomputer System Lee, Jung won CS843.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.
Designing Lattice QCD Clusters Supercomputing'04 November 6-12, 2004 Pittsburgh, PA.
Brook for GPUs Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, Pat Hanrahan Stanford University DARPA Site Visit, UNC.
A Comparative Study of Network Protocols & Interconnect for Cluster Computing Performance Evaluation of Fast Ethernet, Gigabit Ethernet and Myrinet.
Teraserver Darrel Sharpe Matt Todd Rob Neff Mentor: Dr. Palaniappan.
Multiprocessors ELEC 6200 Computer Architecture and Design Instructor: Dr. Agrawal Yu-Chun Chen 10/27/06.
Workshop on Parallel Visualization and Graphics Chromium Mike Houston, Stanford University and The Chromium Community.
Parallel Rendering Ed Angel
1 Web Server Administration Chapter 2 Preparing For Server Installation.
Parallel Graphics Rendering Matthew Campbell Senior, Computer Science
Presentation by David Fong
Trends in Storage Subsystem Technologies Michael Joyce, Senior Director Mylex & OEM Storage Subsystems IBM.
Workshop on Commodity-Based Visualization Clusters Big Data, Big Displays, and Cluster-Driven Interactive Visualization Sunday, October 27, 2002 Kenneth.
Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
INFO1119 (Fall 2012) INFO1119: Operating System and Hardware Module 2: Computer Components Hardware – Part 2 Hardware – Part 2.
Processor Technology John Gordon, Peter Oliver e-Science Centre, RAL October 2002 All details correct at time of writing 09/10/02.
PC DESY Peter Wegner 1. Motivation, History 2. Myrinet-Communication 4. Cluster Hardware 5. Cluster Software 6. Future …
Sven Ubik, Petr Žejdl CESNET TNC2008, Brugges, 19 May 2008 Passive monitoring of 10 Gb/s lines with PC hardware.
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
1 Down Place Hammersmith London UK 530 Lytton Ave. Palo Alto CA USA.
Basic Computer Structure and Knowledge Project Work.
Terabyte IDE RAID-5 Disk Arrays David A. Sanders, Lucien M. Cremaldi, Vance Eschenburg, Romulus Godang, Christopher N. Lawrence, Chris Riley, and Donald.
Operational computing environment at EARS Jure Jerman Meteorological Office Environmental Agency of Slovenia (EARS)
COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.
Electronic Visualization Laboratory University of Illinois at Chicago “Sort-First, Distributed Memory Parallel Visualization and Rendering” by E. Wes Bethel,
DELL PowerEdge 6800 performance for MR study Alexander Molodozhentsev KEK for RCS-MR group meeting November 29, 2005.
High Resolution Displays at PPPL Mike Miller
Chep06 1 High End Visualization with Scalable Display System By Dinesh M. Sarode, S.K.Bose, P.S.Dhekne, Venkata P.P.K Computer Division, BARC, Mumbai.
NSTX Collaborative Control Room 9/15/2003 E. Feibush, S. Kaye, S. Klasky, I. Zatz A. Finklestein, K. Li, G. Wallace.
Parallel Rendering 1. 2 Introduction In many situations, standard rendering pipeline not sufficient ­Need higher resolution display ­More primitives than.
Computer Graphics Graphics Hardware
Current Computer Architecture Trends CE 140 A1/A2 29 August 2003.
OCIPUG Hardware SIG February 12, OCIPUG Hardware SIG Agenda – February 12, :00 – 7:05 Administration 7:05 – 8:00 Featured Topic – CPUs 8:00.
University of Southampton Clusters: Changing the Face of Campus Computing Kenji Takeda School of Engineering Sciences Ian Hardy Oz Parchment Southampton.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Hardware Trends. Contents Memory Hard Disks Processors Network Accessories Future.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Essentials components in Mobo The more important part in a mobo is the chipset. It make the interconnexions between all the other parts on the mobo.
Amy Apon, Pawel Wolinski, Dennis Reed Greg Amerson, Prathima Gorjala University of Arkansas Commercial Applications of High Performance Computing Massive.
Hardware. Make sure you have paper and pen to hand as you will need to take notes and write down answers and thoughts that you can refer to later on.
Parallel Rendering. 2 Introduction In many situations, a standard rendering pipeline might not be sufficient ­Need higher resolution display ­More primitives.
10/22/2002Bernd Panzer-Steindel, CERN/IT1 Data Challenges and Fabric Architecture.
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
2009/4/21 Third French-Japanese PAAP Workshop 1 A Volumetric 3-D FFT on Clusters of Multi-Core Processors Daisuke Takahashi University of Tsukuba, Japan.
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
Computer Graphics II University of Illinois at Chicago Volume Rendering Presentation for Computer Graphics II Prof. Andy Johnson By Raj Vikram Singh.
Parallel Rendering Ed Angel Professor Emeritus of Computer Science University of New Mexico 1 E. Angel and D. Shreiner: Interactive Computer Graphics 6E.
RAL Site Report John Gordon HEPiX/HEPNT Catania 17th April 2002.
1 Cluster Development at Fermilab Don Holmgren All-Hands Meeting Jefferson Lab June 1-2, 2005.
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Acer America Motherboards Fred Smith. Acer America Acer Inc is a Taiwanese multinational hardware and electronics corporation headquartered in Xizhi,
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
Recent experience with PCI-X 2.0 and PCI-E network interfaces and emerging server systems Yang Xia Caltech US LHC Network Working Group October 23, 2006.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
Computer Hardware.
Lattice QCD Computing Project Review
Cluster Active Archive
Scalability of Intervisibility Testing using Clusters of GPUs
Unit 2 Computer Systems HND in Computing and Systems Development
Constructing a system with multiple computers or processors
Cluster Computers.
Presentation transcript:

Workshop on Commodity-Based Visualization Clusters Learning From the Stanford/DOE Visualization Cluster Mike Houston, Greg Humphreys, Randall Frank, Pat Hanrahan

Workshop on Commodity-Based Visualization Clusters 2 Outline Stanford’s current cluster –Design decisions –Performance evaluation –Bottleneck evaluation Cluster “Landscape” –General classification –Bottleneck evaluation Stanford’s next cluster –Design goals –Research directions

Workshop on Commodity-Based Visualization Clusters 3 Stanford/DOE Visualization Cluster The Chromium Cluster

Workshop on Commodity-Based Visualization Clusters 4 Cluster Configuration (Jan. 2000) Cluster: 32 graphics nodes + 4 server nodes Computer: Compaq SP750 –2 processors (800 MHz PIII Xeon, 133MHz FSB) –i840 core logic (big issue for vis-clusters) Simultaneous fast graphics and networking Network: 64-bit, 66 MHz PCI Graphics: AGP-4x –256 MB memory –18GB SCSI 160 disk (+ 3*36GB on servers) Graphics (Sept. 2002) –16 NVIDIA GeForce3 w/ DVI (64 MB) –16 NVIDIA GeForce4 TI4200 w/ DVI (128 MB) Network –Myrinet 64-bit, 66 MHz (LANai 7)

Workshop on Commodity-Based Visualization Clusters 5 Graphics Evaluation NVIDIA GeForce3 –25 MTri/s triangle rate observed –680 MPix/s fill rate observed NVIDIA GeForce4 –60 MTri/s triangle rate observed –800 MPix/s fill rate observed Read Pixels performance –35 MPix/s (140 MB/s) RGBA –22 MPix/s (87 MB/s) Depth Draw Pixels performance –45 MPix/s (180 MB/s) RGBA –21 MPix/s (85 MB/s) Depth

Workshop on Commodity-Based Visualization Clusters 6 Network Evaluation Myrinet LANai 7 PCI64A boards –Theoretical Limit: 160 MB/s –142 MB/s observed peak under Linux –~100 MB/s observed sustained under Linux ServerNet not chosen –Driver support –Large switching infrastructure required Gigabit Ethernet –Performance and scalability concerns

Workshop on Commodity-Based Visualization Clusters 7 Myrinet Issues Fairness: Clients starved of network resources –Implemented credit scheme to minimize congestion Lack of buffering in switching fabric –Causes poor performance in high load conditions –Open issue Partitioned Cluster Unpartitioned Cluster

Workshop on Commodity-Based Visualization Clusters 8 i840 Chipset Evaluation 66MHz 64bit PCI performance not full speed: –210 MB/s PCI read (40% of theoretical peak) –288 MB/s PCI write (54% of theoretical peak) –Combined read/write ~121 MB/s AGP –Fast Writes / Side Band Addressing unstable under Linux

Workshop on Commodity-Based Visualization Clusters 9 Sort-First Performance Configuration –Application runs application on client –Primitives distributed to servers Tiled Display 1024x768 –Total resolution: 4096x2304, 9 Megapixel Quake 3 –50 fps Atlantis –450 fps

Workshop on Commodity-Based Visualization Clusters 10 Sort-Last Performance Configuration –Parallel rendering on multiple nodes –Composite to final display node Volume Rendering on 16 nodes –1.57 GVox/s [Humphreys 02] –1.82 GVox/s (tuned) 9/02 –256x256x1024 volume 1 rendered twice 1 Data Courtesy of G. A Johnson, G.P.Cofer, S.L Gewalt, and L.W. Hedlund from the Duke Center for In Vivo Microscopy (an NIH/NCRR National Resource)

Workshop on Commodity-Based Visualization Clusters 11 Cluster Accomplishments Development Platform –WireGL –Chromium Cluster configuration replicated Interactive Performance –256x512x fps –9 Megapixel 50fps

Workshop on Commodity-Based Visualization Clusters 12 Sources of Bottlenecks Sort-First –Packing speed (processor) –Primitive distribution (network and bus) –Rendering (processor and graphics chip) Sort-Last –Rendering (graphics chip) –Composite (network, bus, and read/draw pixels)

Workshop on Commodity-Based Visualization Clusters 13 Bottleneck Evaluation – Stanford Sort-First: Processor and Network Sort-Last: Network and Read/Draw

Workshop on Commodity-Based Visualization Clusters 14 The Landscape of Graphics Clusters Many Options –Low End <$2500/node –Mid End ~$5000/node –High End >$7500/node Tradeoffs –Different bottlenecks –Price/Performance –Scalability –Usage Evaluation –Based off of published benchmarks and specs

Workshop on Commodity-Based Visualization Clusters 15 Cluster Interconnect Options Many choices –GigE ~100 MB/s –Myrinet 2000 ( 245MB/s –SCI/Dolphin ( 326 MB/s –Quadrics ( 340 MB/s Future options –10 GigE –Infiniband –HyperTransport

Workshop on Commodity-Based Visualization Clusters 16 Low End General Definition –Single CPU –Consumer Mainboard –Integrated Graphics –High Speed commodity network Example Node Configuration –Nvidia NForce2 –AMD Athlon –512 MB DDR –GigE and 10/100 –1U rack chassis –Estimated Price: $1500

Workshop on Commodity-Based Visualization Clusters 17 Bottleneck Evaluation – Low End Bus/Network limited

Workshop on Commodity-Based Visualization Clusters 18 Mid End General Definition –Dual Processor –“Workstation” mainboard –High performance bus 64-bit PCI or PCI-X –High Speed Commodity / Low end cluster interconnect –High-End consumer graphics board Example Node Configuration –Intel i860 –Dual Intel P4 Xeon 2.4GHz –2GB RDRAM –ATI Radeon 9700 –GigE onboard + Myrinet 2000 –2U rack chassis –Estimated Price: $4000

Workshop on Commodity-Based Visualization Clusters 19 Bottleneck Evaluation – Mid End Sort-First: Network limited Sort-Last: Read/Draw and Network limited

Workshop on Commodity-Based Visualization Clusters 20 High End General Definition –Dual or Quad processor –Cutting edge bus PCI-X, HyperTransport, PCI Enhanced –High Speed Commodity/ High end cluster interconnect –“Professional” graphics board –RAID system Example Node Configuration –ServerWorks GC-WS –Dual P4 Xeon 2.6GHz –Nvidia Quadro4 900XGL –4GB DDR –GigE onboard + Infiniband –Estimated Price: $7500

Workshop on Commodity-Based Visualization Clusters 21 Bottleneck Evaluation – High End Sort-First: Well balanced Sort-Last: Read/Draw limited

Workshop on Commodity-Based Visualization Clusters 22 Balanced System is Key Only as fast as slowest component –Spend money where it matters!

Workshop on Commodity-Based Visualization Clusters 23 Goals for Next Cluster Performance –Sort-Last 5 GVox/s 1 GTri/s –Sort-First at 4096x2304 >100fps Research –Remote visualization –Time-varying datasets –Compositing

Workshop on Commodity-Based Visualization Clusters 24 What we plan to build 16 Node cluster, 1U nodes Mainboard chipsets –Intel Placer –ServerWorks GC-WS –AMD Hammer Memory –2-4GB Graphics Chip –Nvidia NV30 –ATI R300/350 Interconnect –Infiniband, Quadrics Disk –IDE RAID or SCSI

Workshop on Commodity-Based Visualization Clusters 25 Continuing Chipset Issues Why do chipsets perform so poorly? –“Workstation” Intel i860 –215 MB/s read (40% of theoretical) –300 MB/s write (56% of theoretical) AMD 760MPX –300 MB/s read (56% of theoretical) –312 MB/s write (59% of theoretical) –“Server” ServerWorks ServerSet III LE –423 MB/s read (79% of theoretical) –486 MB/s write (91% of theoretical) Why can’t a “server” have an AGP slot? Performance numbers from

Workshop on Commodity-Based Visualization Clusters 26 Ongoing Bottlenecks Readback performance –Will be fixed “soon” –Hardware compositing? Chipset Performance –Achieve fraction of theoretical –Need faster busses in commodity chipsets Network Performance –Scalability –Fast is VERY expensive

Workshop on Commodity-Based Visualization Clusters 27 Conclusions What we still need –More vendors –More chipsets –More performance Graphics Clusters are getting better –Chipsets –Interconnects –Form factor –Processing –Graphics Chips Things are really starting to get interesting!