Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.

Slides:



Advertisements
Similar presentations
Cross-site data transfer on TeraGrid using GridFTP TeraGrid06 Institute User Introduction to TeraGrid June 12 th by Krishna Muriki
Advertisements

Computing Infrastructure
Hardware & the Machine room Week 5 – Lecture 1. What is behind the wall plug for your workstation? Today we will look at the platform on which our Information.
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Tag line, tag line Perforce Benchmark with PAM over NFS, FCP & iSCSI Bikash R. Choudhury.
♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.
CURRENT AND FUTURE HPC SOLUTIONS. T-PLATFORMS  Russia’s leading developer of turn-key solutions for supercomputing  Privately owned  140+ employees.
ParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing P. Balaji, Argonne National Laboratory W. Feng and J. Archuleta, Virginia Tech.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
IDC HPC User Forum Conference Appro Product Update Anthony Kenisky, VP of Sales.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.
ASKAP Central Processor: Design and Implementation Calibration and Imaging Workshop 2014 ASTRONOMY AND SPACE SCIENCE Ben Humphreys | ASKAP Software and.
Smart Storage and Linux An EMC Perspective Ric Wheeler
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
Silicon Graphics, Inc. Poster Presented by: SGI Proprietary Technologies for Breakthrough Research Rosario Caltabiano North East Higher Education & Research.
SUMS Storage Requirement 250 TB fixed disk cache 130 TB annual increment for permanently on- line data 100 TB work area (not controlled by SUMS) 2 PB near-line.
Server Platforms Week 11- Lecture 1. Server Market $ 46,100,000,000 ($ 46.1 Billion) Gartner.
NWfs A ubiquitous, scalable content management system with grid enabled cross site data replication and active storage. R. Scott Studham.
Introduction to DoC Private Cloud
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
National Energy Research Scientific Computing Center (NERSC) The GUPFS Project at NERSC GUPFS Team NERSC Center Division, LBNL November 2003.
Mass RHIC Computing Facility Razvan Popescu - Brookhaven National Laboratory.
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
1 Down Place Hammersmith London UK 530 Lytton Ave. Palo Alto CA USA.
High Performance Computing G Burton – ICG – Oct12 – v1.1 1.
VIRTUALIZATION ACTUALIZATION Balacom Services Daniel R. Bennett, Kyle Campbell, Jimmy Schmalzl Virtual Server Farm.
1 - Q Copyright © 2006, Cluster File Systems, Inc. Lustre Networking with OFED Andreas Dilger Principal System Software Engineer
Appendix B Planning a Virtualization Strategy for Exchange Server 2010.
Planning and Designing Server Virtualisation.
Network Tests at CHEP K. Kwon, D. Han, K. Cho, J.S. Suh, D. Son Center for High Energy Physics, KNU, Korea H. Park Supercomputing Center, KISTI, Korea.
HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve.
Diamond Computing Status Update Nick Rees et al..
Remote Direct Memory Access (RDMA) over IP PFLDNet 2003, Geneva Stephen Bailey, Sandburst Corp., Allyn Romanow, Cisco Systems,
Sejong STATUS Chang Yeong CHOI CERN, ALICE LHC Computing Grid Tier-2 Workshop in Asia, 1 th December 2006.
SoCal Infrastructure OptIPuter Southern California Network Infrastructure Philip Papadopoulos OptIPuter Co-PI University of California, San Diego Program.
Infiniband in EDA (Chip Design) Glenn Newell Sr. Staff IT Architect Synopsys.
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001.
Trip Report SC’04 Pittsburgh Nov 6-12 Fons Rademakers.
Infiniband Bart Taylor. What it is InfiniBand™ Architecture defines a new interconnect technology for servers that changes the way data centers will be.
Sep 02 IPP Canada Remote Computing Plans Pekka K. Sinervo Department of Physics University of Toronto 4 Sep IPP Overview 2 Local Computing 3 Network.
1 SOS7: “Machines Already Operational” NSF’s Terascale Computing System SOS-7 March 4-6, 2003 Mike Levine, PSC.
Active Storage Processing in Parallel File Systems Jarek Nieplocha Evan Felix Juan Piernas-Canovas SDM CENTER.
TeraGrid Quarterly Meeting Arlington, VA Sep 6-7, 2007 NCSA RP Status Report.
Macromolecular Crystallography Workshop 2004 Recent developments regarding our Computer Environment, Remote Access and Backup Options.
PROOF Benchmark on Different Hardware Configurations 1 11/29/2007 Neng Xu, University of Wisconsin-Madison Mengmeng Chen, Annabelle Leung, Bruce Mellado,
SA1 operational policy training, Athens 20-21/01/05 Presentation of the HG Node “Isabella” and operational experience Antonis Zissimos Member of ICCS administration.
Parallel IO for Cluster Computing Tran, Van Hoai.
Tackling I/O Issues 1 David Race 16 March 2010.
Microsoft Advertising 16:9 Template Light Use the slides below to start the design of your presentation. Additional slides layouts (title slides, tile.
Introduction to Exadata X5 and X6 New Features
CubicRing ENABLING ONE-HOP FAILURE DETECTION AND RECOVERY FOR DISTRIBUTED IN- MEMORY STORAGE SYSTEMS Yiming Zhang, Chuanxiong Guo, Dongsheng Li, Rui Chu,
New directions in storage | ISGC 2015, Taipei | Patrick Fuhrmann | 19 March 2015 | 1 Presenter: Patrick Fuhrmann dCache.org Patrick Fuhrmann, Paul Millar,
An Introduction to GPFS
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
12/19/01MODIS Science Team Meeting1 MODAPS Status and Plans Edward Masuoka, Code 922 MODIS Science Data Support Team NASA’s Goddard Space Flight Center.
Compute and Storage For the Farm at Jlab
Disruptive Storage Workshop Lustre Hardware Primer
Experience of Lustre at QMUL
DSS-G Configuration Bill Luken – April 10th , 2017
Exadata and ZFS Storage at Nielsen
Appro Xtreme-X Supercomputers
Cluster Active Archive
Experience of Lustre at a Tier-2 site
Building 100G DTNs Hurts My Head!
Chapter 6 Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism Topic 4 Storage Prof. Zhang Gang School of.
Introduction to Networks
Outline Problem DiskRouter Overview Details Real life DiskRouters
Cost Effective Network Storage Solutions
Presentation transcript:

Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop

Agenda Supercomputing 2004 Conference Application –Ultimate Integration Resource Overview Did it work? What did we take from it?

Supercomputing 2004 Annual Conference –Supercomputers –Storage Network hardware –Original reason for application Bandwidth Challenge –Didn’t apply due to time

Application Requirements Runs on Lemieux (PSC’s supercomputer) Application Gateways (AGW) Cisco CRS-1 –40Gb/sec OC-768 cards –Few exist Single application Be used with another demo on the show floor if possible

Ultimate Integration Application Checkpoint Recovery System –Program Garden variety Laplace solver instrumented to save its memory state in checkpoint files Checkpoints memory to remote network clients Runs on 34 Lemieux nodes

Lemieux TCS System 750 Compaq Alphaserver ES45 nodes –SMP Four 1GHz Alpha Processors 4 GB of Memory Interconnection –Quadrics Cluster Interconnect Shared memory library

Application Gateways 750 GigE connections are very expensive Reuse Quadrics network to attach cheap Linux boxes with GigE –15 AGWS Single processor Xeons 1 Quadrics card 2 Intel GigE –Each GigE card maxes out at 990Mb/sec –Only need 30 GigE to fill link to Teragrid Web100 kernel

Application Gateways

Network Cisco 6509 –Sup720 –WS-X6748-SFP –Two WS-X GE Used 4 10GE interfaces OSPF load balancing was my real worry – >30 GE streams over 4 links

Network Cisco CRS-1 –40 Gb/sec slot –16 slots –For Demo Two OC-768 cards –Ken Goodwin’s and Kevin McGratten’s big worry was the OC-768 transport Two 8 Port 10 GE cards –Running production IOS-XR code –Had problems with tracking hardware Ran both without 2 Switching Fabrics with no effects on traffic

Network Cisco CRS-1 –One at Westinghouse Machine Room –One on show floor Fork lift needed to place it –7 feet tall –939 lbs empty –1657 lbs fully loaded

The Magic Box Stratalight – OTS 4040 transponder “compresses” the 40Gbs signal to fit into the spectral bandwidth of a traditional 10G wave – Uses proprietary encoding techniques The Stratalight transponder was connected to the Mux/DMUX of the as an alien wavelength

Time Dependences OC-768 wasn’t worked on until one week before the conference

OC-768

Where Does the Data Land? Lustre Filesystem – Developed by Cluster File Systems – POSIX compliant, Open Source, parallel file system Separates metadata and data objects to allow for speed and scaling

The Show Floor 8 Checkpoint Servers with a 10GigE and Infiniband connections 5 Lustre OSTs connected via Infiniband with 2 SCSI disk shelves (RAID5) Lustre meta-data server (MDS) connected via Infiniband

The Show Floor

The Demo

How well did it run? Laplace Solver w/ Checkpoint Recovery –Using 16 Application Gateways (32 GigE connections): 31.1Gbs Only 32 Lemieux nodes were available IPERF –Using 17 Application Gateways + 3 single GigE attached machines: 35 Gbs Zero SONET errors reported on interface Over 44TB were transferred

The Team

Just Demoware? AGWs –qsub command now has AGW option Can do accounting (and possibly billing) Mysql database with Web100 stats –Validated that AGW was cost effective solution OC-768 Metro can be done by mere mortals

Just Demoware?? Application receiver –Laplace solver ran at PSC –Checkpoint receiver program tested / run at both NCSA and SDSC Ten IA64 compute nodes as receiver ~10 Gb/sec Network to Network (/dev/null) –990 Mb/sec * 10 streams

Thank You