Commodity Flash-Based Systems at 40GbE - FIONA Philip Papadopoulos* Tom Defanti Larry Smarr John Graham Qualcomm Institute, UCSD *Also San Diego Supercomputer.

Slides:



Advertisements
Similar presentations
PowerEdge T20 Customer Presentation. Product overview Customer benefits Use cases Summary PowerEdge T20 Overview 2 PowerEdge T20 mini tower server.
Advertisements

PowerEdge T20 Channel NDA presentation Dell Confidential – NDA Required.
#SQLSatRiyadh Storage Performance 2013 Joe Chang
ATLAS Tier-3 in Geneva Szymon Gadomski, Uni GE at CSCS, November 2009 S. Gadomski, ”ATLAS T3 in Geneva", CSCS meeting, Nov 091 the Geneva ATLAS Tier-3.
Servidor Rack 2583ECU - x3250_M4 Express x3250 M4, Xeon 4C E3-1220v2 69W 3.1GHz/1600MHz/8MB, 1x4GB, O/Bay SS 3.5in SATA, SR C100, Multi- Burner, 300W p/s,
SQL Performance 2011/12 Joe Chang, SolidQ
SAN DIEGO SUPERCOMPUTER CENTER Using Gordon to Accelerate LHC Science Rick Wagner San Diego Supercomputer Center XSEDE 13 July 22-25, 2013 San Diego, CA.
SAN DIEGO SUPERCOMPUTER CENTER Niches, Long Tails, and Condos Effectively Supporting Modest-Scale HPC Users 21st High Performance Computing Symposia (HPC'13)
© 2010 VMware Inc. All rights reserved Confidential Performance Tuning for Windows Guest OS IT Pro Camp Presented by: Matthew Mitchell.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA; SAN DIEGO IEEE Symposium of Massive Storage Systems, May 3-5, 2010 Data-Intensive Solutions.
Juhan Kim, The KIAS CAC January 23,  To build linux cluster from the commodity hardware like cpu (xeon,amd), network (gigabit,infiniband), etc.
Remigius K Mommsen Fermilab A New Event Builder for CMS Run II A New Event Builder for CMS Run II on behalf of the CMS DAQ group.
SSD (Flash-Based) Anthony Bonomi. SSD (Solid State Drive) Commercially available for only a few years Big use in laptops Released the first 512GB last.
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
CPP Staff - 30 CPP Staff - 30 FCIPT Staff - 35 IPR Staff IPR Staff ITER-India Staff ITER-India Staff Research Areas: 1.Studies.
1 RAL Status and Plans Carmine Cioffi Database Administrator and Developer 3D Workshop, CERN, November 2009.
1 CS : Technology Trends Ion Stoica ( September 12, 2011.
 Model: ASUS SABERTOOTH Z77 Intel Series 7 Motherboard – ATX, Socket H2 (LGA115), Intel Z77 Express, 1866MHz DDR3, SATA III (6Gb/s), RAID, 8-CH Audio,
HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster.
5.3 HS23 Blade Server. The HS23 blade server is a dual CPU socket blade running Intel´s new Xeon® processor, the E5-2600, and is the first IBM BladeCenter.
Computer Design Corby Milliron. Mother Board specs Model: Processor Socket Intel Processor Interface LGA1150 Form Factor ATX Processors Supported 4th.
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
SKA in WP19 Iain Emsley Rahim ‘Raz’ Lakhoo David Wallom 3 rd Annual Meeting.
CMS Data Transfer Challenges LHCOPN-LHCONE meeting Michigan, Sept 15/16th, 2014 Azher Mughal Caltech.
Terabyte IDE RAID-5 Disk Arrays David A. Sanders, Lucien M. Cremaldi, Vance Eschenburg, Romulus Godang, Christopher N. Lawrence, Chris Riley, and Donald.
Twin + Platform DCO Platform GPU Tower
Know the Computer Multimedia tools. Computer essentials.
Optimizing Performance of HPC Storage Systems
MY PERSONAL COMPUTER Monica Sheffo. MOTHERBOARD  Model: Intel BOXDZ77GA-70K Intel Extreme Motherboard  Supported Processors: 2 nd generation Intel Core.
Evaluation of the LDC Computing Platform for Point 2 SuperMicro X6DHE-XB, X7DB8+ Andrey Shevel CERN PH-AID ALICE DAQ CERN 10 October 2006.
SIMPLE DOES NOT MEAN SLOW: PERFORMANCE BY WHAT MEASURE? 1 Customer experience & profit drive growth First flight: June, minute turn at the gate.
Hardware Trends. Contents Memory Hard Disks Processors Network Accessories Future.
Network Tests at CHEP K. Kwon, D. Han, K. Cho, J.S. Suh, D. Son Center for High Energy Physics, KNU, Korea H. Park Supercomputing Center, KISTI, Korea.
INDIACMS-TIFR Tier 2 Grid Status Report I IndiaCMS Meeting, April 05-06, 2007.
Linux Servers with JASMine K. Edwards, A. Kowalski, S. Philpott HEPiX May 21, 2003.
Data transfer over the wide area network with a large round trip time H. Matsunaga, T. Isobe, T. Mashimo, H. Sakamoto, I. Ueda International Center for.
10GE network tests with UDP
4 Dec 2006 Testing the machine (X7DBE-X) with 6 D-RORCs 1 Evaluation of the LDC Computing Platform for Point 2 SuperMicro X7DBE-X Andrey Shevel CERN PH-AID.
Computer Architecture Project
PDSF at NERSC Site Report HEPiX April 2010 Jay Srinivasan (w/contributions from I. Sakrejda, C. Whitney, and B. Draney) (Presented by Sandy.
JLab Scientific Computing: Theory HPC & Experimental Physics Thomas Jefferson National Accelerator Facility Newport News, VA Sandy Philpott.
DYNES Storage Infrastructure Artur Barczyk California Institute of Technology LHCOPN Meeting Geneva, October 07, 2010.
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
Challenges of deploying Wide-Area-Network Distributed Storage System under network and reliability constraints – A case study
Status SC3 SARA/Nikhef 20 juli Status & results SC3 throughput phase SARA/Nikhef Mark van de Sanden.
Weekly Report By: Devin Trejo Week of June 21, 2015-> June 28, 2015.
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
Memory and network stack tuning in Linux:
The PRPv1 Architecture Model Panel Presentation Building the Pacific Research Platform Qualcomm Institute, Calit2 UC San Diego October 16, 2015.
“The Pacific Research Platform: a Science-Driven Big-Data Freeway System.” Big Data for Information and Communications Technologies Panel Presentation.
US ATLAS Western Tier 2 Status Report Wei Yang Nov. 30, 2007 US ATLAS Tier 2 and Tier 3 workshop at SLAC.
LSST Cluster Chris Cribbs (NCSA). LSST Cluster Power edge 1855 / 1955 Power Edge 1855 (*LSST1 – LSST 4) –Duel Core Xeon 3.6GHz (*LSST1 2XDuel Core Xeon)
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
אלכס לנדסברג אשכול מערכות ממוחשבות בע ” מ.
B ENCHMARK ON D ELL 2950+MD1000 ATLAS Tier2/Tier3 workshop Wenjing wu AGLT2 / University of Michigan 2008/05/27.
Introduction to Exadata X5 and X6 New Features
CRISP WP18, High-speed data recording Krzysztof Wrona, European XFEL PSI, 18 March 2013.
StoRM+Lustre Performance Test with 10Gbps Network YAN Tian for Distributed Computing Group Meeting Nov. 4th, 2014.
Recent experience with PCI-X 2.0 and PCI-E network interfaces and emerging server systems Yang Xia Caltech US LHC Network Working Group October 23, 2006.
1062m0656 between 10692m2192 DS/ICI/CIF EqualLogic PS6510E
STORAGE EXPERIENCES AT MWT2 (US ATLAS MIDWEST TIER2 CENTER) Aaron van Meerten University of Chicago Sarah Williams Indiana University OSG Storage Forum,
Cluster Status & Plans —— Gang Qin
PPC-L158T-R90-AXE EOL notice
Building 100G DTNs Hurts My Head!
Multi-PCIe socket network device
,Dell PowerEdge 13 gen servers rental.
Power couple. Dell EMC servers powered by Intel® Xeon® processors and running Windows Server* 2016, ready to securely handle dynamic business workloads.
CS 140 Lecture Notes: Technology and Operating Systems
Presentation transcript:

Commodity Flash-Based Systems at 40GbE - FIONA Philip Papadopoulos* Tom Defanti Larry Smarr John Graham Qualcomm Institute, UCSD *Also San Diego Supercomputer Center

FIONA Internal Block Diagram (some channels close to saturation) Intel Haswell 1600 LSI 9300 HBA 16 X 1200MB/s x16 Gen3 2 x 40GbE x8 Gen3 I/O Controller x8 Gen2 x8 Gen3 PCIe Gen3 – 1GB/s SATA3 – 600MB/s SAS3 – 1200MB/s SATA 3 32GB x 68GB/s SATA GHz x8

MB: Supermicro X10SRL-F Socket 2011-v3, SAS Controller – LSI 9300/16i 16Gb/s x 16) PCIe Gen3 x 8) Network – Myricom (2 x 10GbE) OR Mellanox 1x40GbE 8 x SSD – Intel 535 (540MB/s R, 520MB/s W), ea SAS-to-SATA Cable xSFF 8 X Hard Drive: Western Digital 4TB RED (SATA 2) PCIe Gen3 x 8) CPU: Xeon E5-1620v3 (3.50GHz 68GB/s, ECC Memory) FIONA – Classic Approximate Cost $6K 4TB Flash, 32TB Hard Disk Network

Performance Anomalies Iperf testing 10GbE (C = Fiona Classic) 10GbE (Rocks = 5 year old Rocks Node) 40GbE (R = Fiona Rackmount) Network is isolated Most performance within expectation EXCEPT 40GbE  10GbE when thread count is > 3 Inconsistency also manifests itself in 40GbE to 40GbE testing R-To-C(2) == Rackmount-to-Classic (2 threads) Where we WERE a year ago. Very Inconsistent Performance

Long Story – The short version. Mellanox 40GbE performance was very erratic on dual socket systems 100Mbps – 39 Gbps in variation (variation over a few minutes). “good” cores and “bad” cores? Not the problem mask interrupts to certain cores (affinity). Not the problem Do just about anything in Mellanox tuning guide (30 pages). No Help. Spend months and months trying to understand what the heck is going on. Throw the “tuning guide” in the trash and keep searching Looked in memory performance (stream). Not an issue on a whim: Turned OFF TCP-offloading  performance evened out. 25Gbps. Not peak speed but consistent performance  This was a driver memory buffer issue

The Core Issue Definitively characaterized issue to Mellanox in Nov 2014 with a 100% reproducibility “patched” driver tested in December. Problem fixed Mellanox published updated driver in Jan 2015 Each new release requires retesting to insure no regressions. CPU0CPU1 PCI MEM iperf receiver process MEM pin iperf receiver to any core on this socket == 8Gbps pin iperf receiver to any core on this socket == 39Gbps QPI

Performance Today with Latest Driver # ethtool –C eth0 adaptive-rx off reboot (default)

Rackmount and Desktop Versions. Options for GPUs Online Parts Spreadsheet

Software CentOS 6.6/6.7 Updated Mellanox Driver ZFS ( Command-line tools that are included in the perfSONAR toolkit iperf3, nuttcp, bwctl, owamp, …. Rational sysctl settings for windows/buffers Data Access Standard Linux tools for local data access (NFS, Samba, SCP,…) Data Transfer Tools (pick from your favorites) FDT GridFTP UDT-based XRootD Custom Code We manage via the Rocks toolkit, but that is not absolutely essential.