NCCS User Forum June 25, 2013.

Slides:

Advertisements

Similar presentations

Symantec 2010 Windows 7 Migration EMEA Results. Methodology Applied Research performed survey 1,360 enterprises worldwide SMBs and enterprises Cross-industry.

Advertisements

Symantec 2010 Windows 7 Migration Global Results.

Software Version: DSS ver up01

Distributed Systems Architectures

1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 4 Computing Platforms.

Myra Shields Training Manager Introduction to OvidSP.

Slide 1 FastFacts Feature Presentation October 15, 2013 To dial in, use this phone number and participant code… Phone number: Participant.

Towards Automating the Configuration of a Distributed Storage System Lauro B. Costa Matei Ripeanu {lauroc, NetSysLab University of British.

17 Copyright © 2005, Oracle. All rights reserved. Deploying Applications by Using Java Web Start.

Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY

Chapter 5 Input/Output 5.1 Principles of I/O hardware

Chapter 6 File Systems 6.1 Files 6.2 Directories

1 Chapter 12 File Management Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,

Making the System Operational

1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.

1. Bryan Dreiling Main Contact for Three Year Plans

SE-292 High Performance Computing

Configuration management

Software change management

© 2003 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Performance Measurements of a User-Space.

Database Performance Tuning and Query Optimization

ACT User Meeting June Your entitlements window Entitlements, roles and v1 security overview Problems with v1 security Tasks, jobs and v2 security.

1 The phone in the cloud Utilizing resources hosted anywhere Claes Nilsson.

ETS4 - What's new? - How to start? - Any questions?

EIS Bridge Tool and Staging Tables September 1, 2009 Instructor: Way Poteat Slide: 1.

Chapter 10: Virtual Memory

1 Sizing the Streaming Media Cluster Solution for a Given Workload Lucy Cherkasova and Wenting Tang HPLabs.

OFFICE OF SUPERINTENDENT OF PUBLIC INSTRUCTION Division of Assessment and Student Information Online MSP Testing Technology & Assessment Coordinator Training.

OFFICE OF SUPERINTENDENT OF PUBLIC INSTRUCTION Division of Assessment and Student Information Online MSP Testing In-Depth Technology Training January 13,

Chapter 6 File Systems 6.1 Files 6.2 Directories

Database System Concepts and Architecture

Page 1 of 43 To the ETS – Bidding Query by Map Online Training Course Welcome This training module provides the procedures for using Query by Map for a.

Processes Management.

Executional Architecture

KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

1 BRState Software Demonstration. 2 After you click on the LDEQ link to download the BRState Software you will get this message.

Macromedia Dreamweaver MX 2004 – Design Professional Dreamweaver GETTING STARTED WITH.

SE-292 High Performance Computing Memory Hierarchy R. Govindarajan

The DDS Benchmarking Environment James Edmondson Vanderbilt University Nashville, TN.

Chapter 13 The Data Warehouse

Introduction Updates & Reminders Telephone Directory Impact of Consolidations on Websites.

South Dakota Library Network MetaLib User Interface South Dakota Library Network 1200 University, Unit 9672 Spearfish, SD © South Dakota.

TCP/IP Protocol Suite 1 Chapter 18 Upon completion you will be able to: Remote Login: Telnet Understand how TELNET works Understand the role of NVT in.

NCCS User Forum June 19, Agenda Introduction Discover Updates NCCS Operations & User Services Updates Question & Answer Breakout Session: –Climate.

Near-Term NCCS & Discover Cluster Changes and Integration Plans: A Briefing for NCCS Users October 30, 2014.

Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NCCS User Forum September 14, Agenda – September 14, 2010 Welcome & Introduction (Phil Webster, CISTO Chief) Current System Status (Fred Reitz,

Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.

6/1/2001 Supplementing Aleph Reports Using The Crystal Reports Web Component Server Presented by Bob Gerrity Head.

XA R7.8 Upgrade Process and Technical Overview Ruth Anne Pharr Sr. IT Consultant, CISTECH Inc.

IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.

SDSC RP Update TeraGrid Roundtable Reviewing Dash Unique characteristics: –A pre-production/evaluation “data-intensive” supercomputer based.

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

NCCS NCCS User Forum 24 March NCCS Agenda Welcome & Introduction Phil Webster, CISTO Chief Current System Status Fred Reitz, Operations Manager.

NCCS User Forum June 15, Agenda Current System Status Fred Reitz, HPC Operations NCCS Compute Capabilities Dan Duffy, Lead Architect User Services.

Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,

Looking Ahead: A New PSU Research Cloud Architecture Chuck Gilbert - Systems Architect and Systems Team Lead Research CI Coordinating Committee Meeting.

ARGONNE NATIONAL LABORATORY Climate Modeling on the Jazz Linux Cluster at ANL John Taylor Mathematics and Computer Science & Environmental Research Divisions.

NCCS User Forum 11 December GSFC NCCS NCCS User Forum12/11/082 Agenda Welcome & Introduction Phil Webster, CISTO Chief Current System Status Fred.

NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.

NCCS User Forum September 25, Agenda Introduction Discover Updates NCCS Operations & User Services Updates Question & Answer NCCS User Forum, Sep.

1 Accomplishments. 2 Overview of Accomplishments  Sustaining the Production Earth System Grid Serving the current needs of the climate modeling community.

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

Welcome to the PRECIS training workshop

NCCS User Forum December 7, Agenda – December 7, 2010 Welcome & Introduction (Phil Webster, CISTO Chief) Current System Status (Fred Reitz, NCCS.

Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb

INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.

PowerMart of Informatica

Presentation transcript:

NCCS User Forum June 25, 2013

Agenda Introduction NCCS Updates Recent Accomplishments Increase in Capacity Staffing NCCS Updates Discover Updates (Including Intel Phi) Remote Visualization Data Portal Update on NCCS Response to User Survey Resource Manager Analysis of Alternatives NCCS Operations & User Services Updates Upcoming and Status Ongoing Investigations Brown Bag and SSSO Seminars Questions & Answers NCCS User Forum, June 25, 2013

Recent Accomplishments 3 Months of Increasing Utilization Rising to 74% utilization in May (based on PBS) 3 Months of Very High Availability At or near 100% availability (does not include scheduled maintenance) Discover SCU9 (more later) Intel Xeon SandyBridge – 480 nodes In-depth Intel Training and Brown Bag seminars (more to come) Earth System Grid Federation (ESGF) downloads Over 326 TB and 10 million data sets, April 2011 – May 2013 NCCS User Forum, June 25, 2013

Iowa Flood Studies Support IFloodS GPM Ground Validation Field Campaign support Assisted Christa Peters-Lidard and her team to run NU-WRF forecasts Two forecasts per day during the field campaign (ran on the SCU8 SandyBridge) Tailored services (compute queues and storage) to meet the requirements for the campaign No downtime during the campaign and no forecasts were missed NCCS User Forum, June 25, 2013

NCCS Compute Capacity Evolution September 2007- September 2013 Summer 2013: SCU9 Addition: 480 Intel Xeon “Sandy Bridge” Nodes NCCS User Forum, June 25, 2013

Staff Additions Welcome to New Members of the NCCS Team: Garrison Vaughn Julien Peters Lyn Gerner (consultant) Welcome to Summer Interns: Jordan Robertson Dennis Lazar Winston Zhou NCCS User Forum, June 25, 2013

NCCS Updates Dan Duffy, HPC Lead and NCCS Lead Architect

Discover Updates SCU5/SCU6 Decommissioned SCU8 Nehalem processors, 8 cores per node, 24 GB of RAM Space, power, and cooling used for SCU9 Part of the system will be reused internally and part will go to UMBC SCU8 SandyBridge processors, 16 cores per node, 32 GB of RAM One Intel Phi accelerator per node Available for general access Special queue for native access by request (more on this later) NCCS User Forum, June 25, 2013

SCU9 Status SCU9 Schedule 480 SandyBridge Nodes 4 GB of RAM per core; total of 64 GB per node Does NOT contain any, but there is room for additional accelerators (Intel Phi or Nvidia GPUs) Upgrades to all the Discover I/O nodes Additional Discover Service nodes with SandyBridge processors Schedule Integration and testing for the next 2 to 3 weeks Pioneer usage during the month of July General access in August NCCS User Forum, June 25, 2013

GPFS Metadata Discover GPFS Metadata Storage Upgrade Goal is to dramatically increase the aggregate metadata performance for GPFS Cannot speed up a single metadata query, but can speed up the combination of all queries Acquisition in progress for solid state disks Responses are being evaluated now Installation later this year, and the NCCS will coordinate closely with the user community during the integration of this new capability Image source: IBM NCCS User Forum, June 25, 2013

Data Portal Upgrade Disk Capacity Additional Servers Adding ~100 TB of usable disk Additional Servers Servers with 10 GbE capability Support higher speed access between Discover and the Data Portal NCCS User Forum, June 25, 2013

Update on NCCS Response to User Survey External data transfer performance and convenience Focus has been on upgrading the Data Portal servers with 10 GbE capabilities to facility faster transfer performance Analysis of the GISS to NCCS network and recommendations for upgrades Upgrade of the SEN to CNE link to 10 GbE More timely notifications of problems or unplanned outages Web dashboard for system status is under development “Architecting for More Resiliency,” especially the Discover storage file systems Initial architecture thoughts and requirements have been captured Creation of a tiger team of NCCS and non-NCCS team members to look at how to architect for a higher resiliency Evaluation of alternative computing platforms, including cloud NCCS User Forum, June 25, 2013

Resource Manager Analysis of Alternatives Is there an alternative to PBS that better meets NCCS and user community requirements? NCCS, with the support of Lyn Gerner, has generated an Analysis of Alternatives for the resource management software. Includes a mapping of capabilities to requirements, and a cost/benefit analysis. A recommendation has been made to the NCCS management. A decision will be made in a short time period (weeks). Users will be notified of any changes as time goes on. The goal is to make any change as transparent to the users as possible. Stay tuned! NCCS User Forum, June 25, 2013

Discover Intel Phi Many Integrated Core (MIC) Coprocessor Discover’s 480 SCU8 nodes have one Intel Phi coprocessor per node. Direct use of Phi is now available via “native” queue. Offload use available on all other SCU8 nodes. Coming soon: method to specify SCU8 nodes for Intel Phi Offload use. Training: A number of NCCS Brown Bags so far. Content available on NCCS web site. Training will be repeated upon request. Contact support@nccs.nasa.gov NCCS User Forum, June 25, 2013

Contact support@nccs.nasa.gov . NCCS Code Porting Efforts for Discover Intel Phi Many Integrated Core (MIC) NCCS staff, SSSO, vendor, and external community members are currently working on the following codes: GEOS-5 components (GMAO, NOAA/GFDL, et al.) GRAIL (high degree/high order solutions of lunar gravity field) Ice Melt code (Kwo-Sen Kuo et al.) WRF (with NOAA/ESL, NCAR, et al.) Contact support@nccs.nasa.gov . GRAIL code uses MKL libraries – found out some good lessons learned; we believe Doris put this in the Brown Bag. NCCS User Forum, June 25, 2013

Remote Visualization Prototype A prototype Remote Visualization platform is being investigated for the UVCDAT advanced visualization application. Goal: applications such as UVCDAT would run on NCCS resources, with displays and controls on user desktop. Should dramatically speed up remote visualization for users. Requires careful network and security configuration to safeguard NCCS resources and user desktop. Tests of supporting technologies are underway. Following evaluation of alternatives, will move into deployment and “pioneer” phase; stay tuned. NCCS User Forum, June 25, 2013

NCCS Operations & User Services Update Ellen Salmon Upcoming & Status Ongoing Investigations NCCS Brown-Bag and SSSO Seminars

Upcoming (1 of 2) Discover resources: SCU8 Sandy Bridge and SCU8 Intel Phi Coprocessors (MICs): Intel Phi native use now available via “native” queue. Want help porting code to the Intel Phis (either “offload” or “native”)? Contact support@nccs.nasa.gov. SCU9 Sandy Bridge : Somewhat reduced total Discover compute cores until late July/August (when pilot usage starts). Targeting August for general availability. Discover GPFS nobackup: Following May’s GPFS parameter changes, continuing to deploy additional “NetApp” nobackup disk space in a measured fashion. Moving nobackup directories to disk array types best suited for their workloads. Watch for more info on GPFS metadata storage upgrade as acquisition progresses. NCCS User Forum, June 25, 2013

Upcoming (2 of 2) Discover InfiniBand OFED (software stack) changes: Continuing the rolling migration to required, upgraded, InfiniBand OFED (software stack). 2/3 of Discover is already on the new OFED release All computational nodes on InfiniBand Fabric 2: SCU7 Westmere nodes, and all SCU8 and SCU9 Sandy Bridge nodes Rolling, announced, gradual changeovers of other parts of Discover (e.g., via PBS queues or properties). SCUs 1 through 4, handful of remaining interactive and Dali nodes Recompile is recommended. Some codes work fine without a recompile. Other codes require a recompile to take advantage of some advanced features. Planned Outages (to date): Discover downtime (full day) sometime during Field Campaign hiatus (July 16 – August 4) Upgrade remaining (InfiniBand Fabric 1) I/O nodes to Intel Xeon Sandy Bridge nodes Other NCCS systems may also “piggyback” on this downtime, stay tuned. NCCS User Forum, June 25, 2013

Discover Compiler / Library Recommendations Use current libraries and compilers to get many benefits: Executables created with older versions can experience problems running on Discover’s newest hardware. Often, simply rebuilding with current compilers and libraries (especially Intel MPI 4.x and later) fixes these issues. Current versions can enable use of new features like Sandy Bridge nodes’ advanced vector extensions (AVX) for improved performance. Use of current versions greatly increases NCCS staff’s abilities to track down other problems… Especially when seeking vendor support to fix problems. Use of current compilers and libraries also greatly increases NCCS and vendor staff abilities to track down other problems, and especially to get vendor attention and support to fix problems. The Sandy Bridge processor family features: Intel Advanced Vector eXtensions Intel AVX is a wider, new 256-bit instruction set extension to Intel SSE (Streaming 128-bit SIMD Extensions), hence higher peak FLOPS with good power efficiency. Designed for applications that are floating point intensive. NCCS User Forum, June 25, 2013

Archive Tape Media Issue Crinkled/damaged archive tapes caused a number of “Please examine/replace these archive files” tickets in the last several months. Damage is no longer occurring. Oracle identified faulty tape motor on a single tape drive as the cause, and: Replaced that tape drive and 11 others to proactively remediate the problem, prior to pinpointing the cause. Providing data recovery services to extract usable data from damaged tapes. Replacing all tape media affected by the problem. ~12 tapes sent, so far, to Oracle Tape Recovery Services. So far 5 have been returned. Recovered all data on 2 of the tapes and much of the data on the other 3 tapes. Larger list of tapes was damaged, but NCCS staff was able to recover files from those because second copies of files still existed on separate (unaffected) tapes. Reminder: dmtag –t 2 <filename> to get two tape copies of archive files, where needed. Fred Reitz, via 21 June 2013 email: We’ve received 5 of the 12 tapes back now. The latest of these didn’t have quite a much data recovered as the first four. We’ll probably send the vendor one more damaged tape, but we don’t want to do that until they make progress on the remaining tapes we’ve already sent them, since some of the data on the 13th tape is usable. NCCS User Forum, June 25, 2013

Ongoing Discover Investigations GPFS and I/O GPFS slowness due to heavy parallel I/O activity. Significant GPFS parameter changes made in May to help address issues, but much additional work remains. E.g., many-month effort: background “rebalancing” of data among filesystems to better accommodate workloads. New Sandy Bridge I/O nodes’ capabilities will help. More cores per I/O node—16 cores, rather than 8—improved concurrency. More total memory channels—4, rather than 3, per “socket”—better for data moving. More total I/O “lanes” per I/O node. Heavy GPFS metadata workloads. Acquisition in progress for new metadata storage. Target: improve responsiveness in handling many concurrent small, random I/O actions (e.g., for directories, filenames, etc.). PBS “Ghost Jobs” Extremely rare due to successful mitigation strategy–report jobid if you see one! Efforts are continuing on the GPFS and I/O performance issues (also mentioned a few slides ago). NCCS User Forum, June 25, 2013

NCCS Brown Bag Seminars ~Twice monthly in GSFC Building 33 (as available). Content is available on the NCCS web site following seminar: https://www.nccs.nasa.gov/list_brown_bags.html Current emphasis: Using Intel Phi (MIC) Coprocessors Current/potential Intel Phi Brown Bag topics: Intro to Intel Phi (MIC) Programming Models Programming on the Intel MIC Part 2 – How to run MPI applications Advanced Offload Techniques for Intel Phi Maximum Vectorization Performance Analysis via the VTune™ Amplifier Performance Tuning for the Intel Phi NCCS User Forum, June 25, 2013

Questions & Answers NCCS User Services: support@nccs. nasa Questions & Answers NCCS User Services: support@nccs.nasa.gov 301-286-9120 https://www.nccs.nasa.gov

Contact Information NCCS User Services: support@nccs.nasa.gov 301-286-9120 https://www.nccs.nasa.gov http://twitter.com/NASA_NCCS Thank you NCCS User Forum, June 25, 2013

Supporting Slides NCCS User Forum, June 25, 2013

NCCS Brown Bag: Climate Data Analysis and Visualization using UVCDAT Climate scientist Jerry Potter (606.2) and analysis/visualization expert Tom Maxwell (606.2) demonstrated how climate scientists can use the open-source Ultrascale Visualization Climate Data Analysis Tools (UVCDAT) to explore and analyze climate model output, such as the NetCDF-formatted model output files produced by GEOS-5 system and MERRA. A UVCDAT Vistrails demonstration screenshot, displaying (clockwise from top center): the VisTrails workflow builder, a VisTrails “spreadsheet” of visualizations, console window, and UVCDAT modules in use (e.g., vtDV3D and the matplotlib Python module). PoC: Thomas.Maxwell@nasa.gov The UVCDAT tools feature workflow interfaces, interactive 3D data exploration, automated provenance generation, parallel task execution, and streaming data parallel pipelines, and can enable hyperwall and stereo visualization. UVCDAT is the new Earth System Grid analysis framework designed for climate data analysis, and it combines VisTrails, CDAT and ParaView. Tom Maxwell developed vtDV3D, a new module included with recent UVCDAT and VisTrails releases, which provides user-friendly workflow interfaces for advanced visualization and analysis of climate data via a simple GUI interface designed for scientists who have little time to invest in learning complex visualization frameworks. Slides from NCCS Brown Bag Presentations can be downloaded at: http://www.nccs.nasa.gov/list_brown_bags.html Ultrascale Visualization Climate Data Analysis Tools (UV-CDAT) http://uvcdat.llnl.gov/ Interactive 3D data exploration (vtDV3D) http://portal.nccs.nasa.gov/DV3D/ VisTrails http://www.vistrails.org/ ParaView - Open Source Scientific Visualization http://www.paraview.org/ August 10, 2012

NASA Center for Climate Simulation Supercomputing Environment Supported by HQ’s Science Mission Directorate Discover Linux Supercomputer, June 2013: Intel Xeon nodes ~3,200 nodes ~43,000 cores Peak ~624 TFLOPS general purpose 97 TB memory (2 or 4 GB per core) Coprocessors: Intel Phi MIC 480 units ~485 TFLOPS NVIDIA GPUs 64 units ~33 TFLOPS Shared disk: 7.2 PB IB SAN GPFS I/O Servers JIBB JCSDA (batch) ~39 TF peak Westmere 10 GbE JIBB Login Data Portal Dirac Archive Parallel DMF cluster SANs Storage Area Network (SAN) InfiniBand (IB) SCUs1,2,3,4 ~139 TF peak ~3 TF peak Base (offline) Dali Analysis Nodes (interactive) Discover Login SCU 7 ~161 TF peak SCU 8 Sandy Bridge ~160 TF peak “Phi” MIC ~485 TF peak Dali-GPU Warp (GPU) Tape Libraries Discover (batch) 1 2 3 4 SCU 9 5 Discover Linux Cluster, JIBB, Data Portal, and Dirac all use Intel Xeon processors. SCU – Scalable Computational Unit DMF – SGI’s Data Migration Facility software (Hierarchical Storage Management software for the data archive) JCSDA – Joint Center for Satellite Data Assimilation TF and TFLOPS – trillion floating point operations per second March 2013: Discover uses several different versions of Xeon chips: currently offline: “Base” – “Dempsey” (circa 2006/2007, 4 cores per dual-socket, dual-core node, 2 floating point operations (FLOPS) per clock, 3.2 GHz) SCU1 & SCU2: “Westmere” (circa 2011/12, 12 cores per dual-socket, quad-core node, 4 FLOPS per clock, 2.8 GHz, improved memory controller; replaced previous ca. 2007 “Woodcrest”) SCU3 & SCU4: “Westmere” (circa 2011, 12 cores per dual-socket, quad-core node, 4 FLOPS per clock, 2.8 GHz, improved memory controller; replaced previous ca. 2008 “Harpertown”) SCU5 & SCU6: “Nehalem” (circa 2009, 8 cores per dual-socket, quad-core node, 4 FLOPS per clock, 2.8 GHz, improved memory controller) SCU7: “Westmere” (circa 2010/11, 12 cores per dual-socket, 6-core node, 4 FLOPS per clock, 2.8 GHz, improved memory controller) “warp” nodes (circa 2011) Intel Xeon “Nehalem” with 2 NVIDIA general-purpose Graphical Processing Units (GPUs) per node SCU8: “Sandy Bridge” (circa 2012), 16 cores per dual-socket, 8-core nodes, 2.6 GHz, 8 FLOPS per clock (compiled with –avx flag), SCU8: Intel Phi Coprocessors (Many Integrated Core/MICs), November 2012, 60 cores each, 1 GHz, ~1 TFLOP per co-processor unit Initial Dali nodes use “Dunnington” Xeon chips (circa 2008, eight 16-core nodes, 2.4 GHz, each node has 256 GB memory, 10 Gbps Ethernet) Dali-gpu nodes use “Westmere” and NVIDIA general-purpose Graphical Processing Units (GPUs). JCSDA uses Westmere (circa 2010/2011) Seven servers for Dirac server cluster, to accommodate additional I/O load from model resolution increases, CMIP5 simulations, and increased computational capacity on Discover. Dali and Dali-GPU Analysis 12- and 16-core nodes 16 GB memory per core Dali-GPU has NVIDIA GPUs Dirac Archive 0.9 PB disk ~70 PB robotic tape library Data Management Facility (DMF) space management Data Portal Data Sharing Services Earth System Grid OPeNDAP Data download: http, https, ftp Web Mapping Services (WMS)server JIBB Linux cluster for Joint Center for Satellite Data Assimilation community March 1, 2013 NCCS User Forum, June 25, 2013

NCCS Metrics Slides (Through May 31, 2013)

NCCS Discover Linux Cluster Utilization Normalized to 30-Day Month Mar. 2013: SCU8 initial Sandy Bridge use in special-purpose queues. Oct. 2012: SCU8 initial Sandy Bridge “burn-in” testing. TFLOPS – Trillion Floating Point Operations Per Second June 12, 2013

Discover Linux Cluster Expansion Factor Expansion Factor = (Queue Wait + Runtime) / Runtime June 12, 2013

Discover Linux Cluster Downtime October 2012 scheduled unavailability reflected downtime and recovery from multi-building electrical work and precautions for Post Tropical Cyclone Sandy. July 2012 unavailability reflected downtime and recovery from multi-day weather-related Goddard-wide power outage. Scheduled Unscheduled June 12, 2013

NCCS Mass Storage As of late May, 2012, NCCS changed the Mass Storage default so that two tape copies are made only for files for which two copies have been explicitly requested. NCCS is gradually reclaiming second-copy tape space from legacy files for which two copies have not been requested. 1 Explanation of plots on NCCS Mass Storage graph (all units are Petabytes==1015 bytes): Total Media Capacity - Media capacity defined to DMF (used + unused), plus capacity of any media that is not yet defined to DMF, including media not yet in tape libraries. Media capacity for unused cartridges is the uncompressed capacity (on NCCS user data, tape drives typically achieve 1.5x compression) Tape Library Available Capacity - Total Media Capacity On-Hand plus potential capacity of empty slots if filled with most-dense media that current NCCS tape drives can write in that tape library. Empty slots obtained via ACSLS "query LSM" command; Free slots were counted only from tape libraries that will continue to be used (Bldg 28 E220 old STK libraries will be excessed) As of May 31, 2012, we are also no longer counting the capacity of free slots in old Bldg 32 STK “Powderhorn” libraries, because these, too, will be excessed when data are copied from there to the new libraries. Media Capacity Defined to DMF Archive = total DMF data stored + capacity of available media defined to DMF, written at capacity of appropriate current tape drives Total Data Stored – includes primary (unique) copies and risk-mitigation duplicates From DMF ls (library server) report of total stored Unique Tape Data Stored – Includes “soft-deleted” data within 28-day grace period before “hard deletion” From DMF ls (library server) utility report Unique Data Stored – Total data from file system perspective From DMF dmscanfs utility output June 12, 2013

NCCS Earth System Grid Federation Services for NASA’s and Peer Organizations’ Climate Simulations, Selected NASA Observations, and Selected Analyses GISS and GMAO researchers are using the NCCS Discover cluster for simulations in support of the fifth phase of the Coupled Model Intercomparison Project (CMIP5), which supports the Intergovernmental Panel on Climate Change’s Fifth Climate Assessment (IPCC AR5) and related research. The research community accesses data via the NCCS’s Earth System Grid Federation (ESGF) node http://esgf.nccs.nasa.gov/ . NASA climate simulation CMIP5 data (from GISS and GMAO when available) are available to the climate community via the Earth System Grid Data Node on the NCCS Data Portal: http://esg.nccs.nasa.gov/ CMIP5 is the 5th phase of the Coupled Model Intercomparison Project established by the World Climate Research Programme’s Working Group on Coupled Modeling. Key customers for CMIP5: researchers supporting Intergovernmental Panel on Climate Change’s Fifth Climate Assessment (IPCC AR5) IPCC was established in 1988 by the United Nations Environment Programme and the UN’s World Meteorological Organization CMIP5 Includes Near-term (also called decadal) and Long-term-focus experiments CMIP5 Long-term experiment baselines: (1) pre-industrial control runs; (2) historical runs; (3) AMIP simulations (systematic Atmospheric Model Intercomparison (Project) to evaluate atmospheric model performance and identify errors) GISS is providing Long-term-focus experiments for CMIP5: GISS uses ModelE coupled atmosphere-ocean simulations, at 2x2.5 degrees; running multiple ensembles; future additional special high-resolution runs will use 1° cubed sphere dynamical core. ModelE2-R uses the GISS Dynamic Ocean Model (Russell) ModelE2-H uses the Hybrid Coordinate Ocean Model (HYCOM) GMAO is providing near-term-focused decadal simulations PCMDI ESG Gateway for CMIP5/AR5 data: http://pcmdi3.llnl.gov/esgcet/home.htm Increases in download activity in May and June 2012 reflect downloads by researchers working to meet the July 31, 2012 deadline for papers supporting the IPCC AR5 Working Group 1, “The Physical Science Basis” (https://www.ipcc-wg1.unibe.ch/) Terabytes (Trillion Bytes) Thousands of Files The NCCS Data Portal serves data in CF-compliant format to support these Earth System Grid Federation Projects: CMIP5: Long-term NASA GISS simulations, and decadal simulations from NASA’s GMAO; NOAA NCEP; and COLA (Center for Ocean-Land-Atmosphere Studies). Obs4MIPs: selected satellite observations from NASA’s GPCP, TRMM, CERES-EBAF, and Terra MODIS. Ana4MIPs: analyses from NASA/GMAO’s Modern Era Retrospective-Analysis for Research and Applications (MERRA). NEX-DCP30: bias-corrected, statistically downscaled (0.8-km) CMIP5 climate scenarios for the conterminous United States. June 12, 2013

Dataportal Utilization – File Downloads Download Mechanism Downloaded Data Files Available Data (TB) March April May Web 5,068,450 3,776,964 4,771,425 89 ESGF 838,721 493,281 4,219,467 153 FTP 517,941 160,465 98,902 143 GDS 2,396,618 1,827,154 1,143,213 91

Dataportal Utilization – Data Accessed Download Mechanism Data Accessed (TB) Available Data (TB) March April May Web 11.7 8.3 9.8 89 ESGF 22.7 11.6 29.1 153 FTP 15.1 11.0 5.4 143

Some Discover Updates Slides (Intel Sandy Bridge and Intel Phi MIC) from September 25, 2012 NCCS User Forum NCCS User Forum, June 25, 2013

Discover SCU8 Sandy Bridge: AVX The Sandy Bridge processor family features: Intel Advanced Vector eXtensions Intel AVX is a wider, new 256-bit instruction set extension to Intel SSE (Streaming 128-bit SIMD Extensions), hence higher peak FLOPS with good power efficiency. Designed for applications that are floating point intensive. NCCS User Forum, June 25, 2013

Discover SCU8 Sandy Bridge: User Changes Compiler flags to take advantage of Intel AVX (for Intel compilers 11.1 and up) -xavx: Generate an optimized executable that runs on the Sandy Bridge processors ONLY -axavx –xsse4.2: Generate an executable that runs on any SSE4.2 compatible processors but with additional specialized code path optimized for AVX compatible processors (i.e., run on all Discover processors) Application performance is affected slightly compared to with “-xavx” due to the run-time checks needed to determine which code path to use NCCS User Forum, June 25, 2013

Different executable (compiled with –xavx on Sandy Bridge) Sandy Bridge vs.Westmere: Application Performance Comparison – Preliminary Sandy Bridge Execution Speedup Compared to Westmere WRF NMM 4km Same executable Different executable (compiled with –xavx on Sandy Bridge) Core to Core Node to Node 1.15 1.50 1.35 1.80 GEOS5 GCM half degree 1.23 1.64 1.26 1.68 NCCS User Forum, June 25, 2013

Discover SCU8 – Sandy Bridge Nodes 480 IBM iDataPlex Nodes, each configured with Dual Intel SandyBridge 2.6 GHz processors (E5-2670) 20 MB Cache 16 cores per node (8 cores per socket) 32 GB of RAM (maintain ratio of 2 GB/core) 8 floating point operations per clock cycle Quad Data Rate Infiniband SLES11 SP1 Advanced Vector Extensions (AVX) New instruction set (http://software.intel.com/en-us/avx/) Just have to recompile Discover Linux Cluster, JIBB, Data Portal, and Dirac all use Intel Xeon processors. SCU – Scalable Computational Unit DMF – SGI’s Data Migration Facility software (Hierarchical Storage Management software for the data archive) JCSDA – Joint Center for Satellite Data Assimilation TFLOP – trillion floating point operations per second January 2012: Discover uses 3 different versions of Xeon chips: “Base” – “Dempsey” (circa 2006/2007, 4 cores per dual-socket, dual-core node, 2 floating point operations (FLOPS) per clock, 3.2 GHz) SCU1 & SCU2: “Westmere” (circa 2011/12, 12 cores per dual-socket, quad-core node, 4 FLOPS per clock, 2.8 GHz, improved memory controller; replaced previous ca. 2007 “Woodcrest”) SCU3 & SCU4: “Westmere” (circa 2011, 12 cores per dual-socket, quad-core node, 4 FLOPS per clock, 2.8 GHz, improved memory controller; replaced previous ca. 2008 “Harpertown”) SCU5 & SCU6: “Nehalem” (circa 2009, 8 cores per dual-socket, quad-core node, 4 FLOPS per clock, 2.8 GHz, improved memory controller) SCU7: “Westmere” (circa 2010/11, 12 cores per dual-socket, 6-core node, 4 FLOPS per clock, 2.8 GHz, improved memory controller) “warp” nodes (circa 2011) Intel Xeon “Nehalem” with 2 NVIDIA general-purpose Graphical Processing Units (GPUs) per node Initial Dali nodes use “Dunnington” Xeon chips (circa 2008, eight 16-core nodes, 2.4 GHz, each node has 256 GB memory, 10 Gbps Ethernet) Dali-gpu nodes use “Westmere” and NVIDIA general-purpose Graphical Processing Units (GPUs). JCSDA uses Westmere (circa 2010/2011) Seven additional servers for Dirac server cluster, to accommodate additional I/O load from model resolution increases, CMIP5 simulations, and increased computational capacity on Discover. NCCS User Forum, June 25, 2013

Discover SCU8 – Many Integrated Cores (MIC) The NCCS will be integrating 240 Intel MIC Processors later this year (October) ~1 TFLOP per co-processor unit PCI-E Gen3 connected Will start with 1 per node in half of SCU8 How do you program for the MIC? Full suite of Intel Compilers Doris Pan and Hamid Oloso have access to a prototype version and have developed experience over the past 6 months or so Different usage modes; common ones are “offload” and “native” Expectation: Significant performance gain for highly parallel, highly vectorizable applications Easier code porting using native mode, but potential for better performance using offload mode NCCS/SSSO will host Brown Bags and training sessions soon! NCCS User Forum, June 25, 2013

Sandy Bridge Memory Bandwidth Performance STREAMS Copy Benchmark comparison of the last three processors Nehalem (8 cores/node) Westmere (12 cores/node) SandyBridge (16 cores/node) NCCS User Forum, June 25, 2013

SCU8 Finite Volume Cubed-Sphere Performance JCSDA (Jibb): Westmere Discover SCU3+/SCU4+: Westmere Discover SCU8: Sandy Bridge Comparison of the performance of the GEOS-5 FV-CS Benchmark 4 shows an improvement of 1.3x to 1.5x over the previous systems’ processors. NCCS User Forum, June 25, 2013

Discover: Large “nobackup” augmentation Discover NOBACKUP Disk Expansion 5.4 Petabytes RAW (about 4 Petabytes usable) Doubles the disk capacity in Discover NOBACKUP NetApp 5400 http://www.netapp.com/us/products/storage-systems/e5400/ 3 racks and 6 controller pairs (2 per rack) 1,800 by 3 TB disk drives (near line SAS) 48 by 8 GB FC connections Have performed a significant amount of performance testing on these systems First file systems to go live this week If you need some space or have an outstanding request waiting, please let us know (email support@nccs.nasa.gov). NCCS User Forum, June 25, 2013