NCCS User Forum December 7, 2010. Agenda – December 7, 2010 Welcome & Introduction (Phil Webster, CISTO Chief) Current System Status (Fred Reitz, NCCS.

Slides:

Advertisements

Similar presentations

Implementing Tableau Server in an Enterprise Environment

Advertisements

IBM SMB Software Group ® ibm.com/software/smb Maintain Hardware Platform Health An IT Services Management Infrastructure Solution.

XenData SX-520 LTO Archive Servers A series of archive servers based on IT standards, designed for the demanding requirements of the media and entertainment.

EUFORIA FP7-INFRASTRUCTURES , Grant JRA4 Overview and plans M. Haefele, E. Sonnendrücker Euforia kick-off meeting 22 January 2008 Gothenburg.

MUNIS Platform Migration Project WELCOME. Agenda Introductions Tyler Cloud Overview Munis New Features Questions.

NCCS User Forum June 19, Agenda Introduction Discover Updates NCCS Operations & User Services Updates Question & Answer Breakout Session: –Climate.

Near-Term NCCS & Discover Cluster Changes and Integration Plans: A Briefing for NCCS Users October 30, 2014.

Information Technology Center Introduction to High Performance Computing at KFUPM.

UNCLASSIFIED: LA-UR Data Infrastructure for Massive Scientific Visualization and Analysis James Ahrens & Christopher Mitchell Los Alamos National.

Silicon Graphics, Inc. Poster Presented by: SGI Proprietary Technologies for Breakthrough Research Rosario Caltabiano North East Higher Education & Research.

Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NCCS User Forum September 14, Agenda – September 14, 2010 Welcome & Introduction (Phil Webster, CISTO Chief) Current System Status (Fred Reitz,

Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.

SUMS Storage Requirement 250 TB fixed disk cache 130 TB annual increment for permanently on- line data 100 TB work area (not controlled by SUMS) 2 PB near-line.

Advanced Scientific Visualization Paul Navrátil 28 May 2009.

Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.

MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 11 Managing and Monitoring a Windows Server 2008 Network.

Welcome Course 20410B Module 0: Introduction Audience

CPP Staff - 30 CPP Staff - 30 FCIPT Staff - 35 IPR Staff IPR Staff ITER-India Staff ITER-India Staff Research Areas: 1.Studies.

Capacity Planning in SharePoint Capacity Planning Process of evaluating a technology … Deciding … Hardware … Variety of Ways Different Services.

Remote Visualization of Large Datasets with MIDAS & ParaViewWeb Web3D – Paris 2011 Julien Jomier, Kitware

Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.

Acceleratio Ltd. is a software development company based in Zagreb, Croatia, founded in We create innovative software solutions for SharePoint,

Welcome Thank you for taking our training. Collection 6421: Configure and Troubleshoot Windows Server® 2008 Network Course 6690 – 6709 at

Introduction to HP LoadRunner Getting Familiar with LoadRunner >>>>>>>>>>>>>>>>>>>>>>

CCSM Portal/ESG/ESGC Integration (a PY5 GIG project) Lan Zhao, Carol X. Song Rosen Center for Advanced Computing Purdue University With contributions by:

Bob Thome, Senior Director of Product Management, Oracle SIMPLIFYING YOUR HIGH AVAILABILITY DATABASE.

Net Optics Confidential and Proprietary Net Optics appTap Intelligent Access and Monitoring Architecture Solutions.

SGI Proprietary SGI Update IDC HPC User Forum September, 2008.

An Introduction to IBM Systems Director

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

DOE BER Climate Modeling PI Meeting, Potomac, Maryland, May 12-14, 2014 Funding for this study was provided by the US Department of Energy, BER Program.

Module 7: Fundamentals of Administering Windows Server 2008.

NCCS User Forum 15 May NCCS User Forum5/15/20082 Agenda Welcome & Introduction Phil Webster NCCS Current System Status Fred Reitz, Operations Manager.

D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.

Copyright © 2012 Axceleon Intellectual Property All rights reserved HPC User Forum, Dearborn MI. Our Focus: Enable HPC solutions in the Cloud for our Customer.

NCCS NCCS User Forum 24 March NCCS Agenda Welcome & Introduction Phil Webster, CISTO Chief Current System Status Fred Reitz, Operations Manager.

NCCS User Forum June 15, Agenda Current System Status Fred Reitz, HPC Operations NCCS Compute Capabilities Dan Duffy, Lead Architect User Services.

Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,

Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.

Looking Ahead: A New PSU Research Cloud Architecture Chuck Gilbert - Systems Architect and Systems Team Lead Research CI Coordinating Committee Meeting.

NCCS User Forum 11 December GSFC NCCS NCCS User Forum12/11/082 Agenda Welcome & Introduction Phil Webster, CISTO Chief Current System Status Fred.

NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.

CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)

GVis: Grid-enabled Interactive Visualization State Key Laboratory. of CAD&CG Zhejiang University, Hangzhou

NCCS User Forum July 19, Agenda Introduction (Lynn Parnell, NCCS HPC Lead) Discover Update Archive Update User Services Update Data Services & Analysis.

VMware vSphere Configuration and Management v6

Hadoop IT Services Hadoop Users Forum CERN October 7 th,2015 CERN IT-D*

NCCS NCCS User Forum 22 September NCCS Agenda Welcome & Introduction Phil Webster, CISTO Chief Current System Status Fred Reitz, Operations Lead.

GSFC NCCS NCCS User Forum 25 September GSFC NCCS NCCS User Forum9/25/082 Agenda Welcome & Introduction Phil Webster, CISTO Chief Scott Wallace,

1 Accomplishments. 2 Overview of Accomplishments  Sustaining the Production Earth System Grid Serving the current needs of the climate modeling community.

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

R. Krempaska, October, 2013 Wir schaffen Wissen – heute für morgen Controls Security at PSI Current Status R. Krempaska, A. Bertrand, C. Higgs, R. Kapeller,

Database CNAF Barbara Martelli Rome, April 4 st 2006.

OpenStack Chances and Practice at IHEP Haibo, Li Computing Center, the Institute of High Energy Physics, CAS, China 2012/10/15.

Climate-SDM (1) Climate analysis use case –Described by: Marcia Branstetter Use case description –Data obtained from ESG –Using a sequence steps in analysis,

Windows Certification Paths OR MCSA Windows Server 2012 Installing and Configuring Windows Server 2012 Exam (20410) Administering Windows Server.

January 2010 – GEO-ISC KickOff meeting Christian Gräf, AEI 10 m Prototype Team State-of-the-art digital control: Introducing LIGO CDS.

Architecture of a platform for innovation and research Erik Deumens – University of Florida SC15 – Austin – Nov 17, 2015.

Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb

INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.

High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.

VisIt Project Overview

Working With Azure Batch AI

Overview Introduction VPS Understanding VPS Architecture

Shared Research Computing Policy Advisory Committee (SRCPAC)

Dev Test on Windows Azure Solution in a Box

Haiyan Meng and Douglas Thain

Data Management Components for a Research Data Archive

Presentation transcript:

NCCS User Forum December 7, 2010

Agenda – December 7, 2010 Welcome & Introduction (Phil Webster, CISTO Chief) Current System Status (Fred Reitz, NCCS Operations Manager) SCU7 and Other NCCS Systems Updates (Dan Duffy, NCCS Lead Architect) Analysis Software Update (Tom Maxwell, NCCS Analysis Lead) User Services Update (Tyler Simon, NCCS User Services Group) Questions & Wrap-Up (Phil Webster) NCCS User Forum Dec. 7, 20102

Accomplishments Discover Linux cluster: SCU7 coming on line (159 TFLOPs peak) –Power and cooling issues being addressed by Dell at high levels Dirac Mass Storage archive (DMF) –Disk cache nearly quadrupled, to 480 TB –Server moved to Distributed DMF cluster Science support –GRIP field campaign Genesis and Rapid Intensification Processes, completed September 2010 Provided monitoring, troubleshooting for timely execution of jobs Supported forecast team via image and data download services on Data Portal –IPCC AR5 (ongoing) Intergovernmental Panel on Climate Change – Fifth Reassessment Climate jobs running on Discover Data publication via Earth System Grid “Data Node” on DataPortal NCCS User Forum Dec. 7, 20103

Agenda – December 7, 2010 Welcome & Introduction (Phil Webster, CISTO Chief) Current System Status (Fred Reitz, NCCS Operations Manager) SCU7 and Other NCCS Systems Updates (Dan Duffy, NCCS Lead Architect) Analysis Software Update (Tom Maxwell, NCCS Analysis Lead) User Services Update (Tyler Simon, NCCS User Services Group) Questions & Wrap-Up (Phil Webster) NCCS User Forum Dec. 7, 20104

Operations and Maintenance NASA Center for Climate Simulation (NCCS) Project Fred Reitz December 7, 2010

Accomplishments Discover –Completed GRIP mission support –Continued SCU7 preparations Installed SCU7 Intel Xeon “Westmere” hardware (159 TF peak) Installed system image (SLES 11) Integrated SCU7 into existing InfiniBand fabric with SCU5, SCU6 Troubleshooting power, cooling issues Mass Storage –Archive upgrade – placed new hardware into production New machines, fast processors Interactive logins: Two Nehalem servers Parallel Data Movers serving NCCS’tape drives: Three Nehalem servers DMF Daemon processes: Two Westmere servers in a high-availability configuration NFS-serving to Discover: Two Nehalem servers Scalable Palm was too expensive to maintain Increased disk storage –Troubleshooting issues One of the first deployed in production, has the most demanding workload to date DataPortal –New database service (upgraded hardware) –Database service accessible via Dali, Discover login nodes –Additional storage (including datastage) 6

Discover Total CPU Consumption Past 12 Months (CPU Hours) 7

Discover Total Utilization Past 12 Months (Percentage) 8

Discover Utilization by Architecture Past 12 Months 9

Discover Workload Distribution and Utilization by Architecture – October

Discover CPU Consumption by Group Past 12 Months 11

Discover CPU Consumption by Top Queues Past 12 Months 12

Discover Job Analysis by Job Size and Queue October

Discover Availability Past 12 Months 14

Archive Data Stored NCCS pays an SGI license based on data stored (upgrading from15 to 20 PBs) Please remove any data you know you do not need 15

Dataportal File Downloads 16

NCCS Network Availability Past 12 months 17

Archive Issues 1.Two CXFS bugs (server panic, CXFS failover does not complete) –Sending diagnostic information to SGI –SGI providing recommendations 2.Slow failover during actual problems (causes delayed resource availability) –Sending diagnostic information to SGI –SGI providing recommendations 3.System-level token lock (causes filesystem hangs) –Fix applied –Next step: Address new issue on Discover (see next slide), then reactivate NFS edge servers, then mount archive filesystems on Dali 4.Inappropriate failover when nothing was wrong –Automated monitoring errors –Implemented SGI-provided monitoring script changes –No further occurrences General SGI support-related comments –Eager to resolve problems –Actual SGI code developers providing assistance –Support personnel very responsive 18

Discover Issues 1.Intermittent slow I/O –Can lead to system hangs –GPFS designed for larger files, streaming I/O –NCCS is working with IBM, monitoring system and sending diagnostics to IBM as problems occur –Plan to implement partial fix 15 December –Please work with NCCS, SIVO if your application exhibits high file open/close activity or uses many small files 2.Jobs take longer to begin execution (insufficient capacity) –SCU7 will help 3.Users running nodes out of memory can cause GPFS hangs –Please work with NCCS, SIVO if your application runs nodes out of memory 4.New intermittent data corruption issue –Copy of data from archive to Discover via NFS sometimes results in null blocks 19

Upcoming Changes LDAP hardware upgrade Data Exploration Theater performance, functionality Discover –I/O performance changes –PBS v10 –SLES 11 –SCU7 DataPortal – iRODS Mass Storage – Additional updates for increased stability Security – Firewall hardware, software, rule set changes Network – IP address changes 20

Agenda – December 7, 2010 Welcome & Introduction (Phil Webster, CISTO Chief) Current System Status (Fred Reitz, NCCS Operations Manager) SCU7 and Other NCCS Systems Updates (Dan Duffy, NCCS Lead Architect) Analysis Software Update (Tom Maxwell, NCCS Analysis Lead) User Services Update (Tyler Simon, NCCS User Services Group) Questions & Wrap-Up (Phil Webster) NCCS User Forum Dec. 7,

SCU7 Updates Current status –System is physically installed, configured, and attached to the Discover cluster –Running SLES11 operating system (upgrade over the current version on Discover) –Will be running PBSv10 when general accessible by all users –System is not in production; it is in a dedicated pioneer state running large scale GEOS runs Current issues –Power and cooling concerns must be addressed prior to production NCCS User Forum Dec. 7,

Performance Looks Good! Running 8 cores per node gives equivalent performance as on the Nehalem nodes (as expected) Running 12 cores per node results in about a 20% slowdown versus the same number of cores as the Nehalem nodes (also as expected from previous measurements) No difference in performance across different versions of MPI NCCS User Forum Dec. 7,

What’s next with SCU7? NCCS and Dell are currently and very actively working on the power and cooling issues System will be maintained in a dedicated pioneer phase for now –Need to do this to minimize the disruption caused by changes to the system Target general availability by the end of January NCCS User Forum Dec. 7,

Agenda – December 7, 2010 Welcome & Introduction (Phil Webster, CISTO Chief) Current System Status (Fred Reitz, NCCS Operations Manager) SCU7 and Other NCCS Systems Updates (Dan Duffy, NCCS Lead Architect) Analysis Software Update (Tom Maxwell, NCCS Analysis Lead) User Services Update (Tyler Simon, NCCS User Services Group) Questions & Wrap-Up (Phil Webster) NCCS User Forum Dec. 7,

Ultra-Analysis Requirements Parallel streaming analysis pipelines. Data parallelism. Task parallelism. Parallel IO. Remote interactive execution. Advanced visualization. Provenance capture. Interfaces for scientists. Workflow construction tools. Visualization interfaces. NCCS User Forum Dec. 7,

BER Earth System Modeling Proposals Advanced Scientific Visualization of Ultra-Large Datasets 27 TopicPILocation UV-CDAT development Dean WilliamsLLNL, PCMDI Visit developmentWes BethelLBNL NCO parallelizationRobert JacobANL

Climate Data Analysis Toolkit Integrated environment for data processing, visualization, & analysis. Integrates numerous software modules in python shell. Open source with a large diverse set of contributors. Analysis environment for ESG LLNL. 28

Earth System Grid 29

UV-CDAT Architecture 30

ParaView Open-source, multi-platform visualization application. –Developed by Kitware, Inc. (authors of VTK). Designed to process large data sets. Built on parallel VTK. Client-server architecture: –Client: Qt based desktop application. –Data Server: MPI based parallel application. Parallel streaming IO & pipeline for data processing. Large library of existing filters. Highly extensible using plugins. No existing climate-specific tools or algorithms. Data Server being integrated into ESG. 31

ParaView Client Qt desktop application: Controls data access, processing, analysis, and viz. 32

ParaView Applications Polar Vortex Breakdown Simulation Cross Wind Fire Simulation Golevka Asteroid Explosion Simulation 3D Rayleigh-Benard problem 33

Python VTK tools Integrate VTK 3D visualization into GrADS and CDAT for climate science applications. Develop high level python/Qt interfaces to simplify common scientific visualization tasks. NCCS User Forum Dec. 7,

Analysis Workflows NCCS User Forum Dec. 7,

Analysis Workflow Configuration Configure a parallel streaming pipeline for data analysis 36

UV-CDAT Interface 37

NCDAS 38 Task parallel interactive data analysis Multiple views of dataset

DV3D Visualization & analysis application. Built in python on MayaVi / VTK. Tailored for climate scientists. Simple intuitive interface. Integrated analysis frameworks: pyGrads, UVCDAT. Multiple views via hyperwall. Interactive 3D data browsing. Available on dali, DET, ford1 39

DV3D 40 Interactive 3D visualization of simulation data

DV3D Display Panel 41

DV3D Volume Rendering Panel 42

Analysis Tools Under Development Integrate DV3D into UVCDAT / Vistrails / ParaView / ESG Remote analysis and visualization of climate data on dali Clients: hyperwall, desktop, SC-11 In-situ analysis and real-time visualization of GEOS-5 runs 3D stereo visualization of climate data in DET UVCDAT Use Cases: Hurricane Tracking with high resolution GEOS-5 data Seeking Scientific Use Cases NCCS User Forum Dec. 7,

Agenda – December 7, 2010 Welcome & Introduction (Phil Webster, CISTO Chief) Current System Status (Fred Reitz, NCCS Operations Manager) SCU7 and Other NCCS Systems Updates (Dan Duffy, NCCS Lead Architect) Analysis Software Update (Tom Maxwell, NCCS Analysis Lead) User Services Update (Tyler Simon, NCCS User Services Group) Questions & Wrap-Up (Phil Webster) NCCS User Forum Dec. 7,

User Services Intel MPI 4.0 and MVAPICH2 1.6? Discover node profiles: NCCS User Forum Dec. 7, Processor Type Cores/node Frequency (GHz) clocks/flop Memory per node (GB) Total cores PBS 3.2 (2)4, 1GB per core520Proc=demp 2.66 (4)4, 1 GB per core2,064Proc=wood 2.5 (4)16, 2 GB per core 4,128Proc=harp 2.8 (4)24, 3GB per core 8,256Proc=neha 2.8 (4)24, 2GB per core 14,400Proc=west? #PBS -l select=64:ncpus=8:proc=neha

User Services Matlab Licenses? Expansion Factor = Wait time + Job Runtime Job Runtime –Minimum Expansion factor is 1, when wait time = 0. Future Scheduling enhancements? –Time to solution = Wait time + Job Runtime –Does anyone have a problem with this? –You want to know the TTS not the Expansion factor right? Naturally we want to focus on providing and minimizing Time to Solution for every job. NCCS Job Monitor NCCS User Forum Dec. 7,

Agenda – December 7, 2010 Welcome & Introduction (Phil Webster, CISTO Chief) Current System Status (Fred Reitz, NCCS Operations Manager) SCU7 and Other NCCS Systems Updates (Dan Duffy, NCCS Lead Architect) Analysis Software Update (Tom Maxwell, NCCS Analysis Lead) User Services Update (Tyler Simon, NCCS User Services Group) Questions & Wrap-Up (Phil Webster) NCCS User Forum Dec. 7,

Contact Information NCCS User Services: Thank you NCCS User Forum Dec. 7,