NCCS NCCS User Forum 22 September 2009. NCCS Agenda Welcome & Introduction Phil Webster, CISTO Chief Current System Status Fred Reitz, Operations Lead.

Slides:

Advertisements

Similar presentations

SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.

Advertisements

MCTS GUIDE TO MICROSOFT WINDOWS 7 Chapter 10 Performance Tuning.

Introduction CSCI 444/544 Operating Systems Fall 2008.

Copyright 2009 FUJITSU TECHNOLOGY SOLUTIONS PRIMERGY Servers and Windows Server® 2008 R2 Benefit from an efficient, high performance and flexible platform.

An Adaptable Benchmark for MPFS Performance Testing A Master Thesis Presentation Yubing Wang Advisor: Prof. Mark Claypool.

NCCS User Forum September 14, Agenda – September 14, 2010 Welcome & Introduction (Phil Webster, CISTO Chief) Current System Status (Fred Reitz,

CLIMATE SCIENTISTS’ BIG CHALLENGE: REPRODUCIBILITY USING BIG DATA Kyo Lee, Chris Mattmann, and RCMES team Jet Propulsion Laboratory (JPL), Caltech.

Cambodia-India Entrepreneurship Development Centre - : :.... :-:-

Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.

TPB Models Development Status Report Presentation to the Travel Forecasting Subcommittee Ron Milone National Capital Region Transportation Planning Board.

Computer System Architectures Computer System Software

Alok 1Northwestern University Access Patterns, Metadata, and Performance Alok Choudhary and Wei-Keng Liao Department of ECE,

Topics Introduction Hardware and Software How Computers Store Data

MCTS Guide to Microsoft Windows 7

1 COMPSCI 110 Operating Systems Who - Introductions How - Policies and Administrative Details Why - Objectives and Expectations What - Our Topic: Operating.

WORK ON CLUSTER HYBRILIT E. Aleksandrov 1, D. Belyakov 1, M. Matveev 1, M. Vala 1,2 1 Joint Institute for nuclear research, LIT, Russia 2 Institute for.

Predicting performance of applications and infrastructures Tania Lorido 27th May 2011.

Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.

1 High level view of HDF5 Data structures and library HDF Summit Boeing Seattle September 19, 2006.

DM_PPT_NP_v01 SESIP_0715_AJ HDF Product Designer Aleksandar Jelenak, H. Joe Lee, Ted Habermann Gerd Heber, John Readey, Joel Plutchak The HDF Group HDF.

Nick Draper Teswww.mantidproject.orgwww.mantidproject.org Instrument Independent Reduction and Analysis at ISIS and SNS.

Company Overview for GDF Suez December 29, Enthought’s Business Enthought provides products and consulting services for scientific software solutions.

Upgrade to Real Time Linux Target: A MATLAB-Based Graphical Control Environment Thesis Defense by Hai Xu CLEMSON U N I V E R S I T Y Department of Electrical.

A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.

March 3rd, 2006 Chen Peng, Lilly System Biology1 Cluster and SGE.

NCCS User Forum 15 May NCCS User Forum5/15/20082 Agenda Welcome & Introduction Phil Webster NCCS Current System Status Fred Reitz, Operations Manager.

Chapter 6 SAS ® OLAP Cube Studio. Section 6.1 SAS OLAP Cube Studio Architecture.

FotoGazmic Software (From left to right: Chad Zbinden, Josey Baker, Rob Mills, Myra Bergman, Tinate Dejtiranukul)

NCCS NCCS User Forum 24 March NCCS Agenda Welcome & Introduction Phil Webster, CISTO Chief Current System Status Fred Reitz, Operations Manager.

CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.

NCCS User Forum June 15, Agenda Current System Status Fred Reitz, HPC Operations NCCS Compute Capabilities Dan Duffy, Lead Architect User Services.

CS 390 Unix Programming Summer Unix Programming - CS 3902 Course Details Online Information Please check.

INVITATION TO COMPUTER SCIENCE, JAVA VERSION, THIRD EDITION Chapter 6: An Introduction to System Software and Virtual Machines.

Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.

Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”

_______________________________________________________________CMAQ Libraries and Utilities ___________________________________________________Community.

NCCS User Forum 11 December GSFC NCCS NCCS User Forum12/11/082 Agenda Welcome & Introduction Phil Webster, CISTO Chief Current System Status Fred.

NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.

CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.

GO-ESSP Workshop, LLNL, Livermore, CA, Jun 19-21, 2006, Center for ATmosphere sciences and Earthquake Researches Construction of e-science Environment.

CS533 - Concepts of Operating Systems 1 The Mach System Presented by Catherine Vilhauer.

Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.

Chapter 3 System Performance and Models Introduction A system is the part of the real world under study. Composed of a set of entities interacting.

NCCS User Forum July 19, Agenda Introduction (Lynn Parnell, NCCS HPC Lead) Discover Update Archive Update User Services Update Data Services & Analysis.

Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package MuQun Yang, Christian Chilan, Albert Cheng, Quincey Koziol, Mike.

GSFC NCCS NCCS User Forum 25 September GSFC NCCS NCCS User Forum9/25/082 Agenda Welcome & Introduction Phil Webster, CISTO Chief Scott Wallace,

OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.

CSC190 Introduction to Computing Operating Systems and Utility Programs.

Welcome to the PRECIS training workshop

Tool Integration with Data and Computation Grid “Grid Wizard 2”

SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.

Motivation: dynamic apps Rocket center applications: –exhibit irregular structure, dynamic behavior, and need adaptive control strategies. Geometries are.

Page 1 Monitoring, Optimization, and Troubleshooting Lecture 10 Hassan Shuja 11/30/2004.

Climate-SDM (1) Climate analysis use case –Described by: Marcia Branstetter Use case description –Data obtained from ESG –Using a sequence steps in analysis,

NCCS User Forum December 7, Agenda – December 7, 2010 Welcome & Introduction (Phil Webster, CISTO Chief) Current System Status (Fred Reitz, NCCS.

A Collaboration Tool to Support Modeling Groups Donald W. Denbo JISAO/UW-NOAA/PMEL 20 th IIPS/AMS, 12 – 15 January, 2004,

LIOProf: Exposing Lustre File System Behavior for I/O Middleware

OPERATING SYSTEMS CS 3502 Fall 2017

OpenPBS – Distributed Workload Management System

Chapter 1: Introduction

Migration Strategies – Business Desktop Deployment (BDD) Overview

Overview Introduction VPS Understanding VPS Architecture

Haiyan Meng and Douglas Thain

IS3440 Linux Security Unit 7 Securing the Linux Kernel

Multithreaded Programming

Lecture 18 Syed Mansoor Sarwar

Server Management and Automation Windows Server 2012 R2

What's New in eCognition 9

Presentation transcript:

NCCS NCCS User Forum 22 September 2009

NCCS Agenda Welcome & Introduction Phil Webster, CISTO Chief Current System Status Fred Reitz, Operations Lead NCCS Compute Capabilities Dan Duffy, Lead Architect PoDS Jules Kouatchou, SIVO User Services Updates Bill Ward, User Services Lead Analysis System Updates Tom Maxwell, Analysis Lead Discover Job Monitor Tyler Simon, User Services SSP Test Matt Koop, User Services Questions and Comments Phil Webster, CISTO Chief

NCCS Key Accomplishments Incorporation of SCU5 processors into general queue pool Capability to run large jobs (4000+ cores) on SCU5 Analysis nodes placed in production Migrated DMF from Dirac (Irix) to Palm (Linux)

NCCS New NCCS Staff Members Lynn Parnell, Ph.D. Engineering Mechanics, High Performance Computing Lead Matt Koop, Ph.D. Computer Science, User Services Tom Maxwell, Ph.D. Physics, Analysis System Lead

NCCS Agenda Welcome & Introduction Phil Webster, CISTO Chief Current System Status Fred Reitz, Operations Lead NCCS Compute Capabilities Dan Duffy, Lead Architect PoDS Jules Kouatchou, SIVO User Services Updates Bill Ward, User Services Lead Analysis System Updates Tom Maxwell, Analysis Lead Discover Job Monitor Tyler Simon, User Services SSP Test Matt Koop, User Services Questions and Comments Phil Webster, CISTO Chief

NCCS Key Accomplishments Discover/Analysis Environment Added SCU5 (cluster totals: 10,840 compute CPUs, 110 TF) Placed analysis nodes (dali01-dali06) in production status Implemented storage area network (SAN) Implemented GPFS multicluster feature Upgraded GPFS Implemented RDMA Implemented InfiniBand token network Discover/Data Portal Implemented NFS mounts for select Discover data on Data Portal Data Portal Migrated all users/applications to HP Bladeservers Upgraded GPFS Implemented GPFS multicluster feature Implemented InfiniBand IP network Upgraded SLES10 operating system to SP2 DMF Migrated DMF from Irix to Linux Other Migrated non-compliant AUIDs Transitioned SecurID operations from NCCS to ITCD Enhanced NCCS network redundancy

NCCS 2/4/09 – SCU4 (544 cores added) 2/19/09 – SCU4 (240 cores added) 2/27/09 – SCU4 (1,280 cores added) 8/13/09 – SCU5 (4,128 cores added) Discover 2009 Daily Utilization Percentage

NCCS Discover Daily Utilization Percentage by Group May – August /13/09 – SCU5 (4,128 cores added)

NCCS Discover Total CPU Consumption Past 12 Months (CPU Hours) 9/4/08 – SCU3 (2,064 cores added) 2/4/09 – SCU4 (544 cores added) 2/19/09 – SCU4 (240 cores added) 2/27/09 – SCU4 (1,280 cores added) 8/13/09 – SCU5 (4,128 cores added)

NCCS Discover Job Analysis – August 2009

NCCS Discover Job Analysis – August 2009

NCCS Discover Availability Scheduled Maintenance: Jun-Aug 10 Jun - 17 hrs 5 min GPFS (Token and Subnets, ) 24 Jun - 12 hours GPFS (RDMA, Multicluster, SCU5 integration) 29 Jul - 12 hours GPFS , OFED1.4, DDN firmware 30 Jul - 2 hours 20 minutes DDN controller replacement 19 Aug - 4 hours NASA AUID transition Unscheduled Outages: Jun-Aug 16 Jun – 3 hrs 35 min – nodes out of memory 24 Jun – 4 hrs 39 min – maintenance extension 6-7 Jul – 4 hrs 18 min – internal switch error 13 Jul – 2 hrs 59 min – GPFS error 14 Jul – 26 min – nodes out of memory 20 Jul – 2 hrs 2 min – GPFS error 29 Jul – 55 min – Maintenance extension 19 Aug – 2 hrs 45 min – maintenance extension

NCCS Current Issues on Discover: Login Node Hangs Symptom: Login nodes become unresponsive. Impact: Users cannot login. Status: Developing/testing solution. Issue arose during critical security patch installation.

NCCS Current Issues on DMF: Post-Migration Clean-Up Symptoms: Various. Impact: Various. Status: Issues addressed as they are encountered and reported.

NCCS Future Enhancements Discover Cluster –PBS V 10 –Additional storage –SLES10 SP2 Data Portal –GDS OPeNDAP performance enhancements –Use of GPFS-CNFS for improved NFS mount availability

NCCS Agenda Welcome & Introduction Phil Webster, CISTO Chief Current System Status Fred Reitz, Operations Lead NCCS Compute Capabilities Dan Duffy, Lead Architect PoDS Jules Kouatchou, SIVO User Services Updates Bill Ward, User Services Lead Analysis System Updates Tom Maxwell, Analysis Lead Discover Job Monitor Tyler Simon, User Services SSP Test Matt Koop, User Services Questions and Comments Phil Webster, CISTO Chief

NCCS I/O Study Team Dan Kokron Bill Putman Dan Duffy Bill Ward Tyler Simon Matt Koop Harper Pryor Building on work by SIVO and GMAO (Brent Swartz)

NCCS Representative GEOS Output Dan Kokron has generated many runs containing data in order to characterize the GEOS I/O –720 core, quarter degree GEOS with YOTC-like history –Number of processes that write: 67 –Total amount of data: ~225 GB (written to multiple files) –Average write size: ~1.7 MB –Running in dnb33 –Using Nehalem cores (GPFS with RDMA) Average Bandwidth –Timing the entire CFIO calls results in a bandwidth of 3.8 MB/sec –Timing just the NetCDF ncvpt calls results in a bandwidth of 44.4 MB/sec Why is this so slow?

NCCS Kernel Benchmarks Used open source I/O kernel benchmarks of xdd and iozone –Achieved over 1 GB/sec to all the new nobackup file systems Wrote two representative one-node c- code benchmarks –Using c writes and appending to files –Using NetCDF writes with chunking and appending to files Ran these benchmarks writing out exactly the same as process 0 in the GEOS run –C-writes: Average bandwidth of around 900 MB/sec (consistent with kernel benchmarks) –NetCDF writes: Average bandwidth of around 600 MB/sec Why is GEOS I/O running so slow? C-writes Average Bandwidth ~900MB/sec NetCDF-writes Average Bandwidth ~600MB/sec

NCCS Effect of NetCDF Chunking How does changing the NetCDF chunk size affect the overall performance? The table shows runs varying the chunk size for an average of 10 runs for each chunk size –Used the NetCDF kernel benchmark The smallest chunk size reproduces the GEOS bandwidth –As best as we can tell, this is roughly equivalent to the default chunk size The best chunk size turned out to be about the size of the array being written ~3MB Chunk size (# Floats) Chunk size (KB) AverageB andwidth (MB/sec) 1K437 32K K K2, M4, M8, M12, M24, M40, References: “NetCDF-4 Performance Report”, Lee, et. Al., June NetCDF on-line tutorial: Benchmarking I/O Performance with GEOSdas and other modeling guru posts

NCCS Setting Chunk Size in GEOS Dan K. ran several baseline runs to make sure we were measuring things correctly Turned on chunking and set the chunk size equal to the write size (1080x721x1x1) Dramatic improvement in ncvpt bandwidth Why was the last run so slow? –Because we had a file system hang during that run File NameDescriptionNcvpt Bandwidth (MB/sec) Base Line 1Base line run with time stamps at each “wrote” statement Base Line 2Printed out time stamps before and after the call to ncvpt Base Line 3Printing the time stamps moved after the call to ncvpt Using NetCDF Chunking Initial run with NetCDF chunking turned on Using NetCDF Chunking and Fortran Buffering (1) IO Buffering in the Intel IO library on top of NetCDF chunking Using NetCDF Chunking and Fortran Buffering (2) Same as previous run with very different results 45.17

NCCS What next? Further explore chunk sizes in NetCDF –What is the best chunk size? –Do you set the chunk sizes for write performance or for read performance? –Once a file has been written with a set chunk size, it cannot be changed without rewriting the file. Need to better understand the variability seen in the file system performance –Not uncommon to see a 2x or greater difference in performance from run to run Turn the NetCDF kernel benchmark into a multi-node benchmark –Use this benchmark for testing system changes and potential new systems Compare performance across NCCS and NAS systems Write up results

NCCS Agenda Welcome & Introduction Phil Webster, CISTO Chief Current System Status Fred Reitz, Operations Lead NCCS Compute Capabilities Dan Duffy, Lead Architect PoDS Jules Kouatchou, SIVO User Services Updates Bill Ward, User Services Lead Analysis System Updates Tom Maxwell, Analysis Lead Discover Job Monitor Tyler Simon, User Services SSP Test Matt Koop, User Services Questions and Comments Phil Webster, CISTO Chief

NCCS Ticket Closure Percentiles 1 March to 31 August 2009

NCCS Issue: Parallel Jobs > 1500 CPUs Original problem: Many jobs wouldn’t run at > 1500 CPUs Status at last Forum: Resolved using a different version of the DAPL library Current Status: Now able to run at CPUs using MVAPICH on SCU5

NCCS Issue: Getting Jobs into Execution Long wait for queued jobs before launching Reasons –SCALI=TRUE is restrictive –Per user & per project limits on number of eligible jobs (use qstat –is) –Scheduling policy: first-fit on job list ordered by queue priority and queue time User services will be contacting folks using SCALI=TRUE to assist them in migration away from this feature

NCCS Future User Forums NCCS User Forum schedule –8 Dec 2009, 9 Mar , 8 Jun 2010, 14 Sep 2010, and 7 Dec 2010 –All on Tuesday –All 2:00-3:30 PM –All in Building 33, Room H114 Published –On –On GSFC-CAL-NCCS-Users

NCCS Agenda Welcome & Introduction Phil Webster, CISTO Chief Current System Status Fred Reitz, Operations Lead NCCS Compute Capabilities Dan Duffy, Lead Architect PoDS Jules Kouatchou, SIVO User Services Updates Bill Ward, User Services Lead Analysis System Updates Tom Maxwell, Analysis Lead Discover Job Monitor Tyler Simon, User Services SSP Test Matt Koop, User Services Questions and Comments Phil Webster, CISTO Chief

NCCS Sustained System Performance What is the overall system performance? Many different benchmarks or peak numbers are available –Often unrealistic or not relevant SSP refers to a set of benchmarks that evaluates performance as related to real workloads on the system –SSP concepts originated from NERSC (LBNL)

NCCS Performance Monitoring Not just for evaluating a new system Ever wonder if a system change has affected performance? –Often changes can be subtle and not detected with normal system validation tools Silent corruption Slowness –Find out immediately instead of after running the application and getting an error

NCCS Performance Monitoring (contd.) Run real workloads (SSP) to determine performance changes over time –Quickly determine if something is broken or slow –Perform data verification Run automatically on a regular basis as well as after system changes –e.g. change to a compiler, MPI version, OS update NERSC SSP Example Chart

NCCS Meaningful Measurements How you can help –We need your application and a representative dataset for your application –Ideally should take ~20-30 minutes to run at various processor counts Your benefits –Changes to the system that affect your application will be noticed immediately –Data will be placed on NCCS website to show system performance over time

NCCS Agenda Welcome & Introduction Phil Webster, CISTO Chief Current System Status Fred Reitz, Operations Lead NCCS Compute Capabilities Dan Duffy, Lead Architect PoDS Jules Kouatchou, SIVO User Services Updates Bill Ward, User Services Lead Analysis System Updates Tom Maxwell, Analysis Lead Discover Job Monitor Tyler Simon, User Services SSP Test Matt Koop, User Services Questions and Comments Phil Webster, CISTO Chief

NCCS Discover Job Monitor All data is presented as a current system snapshot, in 5 min intervals. Displays system load as a percentage Displays the number of running jobs and running cores Queued jobs and job wait times Displays current qstat -a output Interactive Historical Utilization Chart Message of the day Displays average number of cores per job Job Monitor

NCCS Agenda Welcome & Introduction Phil Webster, CISTO Chief Current System Status Fred Reitz, Operations Lead NCCS Compute Capabilities Dan Duffy, Lead Architect PoDS Jules Kouatchou, SIVO User Services Updates Bill Ward, User Services Lead Analysis System Updates Tom Maxwell, Analysis Lead Discover Job Monitor Tyler Simon, User Services SSP Test Matt Koop, User Services Questions and Comments Phil Webster, CISTO Chief

NCCS Climate Data Analysis Climate models are generating ever-increasing amounts of output data. Larger datasets are making it increasingly cumbersome for scientists to perform analyses on their desktop computers. Server-side analysis of climate model results is quickly becoming a necessity.

NCCS Parallelizing Application Scripts Many data processing shell scripts can be easily parallelized –MatLab, IDL, etc. Use task parallelism to process multiple files in parallel –Each file processed on a separate core within a single dali node Limit load on dali (16 cores per node ) –Max: 10 compute intensive processes per node Serial Version: while ( … ) … # process another file run.grid.qd.s … end Parallel Version: while ( … ) … # process another file run.grid.qd.s & … end

NCCS ParaView Open-source, multi-platform visualization application –Developed by Kitware, Inc. (authors of VTK) Designed to process large data sets Built on parallel VTK Client-server architecture: –Client: Qt based desktop application –Data Server: MPI based parallel application on dali. Parallel streaming filters for data processing Large library of existing filters Highly extensible using plugins –Plugin development required for HDF, NetCDF, OBS data No existing climate-specific tools or algorithms Data Server being integrated into ESG

NCCS ParaView Client Qt desktop application that Controls data access, processing, analysis, and visualization

NCCS ParaView Client Features

NCCS Analysis Workflow Configuration Configure a parallel streaming pipeline for data analysis

NCCS ParaView Applications Polar Vortex Breakdown Simulation Cross Wind Fire Simulation Golevka Asteroid Explosion Simulation 3D Rayleigh-Benard problem

NCCS Climate Data Analysis Toolkit Integrated environment for data processing, viz, & analysis Integrates numerous software modules in python shell Open source with a large diverse set of contributors Analysis environment for ESG LLNL

NCCS Data Manipulation Exploits NumPy Array and Masked Array Adds persistent climate metadata Exposes NumPy, SciPy, & RPy mathematical operations Clustering FFT Image processing Linear algebra Interpolation Max entropy Optimization Signal processing Statistical functions Convolution Sparse matrices Regression Spatial algorithms

NCCS Grid Support Spherical Coordinate Remapping and Interpolation Package –remapping and interpolation between grids on a sphere –Map between any pair of lat-long grids GridSpec –Standard description of earth system model grids –To be implemented in NetCDF CF convention –Implemented in CMOR MoDAVE –Grid visualization

NCCS Climate Analysis Genutil & Cdutil (PCMDI) –General Utilities for climate data analysis Statistics, array & color manipulation, selection, etc. –Climate Utilities time extraction, averages, bounds, interpolation masking/regridding, region extraction PyClimate –Toolset for analyzing climate variability Empirical Orthogonal Functions (EOF) analysis Analysis of coupled data sets –Singular Vector Decomposition (SVD) –Canonical Correlation Analysis (CCA) Linear digital filters Kernel based probability Density function estimation

NCCS CDAT Climate Diagnostics Provides a common environment for climate research Uniform diagnostics for model evaluation and comparison Taylor Diagram Thermodynamic Plot Performance Portrait Plot Wheeler-Kalidas Analysis

NCCS Contributed Packages PyGrADS (potential) AsciiData BinaryIO ComparisonStatistics CssGrid DsGrid Egenix EOF EzTemplate HDF5Tools IOAPITools Ipython Lmoments MSU NatGrid ORT PyLoapi PynCl RegridPack ShGrid SP SpanLib SpherePack Trends Twisted ZonalMeans ZopeInterface

NCCS Visualization Visualization and Control System (VCS) –Standard CDAT 1D and 2D graphics package Integrated Contributed 2D Packages –Xmgrace –Matplotlib –IaGraph Integrated Contributed 3D packages –ViSUS –VTK –NcVTK –MoDAVE

NCCS Visual Climate Data Analysis Tools (VCDAT) CDAT GUI, facilitates: –Data access –Data processing & analysis –Data visualization Accepts python input –Commands and scripts Saves state –Converts keystrokes to python Online help

NCCS MoDAVE Visualization of Mosaic grids Parallelized using MPI Integration into CDAT in process Developed by Tech-X & LLNL Cubed sphere visualization

NCCS ViSUS in CDAT Data streaming application –Progressive processing & visualization of large scientific datasets Future capabilities for petascale dataset streaming Simultaneous visualization of multiple ( 1D, 2D, 3D ) data representations

NCCS VisTrails Scientific workflow and provenance management system. Interface for next version of CDAT –history trees, data pipelines, visualization spreadsheet, provenance capture

NCCS Agenda Welcome & Introduction Phil Webster, CISTO Chief Current System Status Fred Reitz, Operations Lead NCCS Compute Capabilities Dan Duffy, Lead Architect PoDS Jules Kouatchou, SIVO User Services Updates Bill Ward, User Services Lead Analysis System Updates Tom Maxwell, Analysis Lead Discover Job Monitor Tyler Simon, User Services SSP Test Matt Koop, User Services Questions and Comments Phil Webster, CISTO Chief

NCCS Background Scientists generate large data files Processing the files consists of executing a series of independent tasks Ensemble runs of models All the tasks are run on one CPU

NCCS PoDS Task parallelism tool taking advantage of distributed architectures as well as multi-core capabilities For running serial independent tasks across nodes Does not make any assumption on the underlying applications to be executed Can be ported to other platforms

NCCS PoDs Features Dynamic assessment of resource availability Each task is timed A summary report is provided

NCCS Task Assignment Command 1 Command 2 Command 3 Command 4 Command 5 Command 6 Command 7 Command 8 Command 9 … Node 1 Node 2 Node 3 Execution File

NCCS PoDS Usage pods.py [-help] [execFile] [CpusPerNode] execFile: file listing all the independent tasks to be executed CpusPerNode: number of CPUs per node. If not provide, PoDS will automatically use the number of CPUs available in each node.

NCCS Simple Example Randomly generates an integer n between 0 and 10^9 Loops over n to perform some basic operations Each time the application is called a different n is obtained. We want to run the application 150 times.

NCCS Timing Numbers NodesCores/NodeTime (s)

NCCS More Information User’s Guide on ModelingGuru: Package available at /usr/local/other/pods

NCCS Agenda Welcome & Introduction Phil Webster, CISTO Chief Current System Status Fred Reitz, Operations Lead NCCS Compute Capabilities Dan Duffy, Lead Architect PoDS Jules Kouatchou, SIVO User Services Updates Bill Ward, User Services Lead Analysis System Updates Tom Maxwell, Analysis Lead Discover Job Monitor Tyler Simon, User Services SSP Test Matt Koop, User Services Questions and Comments Phil Webster, CISTO Chief

NCCS Important Contacts NCCS Support Analysis Lead I/O Improvements PoDS Info User Services Lead