Scheduling a 100,000 Core Supercomputer for Maximum Utilization and Capability September 2010 Phil Andrews Patricia Kovatch Victor Hazlewood Troy Baer.

Slides:

Advertisements

Similar presentations

O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Center for Computational Sciences Cray X1 and Black Widow at ORNL Center for Computational.

Advertisements

Scheduling CSCI 3753 Operating Systems Spring 2005 Prof. Rick Han.

Metadata Performance Improvements Presentation for LUG 2011 Ben Evans Principal Software Engineer Terascala, Inc.

IBM 1350 Cluster Expansion Doug Johnson Senior Systems Developer.

BY MANISHA JOSHI.  Extremely fast data processing-oriented computers.  Speed is measured in “FLOPS”.  For highly calculation-intensive tasks.  For.

From Athena to Minerva: A Brief Overview Ben Cash Minerva Project Team, Minerva Workshop, GMU/COLA, September 16, 2013.

Architecture and Implementation of Lustre at the National Climate Computing Research Center Douglas Fuller National Climate Computing Research Center /

SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA; SAN DIEGO SDSC RP Update October 21, 2010.

Near-Term NCCS & Discover Cluster Changes and Integration Plans: A Briefing for NCCS Users October 30, 2014.

Cyberinfrastructure for Scalable and High Performance Geospatial Computation Xuan Shi Graduate assistants supported by the CyberGIS grant Fei Ye (2011)

Parallel Job Scheduling Algorithms and Interfaces Research Exam for Cynthia Bailey Lee Department of Computer Science and Engineering University of California,

Top500: Red Storm An abstract. Matt Baumert 04/22/2008.

CSC Site Update HP Nordic TIG April 2008 Janne Ignatius Marko Myllynen Dan Still.

Plans for Exploitation of the ORNL Titan Machine Richard P. Mount ATLAS Distributed Computing Technical Interchange Meeting May 17, 2013.

Enterprise Storage Our Journey Thus Far John D. Halamka MD CIO, Harvard Medical School and Beth Israel Deaconess Medical Center.

Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. Chapter 5: Batch processing and the Job Entry Subsystem (JES) Batch.

Programming for High Performance Computers John M. Levesque Director Cray’s Supercomputing Center Of Excellence.

TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.

Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.

Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.

How to Resolve Bottlenecks and Optimize your Virtual Environment Chris Chesley, Sr. Systems Engineer

Larry Marx and the Project Athena Team. Outline Project Athena Resources Models and Machine Usage Experiments Running Models Initial and Boundary Data.

SDSC RP Update TeraGrid Roundtable Reviewing Dash Unique characteristics: –A pre-production/evaluation “data-intensive” supercomputer based.

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

Principles of Scalable HPC System Design March 6, 2012 Sue Kelly Sandia National Laboratories Abstract: Sandia National.

Outline IT Organization SciComp Update CNI Update

The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.

CHEP'07 September D0 data reprocessing on OSG Authors Andrew Baranovski (Fermilab) for B. Abbot, M. Diesburg, G. Garzoglio, T. Kurca, P. Mhashilkar.

Jaguar Super Computer Topics Covered Introduction Architecture Location & Cost Bench Mark Results Location & Manufacturer Machines in top 500 Operating.

Supercomputing Cross-Platform Performance Prediction Using Partial Execution Leo T. Yang Xiaosong Ma* Frank Mueller Department of Computer Science.

Montgomery County, Maryland DTS CMMI Approach & Implementation Mike Knuppel 03/20/2006.

The Owner Share scheduler for a distributed system 2009 International Conference on Parallel Processing Workshops Reporter: 李長霖.

Looking Ahead: A New PSU Research Cloud Architecture Chuck Gilbert - Systems Architect and Systems Team Lead Research CI Coordinating Committee Meeting.

Cray XT3 Experience so far Horizon Grows Bigger Richard Alexander 24 January 2006

PetaApps: Update on software engineering and performance J. Dennis M. Vertenstein N. Hearn.

Cray Innovation Barry Bolding, Ph.D. Director of Product Marketing, Cray September 2008.

NML Bioinformatics Service— Licensed Bioinformatics Tools High-throughput Data Analysis Literature Study Data Mining Functional Genomics Analysis Vector.

2009/4/21 Third French-Japanese PAAP Workshop 1 A Volumetric 3-D FFT on Clusters of Multi-Core Processors Daisuke Takahashi University of Tsukuba, Japan.

Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.

CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.

Morgan Kaufmann Publishers

PSC’s CRAY-XT3 Preparation and Installation Timeline.

11 January 2005 High Performance Computing at NCAR Tom Bettge Deputy Director Scientific Computing Division National Center for Atmospheric Research Boulder,

1 NSF/TeraGrid Science Advisory Board Meeting July 19-20, San Diego, CA Brief TeraGrid Overview and Expectations of Science Advisory Board John Towns TeraGrid.

TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.

NICS RP Update TeraGrid Round Table March 10, 2011 Ryan Braby NICS HPC Operations Group Lead.

© 2010 Pittsburgh Supercomputing Center Pittsburgh Supercomputing Center RP Update July 1, 2010 Bob Stock Associate Director

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.

Chapter 8 System Management Semester 2. Objectives  Evaluating an operating system  Cooperation among components  The role of memory, processor,

NICS Update Bruce Loftis 16 December National Institute for Computational Sciences University of Tennessee and ORNL partnership  NICS is the 2.

Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.

Presented by NCCS Hardware Jim Rogers Director of Operations National Center for Computational Sciences.

Seaborg Decommission James M. Craw Computational Systems Group Lead NERSC User Group Meeting September 17, 2007.

Tackling I/O Issues 1 David Race 16 March 2010.

Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.

Capacity Planning in a Virtual Environment Chris Chesley, Sr. Systems Engineer

Petascale Computing Resource Allocations PRAC – NSF Ed Walker, NSF CISE/ACI March 3,

Lead from the front Texas Nodal 1 TDWG Nodal Update – June 6, Texas Nodal Market Implementation Server.

Informational Webinar Troy Grant Assistant Executive Director for P-16 Initiatives Tennessee Higher Education Commission.

Surviving a Mainframe Upgrade APPA – Business & Finance Conference Minneapolis, Minnesota September 19, 2006 Wayne Turnbow IS Department Manager.

Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb

Lecture 3. Performance Prof. Taeweon Suh Computer Science & Engineering Korea University COSE222, COMP212, CYDF210 Computer Architecture.

Condor – A Hunter of Idle Workstation

PA an Coordinated Memory Caching for Parallel Jobs

CSCE 212 Chapter 4: Assessing and Understanding Performance

K computer RIKEN Advanced Institute for Computational Science

GPU Select Scheduling to Improve Job Success Rates on Titan

K computer RIKEN Advanced Institute for Computational Science

Implementation of a small-scale desktop grid computing infrastructure in a commercial domain

Presentation transcript:

Scheduling a 100,000 Core Supercomputer for Maximum Utilization and Capability September 2010 Phil Andrews Patricia Kovatch Victor Hazlewood Troy Baer

Outline  Intro to NICS and Kraken  Weekly utilization averages >90% for 6+ weeks  How 90% utilization was accomplished on Kraken –System scheduling goals –Policy change based on some past work –Influencing end user behavior –Scheduling and utilization details: closer look at three specific weeks  Conclusion and Future Work 2

 JICS and NICS is a collaboration between UT and ORNL  UT awarded the NSF Track 2B ($65M)  Phased deployment of Cray XT systems with 1 PF in 2009  Total JICS funding ~$100M National Institute for Computational Sciences

Kraken on Oct 2009 #4 Fastest machine in the world (Top500 6/10) First academic petaflop Delivers over 60% of all NSF cycles –8,256 dual socket, 16GB memory nodes –2.6GHz 6-core AMD Istanbul processor per socket –1.03 Petaflops peak performance (99,072 cores) –Cray Seastar 2 Torus interconnect –3.3 Petabytes DDN disk (raw) –129 Terabytes memory –88 cabinets –2,200 sq ft 4

Kraken Cray XT5 Weekly Utilization October 2009 – June Date Percent

Kraken Weekly Utilization  Previous slide shows: –Weekly utilization over 90% for 7 of the last 9 weeks. Excellent! –Weekly utilization over 80% for 18 of the last 21 weeks. Very good! –Weekly utilization over 70% each week since implementing the new scheduling policy in mid January (red vertical line)  How was this accomplished?… 6

How was 90% utilization accomplished?  Taking a closer look at Kraken: –Scheduling goals –Policy –Influencing user behavior –Analysis of 3 specific weeks  Nov 9 - one month into production with new configuration  Jan 4 – during a typical slow month  Mar 1 – after implementation of policy change 7

System Scheduling Goals  1. Capability computing Allow “hero” jobs that run at or near the 99,072 maximum core size in order to bring new scientific results  2. Capacity computing Provide as many delivered floating point operations as possible to Kraken users (keep utilization high)  Typically these are antagonistic aspirations for a single system. Scheduling algorithms for capacity computing can lead to inefficiencies  Goal: Improve utilization of a large system while allowing large capability job runs. Attempt to do both capability and capacity computing!  Prior SDSC led to a new approach 8

Policy  Normal approach to capability computing is to accept large jobs, include a weighting factor that increases with queue wait time, leading to eventual draining of the system to run the large capability job.  Major drawback is this can lead to reduction in the overall usage of the system  Next slide illustrates this 9

Typical Large System Utilization red arrows indicate system drain for capability job 10

Policy Change  Based on past SDSC, our new approach would be to drain the system on a periodic basis and run the capability jobs in succession  Allow “dedicated” job runs: full machine with job owner access to Kraken only. This was needed for file system performance  Allow “capacity” job runs: near full machine without dedicated system access  Coincide the run of dedicated and capacity jobs during Preventative Maintenance (PM) time once a week 11

Policy Change  Reservation would be placed to have the scheduler drain the system prior to the PM  After PM dedicated jobs would be run in succession followed by capacity jobs run in succession  No PM, no dedicated jobs  No PM, capacity jobs limited to a specific time period  This had a drastic affect on system utilization as we will show! 12

Influencing User Behavior  To encourage capability computing jobs, NICS instituted a 50% discount for running dedicated and capacity jobs  Discounts were given post job completion 13

Utilization Analysis  The following selected weekly utilization charts show the dramatic affects of running such a large system and implementing the policy change for successive capability job runs 14

Utilization Prior to Policy Change 55% average 15

Utilization During Slow Period 34% average 16

Utilization After Policy Change 92% average, only one system drain 17

Conclusions  Running a large computational resource and allowing capability computing can coincide with high utilization if the right balance between goals, policy and user influences are struck. 18

Future Work  Automation of this type of scheduling policy  Methods to evaluate storage requirements of capability jobs prior to execution in attempt to prevent job failures due to file system use  Automation of dedicated run setup 19