HTCondor at Syracuse University – Building a Resource Utilization Strategy Eric Sedore Associate CIO HTCondor Week 2017.

Slides:

Advertisements

Similar presentations

1 Applications Virtualization in VPC Nadya Williams UCSD.

Advertisements

University of Notre Dame

System Center 2012 R2 Overview

Profit from the cloud TM Parallels Dynamic Infrastructure AndOpenStack.

DatacenterMicrosoft Azure Consistency Connectivity Code.

Cloud Computing (101).

5205 – IT Service Delivery and Support

Tier-1 experience with provisioning virtualised worker nodes on demand Andrew Lahiff, Ian Collier STFC Rutherford Appleton Laboratory, Harwell Oxford,

Deploying Moodle with Red Hat Enterprise Virtualization Brian McSpadden Director of Network Operations Remote-Learner.net.

Cloud Computing Why is it called the cloud?.

SICSA student induction day, 2009Slide 1 Social Simulation Tutorial Session 6: Introduction to grids and cloud computing International Symposium on Grid.

Virtual Machine Course Rofideh Hadighi University of Science and Technology of Mazandaran, 31 Dec 2009.

IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of Atmosphere.

Your First Azure Application Michael Stiefel Reliable Software, Inc.

HTCondor and Beyond: Research Computer at Syracuse University Eric Sedore ACIO – Information Technology and Services.

Research Duke CSG – Spring 2015 – Happy Valley Charley Kneifel.

High Performance Computing on Virtualized Environments Ganesh Thiagarajan Fall 2014 Instructor: Yuzhe(Richard) Tang Syracuse University.

FutureGrid Connection to Comet Testbed and On Ramp as a Service Geoffrey Fox Indiana University Infra structure.

Center for Research Computing at Notre Dame Jarek Nabrzyski, Director

SC2012 Infrastructure Components Management Justin Cook (Data # 3) Principal Consultant, Systems Management Noel Fairclough (Data # 3) Consultant, Systems.

3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-2.

Managing a growing campus pool Eric Sedore

IMPROVEMENT OF COMPUTATIONAL ABILITIES IN COMPUTING ENVIRONMENTS WITH VIRTUALIZATION TECHNOLOGIES Abstract We illustrates the ways to improve abilities.

Unit 2 VIRTUALISATION. Unit 2 - Syllabus Basics of Virtualization Types of Virtualization Implementation Levels of Virtualization Virtualization Structures.

HYPERCONVERGED INFRASTRUCTURE Adam Savage A Brief history of the data centre The Mainframe Natively converged CPU, memory and storage Had redundancy.

Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.

Jenny Hobbs Consulting Systems Engineer April 2016 Business Case for Tailored Datacenter Integration (TDI)

IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of Atmosphere.

Software architectures and tools for highly distributed applications Voldemaras Žitkus.

Intro To Virtualization Mohammed Morsi

Extreme Scale Infrastructure

Journey to the HyperConverged Agile Infrastructure

Hello everyone I am rajul and I’ll be speaking on a case study of elastic slurm Case Study of Elastic Slurm Rajul Kumar, Northeastern University

Univa Grid Engine Makes Work Management Automatic and Efficient, Accelerates Deployment of Cloud Services with Power of Microsoft Azure MICROSOFT AZURE.

Volume Licensing Readiness: Level 100

Hyperconvergence Janos Strausz, June 23th 2016

Organizations Are Embracing New Opportunities

Introduction to VMware Virtualization

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING

What is HPC? High Performance Computing (HPC)

Tools and Services Workshop

Joslynn Lee – Data Science Educator

Volume Licensing Readiness: Level 100

Working With Azure Batch AI

Management of Virtual Machines in Grids Infrastructures

“The HEP Cloud Facility: elastic computing for High Energy Physics” Outline Computing Facilities for HEP need to evolve to address the new needs of the.

Sebastian Solbach Consulting Member of Technical Staff

Bridges and Clouds Sergiu Sanielevici, PSC Director of User Support for Scientific Applications October 12, 2017 © 2017 Pittsburgh Supercomputing Center.

Daniel Murphy-Olson Ryan Aydelott1

Tools and Services Workshop Overview of Atmosphere

Grid Computing.

Windows Azure Migrating SQL Server Workloads

Volume Licensing Readiness: Level 100

Management of Virtual Machines in Grids Infrastructures

Deploy OpenStack with Ubuntu Autopilot

Integration of Singularity With Makeflow

Chapter 1: Introduction

Azure Container Instances

Marrying OpenStack and Bare-Metal Cloud

Versatile HPC: Comet Virtual Clusters for the Long Tail of Science SC17 Denver Colorado Comet Virtualization Team: Trevor Cooper, Dmitry Mishin, Christopher.

Virtualization Meetup Discussion

دکتر محمد کاظم اکبری مرتضی سرگلزایی جوان

CompTIA Security+ Study Guide (SY0-501)

CLUSTER COMPUTING.

Openstack-alapú privát felhő üzemeltetés

IBM Power Systems.

Traditional Virtualized Infrastructure

Welcome to (HT)Condor Week #19 (year 34 of our project)

Presentation transcript:

HTCondor at Syracuse University – Building a Resource Utilization Strategy Eric Sedore Associate CIO HTCondor Week 2017

Research Computing Philosophy @ Syracuse Good to advance research, best to transform research (though transformation is not always related to scale) Entrepreneurial approach to collaboration and ideas Computing resources are only one part of supporting research Strive to use computational resources at 100% utilization, 100% of the time Computational resources must support multiple academic areas

Computational Resources @ Syracuse Academic Virtual Hosting Environment (AVHE) – private cloud 1000 cores, 25TB of memory Individual VMs (students, faculty, staff), small clusters 2 PB of storage (NFS, SMB, DAS per VM), multiple performance tiers OrangeGrid – high throughput computing pool scavenged desktop grid, 13,000 cores, 17TB of memory Crush – compute focused cloud Coupled with the AVHE to provide HPC and HTC environments Made up of heterogeneous hardware, different areas within Crush are focused on different needs (high IO, latency/bandwidth, high memory requirements…) 12,000 cores (24,000 slots with HT), 50 TB of memory SUrge – GPU focused compute cloud 240 commodity NVidia GPUs Individual VMs / nodes scheduled via HTCondor

Resource deployment Researchers can utilize existing “standard” environments or build a unique environment “Virtual Clusters” network, data, scheduling Tools for deploying and managing 10,000+ VM’s in 4 virtual environments (KVM, Hyper-V, vSphere, VirtualBox) Virtualize everything – systems for building nodes, no affiliation, everything loosely coupled (i.e. researchers never touch bare metal)

Allocation of resources Public Science (E@H...) Open Science Grid (OSG) Hybrid and Opportunistic Syracuse Researchers

What resources should Syracuse provide? “Large scale / Specialized” research – accomplished in national infrastructure 10,000+ cores, 100’s of TB’s of memory, PB’s of data Provided by National Resources Not enough need (today) to invest at this level “Medium scale” research – accomplished in clusters 1000’s of cores, 10’s of TB’s of memory, TB’s of data “Small / Medium scale” research – accomplished in the cloud 1-200 cores, 1GB-2TB of memory, TB’s of data Individual virtual machines to small clusters Provided by Syracuse Utilization at 85+% (from an IT Perspective) “Small scale” research – accomplished on desktops/laptops 1-4 cores, 1-16GB of memory, GB’s of data

Core Elements HTCondor Primary tool for resource scheduling – everything (almost) else is a pain! Node advertising capabilities Simplicity of addition/removal of nodes (part its scavenging roots) Flexibility – small simple environments to larger more complicated environments Virtualization (KVM, Hyper-V, vSphere, VirtualBox) Abstraction – shim allows us to easily reallocation resources, including networking and storage Flexibility – easy to run multiple kinds of workload (Windows/Linux) In-house coding / scripting – primarily in management / deployment – interacting with hypervisors

Pain Points VM Management – we have ~20 VM environments within Crush alone Versioning, automation, best of breed VM / monolith VM What do we need? Singularity / Docker When do we need it? Now! Staff Expertise Complexity, staff resources, single person dependencies - systems focused on being operated by a fraction of a staff member Nuance/elegance is lost, often the “right way” is set aside in the necessity to move on to the next

Musings on Our HTCondor Experience Law of unintended consequences is alive and well – changes always have impact There is a knob for everything… Logging is spectacular, deep, voluminous - “a blessing and a curse” You can have multiple versions of HTCondor components in your environment, but anecdotally you will occasionally find “odd” interactions