GPU Computing with Hartford Condor Week 2012 Bob Nordlund.

Slides:



Advertisements
Similar presentations
Grid Computing at The Hartford OGF22 February 27, 2008 Robert Nordlund
Advertisements

Introduction to Grid Application On-Boarding Nick Werstiuk
Issues of HPC software From the experience of TH-1A Lu Yutong NUDT.
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Cloud Computing: Theirs, Mine and Ours Belinda G. Watkins, VP EIS - Network Computing FedEx Services March 11, 2011.
SLA-Oriented Resource Provisioning for Cloud Computing
1 A GPU Accelerated Storage System NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany Sathish Gopalakrishnan Matei.
System Simulation Of 1000-cores Heterogeneous SoCs Shivani Raghav Embedded System Laboratory (ESL) Ecole Polytechnique Federale de Lausanne (EPFL)
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
Cloud Computing to Satisfy Peak Capacity Needs Case Study.
Acceleration of the Smith– Waterman algorithm using single and multiple graphics processors Author : Ali Khajeh-Saeed, Stephen Poole, J. Blair Perot. Publisher:
1 AppliedMicro X-Gene ® ARM Processors Optimized Scale-Out Solutions for Supercomputing.
1 1 © 2011 The MathWorks, Inc. Accelerating Bit Error Rate Simulation in MATLAB using Graphics Processors James Lebak Brian Fanous Nick Moore High-Performance.
Risk Modeling with Condor at The Hartford Condor Week March 15, 2005 Bob Nordlund The Hartford
1 Integrating GPUs into Condor Timothy Blattner Marquette University Milwaukee, WI April 22, 2009.
Dual Stack Virtualization: Consolidating HPC and commodity workloads in the cloud Brian Kocoloski, Jiannan Ouyang, Jack Lange University of Pittsburgh.
Bob Thome, Senior Director of Product Management, Oracle SIMPLIFYING YOUR HIGH AVAILABILITY DATABASE.
August 27, 2008 Platform Market, Business & Strategy.
Lynette Liu Senior Business Development Manager Oracle Corporation.
Predictive Runtime Code Scheduling for Heterogeneous Architectures 1.
Connecting OurGrid & GridSAM A Short Overview. Content Goals OurGrid: architecture overview OurGrid: short overview GridSAM: short overview GridSAM: example.
Extracted directly from:
General Purpose Computing on Graphics Processing Units: Optimization Strategy Henry Au Space and Naval Warfare Center Pacific 09/12/12.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
1 Dell World 2014 Reducing IT complexity and improving utilization though convergence infrastructure solutions Dell World 2014 Ravi Pendekanti, Vice President,
Thinking Outside the Nest Utilizing Enterprise Resources with Condor Bob Nordlund The Hartford Condor Week 2006.
Condor In Flight at The Hartford 2006 Transformations Condor Week 2007 Bob Nordlund.
Grid Computing at The Hartford Condor Week 2008 Robert Nordlund
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
BOF: Megajobs Gracie: Grid Resource Virtualization and Customization Infrastructure How to execute hundreds of thousands tasks concurrently on distributed.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.
Enterprise Grid in Financial Services Nick Werstiuk
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
LegendCorp What is System Center Virtual Machine Manager (SCVMM)? SCVMM at a glance Features and Benefits Components / Topology /
Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.
© 2009 IBM Corporation Maximize Cost Savings While Improving Visibility Into Lines of Business Wendy Tam, CDC Product Marketing Manager
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Pierre Sleep tight knowing you’re prepared for outages Protect your environment with Azure ARC32 4.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Enabling the use of e-Infrastructures with.
Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
Dynamic Scheduling Monte-Carlo Framework for Multi-Accelerator Heterogeneous Clusters Authors: Anson H.T. Tse, David B. Thomas, K.H. Tsoi, Wayne Luk Source:
Final Implementation of a High Performance Computing Cluster at Florida Tech P. FORD, X. FAVE, K. GNANVO, R. HOCH, M. HOHLMANN, D. MITRA Physics and Space.
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
Tackling I/O Issues 1 David Race 16 March 2010.
Next Generation of Apache Hadoop MapReduce Owen
Joint Institute for Nuclear Research Synthesis of the simulation and monitoring processes for the data storage and big data processing development in physical.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
PEER 2003 Meeting 03/08/031 Interdisciplinary Framework Major focus areas Structural Representation Fault Systems Earthquake Source Physics Ground Motions.
AZURE REGIONS Only hyper-scale cloud provider to have presence in India.
© 2012 Eucalyptus Systems, Inc. Cloud Computing Introduction Eucalyptus Education Services 2.
U N C L A S S I F I E D LA-UR Leveraging VMware to implement Disaster Recovery at LANL Anil Karmel Technical Staff Member
Blazent / ServiceNow Messaging Guide. Transforming data into actionable intelligence Improve business outcomes by contextualizing data to make informed.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
1 ”MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs” John A. Stratton, Sam S. Stone and Wen-mei W. Hwu Presentation for class TDT24,
Univa Grid Engine Makes Work Management Automatic and Efficient, Accelerates Deployment of Cloud Services with Power of Microsoft Azure MICROSOFT AZURE.
Current Generation Hypervisor Type 1 Type 2.
Business Critical Application Platform
GWE Core Grid Wizard Enterprise (
Parallel Algorithm Design
Business Critical Application Platform
A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.
Infrastructure, Data Center & Managed Services
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
Example of usage in Micron Italy (MIT)
Optena: Enterprise Condor
Enterprise Program Management Office
How Dell, SAP and SUSE Deliver Value Quickly
Presentation transcript:

GPU Computing with Hartford Condor Week 2012 Bob Nordlund

Grid Hartford… Using Condor in our production environment since 2004 Computing Environment – Two pools (Hartford, CT and Boulder, CO) – Linux central managers and schedulers – Windows execute nodes (~7000 cores) – CycleServer from Cycle Computing LLC Workload – Mix of off-the-shelf tools and in-house custom software – Actuarial modeling – Financial reporting – Compliance – Enterprise risk management –Hedging –Stress testing 2

The Challenge… Compress the time it takes to compute market sensitivities to enable rapid response to large market movements – Current compute time: ~8 hours on ~3000 cores – Target: A.S.A.P.(P being practical) Compress the time it takes to simulate our hedging program – Current compute time: ~5 days on ~5000 cores – Target: 1 day Create a mechanism to calculate specific sensitivities in near real time Support Entire Model Portfolio: ~20 models Maintain Accuracy and Precision Enterprise IT Targets – Reduce Datacenter Footprint – Reduce Costs 3

The Approach…(everything’s on the table) Modeling – Variance Reduction – Optimize algorithms – Eliminate Redundant or Un-necessary Work Processes – Optimize submission pipeline – Reduce file transfers – Implement Master/Worker framework Models – Optimize code –Caching –Dynamic scenario generation – CUDA/OpenCL/OpenMP Infrastructure – Improve storage – GPUs 4

The Plan… Modeling – Test convergence with low-discrepancy sequences – Evaluate closed-form or replicating portfolio approach – Remove un-necessary workload Processes – Interleave scenario/liability/asset submissions – Improve nested stochastic analysis – Develop Master/Worker scheduling Models – Port model portfolio to CUDA – Optimize algorithms Purchase GPU Infrastructure – 250 NVIDIA Tesla 2070s 5

The Results… Modeling – Convergence achieved faster with low-discrepancy sequences –2x improvement – Removed non-essential tasks –2x improvement Processes – Streamlined submission pipeline for scenario/liability/assets – Eliminated ~1TB/run of file transfer – Using Work Queue for Master/Worker – 4-6x improvement 6

The Results (cont.)… Models – Developed code generator for CUDA –Automated development and end-user automation (priceless!) –Directly Compiled Spec Models – Ported entire model portfolio to CUDA (GPU) and C++ (CPU) – 40-60x improvement Infrastructure – 125 Servers with 250 M2070s – 3x reduction in data center footprint – 50% cost reduction Summary – Success! –Improved Performance –Reduced Cost – Improved our long-term capabilities 7

What’s Next? Complete integration of GPUs into our Condor environment Quickly find the GPU nodes – GPU = “None” – SLOT1_GPU =“NVIDIA” – SLOT2_GPU=“NVIDIA” – STARTD_EXPRS = $(STARTD_EXPRS), GPU Identify GPGPU submissions – +GPGPU=True Reserve Slots for GPGPU jobs – START=( ( ( SlotID 2) && (GPGPU =!= True) ) ) Work with Todd on GPU wish list – Benchmarking – Monitoring (corrupt memory, etc.) 8

What’s Next? Refine our job scheduling architecture Minimize Scheduling Overhead – Continue development on our Work Queue implementation – Leverage new Condor features – key_claim_idle? Optimize Work Distribution – Need to prevent starvation of fast GPU resources while still leveraging existing dedicated and scavenged CPUs – Integrate with CycleServer High-availability/disaster recovery – Persistent queues – Support for multiple resource pools 9

What’s Next? Expand Condor’s Hartford Condor for Server Utilization Monitoring – Install Condor on all servers – Improved reporting and, – Foot-in-the-door for scavenging! Condor in the Cloud Condor Interoperability (MS HPC Server) Evangelize Condor to ISVs 10

Bob Nordlund Enterprise Risk Management Technology The Hartford Thank you! 11