PRESTON SMITH ROSEN CENTER FOR ADVANCED COMPUTING PURDUE UNIVERSITY A Cost-Benefit Analysis of a Campus Computing Grid Condor Week 2011.

Slides:



Advertisements
Similar presentations
Condor use in Department of Computing, Imperial College Stephen M c Gough, David McBride London e-Science Centre.
Advertisements

LIBRA: Lightweight Data Skew Mitigation in MapReduce
GreenSlot: Scheduling Energy Consumption in Green Datacenters Íñigo Goiri, Kien Le, Md. E. Haque, Ryan Beauchea, Thu D. Nguyen, Jordi Guitart, Jordi Torres,
SLA-Oriented Resource Provisioning for Cloud Computing
OPNET Technologies, Inc. Performance versus Cost in a Cloud Computing Environment Yiping Ding OPNET Technologies, Inc. © 2009 OPNET Technologies, Inc.
HPC - High Performance Productivity Computing and Future Computational Systems: A Research Engineer’s Perspective Dr. Robert C. Singleterry Jr. NASA Langley.
1 Distributed Systems Meet Economics: Pricing in Cloud Computing Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of.
Computer Performance CS350 Term Project-Spring 2001 Elizabeth Cramer Bryan Driskell Yassaman Shayesteh.
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
Communication Pattern Based Node Selection for Shared Networks
Managing Risk of Inaccurate Runtime Estimates for Deadline Constrained Job Admission Control in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing.
Lincoln University Canterbury New Zealand Evaluating the Parallel Performance of a Heterogeneous System Elizabeth Post Hendrik Goosen formerly of Department.
Comments on The Progress of Computing William Nordhaus Iain Cockburn Boston University and NBER.
Understanding Application Scaling NAS Parallel Benchmarks 2.2 on NOW and SGI Origin 2000 Frederick Wong, Rich Martin, Remzi Arpaci-Dusseau, David Wu, and.
Virtualization Ryan Cahoon Timothy Farkas Christopher Garcia Jeremy Slovak.
Grid Computing Exposing the myths of desktop scavenging grids John Easton – IBM Grid computing
Cloud based Dynamic workflow with QOS for Mass Spectrometry Data Analysis Thesis Defense: Ashish Nagavaram Graduate student Computer Science and Engineering.
THE COST OF CONDOR: MEASURING POWER USAGE OF SCIENTIFIC COMPUTATION USING THE DESKTOP FLEET Supervisors: Brian Davis Sam Moskwa Summer Scholar: Monish.
Distributed Systems Meet Economics: Pricing In The Cloud Authors: Hongyi Wang, Qingfeng Jing, Rishan Chen, Bingsheng He, Zhengping He, Lidong Zhou Presenter:
April 2009 OSG Grid School - RDU 1 Open Science Grid John McGee – Renaissance Computing Institute University of North Carolina, Chapel.
High Performance Computing with cloud Xu Tong. About the topic Why HPC(high performance computing) used on cloud What’s the difference between cloud and.
An Introduction to Cloud Computing. The challenge Add new services for your users quickly and cost effectively.
Chapter 2 Computer Clusters Lecture 2.1 Overview.
A User Experience-based Cloud Service Redeployment Mechanism KANG Yu.
Implementing a Central Quill Database in a Large Condor Installation Preston Smith Condor Week April 30, 2008.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.
N Tropy: A Framework for Analyzing Massive Astrophysical Datasets Harnessing the Power of Parallel Grid Resources for Astrophysical Data Analysis Jeffrey.
ScotGrid: a Prototype Tier-2 Centre – Steve Thorn, Edinburgh University SCOTGRID: A PROTOTYPE TIER-2 CENTRE Steve Thorn Authors: A. Earl, P. Clark, S.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
Information Technology at Purdue Presented by: Dr. Gerry McCartney Vice President and CIO, ITaP HPC User Forum September 8-10, 2008 Using SiCortex SC5832.
Resource management system for distributed environment B4. Nguyen Tuan Duc.
PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.
Condor to Every Corner of Campus Condor Week 2010 Preston Smith Purdue University.
An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.
Open Science Grid For CI-Days Elizabeth City State University Jan-2008 John McGee – OSG Engagement Manager Manager, Cyberinfrastructure.
Combining the strengths of UMIST and The Victoria University of Manchester Utility Driven Adaptive Workflow Execution Kevin Lee School of Computer Science,
Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - Based on “Optimizing.
The Owner Share scheduler for a distributed system 2009 International Conference on Parallel Processing Workshops Reporter: 李長霖.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
Purdue Campus Grid Preston Smith Condor Week 2006 April 24, 2006.
An Investigation into Implementations of DNA Sequence Pattern Matching Algorithms Peden Nichols Computer Systems Research April,
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
Template This is a template to help, not constrain, you. Modify as appropriate. Move bullet points to additional slides as needed. Don’t cram onto a single.
Having a Blast! on DiaGrid Carol Song Rosen Center for Advanced Computing December 9, 2011.
Performance and Energy Efficiency Evaluation of Big Data Systems Presented by Yingjie Shi Institute of Computing Technology, CAS
GRID activities in Wuppertal D0RACE Workshop Fermilab 02/14/2002 Christian Schmitt Wuppertal University Taking advantage of GRID software now.
Open XDMoD Overview Tom Furlani, Center for Computational Research
Final Implementation of a High Performance Computing Cluster at Florida Tech P. FORD, X. FAVE, K. GNANVO, R. HOCH, M. HOHLMANN, D. MITRA Physics and Space.
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
Purdue RP Highlights TeraGrid Round Table May 20, 2010 Preston Smith Manager - HPC Grid Systems Rosen Center for Advanced Computing Purdue University.
E-FISCAL project Workshop 21 September 2011 Setting the scene for e-Infrastructure Cost analysis e-FISCAL methodology Draft Questionnaire Sandra Cohen.
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
Purdue RP Highlights TeraGrid Round Table November 5, 2009 Carol Song Purdue TeraGrid RP PI Rosen Center for Advanced Computing Purdue University.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
CernVM and Volunteer Computing Ivan D Reid Brunel University London Laurence Field CERN.
Stamatia Bibi1, Dimitris Katsaros2, Panayiotis Bozanis2
HPC In The Cloud Case Study: Proteomics Workflow
Lecture 2: Performance Evaluation
Introduction to Computer
Condor – A Hunter of Idle Workstation
Where are being used the OS?
Introduction to XSEDE Resources HPC Workshop 08/21/2017
Scientific Computing At Jefferson Lab
Haiyan Meng and Douglas Thain
Example of usage in Micron Italy (MIT)
Computer Services Business challenge
Implementation of a small-scale desktop grid computing infrastructure in a commercial domain    
Kajornsak Piyoungkorn,
Presentation transcript:

PRESTON SMITH ROSEN CENTER FOR ADVANCED COMPUTING PURDUE UNIVERSITY A Cost-Benefit Analysis of a Campus Computing Grid Condor Week 2011

Overview Introduction  The Problem  Significance of the Problem Methodology  Costs  Benchmarking  Capacity  Utility Findings Conclusions

Background Purdue University Campus Grid  Large, high throughput, computation resource – 42,000 processor cores Frequently linked to efforts to reduce IT costs  Claims include  Power savings, maximizing investment in IT  HPC resource using existing equipment  No marginal cost increase

The Problem What is the Additional Cost of Having a Campus Grid?  On top of existing IT investment  People say it’s basically zero – but how close is it in reality? An institution needs information for designing an HPC resource  Therefore, I define a model for identifying the costs and benefits of building a campus grid

Significance of the Problem

Appropriate Computations  IU study reports:  66% of all jobs on TeraGrid in were single-CPU jobs  80% of those jobs ran for two hours or less  Purdue University  35.4 million single-core serial jobs in  Average runtime of 1.35 hours This is 21% of all HPC hours consumed at Purdue. A large amount of work is appropriate for a campus grid

Significance of the Problem Size of Grid Resource  27,000 desktop machines at Purdue  2 cores per machine – 54,000 cores on desktops  30,000 cores of HPC clusters  84,000 cores potentially usable by the grid  40,000 used by the grid today Only 17 systems on 2010 Top 500 with more than 40,000 cores!  200 TF theoretical performance – top 20 machine

Significance of the Problem Power Cost of Desktop Computers  111W idle  160W at full load Purdue’s 27,000 desktops  2.99 MW/hour, for a total of 26,253.7 MW per year Idle to fully loaded  Estimated additional cost of $393, per year

Methodology Identify and calculate baseline costs  Clusters  Desktop, student lab IT Identify and calculate additional costs  Staff, power, hardware Measure capacity of the grid  Sample the state of the grid over 2-week period Benchmark  Condor nodes  Amazon EC2 Normalize Costs  To Amazon EC2 Collect and Report Output of Grid  Cost per productivity metric

Pre-Normalized Costs Per Hour Cost Labs$ Steele$ Coates$ Condor$0.03 EC2$0.17 Labs, Steele, and Coates are all derived from Purdue TCO data “Condor” is average of all three “EC2” is retail price (per core) of EC2 “Large” instance

Normalization Internal benchmarks  Condor runs and presents two predefined benchmarks  Kflops (LINPACK)  MIPS (Dhrystone) Are these benchmarks meaningful enough to normalize cost? Application benchmark  Use a benchmark that relates to real performance of an application  NAS Parallel Benchmarks  Single CPU BT, SP, Class C

Normalization - Benchmarks EC2 is 17.1% faster than avg Purdue Condor nodes.829 cost scaling factor

Normalized Cost Model C n : normalized per-core-hour cost C h : pre-normalized per core-hour cost F n : a constant representing the normalizing factor of one hour on the grid to 1 hour on EC2. Normalized core-hour cost: $ $

Additional Costs ItemTotal yearly Cost Systems Engineering (1 FTE) $73, User Support (.75 FTE) $55, Distributed IT Staff (.1 FTE) $11, Additional Power Load $290, Amortized Over 5 Years Submit Nodes $6, Checkpoint Servers,etc $8, Total$433,502.01

Additional Costs – Per Core Hour C a : additional per-core-hour cost E y : total yearly additional cost of operating the grid S a : total available slots in the grid H y : total hours in a year $433, , tenths of one cent!

Power Costs Actual additional power expense

Total Cost C t : total per-core-hour cost C n : total normalized base cost of the campus grid C a : total additional cost per core hour Total per core-hour cost: $.03985

Scientific Output – Raw Metrics Year Unique users Unique Pis Unique PI Depts Fields of ScienceJobsHours M M4.61 M M8.17 M M16.6 M M17.9 M M18.6 M From Rosen Center Usage Metrics

Solutions or Publications as Metrics Solutions per unit of time is one metric recommended in the literature  How much good computation was done in those millions of hours?  But, from the perspective of the institution, this is hard to obtain  Only the user knows how many of these jobs were scientifically useful! Publications are the end goal of research, so they are an excellent measure of output  Unfortunately no data exists on publications directly attributable to the campus grid

Per-Metric Costs C tot : the total per-core-hour cost per unit of M u H y : total hours provided in a year C t : total cost of an hour of use in the campus grid M u : metric of use (such as users or PIs) users $ ,945,723 $3, per user 2005:

Additional Costs, Per Metric Per UserPer PIPer PI Dept Per Field of Science 2005$284.75$889.83$1,423.73$1, $241.13$625.14$1, $259.87$597.69$1,867.78$1, $528.17$1,012.32$4,672.24$3, $403.96$774.66$3,658.12$4, $469.54$861.81$3,404.15$4, Per UserPer PIPer PI Dept Per Field of Science Average$364.57$793.58$2,760.08$2, Average 11.3 Million Hours Average 105 usersAverage 107,300 hours per user $ a user

Additional Costs, Per Metric

Summary Measured Relative Performance of Grid Nodes .829 relative to Amazon EC2 Developed Models and Calculated Per-Core Hour Costs:  Normalized: $  Additional: $  Total: $ Calculated Costs per unit of Several Metrics  For example: For each user of the grid in 2010  Additional cost to Purdue is $ On average, each user costs Purdue an extra $364.57

Recommendation A campus grid is indeed a cost- effective way to create a useful HPC resource  Any institution with a substantial investment in an IT infrastructure should consider a campus grid to support HPC Questions?

The End Questions?