Simulating Condor Stephen McGough, Clive Gerrard & Jonathan Noble Newcastle University Paul Robinson, Stuart Wheater Arjuna Technologies Limited Condor.

Slides:



Advertisements
Similar presentations
Condor use in Department of Computing, Imperial College Stephen M c Gough, David McBride London e-Science Centre.
Advertisements

Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
SAVE ENERGY: This means you!. I. Abstract proposing a scheme that can conserve electricity and by following the proposed scheme – one can save electricity.
Display Power Management Policies in Practice Stephen P. Tarzia Peter A. Dinda Robert P. Dick Gokhan Memik Presented by: Andrew Hahn.
C LOUD C OM 2012 Self-Adaptive Management of The Sleep Depths of Idle Nodes in Large Scale Systems to Balance Between Energy Consumption and Response Times.
Walter Binder University of Lugano, Switzerland Niranjan Suri IHMC, Florida, USA Green Computing: Energy Consumption Optimized Service Hosting.
Building Campus HTC Sharing Infrastructures Derek Weitzel University of Nebraska – Lincoln (Open Science Grid Hat)
INTRODUCTION Our goal is to reduce the amount of CO2 through limiting the amount of energy being wasted through computers. Electricity makes up 63% of.
Intelligent Power Management over Large Clusters Stephen McGough *, Clive Gerrard *, Paul Haldane *, Sindre Hamlander +, Paul Robinson +, Dave Sharples.
Making HTCondor Energy Efficient by identifying miscreant jobs Stephen McGough, Matthew Forshaw & Clive Gerrard Newcastle University Stuart Wheater Arjuna.
Reducing the Energy Usage of Office Applications Jason Flinn M. Satyanarayanan Carnegie Mellon University Eyal de Lara Dan S. Wallach Willy Zwaenepoel.
Business Intelligence Brian Cox Margie Jantti. Quick historical context Client Satisfaction measures.
Grid Computing Exposing the myths of desktop scavenging grids John Easton – IBM Grid computing
Idle virtual machine detection in FermiCloud Giovanni Franzini September 21, 2012 Scientific Computing Division Grid and Cloud Computing Department.
Sustainability is Good for Business PURCHASING. Introduction –Our customers are looking for accommodation that is working to improve its sustainability.
Cutting the Electric Bill for Internet-Scale Systems Andreas Andreou Cambridge University, R02
CoolAir Temperature- and Variation-Aware Management for Free-Cooled Datacenters Íñigo Goiri, Thu D. Nguyen, and Ricardo Bianchini 1.
High Throughput Computing with Condor at Purdue XSEDE ECSS Monthly Symposium Condor.
OPTIMAL SERVER PROVISIONING AND FREQUENCY ADJUSTMENT IN SERVER CLUSTERS Presented by: Xinying Zheng 09/13/ XINYING ZHENG, YU CAI MICHIGAN TECHNOLOGICAL.
Cloud Usage Overview The IBM SmartCloud Enterprise infrastructure provides an API and a GUI to the users. This is being used by the CloudBroker Platform.
GPU Performance Prediction GreenLight Education & Outreach Summer Workshop UCSD. La Jolla, California. July 1 – 2, Javier Delgado Gabriel Gazolla.
Enhanced power efficient sleep mode operation for IEEE e based WiMAX Shengqing Zhu, and Tianlei Wang IEEE Mobile WiMAX Symposium, 2007 IEEE Mobile.
Ian C. Smith Towards a greener Condor pool: adapting Condor for use with energy-efficient PCs.
Power Save Mechanisms for Multi-Hop Wireless Networks Matthew J. Miller and Nitin H. Vaidya University of Illinois at Urbana-Champaign BROADNETS October.
Y. Kotani · F. Ino · K. Hagihara Springer Science + Business Media B.V Reporter: 李長霖.
By: Wyatt Berlinic and Lyndon Chan. Outline Possible interpretations – Global – Home – University – Engineering Our interpretation – Focus on computers.
Experiences with a HTCondor pool: Prepare to be underwhelmed C. J. Lingwood, Lancaster University CCB (The Condor Connection Broker) – Dan Bradley
Xiao Liu, Jinjun Chen, Ke Liu, Yun Yang CS3: Centre for Complex Software Systems and Services Swinburne University of Technology, Melbourne, Australia.
Condor In Flight at The Hartford 2006 Transformations Condor Week 2007 Bob Nordlund.
Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - Based on “Optimizing.
The Owner Share scheduler for a distributed system 2009 International Conference on Parallel Processing Workshops Reporter: 李長霖.
How much does it cost you to run your computers? (a quick overview)
BOF: Megajobs Gracie: Grid Resource Virtualization and Customization Infrastructure How to execute hundreds of thousands tasks concurrently on distributed.
Exam Covers everything not covered by the last exam. Starts with simulation If you cant’ get the optimal solution – get a pretty good one. You may run.
Cell Zooming for Cost-Efficient Green Cellular Networks
Derek Wright Computer Sciences Department University of Wisconsin-Madison MPI Scheduling in Condor: An.
Dana Butnariu Princeton University EDGE Lab June – September 2011 OPTIMAL SLEEPING IN DATACENTERS Joint work with Professor Mung Chiang, Ioannis Kamitsos,
Update on replica management
An Investigation into Implementations of DNA Sequence Pattern Matching Algorithms Peden Nichols Computer Systems Research April,
ADK ComponentConfiguration Manager Site System Windows Deployment ToolsCentral Administration Site Server Primary Site Server All SMS Provider.
June 30 - July 2, 2009AIMS 2009 Towards Energy Efficient Change Management in A Cloud Computing Environment: A Pro-Active Approach H. AbdelSalamK. Maly.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
Elastic Cloud Caches for Accelerating Service-Oriented Computations Gagan Agrawal Ohio State University Columbus, OH David Chiu Washington State University.
Performance and Energy Efficiency Evaluation of Big Data Systems Presented by Yingjie Shi Institute of Computing Technology, CAS
Timeshared Parallel Machines Need resource management Need resource management Shrink and expand individual jobs to available sets of processors Shrink.
#watitis2015 TOWARD A GREENER HORIZON: PROPOSED ENERGY SAVING CHANGES TO MFCF DATA CENTERS Naji Alamrony
Computer Simulations of Liquid Crystal Models using Condor C. Chiccoli (1, P. Pasini (1, F. Semeria (1, C. Zannoni (2 ( 1 INFN Sezione di Bologna,
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
Power Guru: Implementing Smart Power Management on the Android Platform Written by Raef Mchaymech.
The Product Flow Efficiency Project By Martyn Bates Monitor Coatings limited.
Remote Power Manager (PowerMan)
A University Challenge Program Save energy and the environment.
Campus Grid Technology Derek Weitzel University of Nebraska – Lincoln Holland Computing Center (HCC) Home of the 2012 OSG AHM!
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
Energy Mythbusters!. For most schools hot water is provided by your boiler – which is in turn controlled by your TREND Building Management System It’s.
Saving Energy At Work and Beyond. © Business & Legal Reports, Inc Session Objectives Conservation and sustainability Energy conservation Energy.
= Computer Power Consumption and Management on Kansas State’s Campus
Jacob R. Lorch Microsoft Research
Green cloud computing 2 Cs 595 Lecture 15.
Outline Introduction Related Work
Condor – A Hunter of Idle Workstation
High Availability in HTCondor
Office of Revenue Analysis – District of Columbia Government
PA an Coordinated Memory Caching for Parallel Jobs
Energy Saving Guide How much in Energy Savings are you missing out on?
A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.
Energy Smart Client Solutions
Energy Saving Guide How much in Energy Savings are you missing out on?
Complete chart for 18 minutes
Introduction to research computing using Condor
Presentation transcript:

Simulating Condor Stephen McGough, Clive Gerrard & Jonathan Noble Newcastle University Paul Robinson, Stuart Wheater Arjuna Technologies Limited Condor Week 2012

Overview Motivation and Background Condor Simulation Power Management Evaluation Conclusion

Overview Motivation and Background Condor Simulation Power Management Evaluation Conclusion

Motivation Newcastle University has strong desire to reduce energy consumption – Currently powering down computer & buying low power PCs – “If a computer is not ‘working’ it should be powered down” Can we go further to reduce wasted time? – Reduce computer idle time – Identify wasteful work sooner? We have a number of policies we’d like to evaluate – Difficult on running system, measuring power Aims – Investigate policy for reducing energy consumption – Determine the impact on high-throughput users

Condor At Newcastle Comprises of ~1300 open-access computers based around campus in 35 ‘clusters’ All computers at least dual core, moving to quad / 8 core Job SubmissionsUser Logins

Cluster Locations Old Library Basement Cluster room Needs heating all year PUE < 1 (offset heat from computers against room heating) (Average idle time between users < 5 hours) Old Library Basement Cluster room Needs heating all year PUE < 1 (offset heat from computers against room heating) (Average idle time between users < 5 hours) MSc Computing Cluster South facing cluster room in High tower. PUE > 1 (needs air-con all year) (Average idle time between users < 8 hours) MSc Computing Cluster South facing cluster room in High tower. PUE > 1 (needs air-con all year) (Average idle time between users < 8 hours) Robinson Library Very high turnover and usage of computers room is hot and sunny (PUE > 1, Average idle time between users < 2 hours) Robinson Library Very high turnover and usage of computers room is hot and sunny (PUE > 1, Average idle time between users < 2 hours) School of Chemistry (Chart) Very low usage of Computers (PUE ~ 1, Average idle time between users ~23 hours) School of Chemistry (Chart) Very low usage of Computers (PUE ~ 1, Average idle time between users ~23 hours) Power Usage Effectiveness (PUE) – depends on location of computer (and time) Power Efficiency: efficiency = flops/(PUE ∗ watts)

Overview Motivation and Background Condor Simulation Power Management Evaluation Conclusion

Condor Simulation High Level Simulation of Condor – Trace logs from the last year are used as input User Logins / Logouts (computer used) Condor Job Submission times (and duration) Cluster open times and and policy Active User / Condor Active User / Condor Idle Sleep

Overview Motivation and Background Condor Simulation Power Management Evaluation Conclusion

Power State Policy P1: Computers are always on P2: On during cluster open hours and off otherwise, no mechanism to wake up P3: Computers sleep after n minutes of inactivity with no remote wake up P4: Sleep after n minutes of inactivity but can be remotely woken up P5: Sleep after n mins of inactivity but Condor is only informed every m mins 580

Computer Selection Policy S1: No preference (random) S2: Target most energy efficient computers S3: Target least used computers – Least number of interactive logins – Largest intervals between logouts and logins

Management Policy M1: Computer is idle for at least n minutes before a Condor job can run on it M2: If a job is started more than n times mark it as ‘miscreant’ and don’t re-start

Cluster Change Policy C1: Dedicated computers for ‘miscreant’ jobs – Run these jobs on computers where they can’t be evicted C2: High-throughput jobs defer nightly reboots C3: High-throughput jobs use computers at the same time as interactive users

Overview Motivation and Background Simulating Condor Power Management Evaluation Conclusion

We can save energy (with minimal user impact) – P4 is the most optimal policy – S3 – greater impact on overhead – S2 – greater impact on power consumption These could be merged – M2 can kill off lots of good jobs Fix this by using C1 – Benefits of C2 and C3 lost due to number of miscreant jobs Need a better way to identify these – Policies are not mutually exclusive could save ~70MWh (~60% of current usage) without significant impact on high-throughput user – Powering down cluster saves the most energy Looking for other uses – Already simulated running jobs on Cloud – Do others have data we could use?

Questions?