Energy Prediction for I/O Intensive Workflow Applications 1 MASc Exam Hao Yang NetSysLab The Electrical and Computer Engineering Department The University of British Columbia
Background - Workflow Applications 2 Montage Workflow Computation File Dependency Characteristics: File based communication Large number of tasks Large amount of I/O Common data access patterns
Background - Application Execution 3 Central Storage System (e.g., GPFS, NFS) File based communication Large I/O volume Workflow Runtime Engine App. task Local storage App. task Local storage App. task Local storage App. task Local storage App. task Local storage I/O Bottleneck
Background - Intermediate Storage System 4 Central Storage System (e.g., GPFS, NFS) App. task Local storage App. task Local storage App. task Local storage Intermediate Storage … Workflow Runtime Engine Stage In Stage Out Compute Nodes
5 Background - Context of this thesis This work focuses on workflow application execution on intermediate storage systems.
Research Problem – Energy Consumption The pursuit of performance use to dominate the conventional computing area. Energy efficiency is the new concern. 6 Computing Equipment Energy Bill
Research Problem - Configuration Decisions 7 Montage WorkloadEnergy Delay Product (EDP) Configuring the runtime system is complex (Example: resource allocation decision)
Q1: What performance optimizations in storage systems lead to energy savings? Q2: What is the performance and energy impact of power- centric tuning techniques? Q3: How can users balance time-to-solution and energy consumption when given a target application? 8 Research Problem - Questions
Outline Background Research Problem Methodology Evaluation Conclusion 9
Methodology – Building Energy Consumption Predictor The goal of this work is to build an energy consumption predictor to aid system configuration and provisioning decisions. Answer what-if questions (E.g, is A configuration better than B from the energy perspective?) Customize optimization metric (E.g., energy consumption, performance-energy product) 10
Methodology – Energy Model 11 App. task Local storage App. task Local storage App. task Local storage Intermediate Storage … Compute Nodes Execution States: Idle Network Transfer Storage I/O Task Processing AC D App. task Local storage B Workflow Runtime Engine Power Profiles:
Methodology – Energy Model 12 Idle Network Transfer I/O ops (read, write) Task Processing Energy Power Profile * Predicted Times Execution States:
Methodology – Energy Model How to seed the energy model? Power states: using synthetic benchmarks to get the power consumption in each state. Time estimates: augments a performance predictor to track the time spent in each state. 13
Methodology – Building Energy Consumption Predictor 14 L. B. Costa, S. Al-Kiswany, H. Yang, and M. Ripeanu, “Supporting Storage Configuration for I/O Intensive Workflows”, In Proceedings of the 28th ACM International Conference on Supercomputing, ICS'14, (Acceptance Rate: 20%) June L. B. Costa, S. Al-Kiswany, A. Barros, H. Yang, and M. Ripeanu, “Predicting Intermediate Storage Performance for Workflow Applications”, In Proceedings PDSW'13, Sources of inaccuracies homogeneity, Power meter Time Prediction Model Simplification (metadata, scheduling, …)
Evaluation Outline 15 Synthetic benchmarks: Workflow Patterns Real workflow applications Predicting Energy Impact of Power-tuning Techniques Predicting Energy-Performance Tradeoffs
Evaluation - Platform 16 Taurus Cluster (11 nodes) two 2.3GHz Intel Xeon E CPUs (each with 6 cores), 32GB memory, 10 Gbps NIC Sagittaire Cluster (16 nodes) two 2.4GHz AMD Opteron CPUs (each with one core), 2GB RAM and 1 Gbps NIC SME Omegawatt power-meter per Node 0.01W power resolution at 1Hz sampling rate Grid5000 Lyon site Idle App Storage I/O Net transfer
Evaluation – Synthetic benchmarks: Workflow Patterns 17 Montage Workflow Pipeline Reduce
Evaluation – Synthetic benchmarks: Workflow Patterns 18
Evaluation – Synthetic benchmarks: Workflow Patterns 19 Average 88% accuracy 20-30x times faster than running the actual benchmark 200x-300x less resources (machines * runtime) Using Default Storage System Configuration (DSS)
Evaluation – Synthetic benchmarks: Workflow Patterns 20 S. Al-Kiswany, L. B. Costa, H. Yang, E. Vairavanathan, M. Ripeanu, “The Case for Cross-Layer Optimizations in Storage: A Workflow-Optimized Storage System”, IEEE Transactions on Parallel and Distributed Systems (TPDS), Under Review, Submitted in June 2014 L.B. Costa, H. Yang, E. Vairavanathan, A. Barros, K. Maheshwari, G. Fedak, D.S. Katz, M. Wilde, M. Ripeanu and S. Al-Kiswany, “The Case for Workflow-Aware Storage: An Opportunity Study using MosaStore”, Journal of Grid Computing Pipeline Energy Consumption DSS – Default Storage System Configuration WOSS – Workflow Optimized Storage System Configuration Q1: What are the energy savings that performance optimizations in storage can bring? Accurate in both configurations. Suggests the configuration from energy perspective.
Evaluation – Real Workflow Applications 21 BLAST workflowMontage workflow
Evaluation – Real Workflow Applications 22 BLAST Result (Energy 89%, Time 95% ) Montage Result (Energy 84%, Time 86% )
Evaluation – CPU Throttling 23 CPU throttling is an important technique where processors run at less-than-maximum frequency to conserve power. this technique can prolong the execution time while conserving instantaneous power. Q2: What is the energy and performance impact of CPU throttling? Is it application- specific? CPU bound application: BLAST I/O bound application: pipeline benchmark
Evaluation – CPU Throttling 24 BLAST Result Pipeline Result EnergyTime Energy Time 17% savings when using maximum throttling 96% cost when using maximum CPU throttling Frequency Level: 1200MHz, 1800MHz, 2300MHz Conclusion: The computational and I/O characteristics Energy savings/ energy costs The predictor can be used in make the decisions.
Evaluation – Predicting Energy Delay Product 25 User’s optimization metric Performance (use more machines) Energy Energy-Delay Product (EDP, energy * time) Consider allocation decision. Use Montage workload on two clusters to demonstrate prediction. Q3: How can users balance time-to-solution and energy consumption when given a target application?
Evaluation – Predicting Energy Delay Product 26 Montage EDP at Taurus Montage EDP at Sagittaire
Conclusion This thesis presents an energy consumption predictor in the workflow application domain. The proposed energy model and prediction framework achieve adequate accuracy to be useful for the energy- oriented configurations this work targets. 27
Resulting Publications Energy Prediction H. Yang, L. B. Costa and M. Ripeanu, “Energy Prediction for I/O Intensive Workflows Applications”, submitted to 7th Workshop on Many-Task Computing on Clouds, Grids, and Supercomputers (MTAGS) 2014 (Co-located with Supercomputing/SC 2014), under-review. Performance Prediction and Provisioning L. B. Costa, S. Al-Kiswany, H. Yang, and M. Ripeanu, “Supporting Storage Configuration and Provisioning for I/O Intensive Workflows”, In Preparation. L. B. Costa, S. Al-Kiswany, H. Yang, and M. Ripeanu, “Supporting Storage Configuration for I/O Intensive Workflows”, In Proceedings of ICS'14, Acceptance rate: 20%. June L. B. Costa, S. Al-Kiswany, A. Barros, H. Yang, and M. Ripeanu, “Predicting Intermediate Storage Performance for Workflow Applications”, In Proceedings PDSW'13, Evaluating Storage Systems for Scientific Data in the Cloud K. Maheshwari, J. Wozniak, H. Yang, D. S. Katz, M. Ripeanu, V. Zavala, M. Wilde, “Evaluating Storage Systems for Scientific Data in the Cloud”, In Proceedings of the 5th Workshop on Scientific Cloud Computing (ScienceCloud), Co-located with ACM HPDC 2014 (Best Paper Award) A Workflow-Optimized Storage System S. Al-Kiswany, L. B. Costa, H. Yang, E. Vairavanathan, M. Ripeanu, “A Software Defined Storage for Scientific Workflow Applications”, In Preparation. S. Al-Kiswany, L. B. Costa, H. Yang, E. Vairavanathan, M. Ripeanu, “The Case for Cross-Layer Optimizations in Storage: A Workflow-Optimized Storage System”, IEEE Transactions on Parallel and Distributed Systems (TPDS), Under Review, Submitted in June 2014 L.B. Costa, H. Yang, E. Vairavanathan, A. Barros, K. Maheshwari, G. Fedak, D.S. Katz, M. Wilde, M. Ripeanu and S. Al-Kiswany, “The Case for Workflow-Aware Storage: An Opportunity Study using MosaStore”, accepted by Journal of Grid Computing, 2014.
29 The system model Model seeding Workload description System Deployment Configuration Number of Storage Nodes Number of Client Nodes Chunk Size Replication Level … Platform Performance Parameters Manger Service Time Storage Service Time Client Service Time Remote network service Time Local network service time I/O traces Task Dependency Graph L. B. Costa, S. Al-Kiswany, H. Yang, and M. Ripeanu, “Supporting Storage Configuration for I/O Intensive Workflows”, In Proceedings of the 28th ACM International Conference on Supercomputing, ICS'14, June Backup Slides
Limitations: Simplification of the model Short tasks/ small workload Not validated using new devices (e.g, SSD) 30 Backup Slides
Alternative Approaches: Utilization Detailed simulation Machine learning 31 Backup Slides
32 Backup Slides Combined states
Energy Composition (pipeline benchmark): Idle energy: 64% App processing: 9.2% Storage operations: 15.8% Network transfer: 10.6% 33 Backup Slides
Sagittaire power profiles 34 Backup Slides 175W 25W 8W 7W