Experience with Using a Performance Predictor During Development a Distributed Storage System Tale Lauro Beltrão Costa *, João Brunet +, Lile Hattori #,

Experience with Using a Performance Predictor During Development a Distributed Storage System Tale Lauro Beltrão Costa *, João Brunet +, Lile Hattori #, Matei Ripeanu * * NetSysLab/ECE, UBC (University of British Columbia) + DSC, UFCG (Federal University of Campina Grande) # Microsoft Corp.

How it is (typically) done?  Profilers to monitor behaviour  They pinpoint code regions that take too long  will receive attention 2 Developers decide when they have reached “good-enough” efficiency High performance must be reached while keeping resource cost low

An Example  More Storage Nodes More Application Nodes  3 Cluster size: 20 Nodes Target performance is not obvious Wide performance variation among configurations. Application Time (seconds in log scale)

 Experience with using a performance predictor during the software development process  What are the limitations and challenges of using a performance predictor as part of the development process? 4

Context: A Distributed Storage System  MosaStore, a distributed storage system  A manager, several clients, several storage servers  Approximately 11,000 lines of code,  Around 15 developers involved over time 5 Code & papers at: MosaStore.net

Sources of complexity  Multiple interacting components with complex interactions  Complex data and control paths  Contention (network, component level)  Variability in the environment  Deployment choices (configuration, provisioning ) 6

Performance Predictor Performance Predictor 7  Supporting Storage configuration for I/O Intensive Workflows, L. B. Costa, S. Al- Kiswany, H. Yang, M. Ripeanu, ICS,’14  Energy Prediction for I/O Intensive Workflow Applications, H. Yang, L. B. Costa, M. Ripeanu, MTAGS’14

Development Flow 8

Performance Anomalies Case 1: Lack of Randomness Case 2: Lock Overhead Case 3: Connection Timeout 9

Benchmark Time (seconds) Actual vs. Expected 10 Actual vs. Predicted: Large Mismatch

Case 3: Connection Timeout ContextClient tries to establish a TCP connection Problem Too many clients try to connect, SYN packets dropped OS timeout to retry (3 seconds) Detection The developers logged and verified the service time of each component Fix Different implementation allowing custom timeout 11

Case 3: Impact Benchmark Time (seconds) Use of predictor made performance improvements possible 12

Some Other Cases Pipeline Reduce Up to 30% performance improvement Up to 10x smaller variance Benchmark Time (seconds) 13

Limitations and Challenges 1. Have accurate predictions  Well-know challenge in the area 2. Use of predictor during development  Lack of interest after initial improvements  There still is a decision related to overhead  Takes too long 14

Benefits of integrating a performance predictor  Brings confidence in the performance results obtained  Successful in pointing out scenarios that needed improvement  Support the improvement effort 15 Code & papers at: NetSysLab.ece.ubc.ca

Concluding Remarks  Every tool reflects a decision between the cost and the benefits of employing  Our study gives information to support these decisions  Predictor helps with this non-functional requirement  Up to 30% improvement, 10x less variability  Target performance is still not perfect  It offers guidance, but not perfect final target 16

Backup Slides  Debugging Support Debugging Support  Case 1: Lack of Randomness Case 1: Lack of Randomness  Case 2: Lock Overhead Case 2: Lock Overhead  Synthetic Benchmarks Synthetic Benchmarks  Storage System Model Storage System Model  MosaStore Deployment MosaStore Deployment  MosaStore execution path MosaStore execution path 17

Synthetic Benchmarks 18 Common patterns in the structure of workflows I/O only to stress the storage system

Debugging Support  Granularity of the predictor is per component (storage, client, manager)  Developers by turn on a logging option  measures the time from the reception of a request until its response  Once the buggy component and request are spotted, regular debugging starts 19

Case 1: Lack of Randomness 20 ContextClient obtains list of storage nodes from manager Problem Manager used same seed List of storage nodes was not shuffled Client accessing storage nodes in the same order Some nodes were hot-spots; others, idle Detection The developers logged and verified the service time of each component Fix Change algorithm that shuffles the list of storage nodes to use a different seed every time it is invoked

Case 2: Lock Overhead 21 ContextClients access manager for file’s metadata Problem Too many clients accessing the metadata Lock for large portions of the code Detection The developers logged and verified the service time of each component Fix Reduce the lock scope

Storage System Model 22 Net Manager Service Net Storage Service Network core In queue Out queue Service queue Net Client Service Scheduler Application Driver Properties:  General  Uniform  Coarse

MosaStore Deployment App. task Local storage App. task Local storage App. task Local storage Workflow-Optimized Storage (shared) Backend Filesystem (e.g., GPFS, NFS) Compute Nodes … Workflow Runtime Engine Stage In/Out Storage hints (e.g., location information) Application hints (e.g., indicating access patterns) POSIX API

MosaStore Execution Path 24

Experience with Using a Performance Predictor During Development a Distributed Storage System Tale Lauro Beltrão Costa *, João Brunet +, Lile Hattori #,

Similar presentations

Presentation on theme: "Experience with Using a Performance Predictor During Development a Distributed Storage System Tale Lauro Beltrão Costa *, João Brunet +, Lile Hattori #,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Experience with Using a Performance Predictor During Development a Distributed Storage System Tale Lauro Beltrão Costa *, João Brunet +, Lile Hattori #,

Similar presentations

Presentation on theme: "Experience with Using a Performance Predictor During Development a Distributed Storage System Tale Lauro Beltrão Costa *, João Brunet +, Lile Hattori #,"— Presentation transcript:

Similar presentations

About project

Feedback