Experience with Using a Performance Predictor During Development a Distributed Storage System Tale Lauro Beltrão Costa *, João Brunet +, Lile Hattori #,

Slides:



Advertisements
Similar presentations
Towards Automating the Configuration of a Distributed Storage System Lauro B. Costa Matei Ripeanu {lauroc, NetSysLab University of British.
Advertisements

Dynamic Analysis of Windows Phone 7 apps Behrang Fouladi, SensePost.
1 A GPU Accelerated Storage System NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany Sathish Gopalakrishnan Matei.
Esma Yildirim Department of Computer Engineering Fatih University Istanbul, Turkey DATACLOUD 2013.
Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.
® IBM Software Group © 2006 IBM Corporation Rational Software France Object-Oriented Analysis and Design with UML2 and Rational Software Modeler 04. Other.
Yale LANS ShadowStream: Performance Evaluation as a Capability in Production Internet Live Streaming Networks Chen Tian Richard Alimi Yang Richard Yang.
1 The Case for Versatile Storage System NetSysLab The University of British Columbia Samer Al-Kiswany, Abdullah Gharaibeh, Matei Ripeanu.
MPI and C-Language Seminars Seminar Plan  Week 1 – Introduction, Data Types, Control Flow, Pointers  Week 2 – Arrays, Structures, Enums, I/O,
Swami NatarajanJune 17, 2015 RIT Software Engineering Reliability Engineering.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Matei Ripeanu.
Failure Avoidance through Fault Prediction Based on Synthetic Transactions Mohammed Shatnawi 1, 2 Matei Ripeanu 2 1 – Microsoft Online Ads, Microsoft Corporation.
Architectural Design Establishing the overall structure of a software system Objectives To introduce architectural design and to discuss its importance.
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
Presenter: Chi-Hung Lu 1. Problems Distributed applications are hard to validate Distribution of application state across many distinct execution environments.
Hands-On Microsoft Windows Server 2008 Chapter 8 Managing Windows Server 2008 Network Services.
Cost Effort Complexity Benefit Cloud Hosted Low Cost Agile Integrated Fully Supported.
Problems with reuse – Increased maintenance costs; lack of tool support; not-invented- here syndrome; creating, maintaining, and using a component library.
Computer System Lifecycle Chapter 1. Introduction Computer System users, administrators, and designers are all interested in performance evaluation. Whether.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Advisor: Professor.
A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu.
Chapter 10 Architectural Design
Energy Prediction for I/O Intensive Workflow Applications 1 Hao Yang, Lauro Beltrão Costa, Matei Ripeanu NetSysLab Electrical and Computer Engineering.
Hands-On Microsoft Windows Server 2008
Multi-core Programming Thread Profiler. 2 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Topics Look at Intel® Thread Profiler features.
Department of Computer Science Engineering SRM University
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
Performance analysis and prediction of physically mobile systems Point view: Computational devices including Mobile phones are expanding. Different infrastructure.
Monitoring Latency Sensitive Enterprise Applications on the Cloud Shankar Narayanan Ashiwan Sivakumar.
Emalayan Vairavanathan
1. 2 Corollary 3 System Overview Second Key Idea: Specialization Think GoogleFS.
1 Configurable Security for Scavenged Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany, Matei Ripeanu.
Pursuing Faster I/O in COSMO POMPA Workshop May 3rd 2010.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.
Energy Prediction for I/O Intensive Workflow Applications 1 MASc Exam Hao Yang NetSysLab The Electrical and Computer Engineering Department The University.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
A performance evaluation approach openModeller: A Framework for species distribution Modelling.
1 Introduction to Middleware. 2 Outline What is middleware? Purpose and origin Why use it? What Middleware does? Technical details Middleware services.
Introduction to the Adapter Server Rob Mace June, 2008.
1 Geospatial and Business Intelligence Jean-Sébastien Turcotte Executive VP San Francisco - April 2007 Streamlining web mapping applications.
Lucian Voinea Visualizing the Evolution of Code The Visual Code Navigator (VCN) Nunspeet,
Refining middleware functions for verification purpose Jérôme Hugues Laurent Pautet Fabrice Kordon
Software Engineering Prof. Ing. Ivo Vondrak, CSc. Dept. of Computer Science Technical University of Ostrava
1 MosaStore -A Versatile Storage System Lauro Costa, Abdullah Gharaibeh, Samer Al-Kiswany, Matei Ripeanu, Emalayan Vairavanathan, (and many others from.
Towards Exascale File I/O Yutaka Ishikawa University of Tokyo, Japan 2009/05/21.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.
Presented By:- Sudipta Dhara Roll Table of Content Table of Content 1.Introduction 2.How it evolved 3.Need of Middleware 4.Middleware Basic 5.Categories.
Precomputation- based Prefetching By James Schatz and Bashar Gharaibeh.
Compuware Corporation Deliver Reliable Applications Faster Dave Kapelanski Automated Testing Manager.
Introduction to Hardware Verification ECE 598 SV Prof. Shobha Vasudevan.
Experiments in Utility Computing: Hadoop and Condor Sameer Paranjpye Y! Web Search.
1 Chapter Overview Monitoring Access to Shared Folders Creating and Sharing Local and Remote Folders Monitoring Network Users Using Offline Folders and.
(re)-Architecting cloud applications on the windows Azure platform CLAEYS Kurt Technology Solution Professional Microsoft EMEA.
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
Release Management for Visual Studio 2013 Ana Roje Ivančić Ognjen Bajić Ekobit.
Basic Concepts of Software Architecture. What is Software Architecture? Definition: – A software system’s architecture is the set of principal design.
 Cloud Computing technology basics Platform Evolution Advantages  Microsoft Windows Azure technology basics Windows Azure – A Lap around the platform.
TIBCO Business Events Online Training. Introduction to TIBCO BE Tibco Business Events is complex event processing software with a powerful engine enables.
Profiling: What is it? Notes and reflections on profiling and how it could be used in process mining.
SDN challenges Deployment challenges
Fan Engagement Solution
Lecture 21 Concurrency Introduction
Parallel Algorithm Design
Hyper-V Cloud Proof of Concept Kickoff Meeting <Customer Name>
Software Defined Networking (SDN)
A Software-Defined Storage for Workflow Applications
Technical Capabilities
Overview of Workflows: Why Use Them?
A General Approach to Real-time Workflow Monitoring
Presentation transcript:

Experience with Using a Performance Predictor During Development a Distributed Storage System Tale Lauro Beltrão Costa *, João Brunet +, Lile Hattori #, Matei Ripeanu * * NetSysLab/ECE, UBC (University of British Columbia) + DSC, UFCG (Federal University of Campina Grande) # Microsoft Corp.

How it is (typically) done?  Profilers to monitor behaviour  They pinpoint code regions that take too long  will receive attention 2 Developers decide when they have reached “good-enough” efficiency High performance must be reached while keeping resource cost low

An Example  More Storage Nodes More Application Nodes  3 Cluster size: 20 Nodes Target performance is not obvious Wide performance variation among configurations. Application Time (seconds in log scale)

 Experience with using a performance predictor during the software development process  What are the limitations and challenges of using a performance predictor as part of the development process? 4

Context: A Distributed Storage System  MosaStore, a distributed storage system  A manager, several clients, several storage servers  Approximately 11,000 lines of code,  Around 15 developers involved over time 5 Code & papers at: MosaStore.net

Sources of complexity  Multiple interacting components with complex interactions  Complex data and control paths  Contention (network, component level)  Variability in the environment  Deployment choices (configuration, provisioning ) 6

Performance Predictor Performance Predictor 7  Supporting Storage configuration for I/O Intensive Workflows, L. B. Costa, S. Al- Kiswany, H. Yang, M. Ripeanu, ICS,’14  Energy Prediction for I/O Intensive Workflow Applications, H. Yang, L. B. Costa, M. Ripeanu, MTAGS’14

Development Flow 8

Performance Anomalies Case 1: Lack of Randomness Case 2: Lock Overhead Case 3: Connection Timeout 9

Benchmark Time (seconds) Actual vs. Expected 10 Actual vs. Predicted: Large Mismatch

Case 3: Connection Timeout ContextClient tries to establish a TCP connection Problem Too many clients try to connect, SYN packets dropped OS timeout to retry (3 seconds) Detection The developers logged and verified the service time of each component Fix Different implementation allowing custom timeout 11

Case 3: Impact Benchmark Time (seconds) Use of predictor made performance improvements possible 12

Some Other Cases Pipeline Reduce Up to 30% performance improvement Up to 10x smaller variance Benchmark Time (seconds) 13

Limitations and Challenges 1. Have accurate predictions  Well-know challenge in the area 2. Use of predictor during development  Lack of interest after initial improvements  There still is a decision related to overhead  Takes too long 14

Benefits of integrating a performance predictor  Brings confidence in the performance results obtained  Successful in pointing out scenarios that needed improvement  Support the improvement effort 15 Code & papers at: NetSysLab.ece.ubc.ca

Concluding Remarks  Every tool reflects a decision between the cost and the benefits of employing  Our study gives information to support these decisions  Predictor helps with this non-functional requirement  Up to 30% improvement, 10x less variability  Target performance is still not perfect  It offers guidance, but not perfect final target 16

Backup Slides  Debugging Support Debugging Support  Case 1: Lack of Randomness Case 1: Lack of Randomness  Case 2: Lock Overhead Case 2: Lock Overhead  Synthetic Benchmarks Synthetic Benchmarks  Storage System Model Storage System Model  MosaStore Deployment MosaStore Deployment  MosaStore execution path MosaStore execution path 17

Synthetic Benchmarks 18 Common patterns in the structure of workflows I/O only to stress the storage system

Debugging Support  Granularity of the predictor is per component (storage, client, manager)  Developers by turn on a logging option  measures the time from the reception of a request until its response  Once the buggy component and request are spotted, regular debugging starts 19

Case 1: Lack of Randomness 20 ContextClient obtains list of storage nodes from manager Problem Manager used same seed List of storage nodes was not shuffled Client accessing storage nodes in the same order Some nodes were hot-spots; others, idle Detection The developers logged and verified the service time of each component Fix Change algorithm that shuffles the list of storage nodes to use a different seed every time it is invoked

Case 2: Lock Overhead 21 ContextClients access manager for file’s metadata Problem Too many clients accessing the metadata Lock for large portions of the code Detection The developers logged and verified the service time of each component Fix Reduce the lock scope

Storage System Model 22 Net Manager Service Net Storage Service Network core In queue Out queue Service queue Net Client Service Scheduler Application Driver Properties:  General  Uniform  Coarse

MosaStore Deployment App. task Local storage App. task Local storage App. task Local storage Workflow-Optimized Storage (shared) Backend Filesystem (e.g., GPFS, NFS) Compute Nodes … Workflow Runtime Engine Stage In/Out Storage hints (e.g., location information) Application hints (e.g., indicating access patterns) POSIX API

MosaStore Execution Path 24