StorPerf - Cinder Storage Performance Measurement Mark Beierl, EMC
Where Did We Start SNIA[1] Solid State Storage Performance Test Specification “Manufacturers need to set, and customers need to compare, the performance of Solid State Storage (SSS) devices. This Specification defines a set of device level tests and methodologies intended to enable comparative testing of SSS devices in Enterprise systems” Guide to obtain reliable and comparative measurements Tests and methodologies Use known and repeatable test stimulus Based on guidelines from SNIA SSS Technical Working Group Isolates the device being tested from the test platform [1]Storage Networking Industry Association: http://www.snia.org
Goals of the StorPerf Project To provide a report based on SNIA’s Performance Test Specification Tests Cinder block storage using specified Cinder drivers Measures Latency Throughput in Bytes/Second IOPS Across a matrix of Block Sizes Queue Depths this spec was designed for testing SSDs, but we’ve extended/applied it to this test case because of the insidious effects of SSD behavior from initialization to steady state. To talk about how we see SSD as being an important technology for deployment in remote POPs due to reduced maintenance. And, even given all that, how we’ve extended and applied the testing provides relevance to HDD testing as well, as is currently the case in Pharos Pods.
Architecture Single Docker container controls test run Using Heat, creates: Volumes in Cinder VMs in Nova Private network and security group for VMs Floating IP addresses Container has ReST API (Swagger) for control and reports Graphite can also be exposed for ad hoc queries Carbon DB stores all metrics
Architecture
Challenges Logical Volumes Time SSD and HD have very different performance characteristics SSD exhibits transient elevated performance when fresh HD exhibits better performance after cache warm up Logical Volumes Not testing raw disk Cannot control cache or initialization Each storage implementation has its own performance characteristics Time Initializing large Cinder volumes can take a long time
Test Definitions Steady State Transition Zone A device is in Steady State when for a given (y) Max(y) – Min(y) is less than 20% of avg(y) Slope(y) is less than 10% Transition Zone State when volume’s performance is changing, such as during initial data fill Max Slope Min
Test Concepts Steady State Verification Ensure the initial performance or transition states are not reported as typical Do not report warm up or volume data fill Capture meaningful indication of volume’s performance during the bulk of its operating life Verification By inspection, while issuing IO requests Statistical analysis of last N rounds of sample data to determine when steady state has been reached
Test Concepts Active Range Data Patterns Queue Depth Definition of how much of the underlying Cinder pool is under test Can span from 1GB to 100% of all available block storage Data Patterns All tests are run with random data patterns FIO is the data generation utility See: http://freecode.com/projects/fio Queue Depth The number of outstanding I/O operations kept in flight throughout the test duration
Test Concepts Latency IOPS Workload Caching Duration between issuance of I/O request and its completion IOPS I/O operations per second, regardless of block size Workload Generation of I/O requests Specifies Read, Write, or mix of both operations Specifies sequential or random volume data access Caching StorPerf relies on Cinder Driver for tuning
Overview of Test Flow Configure environment: Number of agents. This controls the width of the run and simulates heavier load on Cinder Size of Cinder volumes per agent Floating IP network to use Creation of Cinder volumes (one per agent) Creation of Agent VMs Attach volumes to agents Boot Ubuntu 14.04 stock distribution Copy FIO to agents
Overview of Test Flow Initial Volume Fill Run Test Execute sequential write to fill 100% of each volume Volumes may be lazily created, therefore read tests prior to filling volume is invalid Transitional state - do not report these numbers Run Test Matrix of block size, queue depth and workload specifies Run workload up to 100% of volume size across all agents simultaneously Record intermediate data results of workload Examine past 5 test results for Steady State If reached, terminate workload and move on to next If 100% reached, record lack of Steady State and move on
What is Available Today? Ability to execute test across Workloads (Read, Write, Mix) and access (random, sequential) Queue depths Block sizes (note: Ceph is limited to 16K max block size) Report on average latencies 3D plot for matrix JSON for external reporting Built in Graphite browser Custom reporting of all raw data Statistics gathered once every 60 seconds
Existing Reports OPNFV Test DB Text results (http://storperf:5000/api/v1.0/job?id=NNNN) "rw.queue-depth.1.block-size.16384.duration": 1841, "rw.queue-depth.1.block-size.16384.read.iops": 51.52010714285715, "rw.queue-depth.1.block-size.16384.read.latency": 1505.8304166666665, "rw.queue-depth.1.block-size.16384.read.throughput": 823.8154761904763, Graph results (http://storperf:5000/results/NNNN)
Where to From Here? Linux Foundation Intern Project StorPerf is helping to define framework for LF Interns Tim Rault at CENGN paving the way Implementing steady state analysis Generating interest from other students Further adoption of SNIA reports Steady state convergence plot Steady state verification plot Per measurement value plots
Steady State Detection Currently: StorPerf reports on average across the entire run Transitional state may be included in final result at this time Target: Report only once steady state has been reached SNIA has guidelines on what determines steady state Statistically Valid Warm up, volume fill, etc. Statistically invalid
Steady State Report Steady State Convergence Report When 5 samples are found that fit the definition of steady state, plot the proof of convergence Max / Min values for the series Average values Calculated slope of the series
More Reports Measurement Tabular Report 2D Measurement Plot Read / Write Mix Block Sizes IOPS, Throughput or latency 2D Measurement Plot IOPS, Throughput or latency
In Summary StorPerf … what’s in your storage subsystem? “Standing on the shoulders”... A common theme! Based on SNIA, using accepted methodologies Characterization of Virtual vs. Physical Testing from inside the OpenStack infrastructure Still more work ahead Refinement of reporting, statistically relevant data points Yardstick integration and OPNFV test database dashboards Paying it forward Student and Intern friendly Open and welcome for new contributors these tests are marginally helpful to Pod testing today, but should be of significant value to standing up larger data centers and applying tests such as defined by Bottlenecks, or just run standalone in any larger scale lab or pre-production deployment
More Info Bi-weekly meetings Wiki: Mailing list tag [StorPerf] http://wiki.opnfv.org/display/StorPerf Developer guide Installation guide Performance test execution guide SNIA References and targets Mailing list tag [StorPerf] IRC Channel: #opnfv-storperf Bi-weekly meetings Wednesdays at 7:00 Pacific. Next meeting July 6
StorPerf - Cinder Storage Performance Measurement Mark Beierl, EMC