Using OpenStack to Measure OpenStack Cinder Performance StorPerf Using OpenStack to Measure OpenStack Cinder Performance
Goals of the Project Tests: OpenStack storage back end using different Cinder drivers Measures: latency, throughput, IOPs User-specifies: workloads, block sizes, I/O loads Provides: a report of performance metrics
Architecture Master Docker container to control test execution Leverages OpenStack heat to create: Networks, security groups and floating IPs Volumes in Cinder VMs with volumes attached ReST API (with SwaggerUI) for control and reports
Metrics Collection Uses Carbon with Whisper DB to store disk performance samples Uses Graphite for averaging and summation of metrics Graphite can be exposed for ad hoc queries
Measuring Storage Performance Considerations: How many VMs to run at the same time? Workloads – sequential or random; read, write or a mix of both? Data block size? I/O load per VM?
Test Overview Phases: Creation of heat stack Initialization of Cinder volumes Execution of one or more performance runs Deletion of heat stack
Creation of Heat Stack POST to /configurations API Number of agent VMs and Volumes to create Size of Volumes Name of public network (for floating IPs and router) Name of Glance image to use for booting agent VM
Initialization of Cinder Volumes Optional, but recommended Pre load 100% of the Volume with random data Read performance can be elevated when no data present Allocation of data blocks can hamper write performance
Execution of Performance Run Parameters Workload (read, write, sequential, random, mix) Block size Queue depths Number of samples required to determine steady state Maximum run duration Copies FIO to agent VM
Why not dd? Limited Can only support sequential I/O workloads Does not support more than I/O load of 1 Hard to generate random data
Flexible I/O Tester: FIO Widely used and recognized. Supports our needs Support for different I/O workloads Support for block sizes Support for varying I/O loads Support for random data without CPU penalty Support for periodic metrics reporting
Examples from FIO ./fio --direct=1 --bs=16384 --ioengine=libaio --size=1G --rw=randwrite --name=test --iodepth=4 --filename=/dev/vdb --numjobs=1 Run 1: WRITE: io=1024.0MB, aggrb=1728KB/s Run 2: WRITE: io=1024.0MB, aggrb=2006KB/s Run 3: WRITE: io=1024.0MB, aggrb=2049KB/s
Why Such Variance? Ceph has overhead when allocating new blocks Different backends may behave differently Do not know underlying disk type or topology Are these new SSDs? Higher performance until wear-levelling kicks in, etc
When is the Data Valid? Do I need to pre-fill the target with random data? When is Ceph done allocating blocks? Should I run test N times and take average? Hasn’t someone figured this out yet?
Avoiding Transitional Performance https://www.snia.org Statistically Valid Ceph performs ‘lazy’ allocation of blocks Write performance increases once volume is fully populated Do not want to count warm up data in final result Warm up, volume fill, etc. Statistically invalid
SNIA – Performance Test Specification Test the disk over and over again Note the performance values Once the values start to “flat line” and stays that way, it’s good! Max Slope Min
During a Test Run FIO is executed on each target VM Every 60 seconds full FIO statistics are collected and stored in Carbon History of past 10 stats for Latency, IOPs and Bandwith examined: Variance (max-min) within 20% of average Slope of values less than 10% of average Run terminates automatically when steady state detected
Deletion of Heat Stack Clean up after run DELETE to /configurations API requests deletion of stack Nothing left behind – volumes are deleted
Not Just Cinder StorPerf can profile ephemeral storage Target to profile can be: /dev/vdb for Cinder volume /home/storperf/storperf.dat for Nova ephemeral Uses the guest OS filesystem vs. direct block I/O
What’s Coming Next Intern projects Focus on results processing and reporting Broaden support for filesystem profiling Deeper integration with other testing projects
Container Decomposition Monolithic container: Graphite, Carbon, SwaggerUI Move towards lighter base: Alpine Static FIO to use with different slave VMs
Steady State Graphs New docker container for graphs and reporting Use existing web server as base New front end for all web services Authentication if desired Pass through to Swagger, existing Flask ReST
StorPerf Wiki: https://wiki.opnfv.org/display/storperf Questions? StorPerf Wiki: https://wiki.opnfv.org/display/storperf