Sophia: Online Reconfiguration of Clustered NoSQL Databases for Time-Varying Workloads Ashraf Mahgoub1, Paul Wood2, Alexander Medoff1, Subrata.

Sophia: Online Reconfiguration of Clustered NoSQL Databases for Time-Varying Workloads
Ashraf Mahgoub1, Paul Wood2, Alexander Medoff1, Subrata Mitra3, Folker Meyer4, Somali Chaterji1, Saurabh Bagchi1 1: Purdue University; 2: Johns Hopkins University; 3: Adobe Research; 4: Argonne National Laboratory Supported by NIH R01 AI ,

Why Do Online Tuning of NoSQL Databases?
Database Management Systems (DBMS) have a plethora of performance-related parameters The exact setting of these parameters determines the DBMS performance The optimal setting is specific to the application Application characteristics change over time and a desirable configuration may become sub-optimal 1: For example, Cassandra has 56 performance tuning parameters and Redis has 46 such parameters. 3: For example, Cassandra out of the box is tuned well for write-heavy workload pattern, but the default configuration suffers mightily when the read percentage increases. 4: Many applications show different mixes of database operations depending on user requests or on the execution phase of the application. Call-out box: In this work, we are targeting the online tuning of clustered NoSQL databases in the face of changing workload characteristics within an application. The database server is distributed and is run as multiple server instances in a cluster. The data is typically replicated among the cluster instances. Our Target: Clustered NoSQL Databases (Examples: Cassandra, Redis, MongoDB, ScyllaDB)

Challenges of Online Tuning
Large configuration parameter search space. Complex interdependencies exist among the parameters A new workload pattern does not necessarily mean switch to new configuration Performance degradation during reconfiguration process New workload pattern may be shortlived Data availability must be maintained during reconfiguration Many parameters need server restart Staggered restart of servers needed through a distributed protocol to meet availability and consistency requirements 1.1: For example, Cassandra has two performance-determining parameters: Compaction Method (CM) and Concurrent Writes (CW). The first controls how multiple tables will be compacted over time, while the latter controls how many threads will service concurrent writes. Changing one parameter’s value results in changing the optimal values for the other parameter. 2: When the application workload characteristic changes, the critical question is should we switch the DBMS to the new optimal parameter setting. There is a transient drop in performance when the server is being reconfigured. The workload pattern may be too short-lived and the cost of the reconfiguration may therefore not be recouped. And we would have been better off staying with the earlier configuration. 3: When a DBMS server instance has to be reconfigured, for many parameter value changes, the server instance has to be shut down and then restarted. Therefore, temporarily the data on that instance becomes unavailable. If all the server instances or a large number without careful planning are shut down and restarted for the purpose of reconfiguration, parts of the data will become unavailable, typically a show stopper for the NoSQL database deployments. Further users specify a data consistency requirement that may also get violated through a naïve reconfiguration process.

Look Before You Leap Change
Cassandra DBMS, MG-RAST production trace # servers = 2, Replication Factor (RF) = 2 Consistency Level (CL) = 1 The default configuration can be switched to a read-optimized one for increase in throughput (40Kops/s  50Kops/s) Temporary throughput loss due to transient unavailability of server instances as they undergo reconfiguration, one instance at a time The dashed line gives the gain over time in terms of the total # operations served relative to the default configuration There is a cross-over point such that if the new workload pattern lasts greater than this threshold then it is worthwhile reconfiguring 2: I can reconfigure one server at a time to keep data available with the given consistency level (RF=2 and CL=1). 3.1: The cross-over point is represented by the red cross mark.

Technical Contributions: Sophia
We show that today’s state-of-the-art static tuners degrade throughput below the default configuration and degrade data availability Sophia performs cost-benefit analysis to achieve long-horizon optimized performance for clustered NoSQL DBMS in the face of dynamic workload changes Sophia executes a distributed protocol to gracefully switch over the cluster to the new configuration while meeting data consistency and availability guarantees 1: When applied to applications with dynamic workload characteristics as most applications are. 2: Sophia’s cost benefit analysis includes ML models to predict the future of the workload. It also uses ML models to estimate what the throughput would have been under a variety of configurations. It uses this to efficiently search through the space of possible configurations. Sophia can achieve this also for rather unpredictable and fast changes to the workload. 3: Our distributed protocol reconfigures the various server instances in rounds while maintaining the user-specified requirement for data consistency and keeping the data continuously available.

We use three application traces:
Evaluation We implement and evaluate Sophia on two NoSQL databases, Cassandra and Redis We use three application traces: MG-RAST: largest metagenomics portal hosted at Argonne Bus-tracking application trace, and Data analytics jobs as would be submitted to an HPC cluster We show improvements over Default configuration Static optimized Naïve reconfiguration Commercial auto-tuning NoSQL database (ScyllaDB) 3: (1) Default: The default configuration that ships with Cassandra. This configuration favors write-heavy workloads by design [48]. (2) Static Optimized: This baseline resembles the static tuner (RAFIKI) when queried to provide the one constant configuration that optimizes for the entire future workload. This is an impractically ideal solution since it is assumed here that the future workload is known perfectly. However, non-ideally no configuration changes are allowed dynamically. (3) Na¨ıve Reconfiguration: Here, when the workload changes, RAFIKI’s provided reconfiguration is always applied, instantiated by concurrently shutting down all server instances, changing their configuration parameters, and restarting all of them. Practically, this makes data unavailable and may not be tolerable in many deployments such as user-facing applications.

Talk and Poster Info Track: “Big-Data Programming Models & Frameworks”, July 10th (Wednesday), 2:20-3:40 pm, Track II Poster Session: July 10th (Wednesday), 6:00-7:30 pm

Sophia: Online Reconfiguration of Clustered NoSQL Databases for Time-Varying Workloads Ashraf Mahgoub1, Paul Wood2, Alexander Medoff1, Subrata.

Similar presentations

Presentation on theme: "Sophia: Online Reconfiguration of Clustered NoSQL Databases for Time-Varying Workloads Ashraf Mahgoub1, Paul Wood2, Alexander Medoff1, Subrata."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Sophia: Online Reconfiguration of Clustered NoSQL Databases for Time-Varying Workloads Ashraf Mahgoub1, Paul Wood2, Alexander Medoff1, Subrata.

Similar presentations

Presentation on theme: "Sophia: Online Reconfiguration of Clustered NoSQL Databases for Time-Varying Workloads Ashraf Mahgoub1, Paul Wood2, Alexander Medoff1, Subrata."— Presentation transcript:

Similar presentations

About project

Feedback