Backlog Estimation and Management for Real-Time Data Services Kyoung-Don Kang, Jisu Oh, and Yan Zhou Department of Computer Science State University of New York at Binghamton 20 th Euromicro Conference on Real-Time Systems (ECRTS ’08), July 2~4, 2008
2 Real-time data service is challenging Real-time database (RTDB) applications e-commerce, traffic monitoring Requirements Processing user requests in a timely manner Maintaining the freshness of temporal data Challenges Dynamic database workloads due to data/resource contention Difficult to precisely analyze the amount of data the database will need to process Conflicts between the transaction timeliness and data freshness
Feedback Control A promising approach to manage the performance of RTDBs In the presence of dynamic workloads Required procedure System modeling (SYSID) – using a linear relationship Controller design (ex, P, PI, PID) Controller tuning (Root Locus tool in Matlab) to support performance requirements and closed-loop system stability requirements Actuator design
Shortcomings of existing work System modeling problem An inaccurate metric for measuring data service workload: the queue length vs. service delay Coarse-grain approach When the amount of data accessed by each request varies, cannot correctly measure the workload Mostly based on simulations due to a lack of real-time database testbeds Reveal limitations in modelling real system behaviours
5 Our Approach Goal: Supporting the target service delay bound and enhancing the data service throughput Key technologies 1) A database backlog estimation mechanism 2) Modeling of the system dynamics based on the relation between the estimated backlog and service delay 3) A systematic backlog adaptation via a fine-grained feedback-based admission control based on the backlog model 4) Workload smoothing using a hint based scheme
6 Implemented and evaluated in a real database system Chronos a soft real-time database testbed built on top of Berkeley DB processes thousands of client’s data service requests for stock quotes and stock trading periodically updates 3,000 stock prices for data freshness management critical for real-time data services
7 RTDB Architecture Commit User Requests Ready Queue … Database Dispatch User Request Service Threads … … Stock Update Threads Stock Price Update Server … Backlog Estimator Metadata Feedback Controller Perf. Monitor Service Delay s(k) Backlog Adjustment δd(k) Target Service Delay AC & TS d t (k) = d t (k-1) + δd(k)
8 Backlog The amount of data for the database to process For backlog estimation, Use database schema and semantics of queries and transactions Maintain metadata System statistics (# rows per table, row sizes, etc.) Materialized portfolio information (# stock items each client owns) About 1.2% CPU utilization overhead
9 Backlog estimation: example (1) “view-stock” query A user query about stock prices for a set of companies ex): “view-stock, IBM, DELL, Microsoft” Chronos accesses two tables to answer this query STOCKS table to read a set of companies’ information QUOTES table to get the associated stock prices Backlog for this query is n c ∙ {r( STOCKS ) + r( QUOTES )} Where n c is the number of companies in a query r(x) is the average size of a row in table x
10 Backlog estimation: example (2) “view-portfolio” transaction A client transaction about stock information in his/her portfolio ex) “view-portfolio, client-id 150” Chronos accesses two tables to answer this query PORTFOLIOS table to look up a set of company stock IDs of the stocks owned by the client QUOTES table to get associated stock prices Backlog for this query is |portfolio(id)| ∙ {r( PORTFOLIOS ) + r( QUOTES )} Where |portfolio(id)| is the number of stock items in the portfolio owned by the client whose ID is id
11 The database backlog at time t d(t) : the database backlog at time t n i : the amount of data to be processed by a request T i in the ready queue q(t) : the number of the requests in the ready queue at time t q(t)q(t)
12 The relation between d atabase backlog d(k) and service delay s(k) A fine-grained modeling Because it can closely capture system dynamics when the transaction size varies Derive a model using the ARX (Auto Regressive eXternal) model Apply the Recursive Least Square (RLS) method RTDB Model
13 System Identification (SYSID) System settings No feedback control, admission control, or traffic smoothing Number of clients: 1200 Inter-request time: [0.5s, 2.5s] Experimental duration: 3600 seconds At every 10 seconds, the recursive least square estimator predicts the response time
14 System Identification (cont.) Use the square roots of the performance data to reduce the impact of burstiness Model evaluation metric: If R2 > 0.8, a model is acceptable Choose the second order model R2: 0.86 (1 st << 2 nd ≈ 3 rd ≈ 4 th )
15 Target Performance Based on a fine-grained RTDB model using the database backlog Target Performance Target service delay bound ( S t ) : 2s E*Trade ( Service Delay Overshoot ( S v ) : 2.5s Settling Time ( T v ) : 10s
16 Design of Feedback Controller and Backlog Adaptation The closed-loop for data service delay is Design a PI controller using Root Locus method in Matlab Controller input: e(k) = S t – s(k) Controller output δd(k) : the backlog adjustment The closed-loop poles: , 0.52 ±0.106i K P = 2.77, K I = 5.28 New target backlog d t (k) = d t (k-1) + δd(k)
17 Admission Control n new : the amount of data to process required by a user request T new n i : the amount of data to be processed by a request T i in the ready queue q(t) : the number of the requests in the ready queue at time t User request T new A transaction T new arriving at time t during the (k+1) th sampling period is admitted if d(t) + n new ≤ d t (k) where d t (k) is the desired backlog q(t)q(t) d(t) + n new ≤ d t (k) Yes No
18 Load Smoothing Purpose Reducing the burstiness of workload Key idea Use only under overload Each client voluntarily delays the submission of the request by an additional period of time Increase the chances of a client's request being admitted
19 Load Smoothing (cont.) Mechanism Chronos provides the rejection rate p(k) as the current server status Clients delay their request submission in a predefined range [t a, t b ] with probability p(k) in addition to the arbitrary inter-request time between [t 1, t 2 ] Benefit Enhances the stability of the system Distributed traffic smoothing reduced overhead
20 Load Smoothing: example When inter-request time is in a range of [t 1, t 2 ] = [0.1s, 0.5s], a client submits = 3.3 requests/s If p(k) > 0 ( e.g., 0.5) and a predefined extra delay is in a range of [t a, t b ] = [0.1s, 0.3s], a client submits = 2.5 requests/s The total arrival rate is expected to be reduced by
21 Experimental Environments Chronos server Dell laptop 1.66 GHz dual core 1GB of RAM Linux kernel Clients Dell Desktop 3GHz CPU 4GB of RAM Linux kernel 1 Gbps Ethernet switch Stock Price Update Server Dell Desktop 3GHz CPU / 2GB of RAM Linux kernel
22 Workload Settings 1200 clients 60% Queries (view-stock) 40% Transactions (view-portfolio, purchase, sale) Chronos server 3000 stock prices 0 5m 10m 15m Inter-request time [3.5s~4.0s] [0.1s~0.5s] To generate bursty workload increase workload by 7~40 times
23 Tested Approaches Open Pure Berkeley DB AC Ad-hoc admission control FC-Q Feedback control – queue length vs. delay model FC-D Feedback control – database backlog vs. delay model FC-TS Feedback control of database backlog & traffic smoothing
24 Performance Metrics Performance metrics Data service delay Data service throughput: average number of the data processed by the committed transactions and queries Total Timely (data processed by the timely transactions completed within S t ) Each experimental run is 15 minutes long take the average of 10 runs with 90% confidence interval
25 Average Service Delay FC-D 1.52±0.07s By adapting database backlog systematically FC-TS 1.24±0.03s By further reducing the workload burstiness Baselines Fail to support 2s Target service delay
26 Transient Service Delay FC-D & FC-TS Outperform the baselines Overshoots decay in less than three sampling periods, satisfying T v < 10s
27 Data Service Throughput FC-D - Process more than 18,680,000 data - Around 86% data were processed within the desired delay bound FC-TS - Process around 35% more data than FC-D by further smoothing incoming workload Baselines - Show similar throughput The average number of the data processed by the committed transactions, which is normalized to the corresponding number for FC-D.
28 Conclusions To enhance RTDB performance without degrading the data freshness, Predict the database backlog as the amount of data for the database to process, using the meta data extracted from the database schema and transaction semantics Adjust the database backlog via a fine-grained closed loop admission control based on the backlog model to support the desired service delay Reduce the burstiness of incoming data service requests via hint-based incoming load smoothing
Questions?