Download presentation
Presentation is loading. Please wait.
Published byFerdinand O’Connor’ Modified over 6 years ago
1
Applying Control Theory to Stream Processing Systems
Wei Xu Bill Kramer Peter Bodik
2
Just for fun Jan. 7, 2005 VOD system on an AirBus 330…
“We need 20 minutes to reboot the system”
3
Outline System log as data streams Applying control theory
Accurate data source Controlling queue length Lessons learned
4
Introduction Goal of our project Problem: data rate up to 1 TB a day
A flexible and scalable architecture for system log processing Explore general techniques of applying control theory to systems Problem: data rate up to 1 TB a day data are very complex main goal of our project is to analyze log data of online services such as Amazon or eBay. these systems are very complex and they often fail. ... however, due to very high data rate and complexity of the logs, we had problems processing the data
5
Example of system log data
request data Apache log, etc performance data CPU, mem etc. failure data Detected problems /error messages reports from operators 450 attributes, 11,000 requests a second
6
Preprocessing Sanitize the data Put logs into common format
Remove/encrypt sensitive information before the data get into permanent storage Sanitize in different levels Put logs into common format Merge information from various sources Sampling all the required preprocessing should be done outside the algorithm ... As new data streams -> as new input
7
? The big picture raw log data Failure Detection Data Collection
Automatic analysis Production System feedback loop preprocessing Sanitized Data Repository add “AOL” box in front of the orange arrows. add a feedback loop back into AOL. how would be this used in real life. it’s in critical path of failure recovery. speed of “data analysis” is critical for recovery. also, speed of preprocessing is critical ... also, how do we evaluate this framework? how much delay do we introduce? what happens if a node in the preprocessing step fails? can we handle that? put data sanitizing functions into TCQ!!
8
Early Experiences Ad-hoc Scripts Relational databases Tedious
Hard to change Relational databases Static schema Hard to support temporal queries Have to store all the data Efficient query evaluation requires prior knowledge of the data..
9
Stream processing – Telegraph Continuous Query (TCQ)
system log data are data streams preprocessing is a continuous query Telegraph Continuous Query (TCQ) data stream processing engine SQL queries sliding time window adaptive: execution optimized on-the-fly performance doesn’t depend on #queries SLT query Q We think that stream processing is a good data model for system log data.
10
Data preprocessing architecture
load splitter combiner 4 1 TCQ query Q 4 1 5 2 6 3 SLT 1 6 5 4 3 2 1 6 5 4 3 2 1 5 2 6 5 4 3 2 1 6+5+4 3+2+1 TCQ query R SLT 2 “one machine running TCQ can’t handle 1 TB of data a day, so we need to distribute the processing. at the same time, we also want to extract temporal information from the data and thus we need to process the data in sequence. these contradicting goals ...” can be easily distributed over a cluster of machines linear scaling performance of a TCQ node depends on the data rate, not on the number of queries running => can generate many streams can be extended/reorganized in any way why (at least) two tiers? sampling should be the first thing to do in the pipeline to reduce the data rate (that’s why we need parallelism) how to support off-line algorithms? 6 3
11
Outline System log as data streams Applying control theory
Accurate data source Controlling queue length Lessons learned
12
Why do we need control? Data source does not provide accurate data rate
13
Control Problems Not accurate for various reasons
Scheduling Time spent on I/O Etc. Providing an accurate data source using feedback control By controlling the input of “desired rate”
14
The Control Architecture
1500 1900 1600 P Controller (with precompensation) u(k)=Kp*e(k) PI Controller U(k)=u(k-1)+(Kp+KI)e(k)-Kpe(k-1)
15
Result – An accurate data source
P Controller with Pre-compensation PI Controller
16
Zoom In A lot of small disturbance in a Java program
Incremental garbage collection P Controller PI Controller
17
Outline System log as data streams Applying control theory
Accurate data source Controlling queue length Lessons learned
18
Problem: performance disturbance
Significant network traffic Memory Leak System Process Interference Packets dropped during transferring stream Other failures Also, performance of a node depends on SELECTIVITY of relational operator Depends on input data
19
Description of the system
TCQ Complex internal structure TCQ drops tuples silently if result queue is full Controlled Data Source Input Buffer
20
Why do we need control? TCQ node drops tuples when result queue fill up Source Buffer TCQ Result Q
21
Control Problems Regulate queue length on TCQ node
By controlling buffer output rate Prevent dropping tuples Maximize throughput Tolerate disturbance
22
System with Control Controlled Output Rate Data Source Controller
Queue Length Monitor
23
Controller U(k)=u(k-1)+(Kp+KI)e(k)-Kpe(k-1)
24
Result – regulating queue length
Source Buffer TCQ Result Q
25
Result – Under CPU Contention
Source Buffer TCQ Result Q
26
Outline System log as data streams Applying control theory
Accurate data source Controlling queue length Lessons learned
27
Why theory is useful? One of my implementations .. What happened?
Source Buffer TCQ Result Q
28
Output Thread (Code Reuse)
What is going on? Controlled Output Thread (Code Reuse) Desired Queue length Queue Length Controller Data Rate to TCQ Actual Queue Length
29
Theory meets reality Output Y from simulation Queue length Time
30
Conclusion Advantages of feedback control
Make system more robust under disturbance Allows more time for failure detection Treat complex systems as black boxes Cope with the system characteristics instead of having to change it Theoretical analysis Implementation is easy System statistics can also be used for SLT
31
Future Work Load balancer Load control across multiple tiers
Scheduling of multiple streams
32
Backup Slides
33
Tricky part of parameter estimation
Model evaluation – Making the system operate in desired range Data rate vs free space Free Space Non-Linear range Easy for data source, but queue length ..
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.