Presentation is loading. Please wait.

Presentation is loading. Please wait.

Applying Control Theory to Stream Processing Systems

Similar presentations


Presentation on theme: "Applying Control Theory to Stream Processing Systems"— Presentation transcript:

1 Applying Control Theory to Stream Processing Systems
Wei Xu Bill Kramer Peter Bodik

2 Just for fun Jan. 7, 2005 VOD system on an AirBus 330…
“We need 20 minutes to reboot the system”

3 Outline System log as data streams Applying control theory
Accurate data source Controlling queue length Lessons learned

4 Introduction Goal of our project Problem: data rate up to 1 TB a day
A flexible and scalable architecture for system log processing Explore general techniques of applying control theory to systems Problem: data rate up to 1 TB a day data are very complex main goal of our project is to analyze log data of online services such as Amazon or eBay. these systems are very complex and they often fail. ... however, due to very high data rate and complexity of the logs, we had problems processing the data

5 Example of system log data
request data Apache log, etc performance data CPU, mem etc. failure data Detected problems /error messages reports from operators 450 attributes, 11,000 requests a second

6 Preprocessing Sanitize the data Put logs into common format
Remove/encrypt sensitive information before the data get into permanent storage Sanitize in different levels Put logs into common format Merge information from various sources Sampling all the required preprocessing should be done outside the algorithm ... As new data streams -> as new input

7 ?  The big picture raw log data Failure Detection Data Collection
Automatic analysis Production System feedback loop preprocessing Sanitized Data Repository add “AOL” box in front of the orange arrows. add a feedback loop back into AOL. how would be this used in real life. it’s in critical path of failure recovery. speed of “data analysis” is critical for recovery. also, speed of preprocessing is critical ... also, how do we evaluate this framework? how much delay do we introduce? what happens if a node in the preprocessing step fails? can we handle that? put data sanitizing functions into TCQ!!

8 Early Experiences Ad-hoc Scripts Relational databases Tedious
Hard to change Relational databases Static schema Hard to support temporal queries Have to store all the data Efficient query evaluation requires prior knowledge of the data..

9 Stream processing –  Telegraph Continuous Query (TCQ)
system log data are data streams preprocessing is a continuous query Telegraph Continuous Query (TCQ) data stream processing engine SQL queries sliding time window adaptive: execution optimized on-the-fly performance doesn’t depend on #queries SLT query Q We think that stream processing is a good data model for system log data.

10 Data preprocessing architecture
load splitter combiner 4 1 TCQ query Q 4 1 5 2 6 3 SLT 1 6 5 4 3 2 1 6 5 4 3 2 1 5 2 6 5 4 3 2 1 6+5+4 3+2+1 TCQ query R SLT 2 “one machine running TCQ can’t handle 1 TB of data a day, so we need to distribute the processing. at the same time, we also want to extract temporal information from the data and thus we need to process the data in sequence. these contradicting goals ...” can be easily distributed over a cluster of machines linear scaling performance of a TCQ node depends on the data rate, not on the number of queries running => can generate many streams can be extended/reorganized in any way why (at least) two tiers? sampling should be the first thing to do in the pipeline to reduce the data rate (that’s why we need parallelism) how to support off-line algorithms? 6 3

11 Outline System log as data streams Applying control theory
Accurate data source Controlling queue length Lessons learned

12 Why do we need control? Data source does not provide accurate data rate

13 Control Problems Not accurate for various reasons
Scheduling Time spent on I/O Etc. Providing an accurate data source using feedback control By controlling the input of “desired rate”

14 The Control Architecture
1500 1900 1600 P Controller (with precompensation) u(k)=Kp*e(k) PI Controller U(k)=u(k-1)+(Kp+KI)e(k)-Kpe(k-1)

15 Result – An accurate data source
P Controller with Pre-compensation PI Controller

16 Zoom In A lot of small disturbance in a Java program
Incremental garbage collection P Controller PI Controller

17 Outline System log as data streams Applying control theory
Accurate data source Controlling queue length Lessons learned

18 Problem: performance disturbance
Significant network traffic Memory Leak System Process Interference Packets dropped during transferring stream Other failures Also, performance of a node depends on SELECTIVITY of relational operator Depends on input data

19 Description of the system
TCQ Complex internal structure TCQ drops tuples silently if result queue is full Controlled Data Source Input Buffer

20 Why do we need control? TCQ node drops tuples when result queue fill up Source Buffer TCQ Result Q

21 Control Problems Regulate queue length on TCQ node
By controlling buffer output rate Prevent dropping tuples Maximize throughput Tolerate disturbance

22 System with Control Controlled Output Rate Data Source Controller
Queue Length Monitor

23 Controller U(k)=u(k-1)+(Kp+KI)e(k)-Kpe(k-1)

24 Result – regulating queue length
Source Buffer TCQ Result Q

25 Result – Under CPU Contention
Source Buffer TCQ Result Q

26 Outline System log as data streams Applying control theory
Accurate data source Controlling queue length Lessons learned

27 Why theory is useful? One of my implementations .. What happened?
Source Buffer TCQ Result Q

28 Output Thread (Code Reuse)
What is going on? Controlled Output Thread (Code Reuse) Desired Queue length Queue Length Controller Data Rate to TCQ Actual Queue Length

29 Theory meets reality Output Y from simulation Queue length Time

30 Conclusion Advantages of feedback control
Make system more robust under disturbance Allows more time for failure detection Treat complex systems as black boxes Cope with the system characteristics instead of having to change it Theoretical analysis Implementation is easy System statistics can also be used for SLT

31 Future Work Load balancer Load control across multiple tiers
Scheduling of multiple streams

32 Backup Slides

33 Tricky part of parameter estimation
Model evaluation – Making the system operate in desired range Data rate vs free space Free Space Non-Linear range Easy for data source, but queue length ..


Download ppt "Applying Control Theory to Stream Processing Systems"

Similar presentations


Ads by Google