Presentation is loading. Please wait.

Presentation is loading. Please wait.

Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management Author: Raul Castro Fernandez, Matteo Migliavacca, et al.

Similar presentations


Presentation on theme: "Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management Author: Raul Castro Fernandez, Matteo Migliavacca, et al."— Presentation transcript:

1 Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management Author: Raul Castro Fernandez, Matteo Migliavacca, et al. Published conference: SIGMOD’13 Reporter: Ma Yuanwen

2 Introduction Stream – A sequence of tuples – Sensor network, stock trading system Query plan – A query is specified as a directed acyclic graph Distributed stream processing – A query is deployed on a set of nodes State management – Checkpoint, back up, restore, partition Scale out – Split the instance, when the instance is overload Fault tolerance – Recover from failures without affecting processing results

3 Outline Background – Problem Statement – System Model State Management – Query state – State Operations Scale out and fault tolerance – Fault tolerance scale out algorithm – System architecture – Bottleneck detection and scaling policy Evaluation Conclusions

4 Problem Statement Operators – Stateless operators (e.g. filter or map) and stateful operators(e.g. join or aggregation ) – Sliding window based state – Entire history based state Intra-query parallelism – Query graph and execution graph Fault tolerance – Passive standby strategy – Active standby strategy – Upstream backup strategy Report the words frequencies in the recent 1 hour about every 10 minutes

5 System model(1) nameAgesex Li Lei16male Han Mei15female Jim17male nameagesex Li Lei16male nameagesex Han Mei15female nameagesex Jim17male

6 System model(2)

7 Query state (1)

8 Query state (2)

9 State operations Operator state backup and restore – Checkpoint the state of an operator and backup the state to an upstream operator – Restore state for failure and scale out Operator state partitioning – When a stateful operator scales out, it’s processing state must be split across the new partitioned operators

10 Operator state backup and restore

11

12 Operator state partitioning

13

14 Scale out and Fault Tolerance Scale out – SPS partitions operator on-demand in response to bottleneck operators Fault Tolerance – If a node hosting an operator fails, the SPS must replace it with an operator on a new node Operator recover becomes special case of scale out, in which a failed Operator is scale out to a parallelization of 1

15 Fault-tolerant scale out algorithm

16 System architecture Query manager – Perform a mapping of query operators to nodes and maintain the execution graph Deployment manager – use the execution graph to initialize nodes, deploy operators, set up stream communication and start processing

17 Bottleneck detection and scaling policy

18 Goals and deployment of evaluation The goals of experimental evaluation are to investigate – The effectiveness of stateful operator scale out approach – The recovery time of the stateful recovery mechanism – The impact of state management approach on tuple processing latency Experiment deployment

19 Experiment data Linear road benchmark (LRB) – It models a road toll network – Queries: (1) Provide toll notifications to vehicles within 5s; (2) detect accidents within 5s; (3) answer balance account queries about paid toll amounts – The input rate for a single express-way (L=1) begins at 15 tuples/s and increase to 1700 tuples/s Wikipedia – A map/reduce-style top-k query That outputs every 30 seconds the ranking of the most visited Wikipedia language versions based on Wikipedia data traces

20 Dynamic scale out (1)

21 Dynamic scale out (2)

22 Failure recovery Word count

23 State management overhead

24 Conclusions Provide state management of stateful operators – Checkpoint, back up, restore, partition Present an integrated approach for scale out and failure recovery

25


Download ppt "Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management Author: Raul Castro Fernandez, Matteo Migliavacca, et al."

Similar presentations


Ads by Google