Presentation is loading. Please wait.

Presentation is loading. Please wait.

Modeling Stream Processing Applications for Dependability Evaluation

Similar presentations


Presentation on theme: "Modeling Stream Processing Applications for Dependability Evaluation"— Presentation transcript:

1 Modeling Stream Processing Applications for Dependability Evaluation
Gabriela Jacques-Silva†‡, Zbigniew Kalbarczyk†, Bugra Gedik‡, Henrique Andrade‡, Kun-Lung Wu‡, Ravishankar K. Iyer† †University of Illinois at Urbana-Champaign ‡IBM Research – T. J. Watson Research Center

2 Outline Streaming applications Modeling a streaming application
Stream operator, stream connections and tuples Representation of faults and error propagation Extending model to include fault tolerance techniques Evaluation

3 Extract knowledge from live data streams on-the-fly.
Percentage of positive feedback stream operators 9.57% 5.42% 3.16% 2.52% 1.28% tuples 2

4 Different approaches to fault tolerance have different resource consumption and performance impact.
Some techniques aims at providing no data loss an no data duplication guarantees Percentage of positive feedback 9.57% 5.42% 3.16% 2.52% 1.28%

5 Different approaches to fault tolerance have different resource consumption and performance impact.
partial fault tolerance Decreases time to achieve stable output as compared to no recovery Achieves approximate results, which are tolerable by some streaming applications Percentage of positive feedback 9.32% 5.11% 2.84% 2.27% 1.09% 4

6 An evaluation framework helps to understand the relative merits of different techniques.
Previous approaches focus on performance evaluation Fault injection may be expensive, mainly when evaluating the application under different setups and parameters Checkpoint Partial graph replication

7 Summary of contributions
Modeling framework for evaluating streaming applications under faults that lead to data loss and data corruption Considers consequences of error propagation Based on generic models specified via Stochastic Activity Networks (SAN) Abstractions for stream operators, stream connections, and tuples Modeled three fault tolerance techniques Checkpointing, partial replication, and full replication

8 Modeling framework uses Stochastic Activity Network formalism.
SANs can express the non-deterministic behavior and parallel execution of streaming application Nomenclature Place  container for a natural number Activity  transition between places Token  item in a place Input gate  enforce condition to activity Output gate  executes function after activity

9 Framework is based on the abstraction of three key components of a SPA.
Stream operator  state transition model Captures arity, selectivity and processing time IG1 F1 Waiting for input int < 9

10 Framework is based on the abstraction of three key components of a SPA.
Stream operator  state transition model Captures arity, selectivity and processing time Processing tuple IG1 F1 Waiting for input int < 9

11 Framework is based on the abstraction of three key components of a SPA.
Stream operator  state transition model Captures arity, selectivity and processing time Processing tuple IG1 F1 Waiting for input int < 9

12 Framework is based on the abstraction of three key components of a SPA.
Stream operator  state transition model Captures arity, selectivity and processing time input stream connections Processing tuple IG1 F1 Waiting for input int < 9

13 Framework is based on the abstraction of three key components of a SPA.
Stream operator  state transition model Captures arity, selectivity and processing time input stream connections Processing tuple IG1 F1 Waiting for input int < 9 Sending output OG1 output buffer

14 Framework is based on the abstraction of three key components of a SPA.
Stream operator  state transition model Captures arity, selectivity and processing time input stream connections Processing tuple IG1 F1 Waiting for input int < 9 output stream connections Sending output OG1 OG2 output buffer

15 Framework is based on the abstraction of three key components of a SPA.
Stream connections  state sharing between output and input streams

16 Framework is based on the abstraction of three key components of a SPA.
Stream connections  state sharing between output and input streams

17 Framework is based on the abstraction of three key components of a SPA.
Tuples  tokens flying through input and output streams Representation of tuple sizes, but no attribute values

18 Stream operator failure model considers crashes and SDCs.
Crash  data loss for partial fault tolerance techniques 9.32% 5.11% 2.84% 2.27% 1.09%

19 Stream operator failure model considers crashes and SDCs.
Crash  data loss for partial fault tolerance techniques Silent data corruption  corruption of attribute values 9.53% 5.42% 3.14% 2.52% 1.28%

20 Base model is augmented to represent error propagation.
Once a failure occurs, operators may generate inaccurate data Represented via tainted tuples and tainted stream connections input stream connection Processing tuple Waiting for input output stream connection Sending output

21 Base model is augmented to represent error propagation.
Once a fault occurs, operators may generate inaccurate data Represented via tainted tuples and tainted stream connections input stream connection Processing tainted tuple tainted input stream connection Processing tuple Waiting for input output stream connection tainted output stream connection Sending output is tainted

22 Stateless operators do not generate tainted tuples after crash and restore.
No crash Crash 10 5 6 3 10 5 F1 F1 X X int < 9 6 3 int < 9 Once operator recovers, the data is accurate

23 Stateful operators generate tainted tuples after crash and restore.
No crash Crash – after restore 1 2 8 7 6 5 10 6 9 8 16 10 F1 2 F1 3 3 4 X X 6 5 4 7 After recovery, operator produces tainted tuples until its internal state refreshes

24 Stateful operators generate tainted data upon crash of any operator in the upstream set.
No crash Change in internal state 1 7 6 5 4 3 6 5 10 6 F1 F1 2 3 int < 9 4

25 Stateful operators generate tainted data upon crash of any operator in the upstream set.
Internal state is unchanged 2 9 8 7 9 8 16 10 F1 F1 3 4 X X 6 5 int < 9 7 After crashed operator recovers, operator produces tainted tuples until its internal state refreshes

26 Checkpoint of Operator State
Model is parameterized to capture how long it takes to produce good results after a failure No crash Crash – after restore 1 2 8 7 6 5 10 6 9 8 16 10 F1 2 F1 3 3 4 X X 6 5 4 7 G. Jacques-Silva et al. “Language Level Checkpointing Support for Stream Processing Applications”. DSN 2009.

27 Partial Graph Replication
Replicated operators and stream connections on composed application model Extra logic in replicated operators to perform replica failover active op1,A op1,B backup failover deactivate G. Jacques-Silva et al. “Language Level Checkpointing Support for Stream Processing Applications”. DSN 2009.

28 Full graph replication
Extra logic for operators to perform de-duplication on tuples coming from redundant streams Aims at no tuple loss and non duplicate delivery J.-H. Hwang et al. “Fast and highly-available stream processing over wide area networks”. ICDE 2008.

29 Checkpoint vs. Partial Replication Under Crashes
Target  Bargain Discovery Stateless - source, sink, 4 filters Stateful – aggregate and join Operator MTTF - 30, 50, 70 and 90 min Model parameters taken from application executing in IBM System S Checkpoint Partial replication + Checkpoint f(x) 1 2 f(x)2 f(x)1 f(x)

30 Evaluation Metrics Availability Total number of tainted tuples
All operators are alive and are not producing tainted data Total number of tainted tuples Total number of tainted tuples stored by the sink operator Percentage of tainted tuples Fraction of tainted tuples stored by the sink over total number of tuples produced by the golden run

31 Partial replication provides better availability than checkpoint.

32 Partial replication produces less tainted tuples than checkpoint.

33 Impact of SDC on Full Replication Technique
Target  Bargain Discovery Operator MTTF – 120 min 1 2 f(x)2 f(x)1 1 2

34 Impact of SDC on application availability is small.
120 min

35 Percentage of tainted tuples is small when compared to golden run.
120 min

36 SDC breaks non-duplication guarantee promised by full replication technique.
tainted tuples + non-tainted tuples > non-tainted tuples of golden run + confidence interval 120 min

37 Summary Modeling framework to evaluate the dependability provided by different techniques Assemble applications by composing stream operators, stream connections and tuples Demonstrated framework with three fault tolerance techniques Validation by comparing results with real fault injections and application executing in IBM System S Future Automatic model composition based on application source code and physical deployment

38 Modeling Stream Processing Applications for Dependability Evaluation
Gabriela Jacques-Silva†‡, Zbigniew Kalbarczyk†, Bugra Gedik‡, Henrique Andrade‡, Kun-Lung Wu‡, Ravishankar K. Iyer† †University of Illinois at Urbana-Champaign ‡IBM Research – T. J. Watson Research Center


Download ppt "Modeling Stream Processing Applications for Dependability Evaluation"

Similar presentations


Ads by Google