Presentation is loading. Please wait.

Presentation is loading. Please wait.

Consistent Regions: Guaranteed Tuple Processing in IBM Streams Gabriela Jacques da Silva, Fang Zheng, Daniel Debrunner, Kun-Lung Wu, Victor Dogaru, Eric.

Similar presentations


Presentation on theme: "Consistent Regions: Guaranteed Tuple Processing in IBM Streams Gabriela Jacques da Silva, Fang Zheng, Daniel Debrunner, Kun-Lung Wu, Victor Dogaru, Eric."— Presentation transcript:

1 Consistent Regions: Guaranteed Tuple Processing in IBM Streams Gabriela Jacques da Silva, Fang Zheng, Daniel Debrunner, Kun-Lung Wu, Victor Dogaru, Eric Johnson, Michael Spicer, Ahmet E. Sariyuce IBM T. J. Watson Research Center IBM Analytics Platform Sandia National Labs

2 IBM Streams is built from the ground up, has low latency and is flexible
processing element TCP Source Dashboard Enricher Aggregate Continuous FileReader Dedup File Sink Tuples are processed as they arrive Streams can be function calls, threaded queues, or TCP sockets Any logic is allowed (Java, C++, Python, and SPL) Operator logic can be non-deterministic Operators can be multi-threaded

3 The Streams Processing Language (SPL) was designed with reusability in mind
Reusable operator in standard relational toolkit stream<rstring name, uint32 age> Youngs = Filter(People) { param filter: age < 30u; } Code generation template void MY_OPERATOR::process(Tuple const & tuple, uint32_t port) { if (<%=$filterExpr%>) // tuple.getAge() < 30 submit(tuple, 0); } 20 supported toolkits with several operators each and many others from external developers

4 Consistent regions enable a developer to select a sub-graph to have guaranteed tuple processing via distributed snapshots TCP Source Dashboard Enricher Aggregate drain drain A,1 B,3 C,2 D,4, E,5 Directory Scan File Source Dedup File Sink No assumption on determinism and graph topology for correctness Exposing new APIs mapping to protocol stages allows adaptation of toolkit operators

5 On failure, committed snapshot is retrieved, and tuple flow is restarted from commit point
TCP Source Dashboard Enricher Aggregate A,1 B,3 C,2 D,4, E,5 F,6 G,7 H,8 A,1 B,3 C,2 D,4, E,5 Directory Scan File Source Dedup File Sink APIs to reset and reset to initial state

6 Consistent regions are exposed to SPL developers via the @consistent and the @autonomous annotations

7 When operators are in consistent regions, the compiler automatically generates code for the different stages stream<int32 id, int32 counter> TupleCounter = Custom(Input) { logic state: { mutable int32 count = 0; } onTuple Input: { count++; submit({id = Input.id, counter = count}, TupleCounter); }

8 Operator templates can directly implement the different stages of the protocol
MY_OPERATOR::MY_OPERATOR() : numTuples_(0) {} void MY_OPERATOR::process(Tuple const & tuple, uint32_t port) { AutoMutex am(mutex_); numTuples_++; OPort0Type otuple(tuple.getAttributeValue(0), numTuples_); submit(otuple, 0); } void MY_OPERATOR::drain() {} void MY_OPERATOR::checkpoint(Checkpoint & ckpt) { ckpt << numTuples_; void MY_OPERATOR::reset(Checkpoint & ckpt) { ckpt >> numTuples_; Extra APIs for multithreaded operators and for code generation stage

9 We have adapted over 70 existing operators so that developers can leverage consistent regions in legacy applications Operators with in-memory state State in serializable variables ✔ Operators with part of its state in libraries ✔ Library has limited support for serialization (e.g., Decompress) Operators with blocking calls when handling tuples ✘ Tuple handling method cannot finish without processing a new tuple (e.g., Gate) Operators with external state Ingest operators ✔ Non-replayable streams (e.g., TCPSource) Export operators ✔ Systems that writes cannot be retracted (e.g., UDPSink)

10 Performance impact of consistent regions depends on frequency and state size

11 Non-blocking checkpointing can further help reduce penalty on throughput
58% vs 94%

12 Lessons learned Requirement to keep API compatibility makes support for fault tolerance harder Support for legacy operators keeps stretching the limits of consistent regions Integration tests with fault injection at scale is hard

13 Consistent regions achieve guaranteed processing for streaming application subgraphs
First steps at achieving guaranteed processing in IBM Streams Partial fault-tolerance as a first class concept Support for non-deterministic applications, cyclic topologies, and multi-threaded operators Adaptation of over 70 toolkit operators Future work Release of non-blocking checkpointing Better support for large state checkpointing


Download ppt "Consistent Regions: Guaranteed Tuple Processing in IBM Streams Gabriela Jacques da Silva, Fang Zheng, Daniel Debrunner, Kun-Lung Wu, Victor Dogaru, Eric."

Similar presentations


Ads by Google