Download presentation
Presentation is loading. Please wait.
1
Scalable Distributed Stream Processing Presented by Ming Jiang
2
Centralized stream processing review
3
Situation when distributed A distributed federation of participating nodes in different administrative domains Collaboration between different domains required
4
Two complementary efforts for the situation Aurora* intra-participant distribution Medusa inter-participant distribution
5
Three pieces to be shard Aurora An overlay network of communication Algorithms for high-availability
6
Three architectural issues Communications Load sharing High availability in the presence of failure
7
Communications Naming (participants, entity-name) Routing 1. a data source or an administrator registers a schema and a stream 2. When DS produce an event, labels
8
Communications Message Transport multiplexing all the message streams on a single TCP connection Remote definition: process migration is too complicated
9
Load Management Repartitioning Aurora Networks, based on loads and resources: Box Sliding Box Splitting
10
Box Sliding Takes a box on the edge of a sub- network on one machine and shifts it to its neighbor. upstream box sliding
11
Box Splitting Create a copy of a box that is intended to run on second machine, to offload Need a filter as router
12
Box splitting Tumble Merge: Box splitting has to be transparent
13
Box splitting If predicate in filter is: B<3 A machine: 1,2,3,4,7B machine: 5,6 A machine B machine final result after merge
14
Key partitioning Challenges Choosing what to offload Choosing what to split Choosing filters Others…
15
High Availability Utilize the push-based nature
16
Failure detection and Recovery 1. periodically send heartbeat msgs to upstream neighbors 2. if any server does not reply for pre-defined time, we assume it failed 3. initiate recovery phase, emulating the process of failed server (load shedding can be used)
17
Thank you!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.