Download presentation
Presentation is loading. Please wait.
Published byBrice Stone Modified over 9 years ago
1
Viktor Prasanna,Yogesh Simmhan, Alok Kumbhare, Sreedhar Natarajan 04/20/2012
2
Workflow and Stream Processing have been used to for pipeline based applications D3 Science – Dynamic, Distributed, Data Intensive Applications Dynamism Data not being static and flowing continuously Data rates and size being changing depending on domain requirements (QoS requirements) Workflows have compositional characteristics but limit dynamism Stream Processing Systems provide real time processing but lack the compositional and data diversity support Map Reduce framework dynamism in data flow but severely lacks compositional flexibility An architecture which has the capability of providing Compositional capability and allows real time stream processing Provide map reduce based key value exchange Motivation 2
3
Data Flow Model Workflows follow Control Flow and data flow For continuous data, its difficult to define strict control flow Floe follows a Data Flow Model Allows for pipelined execution Dynamic Data Mapping Decide whether the output is sent to one output channel(Round Robin) Same Output is sent to every output channel Map Reduce framework wires al Mapper to Reducer Dynamically maps data to reducer at runtime Typed Output Channel Design Paradigms of Floe 3
4
Continuous Execution System should support continuous processing of data Along with batch processing which takes an input and run once Framework Should be able to pause and resume execution For Low latency applications resources are provisioned and workflow needs to be executed for next batch of input Decentralized Orchestration Centralized Workflow becomes a bottleneck when data flows between tasks which are distributed Decentralized orchestration is better suited, where each component is aware of subsequent component Input Connections, Output Connections etc.. Dynamism in Data Rates & Latency Needs Apart from dynamism in data flow, dynamism occur in data rates and data sizes QoS requirement of Application determines the execution rate by adding new resources at runtime Framework should be able to handle this. Design Paradigms of Floe Contd 4
5
Elastic Resources Cloud inherently provides dynamic provisioning of resources Resources needs to be provisioned ahead of time considering the latency involved in initialization Application should resilient to overcome the failures Dynamic Task Update Considering the continuous data flow execution −Pausing, Updating task logic and resuming the workflow in place is costly since the data should be stored −An nice feature would be to have an update tracer event which updates task logic without pausing the workflow Dynamic Data Flow Updates Depending on the requirements structure of a data flow is possible to change Tasks could be added or removed Similar update tracer could be used to update the edge properties rather than the task properties. Design Paradigms of Floe Contd 5
6
Floe Architecture 6
7
Smart Grid Streaming Pipeline Use Case 7
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.