IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai, Sangeetha Seshadri, Greg Eisenhauer, Karsten Schwan and others

2 Overview Motivation Case study: inTransit Architecture Flow graph deployment/reconfiguration Experiments Other aspects of the system

3 Lots of data produced in lots of places Examples: operational information systems, scientific collaborations, end-user systems, web traffic data Motivation

4 Airline example Flights arriving Flights departing Bags scanned Customers check-in Weather updates Catering updates Check seats FAA updates Rebook missed connections Shop for flights Concourse display Gate display Baggage display Home user display

5 Previous solutions Tools for managing distributed updates Pub/sub middlewares Transaction Processing Facilities In-house solutions Times have changed How to handle larger data volumes? How to seamlessly incorporate new functionality? How to effectively prioritize service? How to avoid hand-tuning the system?

6 Approach Provide a self-managing distributed data flow graph Flight data Weather data Check-in data Correlate flights and reservations Correlate flights and reservations Select ATL data Predict delays Generate customer messages Generate customer messages Terminal or web

7 Approach Deploy operators in a network overlay Middleware should self-manage this deployment Provide necessary performance, availability Respond to business-level needs

8 IFLOW WEATHER FLIGHTS OVERHEAD- DISPLAY COUNTERS Radial Distance Coordinates X-Window Client ImmersaDesk Coordinates +Bonds IPaq Client Molecular Dynamics Experiment Calculates Distance and Bonds AirlineFlowGraph { Sources ->{FLIGHTS, WEATHER, COUNTERS} Sinks ->{DISPLAY} Flow-Operators ->{JOIN-1, JOIN-2} Edges ->{(FLIGHTS, JOIN-1), (WEATHER, JOIN-1), (JOIN-1, JOIN-2), (COUNTERS, JOIN-2), (JOIN-2, DISPLAY)} Utility ->[Customer-Priority, Low Bandwidth Utilization] } IFLOW middleware CollaborationFlowGraph { Sources ->{Experiment} Sinks ->{IPaq, X-Window, Immersadesk} Flow-Operators ->{Coord, DistBond, RadDist, CoordBond} Edges ->{(Experiment, Coord), (Coord, DistBond), (DistBond, RadDist), (RadDist, IPaq), (CoordBond, ImmersaDesk), (CoordBond, X-Window)} Utility ->[Low-Delay, Synchronized-Delivery] } [ICAC ’06]

9 Case study inTransit Query processing over distributed event streams Operators are streaming versions of relational operators

10 [ICDCS ’05] IFLOW Architecture Query? Data-flow parser Application layer Middleware layer Underlay layer inTransit Distributed Stream Management Infrastructure inTransit Distributed Stream Management Infrastructure Flow-graph control ECho pub-sub PDS Stones Messaging

11 Application layer Applications specify data flow graphs Can specify directly Can use SQL-like declarative language STREAM N1.FLIGHTS.TIME, N7.COUNTERS.WAITLISTED, N2.WEATHER.TEMP FROM N1.FLIGHTS, N7.COUNTERS, N2.WEATHER WHEN N1.FLIGHTS.NUMBER=’DL207’ AND N7.COUNTERS.FLIGHT_NUMBER= N1.FLIGHTS.NUMBER AND N2.WEATHER.LOCATION=N1.FLIGHTS.DESTINATION; N1 N2 N7 ‘DL207’ N10 ⋈ ⋈

12 ECho – pub/sub event delivery Event channels for data streams Native operators E-code for most operators Library functions for special cases Stones – operator containers Queues and actions Middleware layer Channel 2 Channel 3 ⋈ Channel 1

13 Middleware layer PDS – resource monitoring Nodes update PDS with resource info inTransit notified when conditions change CPU CPU? CPU

14 Flow graph deployment Where to place operators?

15 Flow graph deployment Where to place operators? Basic idea: cluster physical nodes

16 Flow graph deployment Partition flow graph among coordinators Coordinators represent their cluster Exhaustive search among coordinators N1 N2 N7 ‘DL207’ ⋈ N10 ⋈ ? ? ?

17 Flow graph deployment Coordinator deploys subgraph in its cluster Uses exhaustive search to find best deployment ⋈ ?

18 Flow graph reconfiguration Resource or load changes trigger reconfiguration Clusters reconfigure locally Large changes require inter-cluster reconfiguration ⋈

19 Hierarchical clusters Coordinators themselves are clustered Coordinators form a hierarchy May need to move operators between clusters Handled by moving up a level in the hierarchy

20 What do we optimize Basic metrics Bandwidth used End to end delay Autonomic metrics Business value Infrastructure cost [ICAC ’05]

21 Experiments Simulations GT-ITM transit/stub Internet topology (128 nodes) NS-2 to capture trace of delay between nodes Deployment simulator reacts to delay OIS case study Flight information from Delta airlines Weather and news streams Experiments on Emulab (13 nodes)

22 Approximation penalty Flow graphs on simulator

23 Impact of reconfiguration 10 node flow graph on simulator

24 Impact of reconfiguration 2 node flow graph on Emulab Network congestion Increased processor load

25 Different utility functions Simulator, 128 node network

26 Different utility functions Utility: (150-delay) 2 x availableBandwidth/requiredBandwidth – cost x streamrate Cost: 1/cost Delay: 1/delay

27 Query planning We can optimize the structure of the query graph A different join order may enable a better mapping But there are too many plan/deployment possibilities to consider Use the hierarchy for planning Plus: stream advertisements to locate sources and deployed operators Planning algorithms: top-down, bottom-up [IPDPS ‘07]

28 Planning algorithms Top down A ⋈ B ⋈ C ⋈ D C ⋈ D A ⋈ B ⋈ C ⋈ D A ⋈ B ⋈ DCBA

29 Planning algorithms Bottom up A ⋈ B ⋈ C ⋈ D A ⋈ B ⋈ C ⋈ D A ⋈ B DCBA

30 Query planning 100 queries, each over 5 sources, 64 node network

31 Availability management Goal is to achieve both: Performance Reliability These goals often conflict! Spend scarce resources on throughput or availability? Manage tradeoff using utility function

32 ⋈ Basic approach: passive standby Log of messages can be replayed Periodic “soft-checkpoint” from active to standby Performance versus availability (fast recovery) More soft-checkpoints = faster recovery, higher overhead Choose a checkpoint frequency that maximizes utility ⋈ Fault tolerance [Middleware ’06] ⋈ X

33 Proactive fault tolerance Goal: predict system instability

34 Proactive fault tolerance

35 SPRT Early Alarms

36 SPRT Noisy process signal

37 Recovery time series Benefit of successful operation: k1 x (k2 - delay) 2 x bandwidth/availablebw

38 Mean time to recovery

39 IFLOW beyond inTransit Self-managing information flow Complex infrastructure inTransitPub/sub Science app …

40 Related work Stream data processing engines STREAM, Aurora, TelegraphCQ, NiagaraCQ, etc. Borealis, TRAPP, Flux, TAG Content-based pub/sub Gryphon, ARMADA, Hermes Overlay networks P2P Multicast (e.g. Bayeux) Grid Other overlay toolkits P2, MACEDON, GridKit

41 Conclusions IFLOW is a general information flow middleware Self-configuring and self-managing Based on application-specified performance and utility inTransit distributed event management infrastructure Queries over streams of structured data Resource-aware deployment of query graphs IFLOW provides utility-driven deployment and reconfiguration Overall goal Provide useful abstractions for distributed information systems Implementation of abstractions is self-managing Key to scalability, manageability, flexibility

42 For more information http://www.brianfrankcooper.net cooperb@yahoo-inc.com

IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

Similar presentations

Presentation on theme: "IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

Similar presentations

Presentation on theme: "IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,"— Presentation transcript:

Similar presentations

About project

Feedback