Download presentation
Presentation is loading. Please wait.
Published byLenard Underwood Modified over 9 years ago
1
IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai, Sangeetha Seshadri, Greg Eisenhauer, Karsten Schwan and others
2
2 Overview Motivation Case study: inTransit Architecture Flow graph deployment/reconfiguration Experiments Other aspects of the system
3
3 Lots of data produced in lots of places Examples: operational information systems, scientific collaborations, end-user systems, web traffic data Motivation
4
4 Airline example Flights arriving Flights departing Bags scanned Customers check-in Weather updates Catering updates Check seats FAA updates Rebook missed connections Shop for flights Concourse display Gate display Baggage display Home user display
5
5 Previous solutions Tools for managing distributed updates Pub/sub middlewares Transaction Processing Facilities In-house solutions Times have changed How to handle larger data volumes? How to seamlessly incorporate new functionality? How to effectively prioritize service? How to avoid hand-tuning the system?
6
6 Approach Provide a self-managing distributed data flow graph Flight data Weather data Check-in data Correlate flights and reservations Correlate flights and reservations Select ATL data Predict delays Generate customer messages Generate customer messages Terminal or web
7
7 Approach Deploy operators in a network overlay Middleware should self-manage this deployment Provide necessary performance, availability Respond to business-level needs
8
8 IFLOW WEATHER FLIGHTS OVERHEAD- DISPLAY COUNTERS Radial Distance Coordinates X-Window Client ImmersaDesk Coordinates +Bonds IPaq Client Molecular Dynamics Experiment Calculates Distance and Bonds AirlineFlowGraph { Sources ->{FLIGHTS, WEATHER, COUNTERS} Sinks ->{DISPLAY} Flow-Operators ->{JOIN-1, JOIN-2} Edges ->{(FLIGHTS, JOIN-1), (WEATHER, JOIN-1), (JOIN-1, JOIN-2), (COUNTERS, JOIN-2), (JOIN-2, DISPLAY)} Utility ->[Customer-Priority, Low Bandwidth Utilization] } IFLOW middleware CollaborationFlowGraph { Sources ->{Experiment} Sinks ->{IPaq, X-Window, Immersadesk} Flow-Operators ->{Coord, DistBond, RadDist, CoordBond} Edges ->{(Experiment, Coord), (Coord, DistBond), (DistBond, RadDist), (RadDist, IPaq), (CoordBond, ImmersaDesk), (CoordBond, X-Window)} Utility ->[Low-Delay, Synchronized-Delivery] } [ICAC ’06]
9
9 Case study inTransit Query processing over distributed event streams Operators are streaming versions of relational operators
10
10 [ICDCS ’05] IFLOW Architecture Query? Data-flow parser Application layer Middleware layer Underlay layer inTransit Distributed Stream Management Infrastructure inTransit Distributed Stream Management Infrastructure Flow-graph control ECho pub-sub PDS Stones Messaging
11
11 Application layer Applications specify data flow graphs Can specify directly Can use SQL-like declarative language STREAM N1.FLIGHTS.TIME, N7.COUNTERS.WAITLISTED, N2.WEATHER.TEMP FROM N1.FLIGHTS, N7.COUNTERS, N2.WEATHER WHEN N1.FLIGHTS.NUMBER=’DL207’ AND N7.COUNTERS.FLIGHT_NUMBER= N1.FLIGHTS.NUMBER AND N2.WEATHER.LOCATION=N1.FLIGHTS.DESTINATION; N1 N2 N7 ‘DL207’ N10 ⋈ ⋈
12
12 ECho – pub/sub event delivery Event channels for data streams Native operators E-code for most operators Library functions for special cases Stones – operator containers Queues and actions Middleware layer Channel 2 Channel 3 ⋈ Channel 1
13
13 Middleware layer PDS – resource monitoring Nodes update PDS with resource info inTransit notified when conditions change CPU CPU? CPU
14
14 Flow graph deployment Where to place operators?
15
15 Flow graph deployment Where to place operators? Basic idea: cluster physical nodes
16
16 Flow graph deployment Partition flow graph among coordinators Coordinators represent their cluster Exhaustive search among coordinators N1 N2 N7 ‘DL207’ ⋈ N10 ⋈ ? ? ?
17
17 Flow graph deployment Coordinator deploys subgraph in its cluster Uses exhaustive search to find best deployment ⋈ ?
18
18 Flow graph reconfiguration Resource or load changes trigger reconfiguration Clusters reconfigure locally Large changes require inter-cluster reconfiguration ⋈
19
19 Hierarchical clusters Coordinators themselves are clustered Coordinators form a hierarchy May need to move operators between clusters Handled by moving up a level in the hierarchy
20
20 What do we optimize Basic metrics Bandwidth used End to end delay Autonomic metrics Business value Infrastructure cost [ICAC ’05]
21
21 Experiments Simulations GT-ITM transit/stub Internet topology (128 nodes) NS-2 to capture trace of delay between nodes Deployment simulator reacts to delay OIS case study Flight information from Delta airlines Weather and news streams Experiments on Emulab (13 nodes)
22
22 Approximation penalty Flow graphs on simulator
23
23 Impact of reconfiguration 10 node flow graph on simulator
24
24 Impact of reconfiguration 2 node flow graph on Emulab Network congestion Increased processor load
25
25 Different utility functions Simulator, 128 node network
26
26 Different utility functions Utility: (150-delay) 2 x availableBandwidth/requiredBandwidth – cost x streamrate Cost: 1/cost Delay: 1/delay
27
27 Query planning We can optimize the structure of the query graph A different join order may enable a better mapping But there are too many plan/deployment possibilities to consider Use the hierarchy for planning Plus: stream advertisements to locate sources and deployed operators Planning algorithms: top-down, bottom-up [IPDPS ‘07]
28
28 Planning algorithms Top down A ⋈ B ⋈ C ⋈ D C ⋈ D A ⋈ B ⋈ C ⋈ D A ⋈ B ⋈ DCBA
29
29 Planning algorithms Bottom up A ⋈ B ⋈ C ⋈ D A ⋈ B ⋈ C ⋈ D A ⋈ B DCBA
30
30 Query planning 100 queries, each over 5 sources, 64 node network
31
31 Availability management Goal is to achieve both: Performance Reliability These goals often conflict! Spend scarce resources on throughput or availability? Manage tradeoff using utility function
32
32 ⋈ Basic approach: passive standby Log of messages can be replayed Periodic “soft-checkpoint” from active to standby Performance versus availability (fast recovery) More soft-checkpoints = faster recovery, higher overhead Choose a checkpoint frequency that maximizes utility ⋈ Fault tolerance [Middleware ’06] ⋈ X
33
33 Proactive fault tolerance Goal: predict system instability
34
34 Proactive fault tolerance
35
35 SPRT Early Alarms
36
36 SPRT Noisy process signal
37
37 Recovery time series Benefit of successful operation: k1 x (k2 - delay) 2 x bandwidth/availablebw
38
38 Mean time to recovery
39
39 IFLOW beyond inTransit Self-managing information flow Complex infrastructure inTransitPub/sub Science app …
40
40 Related work Stream data processing engines STREAM, Aurora, TelegraphCQ, NiagaraCQ, etc. Borealis, TRAPP, Flux, TAG Content-based pub/sub Gryphon, ARMADA, Hermes Overlay networks P2P Multicast (e.g. Bayeux) Grid Other overlay toolkits P2, MACEDON, GridKit
41
41 Conclusions IFLOW is a general information flow middleware Self-configuring and self-managing Based on application-specified performance and utility inTransit distributed event management infrastructure Queries over streams of structured data Resource-aware deployment of query graphs IFLOW provides utility-driven deployment and reconfiguration Overall goal Provide useful abstractions for distributed information systems Implementation of abstractions is self-managing Key to scalability, manageability, flexibility
42
42 For more information http://www.brianfrankcooper.net cooperb@yahoo-inc.com
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.