IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

Slides:



Advertisements
Similar presentations
Multi-Access Services in Heterogeneous Wireless Networks Kameswari Chebrolu, Ramesh R. Rao Abstract Today's wireless world is characterized by heterogeneity.
Advertisements

All rights reserved © 2006, Alcatel Grid Standardization & ETSI (May 2006) B. Berde, Alcatel R & I.
Efficient Event-based Resource Discovery Wei Yan*, Songlin Hu*, Vinod Muthusamy +, Hans-Arno Jacobsen +, Li Zha* * Chinese Academy of Sciences, Beijing.
© 2005 Dorian C. Arnold Reliability in Tree-based Overlay Networks Dorian C. Arnold University of Wisconsin Paradyn/Condor Week March 14-18, 2005 Madison,
The State-Space Approach to Self-Management of Enterprise Systems Vibhore Kumar, Karsten Schwan Subu Iyer*, Yuan Chen*, Akhil Sahai* Georgia Institute.
Corona: A High Performance Publish-Subscribe System for the World Wide Web Authors: V. Ramasubramanian, R. Peterson and E.G. Sirer Cornell University Presenter:
Cooperative Overlay Networking for Streaming Media Content Feng Wang 1, Jiangchuan Liu 1, Kui Wu 2 1 School of Computing Science, Simon Fraser University.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
Computer Science 1 ShapeShifter: Scalable, Adaptive End-System Multicast John Byers, Jeffrey Considine, Nicholas Eskelinen, Stanislav Rost, Dmitriy Zavin.
SCAN: A Dynamic, Scalable, and Efficient Content Distribution Network Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy,
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Rheeve: A Plug-n-Play Peer- to-Peer Computing Platform Wang-kee Poon and Jiannong Cao Department of Computing, The Hong Kong Polytechnic University ICDCSW.
Scalable Adaptive Data Dissemination Under Heterogeneous Environment Yan Chen, John Kubiatowicz and Ben Zhao UC Berkeley.
Application architectures
ATSN 2009 Towards an Extensible Agent-based Middleware for Sensor Networks and RFID Systems Dirk Bade University of Hamburg, Germany.
CS218 – Final Project A “Small-Scale” Application- Level Multicast Tree Protocol Jason Lee, Lih Chen & Prabash Nanayakkara Tutor: Li Lao.
Rethinking Internet Traffic Management: From Multiple Decompositions to a Practical Protocol Jiayue He Princeton University Joint work with Martin Suchara,
Design of a Scalable Clearing House Architecture Lakshminarayanan Subramanian Chen-Nee Chuah Ramakrishna Gummadi ICEBERG Design Review Jan 12, 2000.
Adaptive Self-Configuring Sensor Network Topologies ns-2 simulation & performance analysis Zhenghua Fu Ben Greenstein Petros Zerfos.
A Framework for Cost-Effective Peer-to- Peer Content Distribution Mohamed Hefeeda and Bharat Bhargava Department of Computer Sciences Purdue University.
Performance Management (Best Practices) REF: Document ID
Application architectures
Event Processing in Operational Information Systems: Two Case Studies and BAM/EDA Implications Karsten Schwan, Brian Cooper, Greg Eisenhauer Georgia Institute.
Wi-Fi Neighborcast: Enabling communication among nearby clients
1 Chapter 27 Internetwork Routing (Static and automatic routing; route propagation; BGP, RIP, OSPF; multicast routing)
JOnAS developer workshop – /02/2004 status Emmanuel Cecchet
PIC: Practical Internet Coordinates for Distance Estimation Manuel Costa joint work with Miguel Castro, Ant Rowstron, Peter Key Microsoft Research Cambridge.
SensIT PI Meeting, January 15-17, Self-Organizing Sensor Networks: Efficient Distributed Mechanisms Alvin S. Lim Computer Science and Software Engineering.
SCAN: a Scalable, Adaptive, Secure and Network-aware Content Distribution Network Yan Chen CS Department Northwestern University.
High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.
DISTRIBUTED COMPUTING
INFORMATION AND COMMUNICATION SYSTEMS MERIT 2008 Research Symposium Melbourne Engineering Graduates Look to the Future System Architecture An internetworking.
Dynamic Reconfiguration Dynamic selection of handler functionality: currently through use of parameterizable handlers or by selecting from a set of existing.
Content-Based Routing in Mobile Ad Hoc Networks Milenko Petrovic, Vinod Muthusamy, Hans-Arno Jacobsen University of Toronto July 18, 2005 MobiQuitous 2005.
A novel approach of gateway selection and placement in cellular Wi-Fi system Presented By Rajesh Prasad.
IMDGs An essential part of your architecture. About me
1 Martin Schulz, Lawrence Livermore National Laboratory Brian White, Sally A. McKee, Cornell University Hsien-Hsin Lee, Georgia Institute of Technology.
A Self-Manageable Infrastructure for Supporting Web-based Simulations Yingping Huang Xiaorong Xiang Gregory Madey Computer Science & Engineering University.
Impact of Topology on Overlay Multicast Suat Mercan.
ECO-DNS: Expected Consistency Optimization for DNS Chen Stephanos Matsumoto Adrian Perrig © 2013 Stephanos Matsumoto1.
Adaptive Web Caching CS411 Dynamic Web-Based Systems Flying Pig Fei Teng/Long Zhao/Pallavi Shinde Computer Science Department.
Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.
MIDDLEWARE SYSTEMS RESEARCH GROUP Adaptive Content-based Routing In General Overlay Topologies Guoli Li, Vinod Muthusamy Hans-Arno Jacobsen Middleware.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
A Utility-based Approach to Scheduling Multimedia Streams in P2P Systems Fang Chen Computer Science Dept. University of California, Riverside
Intradomain Traffic Engineering By Behzad Akbari These slides are based in part upon slides of J. Rexford (Princeton university)
1 Supporting Dynamic Migration in Tightly Coupled Grid Applications Liang Chen Qian Zhu Gagan Agrawal Computer Science & Engineering The Ohio State University.
Accommodating Bursts in Distributed Stream Processing Systems Yannis Drougas, ESRI Vana Kalogeraki, AUEB
Information-Centric Networks10b-1 Week 10 / Paper 2 Hermes: a distributed event-based middleware architecture –P.R. Pietzuch, J.M. Bacon –ICDCS 2002 Workshops.
Peer-to-Peer Result Dissemination in High-Volume Data Filtering Shariq Rizvi and Paul Burstein CS 294-4: Peer-to-Peer Systems.
CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.
Peter R Pietzuch and Jean Bacon Peer-to-Peer Overlay Networks in an Event-Based Middleware DEBS’03, San Diego, CA, USA,
SHADOWSTREAM: PERFORMANCE EVALUATION AS A CAPABILITY IN PRODUCTION INTERNET LIVE STREAM NETWORK ACM SIGCOMM CING-YU CHU.
University of Westminster – Checkpointing Mechanism for the Grid Environment K Sajadah, G Terstyanszky, S Winter, P. Kacsuk University.
Performance Management (Best Practices) REF: Document ID
Collaborative Scientific Visualization: from your lab to Internet2 and beyond Matthew Wolf College of Computing Georgia Institute of Technology
On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.
Addressing Data Compatibility on Programmable Network Platforms Ada Gavrilovska, Karsten Schwan College of Computing Georgia Tech.
Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.
Euro-Par, HASTE: An Adaptive Middleware for Supporting Time-Critical Event Handling in Distributed Environments ICAC 2008 Conference June 2 nd,
Internet Traffic Engineering Motivation: –The Fish problem, congested links. –Two properties of IP routing Destination based Local optimization TE: optimizing.
1 Ji Wang and Dongsheng Li National Lab for Parallel and Distributed Processing Introduction of iVCE ( Internet-based V irtual C omputing E nvironment.
MarkLogic The Only Enterprise NoSQL Database Presented by: Aashi Rastogi ( ) Sanket Patel ( )
1 Towards Scalable Pub/Sub Systems Shuping Ji 1, Chunyang Ye 2, Jun Wei 1 and Arno Jacobsen 3 1 Chinese Academy of Sciences 2 Hainan University 3 Middleware.
Supporting Fault-Tolerance in Streaming Grid Applications
Data Path through host/ANP.
Resource Allocation for Distributed Streaming Applications
Design.
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai, Sangeetha Seshadri, Greg Eisenhauer, Karsten Schwan and others

2 Overview Motivation Case study: inTransit Architecture Flow graph deployment/reconfiguration Experiments Other aspects of the system

3 Lots of data produced in lots of places Examples: operational information systems, scientific collaborations, end-user systems, web traffic data Motivation

4 Airline example Flights arriving Flights departing Bags scanned Customers check-in Weather updates Catering updates Check seats FAA updates Rebook missed connections Shop for flights Concourse display Gate display Baggage display Home user display

5 Previous solutions Tools for managing distributed updates Pub/sub middlewares Transaction Processing Facilities In-house solutions Times have changed How to handle larger data volumes? How to seamlessly incorporate new functionality? How to effectively prioritize service? How to avoid hand-tuning the system?

6 Approach Provide a self-managing distributed data flow graph Flight data Weather data Check-in data Correlate flights and reservations Correlate flights and reservations Select ATL data Predict delays Generate customer messages Generate customer messages Terminal or web

7 Approach Deploy operators in a network overlay Middleware should self-manage this deployment Provide necessary performance, availability Respond to business-level needs

8 IFLOW WEATHER FLIGHTS OVERHEAD- DISPLAY COUNTERS Radial Distance Coordinates X-Window Client ImmersaDesk Coordinates +Bonds IPaq Client Molecular Dynamics Experiment Calculates Distance and Bonds AirlineFlowGraph { Sources ->{FLIGHTS, WEATHER, COUNTERS} Sinks ->{DISPLAY} Flow-Operators ->{JOIN-1, JOIN-2} Edges ->{(FLIGHTS, JOIN-1), (WEATHER, JOIN-1), (JOIN-1, JOIN-2), (COUNTERS, JOIN-2), (JOIN-2, DISPLAY)} Utility ->[Customer-Priority, Low Bandwidth Utilization] } IFLOW middleware CollaborationFlowGraph { Sources ->{Experiment} Sinks ->{IPaq, X-Window, Immersadesk} Flow-Operators ->{Coord, DistBond, RadDist, CoordBond} Edges ->{(Experiment, Coord), (Coord, DistBond), (DistBond, RadDist), (RadDist, IPaq), (CoordBond, ImmersaDesk), (CoordBond, X-Window)} Utility ->[Low-Delay, Synchronized-Delivery] } [ICAC ’06]

9 Case study inTransit Query processing over distributed event streams Operators are streaming versions of relational operators

10 [ICDCS ’05] IFLOW Architecture Query? Data-flow parser Application layer Middleware layer Underlay layer inTransit Distributed Stream Management Infrastructure inTransit Distributed Stream Management Infrastructure Flow-graph control ECho pub-sub PDS Stones Messaging

11 Application layer Applications specify data flow graphs Can specify directly Can use SQL-like declarative language STREAM N1.FLIGHTS.TIME, N7.COUNTERS.WAITLISTED, N2.WEATHER.TEMP FROM N1.FLIGHTS, N7.COUNTERS, N2.WEATHER WHEN N1.FLIGHTS.NUMBER=’DL207’ AND N7.COUNTERS.FLIGHT_NUMBER= N1.FLIGHTS.NUMBER AND N2.WEATHER.LOCATION=N1.FLIGHTS.DESTINATION; N1 N2 N7 ‘DL207’ N10 ⋈ ⋈

12 ECho – pub/sub event delivery Event channels for data streams Native operators E-code for most operators Library functions for special cases Stones – operator containers Queues and actions Middleware layer Channel 2 Channel 3 ⋈ Channel 1

13 Middleware layer PDS – resource monitoring Nodes update PDS with resource info inTransit notified when conditions change CPU CPU? CPU

14 Flow graph deployment Where to place operators?

15 Flow graph deployment Where to place operators? Basic idea: cluster physical nodes

16 Flow graph deployment Partition flow graph among coordinators Coordinators represent their cluster Exhaustive search among coordinators N1 N2 N7 ‘DL207’ ⋈ N10 ⋈ ? ? ?

17 Flow graph deployment Coordinator deploys subgraph in its cluster Uses exhaustive search to find best deployment ⋈ ?

18 Flow graph reconfiguration Resource or load changes trigger reconfiguration Clusters reconfigure locally Large changes require inter-cluster reconfiguration ⋈

19 Hierarchical clusters Coordinators themselves are clustered Coordinators form a hierarchy May need to move operators between clusters Handled by moving up a level in the hierarchy

20 What do we optimize Basic metrics Bandwidth used End to end delay Autonomic metrics Business value Infrastructure cost [ICAC ’05]

21 Experiments Simulations GT-ITM transit/stub Internet topology (128 nodes) NS-2 to capture trace of delay between nodes Deployment simulator reacts to delay OIS case study Flight information from Delta airlines Weather and news streams Experiments on Emulab (13 nodes)

22 Approximation penalty Flow graphs on simulator

23 Impact of reconfiguration 10 node flow graph on simulator

24 Impact of reconfiguration 2 node flow graph on Emulab Network congestion Increased processor load

25 Different utility functions Simulator, 128 node network

26 Different utility functions Utility: (150-delay) 2 x availableBandwidth/requiredBandwidth – cost x streamrate Cost: 1/cost Delay: 1/delay

27 Query planning We can optimize the structure of the query graph A different join order may enable a better mapping But there are too many plan/deployment possibilities to consider Use the hierarchy for planning Plus: stream advertisements to locate sources and deployed operators Planning algorithms: top-down, bottom-up [IPDPS ‘07]

28 Planning algorithms Top down A ⋈ B ⋈ C ⋈ D C ⋈ D A ⋈ B ⋈ C ⋈ D A ⋈ B ⋈ DCBA

29 Planning algorithms Bottom up A ⋈ B ⋈ C ⋈ D A ⋈ B ⋈ C ⋈ D A ⋈ B DCBA

30 Query planning 100 queries, each over 5 sources, 64 node network

31 Availability management Goal is to achieve both: Performance Reliability These goals often conflict! Spend scarce resources on throughput or availability? Manage tradeoff using utility function

32 ⋈ Basic approach: passive standby Log of messages can be replayed Periodic “soft-checkpoint” from active to standby Performance versus availability (fast recovery) More soft-checkpoints = faster recovery, higher overhead Choose a checkpoint frequency that maximizes utility ⋈ Fault tolerance [Middleware ’06] ⋈ X

33 Proactive fault tolerance Goal: predict system instability

34 Proactive fault tolerance

35 SPRT Early Alarms

36 SPRT Noisy process signal

37 Recovery time series Benefit of successful operation: k1 x (k2 - delay) 2 x bandwidth/availablebw

38 Mean time to recovery

39 IFLOW beyond inTransit Self-managing information flow Complex infrastructure inTransitPub/sub Science app …

40 Related work Stream data processing engines STREAM, Aurora, TelegraphCQ, NiagaraCQ, etc. Borealis, TRAPP, Flux, TAG Content-based pub/sub Gryphon, ARMADA, Hermes Overlay networks P2P Multicast (e.g. Bayeux) Grid Other overlay toolkits P2, MACEDON, GridKit

41 Conclusions IFLOW is a general information flow middleware Self-configuring and self-managing Based on application-specified performance and utility inTransit distributed event management infrastructure Queries over streams of structured data Resource-aware deployment of query graphs IFLOW provides utility-driven deployment and reconfiguration Overall goal Provide useful abstractions for distributed information systems Implementation of abstractions is self-managing Key to scalability, manageability, flexibility

42 For more information