The Design of the Borealis Stream Processing Engine CIDR 2005 Brandeis University, Brown University, MIT Kang, Seungwoo 2005.3.15 Ref.

Slides:



Advertisements
Similar presentations
Computer Systems & Architecture Lesson 2 4. Achieving Qualities.
Advertisements

Network Resource Broker for IPTV in Cloud Computing Lei Liang, Dan He University of Surrey, UK OGF 27, G2C Workshop 15 Oct 2009 Banff,
Load Management and High Availability in Borealis Magdalena Balazinska, Jeong-Hyon Hwang, and the Borealis team MIT, Brown University, and Brandeis University.
Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
SDN Controller Challenges
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
The Design of the Borealis Stream Processing Engine Daniel J. Abadi1, Yanif Ahmad2, Magdalena Balazinska1, Ug ̆ur C ̧ etintemel2, Mitch Cherniack3, Jeong-Hyon.
Adaptive Monitoring of Bursty Data Streams Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani.
The Design of the Borealis Stream Processing Engine Brandeis University, Brown University, MIT Magdalena BalazinskaNesime Tatbul MIT Brown.
Static Optimization of Conjunctive Queries with Sliding Windows over Infinite Streams Presented by: Andy Mason and Sheng Zhong Ahmed M.Ayad and Jeffrey.
Monday, June 01, 2015 ARRIVE: Algorithm for Robust Routing in Volatile Environments 1 NEST Retreat, Lake Tahoe, June
1 Failure Recovery for Priority Progress Multicast Jung-Rung Han Supervisor: Charles Krasic.
Fault-Tolerance in the Borealis Distributed Stream Processing System Magdalena Balazinska, Hari Balakrishnan, Samuel Madden, and Michael Stonebraker MIT.
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Adaptive Sampling for Sensor Networks Ankur Jain ٭ and Edward Y. Chang University of California, Santa Barbara DMSN 2004.
Aurora Proponent Team Wei, Mingrui Liu, Mo Rebuttal Team Joshua M Lee Raghavan, Venkatesh.
P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar.
Scalable Adaptive Data Dissemination Under Heterogeneous Environment Yan Chen, John Kubiatowicz and Ben Zhao UC Berkeley.
Scalable Distributed Stream System Mitch Cherniack, Hari Balakrishnan, Magdalena Balazinska, Don Carney, Uğur Çetintemel, Ying Xing, and Stan Zdonik Proceedings.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.
Improving the Accuracy of Continuous Aggregates & Mining Queries Under Load Shedding Yan-Nei Law* and Carlo Zaniolo Computer Science Dept. UCLA * Bioinformatics.
September 24, 2007The 3 rd CSAIL Student Workshop Byzantine Fault Tolerant Cooperative Caching Raluca Ada Popa, James Cowling, Barbara Liskov Summer UROP.
Wide-area cooperative storage with CFS
Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.
Winter Retreat Connecting the Dots: Using Runtime Paths for Macro Analysis Mike Chen, Emre Kıcıman, Anthony Accardi, Armando Fox, Eric Brewer
Applying Control Theory to Stream Processing Systems Wei Xu Bill Kramer Joe Hellerstein.
Correctness of Gossip-Based Membership under Message Loss Maxim Gurevich, Idit Keidar Technion.
PROMISE: Peer-to-Peer Media Streaming Using CollectCast Presented by: Randeep Singh Gakhal CMPT 886, July 2004.
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
MATE: MPLS Adaptive Traffic Engineering Anwar Elwalid, et. al. IEEE INFOCOM 2001.
Computer Measurement Group, India Reliable and Scalable Data Streaming in Multi-Hop Architecture Sudhir Sangra, BMC Software Lalit.
Efficient Scheduling of Heterogeneous Continuous Queries Mohamed A. Sharaf Panos K. Chrysanthis Alexandros Labrinidis Kirk Pruhs Advanced Data Management.
CS An Overlay Routing Scheme For Moving Large Files Su Zhang Kai Xu.
The Design of the Borealis Stream Processing Engine CIDR 2005 Brandeis University, Brown University, MIT Kang, Seungwoo Ref.
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
An Integration Framework for Sensor Networks and Data Stream Management Systems.
QoS Support in High-Speed, Wormhole Routing Networks Mario Gerla, B. Kannan, Bruce Kwan, Prasasth Palanti,Simon Walton.
ARMADA Middleware and Communication Services T. ABDELZAHER, M. BJORKLUND, S. DAWSON, W.-C. FENG, F. JAHANIAN, S. JOHNSON, P. MARRON, A. MEHRA, T. MITTON,
May PEM status report. O.Bärring 1 PEM status report Large-Scale Cluster Computing Workshop FNAL, May Olof Bärring, CERN.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Aurora – system architecture Pawel Jurczyk. Currently used DB systems Classical DBMS: –Passive repository storing data (HADP – human-active, DBMS- passive.
Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management Author: Raul Castro Fernandez, Matteo Migliavacca, et al.
PMI: A Scalable Process- Management Interface for Extreme-Scale Systems Pavan Balaji, Darius Buntinas, David Goodell, William Gropp, Jayesh Krishna, Ewing.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
Network Computing Laboratory A programming framework for Stream Synthesizing Service.
PROP: A Scalable and Reliable P2P Assisted Proxy Streaming System Computer Science Department College of William and Mary Lei Guo, Songqing Chen, and Xiaodong.
Accommodating Bursts in Distributed Stream Processing Systems Yannis Drougas, ESRI Vana Kalogeraki, AUEB
Tufts Wireless Laboratory School Of Engineering Tufts University Paper Review “An Energy Efficient Multipath Routing Protocol for Wireless Sensor Networks”,
SelfCon Foil no 1 Variability in Self-Adaptive Systems.
A Dynamic Query-tree Energy Balancing Protocol for Sensor Networks H. Yang, F. Ye, and B. Sikdar Department of Electrical, Computer and systems Engineering.
CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.
Adaptive Ordering of Pipelined Stream Filters Babu, Motwani, Munagala, Nishizawa, and Widom SIGMOD 2004 Jun 13-18, 2004 presented by Joshua Lee Mingzhu.
Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown.
Danilo Florissi, Yechiam Yemini (YY), Sushil da Silva, Hao Huang Columbia University, New York, NY 10027
Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Song Liu‡, Sunil Prabhakar†, and Bin Yao‡ † Department of Computer.
Dynamic Load Balancing Tree and Structured Computations.
1 Fault Tolerance and Recovery Mostly taken from
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
HERON.
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
Introduction to Wireless Sensor Networks
Applying Control Theory to Stream Processing Systems
Presenter Kyungho Jeon 11/17/2018.
Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy
TelegraphCQ: Continuous Dataflow Processing for an Uncertain World
Adaptive Query Processing (Background)
Presentation transcript:

The Design of the Borealis Stream Processing Engine CIDR 2005 Brandeis University, Brown University, MIT Kang, Seungwoo Ref. Borealis.ppt

One-line comment This paper presents an overview of the design for Borealis distributed stream processing engine with new features and mechanisms

Outline Motivation and Goal Requirements Technical issues  Dynamic revision of query results  Dynamic query modification  Flexible and scalable optimization  Fault tolerance Conclusion

Motivation and Goal Envision the second-generation SPE  Fundamental requirements of many streaming applications not supported by first-generation SPE  Three fundamental requirements Dynamic revision of query results Dynamic query modification Flexible and scalable optimization Present the functionality and preliminary design of Borealis

Requirements Dynamic revision of query results Dynamic query modification Flexible and highly scalable optimization

Changes in the models Data model  Revision support Three types of messages Query model  Support revision processing Time travel, CP views  Support modification of Op box semantic Control lines QoS model  Introduce QoS metrics in tuple basis Each tuple (message) holds VM (Vector of Metrics) Score function

Architecture of Borealis Modify and extend both systems, Aurora and Medusa, with a set of features and mechanisms

Dynamic revision of query results Problem  Corrections of previously reported data from data source  Highly volatile and unpredictable characteristics of data sources like sensors Late arrival, out-of-order data  Just accept approximate or imperfect results Goal  Support to process revisions and correct the previous output result

Causes for Tuple Revisions Data sources revise input streams: “On occasion, data feeds put out a faulty price [...] and send a correction within a few hours” [MarketBrowser] Temporary overloads cause tuple drops Data arrives late, and miss its processing wnd Average price 1 hour (1pm,$10) (2pm,$12)(3pm,$11) (4:05,$11)(4:15,$10)(2:25,$9)

New Data Model for Revisions time : tuple timestamp type : tuple type  insertion, deletion, replacement id : unique identifier of tuple on stream headerdata (time,type,id,a1,...,an)

Revision Processing in Borealis Closed model: revisions produce revisions Average price 1 hour (1pm,$10) (2pm,$12)(3pm,$11) (2:25,$9) Average price 1 hour (2pm,$12) (3pm,$11)(2pm,$11)

Revision Processing in Borealis Connection points (CPs) store history (diagram history with history bound) Operators pull the history they need  Operator CP Oldest tuple in history Most recent tuple in history Revision Input queue Stream

Discussion Runtime overhead  Assumption Input revision msg < 1% Revision msg processing can be deferred  Processing cost  Storage cost Application’s needs / requirements  Whether it is interested in revision msg or not  What to do with revision msg Rollback, compensation ??  Application’s QoS requirement  Delay bound

Dynamic query modification Problem  Change certain attributes of the query at runtime E.g. network monitoring  Manual query substitutions – high overhead, slow to take effect (starting with an empty state) Goal  Low overhead, fast, and automatic modifications Dynamic query modification  Control lines Provided to each operator box Carry messages with revised box parameters (window size, filter predicate) and new box functions  Timing problem Specify precisely what data is to be processed according to what control parameters Data is ready for processing too late or too early  old data vs. new parameter -> buffered old parameters  new data vs. old parameter -> revision message and time travel

Time travel Motivation  Rewind history and then repeat it  Move forward into the future Connection Point (CP) CP View  Independent view of CP  View range Start_time Max_time  Operations Undo: deletion message to roll back the state to time t Replay  Prediction function for future travel

Discussion How to determine whether timing problem occurs or not  Too early: Lookup data buffered in CP  Too late: Checking timestamp of the tuple  Checking overhead “Performed on a copy of some portion of the running query diagram, so as not to interfere with processing of the running query diagram”  Who makes a copy, when, how to make? How to manage? Future time travel  When is it useful?  E.g alarming ?

Optimization in a Distributed SPE Goal: Optimized resource allocation Challenges:  Wide variation in resources High-end servers vs. tiny sensors  Multiple resources involved CPU, memory, I/O, bandwidth, power  Dynamic environment Changing input load and resource availability  Scalability Query network size, number of nodes

Quality of Service A mechanism to drive resource allocation Aurora model  QoS functions at query end-points  Problem: need to infer QoS at upstream nodes An alternative model  Vector of Metrics (VM) carried in tuples Content: tuple’s importance, or its age Performance: arrival time, total resource consumed,..  Operators can change VM  A Score Function to rank tuples based on VM  Optimizers can keep and use statistics on VM

Example Application: Warfighter Physiologic Status Monitoring (WPSM) Area State Confidence Thermal Hydration Cognitive Life Signs Wound Detection 90% 60% 100% 90% 80% Physiologic Models   

SF(VM) = VM.confidence x ADF(VM.age) age decay function Score Function Sensors Model1 Model2 ([age, confidence], value) VM Merge Models may change confidence. HRate RRate Temp

Borealis Optimizer Hierarchy End-point Monitor(s)Global Optimizer Local Monitor Local Optimizer Neighborhood Optimizer Borealis Node statisticstriggerdecision

Optimization Tactics Priority scheduling - Local Modification of query plans - Local  Changing order of commuting operators  Using alternate operator implementations Allocation of query fragments to nodes - Neighborhood, Global Load shedding - Local, Neighborhood, Global

Priority scheduling Highest QoS Gradient box  Evaluate the predicted-QoS score function on each message with values in VM  average QoS Gradient for each box By comparing average QoS-impact scores between the inputs and outputs of each box Highest QoS-impact message from the input queue  Out-of-order processing  Inherent load shedding behavior

Correlation-based Load Distribution Under the situation that network BW is abundant and network transfer delays are negligible Goal: Minimize end-to-end latency Key ideas:  Balance load across nodes to avoid overload  Group boxes with small load correlation together  Maximize load correlation among nodes Connected Plan AB S1 CD S2 r 2r 2cr 4cr Cut Plan S1 S2 AB CD r 2r 3cr

Load Shedding Goal: Remove excess load at all nodes and links Shedding at node A relieves its descendants Distributed load shedding  Neighbors exchange load statistics  Parent nodes shed load on behalf of children  Uniform treatment of CPU and bandwidth problem Load balancing or Load shedding?

Local Load Shedding PlanLoss LocalApp 1 : 25% App 2 : 58% CPU Load = 5Cr 1, Cap = 4Cr 1 CPU Load = 4Cr 2, Cap = 2Cr 2 A must shed 20%B must shed 50% 4CC C 3C r1r1 r1r1 r2r2 r2r2 App 1 App 2 Node ANode B Goal: Minimize total loss 25% 58% CPU Load = 3.75Cr 2, Cap = 2Cr 2 B must shed 47% No overload A must shed 20% r2’r2’

Distributed Load Shedding CPU Load = 5Cr 1, Cap = 4Cr 1 CPU Load = 4Cr 2, Cap = 2Cr 2 A must shed 20%B must shed 50% 4CC C 3C r1r1 r1r1 r2r2 r2r2 App 1 App 2 Node ANode B 9% 64% PlanLoss LocalApp 1 : 25% App 2 : 58% DistributedApp 1 : 9% App 2 : 64% smaller total loss! No overload A must shed 20% r2’r2’ r 2 ’’ Goal: Minimize total loss

Extending Optimization to Sensor Nets Sensor proxy as an interface Moving operators in and out of the sensor net Adjusting sensor sampling rates Proxy data control Sensors control message from the optimizer

Discussion QoS-based optimization  QoS score calculations, statistics maintaining tuple basis Fine-grained but lots of processing  Application’s burden QoS specifications How to facilitate the task?

Fault-Tolerance through Replication Goal: Tolerate node and network failures U  Node 1’  Node 2’ Node 3’  U s1s1 s2s2 Node 1 U  Node 3 U  Node 2  s3s3 s4s4 s3’s3’

Fault-Tolerance Approach If an input stream fails, find another replica No replica available, produce tentative tuples Correct tentative results after failures STABLE UPSTREAM FAILURE STABILIZING Missing or tentative inputs Failure heals Another upstream failure in progress Reconcile state Corrected output

Conclusions Next generation streaming applications require a flexible processing model  Distributed operation  Dynamic result and query modification  Dynamic and scalable optimization  Server and sensor network integration  Tolerance to node and network failures

Discussion Distributed environment  Load management  Fault tolerance  How about data routing / forwarding issue ?? Other problems not tackled?