A Heartbeat Mechanism and its Application in Gigascope Johnson, Muthukrishnan, Shkapenyuk, Spatscheck Presented by: Joseph Frate and John Russo.

Slides:



Advertisements
Similar presentations
Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD.
Advertisements

1 CONGESTION CONTROL. 2 Congestion Control When one part of the subnet (e.g. one or more routers in an area) becomes overloaded, congestion results. Because.
TELE202 Lecture 8 Congestion control 1 Lecturer Dr Z. Huang Overview ¥Last Lecture »X.25 »Source: chapter 10 ¥This Lecture »Congestion control »Source:
Review: Routing algorithms Distance Vector algorithm. –What information is maintained in each router? –How to distribute the global network information?
Composite Subset Measures Lei Chen, Paul Barford, Bee-Chung Chen, Vinod Yegneswaran University of Wisconsin - Madison Raghu Ramakrishnan Yahoo! Research.
Improving TCP Performance over Mobile Ad Hoc Networks by Exploiting Cross- Layer Information Awareness Xin Yu Department Of Computer Science New York University,
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
What's inside a router? We have yet to consider the switching function of a router - the actual transfer of datagrams from a router's incoming links to.
Engine Design: Stream Operators Everywhere Theodore Johnson AT&T Labs – Research Contributors: Chuck Cranor Vladislav Shkapenyuk.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
A General approach to MPLS Path Protection using Segments Ashish Gupta Ashish Gupta.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
How to Build a Stream Database Theodore Johnson AT&T Labs - Research.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
CHAPTER 9: Input / Output
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
Avoiding Idle Waiting in the execution of Continuous Queries Carlo Zaniolo CSD CS240B Notes April 2008.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
Connecting LANs, Backbone Networks, and Virtual LANs
Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.
CHAPTER 9: Input / Output
“Intra-Network Routing Scheme using Mobile Agents” by Ajay L. Thakur.
Heartbeat Mechanism and its Applications in Gigascope Vladislav Shkapenyuk (speaker), Muthu S. Muthukrishnan Rutgers University Theodore Johnson Oliver.
Multiple Aggregations Over Data Streams Rui ZhangNational Univ. of Singapore Nick KoudasUniv. of Toronto Beng Chin OoiNational Univ. of Singapore Divesh.
1 The Internet and Networked Multimedia. 2 Layering  Internet protocols are designed to work in layers, with each layer building on the facilities provided.
Reliable Communication in the Presence of Failures Based on the paper by: Kenneth Birman and Thomas A. Joseph Cesar Talledo COEN 317 Fall 05.
ECE 526 – Network Processing Systems Design Packet Processing I: algorithms and data structures Chapter 5: D. E. Comer.
Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.
Smita Vijayakumar Qian Zhu Gagan Agrawal 1.  Background  Data Streams  Virtualization  Dynamic Resource Allocation  Accuracy Adaptation  Research.
Load-Balancing Routing in Multichannel Hybrid Wireless Networks With Single Network Interface So, J.; Vaidya, N. H.; Vehicular Technology, IEEE Transactions.
Cisco 3 - Switching Perrine. J Page 16/4/2016 Chapter 4 Switches The performance of shared-medium Ethernet is affected by several factors: data frame broadcast.
2006/3/211 Multiple Aggregations over Data Stream Rui Zhang, Nick Koudas, Beng Chin Ooi Divesh Srivastava SIGMOD 2005.
REECH ME: Regional Energy Efficient Cluster Heads based on Maximum Energy Routing Protocol Prepared by: Arslan Haider. 1.
Jennifer Rexford Princeton University MW 11:00am-12:20pm Measurement COS 597E: Software Defined Networking.
McGraw-Hill©The McGraw-Hill Companies, Inc., 2004 Connecting Devices CORPORATE INSTITUTE OF SCIENCE & TECHNOLOGY, BHOPAL Department of Electronics and.
Eddies: Continuously Adaptive Query Processing Ross Rosemark.
CS4432: Database Systems II Query Processing- Part 2.
High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.
Author: Haoyu Song, Murali Kodialam, Fang Hao and T.V. Lakshman Publisher/Conf. : IEEE International Conference on Network Protocols (ICNP), 2009 Speaker:
Adaptive Ordering of Pipelined Stream Filters Babu, Motwani, Munagala, Nishizawa, and Widom SIGMOD 2004 Jun 13-18, 2004 presented by Joshua Lee Mingzhu.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
CS 540 Database Management Systems
Company LOGO Network Management Architecture By Dr. Shadi Masadeh 1.
1 IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo, Jose G. Delgado-Frias Publisher: Journal of Systems.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
Aggregator Stage : Definition : Aggregator classifies data rows from a single input link into groups and calculates totals or other aggregate functions.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Coping with Link Failures in Centralized Control Plane Architecture Maulik Desai, Thyagarajan Nandagopal.
Author : Tzi-Cker Chiueh, Prashant Pradhan Publisher : High-Performance Computer Architecture, Presenter : Jo-Ning Yu Date : 2010/11/03.
Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window MADALGO – Center for Massive Data Algorithmics, a Center of the Danish.
1 Out of Order Processing for Stream Query Evaluation Jin Li (Portland State Universtiy) Joint work with Theodore Johnson, Vladislav Shkapenyuk, David.
Gigascope A stream database for network monitoring
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
Topics discussed in this section:
Empirically Characterizing the Buffer Behaviour of Real Devices
Chapter 12: Query Processing
Evaluation of Relational Operations
A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.
Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy
Smita Vijayakumar Qian Zhu Gagan Agrawal
Lecture 2- Query Processing (continued)
Evaluation of Relational Operations: Other Techniques
The Gamma Database Machine Project
Evaluation of Relational Operations: Other Techniques
Presentation transcript:

A Heartbeat Mechanism and its Application in Gigascope Johnson, Muthukrishnan, Shkapenyuk, Spatscheck Presented by: Joseph Frate and John Russo

Introduction  Unblocking aggregation, join and union operators Limit scope of output tuples that an input tuple can affect. Two techniques  Define queries over window of input stream Applicable to continuous query systems for monitoring  Timestamp mechanism Applicable to data reduction applications

Gigascope  A high-performance streaming database for network monitoring  Some fields of input stream behave as timestamps  Locality of input tuple Aggregation must have timestamp as one of group by fields Join query relates timestamp fields of both inputs Merge operator  Union operator  Preserves timestamp property of one of the fields

Timestamps and Gigascope  Effective as long as progress is made in all input streams  Join or merge operators can stall if one input stream stalls, causing systems failure

Example  Sites monitored by Gigascope have multiple gigabit connections to the Internet.  One or more is a backup link through which traffic can be diverted if primary link fails  All links are monitored simultaneously  Since primary link has gigabit traffic and backup link has no traffic, merge operator will quickly overflow.  Presence of tuples carries timestamps, absence does not

Focus of Paper  Authors propose use of heartbeats or punctuation to unblock operators Heartbeats originate at source query operators and propagate throughout query Timestamp punctuations are generated at source query nodes and inferred at every other operator Punctuated heartbeats unblock operators that would otherwise be blocked Focus is on multi-stream operators (joins and merges) Significantly reduces memory load for join and merge operators

Related Work  Heartbeat Widely used for fault-tolerance in distributed systems Remote nodes send periodic heartbeats to let other nodes know that they are alive Heartbeats are also used in distributed DSMS  Stream Punctuations Embed special marks in stream to indicate end of subset of data  Previous work in heartbeat mechanisms with streaming data have focused on enforcing ordering of timestamps of tuples before query processing

Gigascope Architecture  Stream-only database  No continuous queries, thus no windows to unblock queries  Streams are labeled with timestampness, such as monotone increasing. Used by query planner to unblock blocking operators

Gigascope Architecture (Continued)  Aggregation query One of the group by attributes must have timestampness. When this attribute changes, all groups and aggregates are flushed to the operator’s output We can define this as an epoch. Flush occurs at the end of each epoch. Example: here time is labeled as monotone increasing  select tb, srcIP, count(*) from TCP group by time/60 as tb, srcOP In this query, we count the packets from each source IP address during 60 second epochs

Gigascope Architecture (Continued)  Merge Operator Union of two streams: A and B A and B must have same schema Both streams must have a timestamp field, such as t, on which to merge If tuples on A have a larger value of t than those on B, tuples on A are buffered until B catches up

Gigascope Architecture (Continued)  Join Query Must contain a join predicate such as A.tr=B.ts or A.tr/3 = B.ts+2  Relates timestamp field from A with B Input streams are buffered so that streams match up on the timestamp.

Two-Level Query Architecture  Low level used for data reduction  High level performs complex processing  Controls high streaming rates  Data streams from NICs are placed in a ring buffer.  These are called source streams

Two-Level Query Architecture (continued)  Since volumes are too large to provide copy to each query, Gigascope creates a subquery  For example: Query Q to be executed over source stream S Gigascope creates a subquery q which directly accesses S Q is transformed into Q0 which is executed over the stream output of q

Two-Level Query Architecture (continued)  Low-level subqueries are called LFTAs  Fast, lightweight data reduction queries  Objective is to quickly process high volume data stream in order to minimize buffer requirements  Expensive processing is performed on output of low level queries Smaller volume Easily buffered

Two-Level Query Architecture (continued)  Much of subquery processing can be performed on the NIC itself  Low-level aggregation uses a fixed-size hash table for maintaining groups in group by  If a collision occurs in hash table, old group is ejected as a tuple and new group replaces it in its slot  Similar methodology as subaggregates and superaggregates in data cube computations Higher level queries complete aggregation

Two-Level Query Architecture (continued)  Traffic shaping policies are implemented in order to spread out processing load Aggregation operator uses a slow flush to emit tuples when aggregation epoch changes

Heartbeats to Unblock Streaming Operators  Gigascope heartbeats are produced by low-level query operators Propagated throughout query DAG Incur same queuing delays as tuples System performance monitoring Aids in detecting failed nodes  Stream punctuation mechanism is implemented Injects special temporal update tuples into operator output streams Informs receiving operator of end of subset of data Authors first attempted on-demand generation of temporal update tuples Authors settled upon approach of using heartbeats to carry temporal update tuples

Schema of Temporal Update Tuple  Identical to regular tuple  All attributes marked as temporal are initialized with values that will not violate the ordering properties i.e. timebucket is marked as temporal increasing Temporal update tuple with timebucket=t is received by an operator All future tuples will have timebucket >=t Non-temporal attributes are ignored by receiving operators Goal is to generate temporal attribute values aggressively and set them to highest possible value

Heartbeat Generation at LFTA Level Low-level Streaming Operators  Read data directly from source data streams (packets sniffed from NICs)  Use filtering, projection and aggregation to reduce amount of data in a stream before passing it to higher-level nodes  Two types of streaming operators: selection and aggregation  Multi-query optimization through prefilters

Heartbeat Generation at LFTA Level Low-level Streaming Operators  Normal mode of operation is to block waiting for new tuples to be posted to NICs ring buffer.  Once a tuple is in the buffer, it is processed  Wait is periodically interrupted to generate a punctuated heartbeat

Heartbeats in select LFTAs  Selection, projection and transformation is performed by LFTAs on packets arriving from data stream source.  If predicate is satisfied, output tuple is generated according to projection list.  A few additions Modify accept_tuple to save all temporal attribute values referenced in select clause Whenever a request is received to generate a temporal update tuple, use maximum of saved value of temporal attributes and a value saved by prefilter to infer value of temporal update tuple

Heartbeats in Aggregation LFTAs  Group by and aggregation functionality is implemented using direct- mapped hash table  Collision results in ejected tuple being sent to output stream High-level aggregation node completes aggregation of partial results produced by LFTA  Flushing occurs whenever incoming tuple advances the epoch.  Slow-flush is used to avoid overflow of buffers Gradually emits tuples as new tuples arrive  Last seen temporal values in input stream are saved, similar to select.  This value is used to generate temporal update tuples  Also maintains value of last temporal attribute of last tuple flushed

Heartbeats in Aggregation LFTAs  Whenever a request for a heartbeat is received, uses following formula in order to insure that heartbeat does not violate temporal attribute ordering properties: if we have unflushed tuples: use the value of last flushed tuple else use max of saved value of the temporal attributes and the value saved by the prefilter

Using System Time for Temporal Values  When a link has no traffic for a long time, heartbeat is generated using inference from system time.  Skew between system clock and buffering has to be accounted for when setting up Gigascope system

High Level Query Nodes (HFTA)  Second level of query execution (two levels)  LFTA for low-level filtering and sub-queries  HFTA for complex query processing Selection Multiple types of aggregation Stream merge Inner and outer join  Data from multiple streams (HFTAs, LFTAs)

HFTA Execution  Normal mode of execution: Block while waiting for new tuples to arrive Determines which operator to apply to that stream Process the incoming tuple for that operator Route output (if operator required) Also regularly receives temporal update tuples  Operators interpret temporal update tuples  Operators use these tuples to unblock themselves

HFTA: Heartbeats in selection  Mostly identical to selection LFTAs Unpack field values referenced in query predicate Check if predicate is satisfied If satisfied, generate projection list  Difference: can receive temporal update tuples (in addition to regular data tuples) Temporal update tuple is received Operator updates the saved values of all temporal attributes referenced in query Select Generates new temporal update tuple Rest of normal tuple processing is bypassed

HFTA: Heartbeats in aggregation and sampling  Non-blocking operator Aggregates data within a time window (epoch) Epoch defined by values of temporal group-by attributes  Required to keep all groups and corresponding aggregates until end of epoch Then flushes to output stream “Slow flush” to avoid overflow  One output tuple for each input tuple  Change in epoch  immediate flush

HFTA: Heartbeats in stream merge  Union of two streams R and S in a way that preserves the ordering properties of the temporal attributes  R and S must have same schema  Both must have a temporal field to merge on  Stream buffers until the other catches up (if difference in time fields)

HFTA: Heartbeats in join  Relate two data streams by timestamp e.g., R.tr = 2 * S.ts  INNER, LEFT, RIGHT, and FULL OUTER  Minimum timestamps for R and S maintained  Buffers input streams to ensure they match up by timestamp predicate  May include temporal attributes

Other heartbeat applications  Initial goal of heartbeats was to collect statistics about nodes in a distributed setting  Main focus of paper is to use heartbeats to carry stream punctuation  Found that there are other uses for heartbeats

Fault tolerance  Heartbeat mechanism used to detect node failures in distributed systems  Nodes must “ping” to indicate still alive  Gigascope slightly different: Heartbeats periodically generated by low-level queries Propagated upward through query execution DAG  Constant flow of heartbeat tuples indicates if low- level sub-query is responding or not responding No heartbeat over specified amount of time  query has failed and recovery is initiated

System performance analysis  Heartbeat message subject to same routing as other data in network  Heartbeat visit all nodes  Statistics gathered on every node visit  Full trace of all nodes  Can be used to identify poor-performing nodes

Distributed query optimization  Automated tools to utilize collected statistics Re-optimize query execution plans (eliminate identified bottlenecks)  Other statistics: Predicate selectivities Data arrival rates Tuple processing costs

Performance evaluation  Experiments conducted on live network feed  Queries monitor three network interfaces main1 and main2: DAG4.3GE Gigabit Ethernet interfaces (see bulk of traffic) control: 100Mbit interface  Approximately 100,000 packets per second 400 Mbits/sec  Dual-processor 2.8 GHz P4, 4GB RAM, FreeBSD 4.10  Focus is to unblock streaming operators high-rate main links low-rate backup links (control)

Evaluation: Unblocking stream merge using heartbeats  Effect of heartbeats on memory usage of queries that use stream merge operator  Used the following query: SELECT tb, protocol, srcIP, destIP, srcPort, destPort, count(*) FROM DataProtocol GROUP BY time/10 as tb, protocol, srcIP, destIP, srcPort, destPort

Evaluation: Unblocking stream merge using heartbeats  Query computes the number of packets observed in different flows in 10 second time buckets  Query planner merges three streams:

Evaluation: Unblocking stream merge using heartbeats  When control link has no traffic, large amount of data buffered (from other streams)  Varied heartbeat interval From 1 second to 30 seconds 5 second increments

Evaluation: Unblocking stream merge using heartbeats  Heartbeats successfully unblock the stream merge operators Higher intervals  more state maintained Also results in more data being flushed when an epoch advances (could cause query failure when system lacks traffic sharing, such as slow flush)

Evaluation: Unblocking join operators using heartbeats  How effectively do heartbeats… Unblock join queries Reduce overall query memory requirements

Evaluation: Unblocking join operators using heartbeats (continued) Query flow1: SELECT tb,protocol,srcIP,destIP, srcPort,destPort,count(*) as cnt FROM [main0_and_control].DataProtocol GROUP BY time/10 as tb,protocol,srcIP, destIP, srcPort, destPort; Query flow2: SELECT tb,protocol,srcIP,destIP, srcPort,destPort,count(*) as cnt FROM main1.DataProtocol GROUP BY time/10 as tb,protocol,srcIP, destIP, srcPort, destPort;

Evaluation: Unblocking join operators using heartbeats (continued)  Query full_flow: SELECT flow1.tb,flow1.protocol, flow1.srcIP, flow1.destIP, flow1.srcPort,flow1.destPort, flow1.cnt, flow2.cnt OUTER_JOIN FROM flow1, flow2 WHERE flow1.srcIP=flow2.srcIP and flow1.destIP=flow2.destIP and flow1.srcPort=flow2.srcPort and flow1.destPort=flow2.destPort and flow1.protocol=flow2.protocol and flow1.tb = flow2.tb

Evaluation: Unblocking join operators using heartbeats (continued)

 Two sub-queries compute the flows aggregated in 10 second time-buckets main1+control main2  Full query results combined using full outer join for final output

Evaluation: Unblocking join operators using heartbeats (continued)  Varied interval of generated heartbeats from 1 second to 60 seconds 10 second increments

Evaluation: Unblocking join operators using heartbeats (continued)  Less memory usage with greater heartbeat interval (linear growth) Avoid accumulating large state of blocking operators Decrease bursty-ness of output (since less data is dumped at the end of the heartbeat interval when the interval is low)

Evaluation: CPU overhead of heartbeat generation  Measure CPU overhead from heartbeats  Measure average CPU load of a merge query: SELET tb, protocol, srcIP, destIP, srcPort, destPort, count(*) FROM DataProtocol GROUP BY time/10 as tb, protocol, srcIP, destIP, srcPort, destPort  Run on two high-rate interfaces  Compared 1 second heartbeat interval vs. identical system with heartbeats disabled

Evaluation: CPU overhead of heartbeat generation (continued)  Both monitored links have moderately high load, so merge operators are naturally unblocked  Heartbeats disabled: 37.3% CPU load  Heartbeats enabled: 37.5% CPU load  Insignificant difference Explained by variations in traffic load  Conclusion: overhead of heartbeat mechanism is immeasurably small

Conclusion  Heartbeats regularly generated Propagated to all nodes Attach temporal update tuples to unblock operators  Heartbeat mechanism can be used for other applications in distributed setting Detecting node failure Performance monitoring Query optimization  Performance evaluation Capable of working at multiple Gigabit line speeds Significantly decrease query memory utilization