3/8/2012Data Streams: Lecture 151 CS 410/510 Data Streams Lecture 15: How Soccer Players Would do Stream Joins & Query-Aware Partitioning for Monitoring.

Slides:



Advertisements
Similar presentations
IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
Advertisements

Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD.
Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
3/13/2012Data Streams: Lecture 161 CS 410/510 Data Streams Lecture 16: Data-Stream Sampling: Basic Techniques and Results Kristin Tufte, David Maier.
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
Parallel Databases By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
Parallel Database Systems
Distributed Process Scheduling Summery Distributed Process Scheduling Summery BY:-Yonatan Negash.
VLDB Revisiting Pipelined Parallelism in Multi-Join Query Processing Bin Liu and Elke A. Rundensteiner Worcester Polytechnic Institute
Chapter 4 Parallel Sort and GroupBy 4.1Sorting, Duplicate Removal and Aggregate 4.2Serial External Sorting Method 4.3Algorithms for Parallel External Sort.
Strategies for Implementing Dynamic Load Sharing.
A Heartbeat Mechanism and its Application in Gigascope Johnson, Muthukrishnan, Shkapenyuk, Spatscheck Presented by: Joseph Frate and John Russo.
Web Server Load Balancing/Scheduling Asima Silva Tim Sutherland.
Database Management 9. course. Execution of queries.
Heartbeat Mechanism and its Applications in Gigascope Vladislav Shkapenyuk (speaker), Muthu S. Muthukrishnan Rutgers University Theodore Johnson Oliver.
Multiple Aggregations Over Data Streams Rui ZhangNational Univ. of Singapore Nick KoudasUniv. of Toronto Beng Chin OoiNational Univ. of Singapore Divesh.
Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters Hung-chih Yang(Yahoo!), Ali Dasdan(Yahoo!), Ruey-Lung Hsiao(UCLA), D. Stott Parker(UCLA)
Shared Memory Parallelization of Decision Tree Construction Using a General Middleware Ruoming Jin Gagan Agrawal Department of Computer and Information.
Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
Data Streams: Lecture 101 Window Aggregates in NiagaraST Kristin Tufte, Jin Li Thanks to the NiagaraST PSU.
Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
MapReduce and Data Management Based on slides from Jimmy Lin’s lecture slides ( (licensed.
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
Eddies: Continuously Adaptive Query Processing Ross Rosemark.
CS4432: Database Systems II Query Processing- Part 2.
Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.
Data Structures and Algorithms in Parallel Computing
Evaluating Window Joins over Unbounded Streams Jaewoo Kang Jeffrey F. Naughton Stratis D. Viglas {jaewoo, naughton, Univ. of Wisconsin-Madison.
Query Processing CS 405G Introduction to Database Systems.
1 Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter Tucker This work.
Computer Organization CS224 Fall 2012 Lesson 52. Introduction  Goal: connecting multiple computers to get higher performance l Multiprocessors l Scalability,
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database.
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dr. Xiao Qin Auburn University
1 Parallel Datacube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation Ruoming Jin Ge Yang Gagan Agrawal The Ohio State University.
Building OC-768 Monitor using GS Tool Vladislav Shkapenyuk Theodore Johnson Oliver Spatscheck June 2009.
Scheduling.
1 Out of Order Processing for Stream Query Evaluation Jin Li (Portland State Universtiy) Joint work with Theodore Johnson, Vladislav Shkapenyuk, David.
Auburn University
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
Web Server Load Balancing/Scheduling
Jennifer Rexford Princeton University
Copyright ©: Nahrstedt, Angrave, Abdelzaher
Web Server Load Balancing/Scheduling
Advanced Computer Networks
Applying Control Theory to Stream Processing Systems
Copyright ©: Nahrstedt, Angrave, Abdelzaher
Parallel Programming By J. H. Wang May 2, 2017.
Interquery Parallelism
Parallel Algorithm Design
Evaluation of Relational Operations
Parallel Programming in C with MPI and OpenMP
Lecture 17: Distributed Transactions
Superscalar Processors & VLIW Processors
Akshay Tomar Prateek Singh Lohchubh
Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy
Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani
CPU SCHEDULING.
Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Zaharia, et al (2012)
COMP60621 Fundamentals of Parallel and Distributed Systems
Uniprocessor scheduling
Parallel Programming in C with MPI and OpenMP
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
COMP60611 Fundamentals of Parallel and Distributed Systems
Author: Xianghui Hu, Xinan Tang, Bei Hua Lecturer: Bo Xu
Presentation transcript:

3/8/2012Data Streams: Lecture 151 CS 410/510 Data Streams Lecture 15: How Soccer Players Would do Stream Joins & Query-Aware Partitioning for Monitoring Massive Network Data Streams Kristin Tufte, David Maier

How Soccer Players Would do Stream Joins Handshake Join  Evaluate window-based stream joins  Highly parallelizable  Implementation on multi-core machine and FPGA Previous stream join execution strategies  Sequential execution based on operational semantics 3/8/2012 Data Streams: Lecture 15 2

Let’s talk about stream joins Join window of R with window of S  Focus on sliding windows here Scan, Insert, Invalidate How might I parallelize?  Partition and replicate  Time-based windows vs. tuple-based windows 3/8/2012 Data Streams: Lecture 15 3 Figure Credit: How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011

So, Handshake Join… 3/8/2012 Data Streams: Lecture 15 4 Stream Join Input A Input B Handshake Join Traditional Stream Join Entering tuple pushes oldest tuple out No central coordination Same semantics May introduce disorder Parallelization needs partitioning; possibly replication Needs central coordination Figure Credit : How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011

Parallelization Each core gets a segment of each window Data flow: act locally on new data arrival and passing on data Good for shared-nothing setups Simple communication – interact with neighbors; avoid bottlenecks 3/8/2012 Data Streams: Lecture 15 5 Figure Credit: How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011

Parallelization - Observations Parallelizes tuple-based windows and non equi-join predicates As written, compares all tuples – could hash at each node to optimize Note data transfer costs between cores and each tuple is processed at each core Soccer players have short arms, hardware is NUMA 3/8/2012 Data Streams: Lecture 15 6 Figure Credit: How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011

Scalability Data flow + point-to-point communication Add’l cores: larger window sizes or reduce workload per core “directly turn any degree of parallelism into higher throughput or larger supported window sizes” “can trivially be scaled up to handle larger join windows, higher throughput rates, or more compute-intensive join predicates” 3/8/2012 Data Streams: Lecture 15 7 Figure Credit: How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011

Encountering Tuples Item in either window, encounters all current times in the other window Immediate scan strategy Flexible segment boundaries (cores) Other local implementations 3/8/2012 Data Streams: Lecture 15 8 Figure : How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011 Figure Credit: How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011

Handshake Join with Message Passing Lock-step processing (tuple-based windows) FIFO queues with message passing Missed join-pair 3/8/2012 Data Streams: Lecture 15 9

Two-phase forwarding Asymmetric synchronization (replication on one core only) Keep copies of forwarded tuples until ack received Ack for s 4 must be processed between r 5 and r 6 3/8/2012 Data Streams: Lecture 15 10

Load Balancing & Synchronization 3/8/2012 Data Streams: Lecture Even distribution not needed for correctness Maintain mostly even-sized local S windows Synch at pipeline ends to manage windows

FPGA Implementation Tuple-based windows that fit into memory Common clock signal; lock-step processing Nested-loops join processing 3/8/2012 Data Streams: Lecture 15 12

Performance 3/8/2012 Data Streams: Lecture Scalability on Multi-Core CPU Scalability on FPGAs; 8 tuples/window

Before we move on… Soccer joins focuses on sliding windows How would their algorithm and implementation work for tumbling windows? What if we did tumbling windows only? 3/8/2012 Data Streams: Lecture 15 14

Query-Aware Partitioning for Monitoring Massive Network Data Streams OC-786 Networks  100 million packets/sec  2x40 Gbit/sec Query plan partitioning  Issues: “heavy” operators, non-uniform resource consumption Data stream partitioning 3/8/2012 Data Streams: Lecture 15 15

Let’s partition the data… Computes packet summaries between src and dest for network monitoring Round robin partitioning -> worst case a single flow results in n partial flows 3/8/2012 Data Streams: Lecture SELECT time, srcIP, destIP, srcPrt, destPort, COUNT(*), SUM(len), MIN(timestamp), MAX(timestamp)... FROM TCP GROUP BY time, srcIP, destIP, srcPort, destPort

And, we might want a HAVING… Round robin partitioning -> no node can apply HAVING CPU and network load on final aggregator is high 3/8/2012 Data Streams: Lecture SELECT time, srcIP, destIP, srcPrt, destPort, COUNT(*), SUM(len), MIN(timestamp), MAX(timestamp)... FROM TCP GROUP BY time, srcIP, destIP, srcPort, destPort HAVING OR_AGGR(flags) = ATTACK_PATTERN

So, let’s partition better… What about partitioning on : srcIP, destIP, srcPort, destPort (partition flows)?  Yeah! Nodes can compute and apply HAVING locally … But, what if I have more than one query? 3/8/2012 Data Streams: Lecture SELECT time, srcIP, destIP, srcPrt, destPort, COUNT(*), SUM(len), MIN(timestamp), MAX(timestamp)... FROM TCP GROUP BY time, srcIP, destIP, srcPort, destPort HAVING OR_AGGR(flags) = ATTACK_PATTERN

But I need to run lots of queries… Large number of simultaneous queries are common (i.e. 50) Subqueries place different requirements on partitioning Dynamic repartitioning for each query?  That’s what the parallel DBs do…  Splitting 80 Gbit/sec -> specialized network hardware  Partition stream once and only once… 3/8/2012 Data Streams: Lecture 15 19

Partitioning Limitations Program partitioning in FPGAs  TCP fields (src, dest IP) - ok  Fields from HTTP – not ok Can’t re-partition every time the workload changes 3/8/2012 Data Streams: Lecture 15 20

Query-Aware Partitioning Analysis framework  Determine optimal partitioning Partition-aware distributed query optimizer  Takes advantage of existing partitions 3/8/2012 Data Streams: Lecture 15 21

Query-Aware Partitioning Analysis framework  Determine optimal partitioning Partition-aware distributed query optimizer  Takes advantage of existing partitions Compatible partitioning  Maximizes amount of data reduction done locally  Formal definition of compatible partitioning  Compatible partitioning – aggregations & joins 3/8/2012 Data Streams: Lecture 15 22

GS Uses Tumbling Windows (only) 3/8/2012 Data Streams: Lecture SELECT tb, srcIP, destIP, sum(len) FROM PKT GROUP BY time/60 as tb, srcIP, destIP SELECT time, PKT1.srcIp, PKT1.destIP, PKT1.len + PKT2.len FROM PKT1 JOIN PKT2 WHERE PKT1.time = PKT2.time and PKT1.srcIP = PKT2.srcIP and PKT1.destIP = PKT2.destIP Time attribute is ordered (increasing)

Query Example 3/8/2012 Data Streams: Lecture flows: SELECT tb, srcIP, destIP, COUNT(*) as cnt FROM TCP GROUP BY time/60 as tb, srcIP, destIP heavy_flows: SELECT tb, srcIP, max(cnt) as max_cnt FROM flows GROUP BY tb, srcIP flow_pairs: SELECT S1.tb, S1.srcIP, S1.max_cnt, S2.max_cnt FROM heavy_flows S1, heavy_flows S2 WHERE S1.srcIP = S2.srcIP and S1.tb = S2.tb+1 Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008

Query Example 3/8/2012 Data Streams: Lecture flows: SELECT tb, srcIP, destIP, COUNT(*) as cnt FROM TCP GROUP BY time/60 as tb, srcIP, destIP heavy_flows: SELECT tb, srcIP, max(cnt) as max_cnt FROM flows GROUP BY tb, srcIP flow_pairs: SELECT S1.tb, S1.srcIP, S1.max_cnt, S2.max_cnt FROM heavy_flows S1, heavy_flows S2 WHERE S1.srcIP = S2.srcIP and S1.tb = S2.tb+1 Which partitioning scheme is optimal for each of the queries? Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008

Query Example 3/8/2012 Data Streams: Lecture flows: SELECT tb, srcIP, destIP, COUNT(*) as cnt FROM TCP GROUP BY time/60 as tb, srcIP, destIP heavy_flows: SELECT tb, srcIP, max(cnt) as max_cnt FROM flows GROUP BY tb, srcIP flow_pairs: SELECT S1.tb, S1.srcIP, S1.max_cnt, S2.max_cnt FROM heavy_flows S1, heavy_flows S2 WHERE S1.srcIP = S2.srcIP and S1.tb = S2.tb+1 How to reconcile potentially conflicting partitioning requirements? Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008

Query Example 3/8/2012 Data Streams: Lecture flows: SELECT tb, srcIP, destIP, COUNT(*) as cnt FROM TCP GROUP BY time/60 as tb, srcIP, destIP heavy_flows: SELECT tb, srcIP, max(cnt) as max_cnt FROM flows GROUP BY tb, srcIP flow_pairs: SELECT S1.tb, S1.srcIP, S1.max_cnt, S2.max_cnt FROM heavy_flows S1, heavy_flows S2 WHERE S1.srcIP = S2.srcIP and S1.tb = S2.tb+1 How can we use information about existing partitioning in a distributed query optimizer? Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008

What if we could only partition on destIP? 3/8/2012 Data Streams: Lecture Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008

Partition compatibility Partitioning on (time/60, srcIP, destIP) -> execute aggregation locally then union (srcIP, destIP, srcPort, destPort) can’t aggregate locally 3/8/2012 Data Streams: Lecture SELECT tb, srcIP, destIP, sum(len) FROM PKT GROUP BY time/60 as tb, srcIP, destIP

Partition compatibility Partitioning on (time/60, srcIP, destIP) -> execute aggregation locally then union (srcIP, destIP, srcPort, destPort) can’t aggregate locally P is Compatible with Q if for every time window, the output of Q is equal to a stream union of the output of Q running on partitions produced by P 3/8/2012 Data Streams: Lecture SELECT tb, srcIP, destIP, sum(len) FROM PKT GROUP BY time/60 as tb, srcIP, destIP

Should we partition on temporal attributes? If we partition on temporal atts:  Processor allocation changes with time epochs  May help avoid bad hash fcns  Might lead to incorrect results if using panes  Tuples correlated in time tend to be correlated on temporal attribute – bad for load balancing Exclude temporal attr from partitioning 3/8/2012 Data Streams: Lecture 15 31

What partitionings work for aggregation queries? Group-bys on scalar expressions of source input attr  Ignore grouping on aggregations in lower-level queries  Any subset of a compatible partitioning is also compatible 3/8/2012 Data Streams: Lecture SELECT expr 1, expr 2,.., expr n FROM STREAM_NAME WHERE tup_predicate GROUP BY temp_var, gb_var 1,..., gb_var m HAVING group_predicate

What partitionings work for join queries? 3/8/2012 Data Streams: Lecture Equality predicates on scalar expressions of source stream attrs Any non-empty subset of a compatible partitioning is also compatible Need to reconcile partitioning of S and R SELECT expr 1, expr 2,.., expr n FROM STREAM1 AS S{LEFT|RIGHT|FULL}[OUTER] JOIN STREAM2 as R WHERE STREAM1.ts = STREAM2.ts and STREAM1.var 11 = STREAM2.var 21 and STREAM1.var 1k = STEAM2.var 2k and other_predicates

Now, multiple queries… 3/8/2012 Data Streams: Lecture tcp_flows: SELECt tb, srcIP, destIP, srcPort, destPort, COUNT(*), sum(len) FROM TCP GROUP BY time/60 as tb, srcIP, destIP, srcPort, destPort flow_cnt: SELECt tb, srcIP, destIP, count(*) FROM tcp_flows GROUP BY tb, srcIP, destIP {sc_exp(srcIP), sc_exp(destIP), sc_exp(srcPort), sc_exp(destPort)} {sc_exp(srcIP), sc_exp(destIP)} {sc_exp(srcIP), sc_exp(destIP)} Result:

Now, multiple queries… 3/8/2012 Data Streams: Lecture tcp_flows: SELECt tb, srcIP, destIP, srcPort, destPort, COUNT(*), sum(len) FROM TCP GROUP BY time/60 as tb, srcIP, destIP, srcPort, destPort flow_cnt: SELECt tb, srcIP, destIP, count(*) FROM tcp_flows GROUP BY tb, srcIP, destIP {sc_exp(srcIP), sc_exp(destIP), sc_exp(srcPort), sc_exp(destPort)} {sc_exp(srcIP), sc_exp(destIP)} Fully compatible partitioning set likely to be empty Partition to minimize cost of execution

Query Plan Transformation 3/8/2012 Data Streams: Lecture Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008 Main idea: push aggregation operator below merge to allow aggregations to execute independently on partitions Main idea: partial aggregates (think panes)

Performance 3/8/2012 Data Streams: Lecture Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008