Evaluating Window Joins over Punctuated Streams

Slides:



Advertisements
Similar presentations
Evaluating Window Joins over Unbounded Streams Author: Jaewoo Kang, Jeffrey F. Naughton, Stratis D. Viglas University of Wisconsin-Madison CS Dept. Presenter:
Advertisements

Sampling From a Moving Window Over Streaming Data Brian Babcock * Mayur Datar Rajeev Motwani * Speaker Stanford University.
Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD.
Supporting top-k join queries in relational databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by Rebecca M. Atchley Thursday, April.
Di Yang, Elke A. Rundensteiner and Matthew O. Ward Worcester Polytechnic Institute VLDB 2009, Lyon, France 1 A Shared Execution Strategy for Multiple Pattern.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Evaluating Window Joins Over Unbounded Streams By Nishant Mehta and Abhishek Kumar.
Dynamic Plan Migration for Continuous Query over Data Streams Yali Zhu, Elke Rundensteiner and George Heineman Database System Research Group Worcester.
VLDB Revisiting Pipelined Parallelism in Multi-Join Query Processing Bin Liu and Elke A. Rundensteiner Worcester Polytechnic Institute
State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries Song Wang Elke Rundensteiner Database Systems Research Group Worcester.
An Adaptive Multi-Objective Scheduling Selection Framework For Continuous Query Processing Timothy M. Sutherland Bradford Pielech Yali Zhu Luping Ding.
1 DCAPE: Distributed and Self-Tuned Continuous Query Processing Tim Sutherland,Bin Liu,Mariana Jbantova, and Elke A. Rundensteiner Department of Computer.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.
Cmpt-225 Simulation. Application: Simulation Simulation  A technique for modeling the behavior of both natural and human-made systems  Goal Generate.
1 XJoin: Faster Query Results Over Slow And Bursty Networks IEEE Bulletin, 2000 by T. Urhan and M Franklin Based on a talk prepared by Asima Silva & Leena.
Index Tuning for Adaptive Multi-Route Data Stream Systems Karen Works, Elke A. Rundensteiner, and Emmanuel Agu Database Systems Research.
CAPE: Continuous Query Engine with Heterogeneous-Grained Adaptivity Elke A. Rundensteiner, Luping Ding, Timothy Sutherland, Yali Zhu Brad Pielech, Nishant.
Online aggregation Joseph M. Hellerstein University of California, Berkley Peter J. Haas IBM Research Division Helen J. Wang University of California,
Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.
16.7 Completing the Physical- Query-Plan By Aniket Mulye CS257 Prof: Dr. T. Y. Lin.
Di Yang, Zhengyu Guo, Elke A. Rundensteiner and Matthew O. Ward Worcester Polytechnic Institute EDBT 2010, Submitted 1 A Unified Framework Supporting Interactive.
Evaluating Window Joins over Unbounded Streams Jaewoo Kang Jeffrey F. Naughton Stratis D. Viglas {jaewoo, naughton, Univ. of Wisconsin-Madison.
Query Processing CS 405G Introduction to Database Systems.
Adaptive Ordering of Pipelined Stream Filters Babu, Motwani, Munagala, Nishizawa, and Widom SIGMOD 2004 Jun 13-18, 2004 presented by Joshua Lee Mingzhu.
for all Hyperion video tutorial/Training/Certification/Material Essbase Optimization Techniques by Amit.
By: Peter J. Haas and Joseph M. Hellerstein published in June 1999 : Presented By: Sthuti Kripanidhi 9/28/20101 CSE Data Exploration.
Safety Guarantee of Continuous Join Queries over Punctuated Data Streams Hua-Gang Li *, Songting Chen, Junichi Tatemura Divykant Agrawal, K. Selcuk Candan.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
Mining Data Streams (Part 1)
Mining Data Streams with Periodically changing Distributions Yingying Tao, Tamer Ozsu CIKM’09 Supervisor Dr Koh Speaker Nonhlanhla Shongwe April 26,
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
Authors: Jiang Xie, Ian F. Akyildiz
Database System Architecture and Implementation
Storage Access Paging Buffer Replacement Page Replacement
Module 11: File Structure
CS 540 Database Management Systems
CHP - 9 File Structures.
International Conference on Data Engineering (ICDE 2016)
The Stream Model Sliding Windows Counting 1’s
CS 440 Database Management Systems
Physical Database Design
Lecture 16: Data Storage Wednesday, November 6, 2006.
Open Addressing: Quadratic Probing
A paper on Join Synopses for Approximate Query Answering
Database Applications (15-415) DBMS Internals- Part VII Lecture 16, October 25, 2016 Mohammad Hammoud.
Hashing - Hash Maps and Hash Functions
Subject Name: File Structures
Chapter 12: Query Processing
Joining Punctuated Streams
Chapter 15 QUERY EXECUTION.
Evaluating Window Joins over Punctuated Streams
Database Systems Ch Michael Symonds
Predictive Performance
Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016
Indexing and Hashing Basic Concepts Ordered Indices
(A Research Proposal for Optimizing DBMS on CMP)
Chapter 12 Query Processing (1)
Overview of Query Evaluation
ECE 352 Digital System Fundamentals
Evaluation of Relational Operations: Other Techniques
A Framework for Testing Query Transformation Rules
Approximation and Load Shedding Sampling Methods
PSoup: A System for streaming queries over streaming data
Adaptive Query Processing (Background)
An Optimal Lower Bound for Buffer Management in Multi-Queue Switches
Lecture-Hashing.
Presentation transcript:

Evaluating Window Joins over Punctuated Streams Luping Ding and Elke A. Rundensteiner Database Systems Research Group Worcester Polytechnic Institute {lisading, rundenst}@cs.wpi.edu Good afternoon. My name is Luping Ding. I am from Worcester Polytechnic Institute. Today I am presenting our research on “Evaluating Window Joins over Punctuated Streams”. This is a joint work with Prof. Elke Rundensteiner. 2018/11/27 CIKM'04

Stream Data Processing Online Transaction Management Sensor Network Monitoring Network Usage Analysis Online Auction Register Continuous Queries Today online processing and sensor network applications become more and more popular. These applications need to process streaming data instead of the data that are persistently stored. For example, online transaction management system needs to process transaction streams to control real-time inventory and recommend discount policies. Network analysis applications need to process streams of network packets to monitor network usage and to detect intrusions. In these applications, data presents as continuous data streams. Users tend to ask long-standing queries and expect the result to be streamed out in real time. Streaming Data Stream Query Engine Streaming Result 2018/11/27 CIKM'04

New Challenges in Stream Context Potentially infinite data streams vs. stateful operators. e.g., join, distinct, … Problem: potentially unbounded state Reason: no hint on which data is no longer useful Many new challenges arise in such new query context. One important challenge is the evaluation of queries that contain stateful operators. In processing potentially infinite data streams, to guarantee the exact query result, the stateful operators such as the join, may need to maintain potentially unbounded state if there is no hint on which data is no-longer-useful. This potentially need infinite storage. We in particular consider the join operator. 2018/11/27 CIKM'04

Example -Symmetric Hash Join [WA93] Memory overflow resolution – state relocation Example: XJoin [UF00], Hash-Merge Join [MLA04] Problems Join state still grows with no bound Delivery of some join results may be highly deferred Memory Overflow Memory SA SB probe insert To illustrate this problem. Suppose we execute a symmetric hash join over two streams A and B. SHJ mains two states to hold tuples from two streams. As a new tuple arrives from stream A, it is first inserted into state S_A. Then it is used to probe state S_B and produce the result. The same thing happens to tuples from stream B. As tuples continuously stream in, the state will grow unboundedly, thus easily causing memory overflow. To handle memory overflow, several pipelined join solutions employ the state relocation, that is, whenever memory is full, move partial state to disk. The examples include XJoin and Hash-merge join. However, the join state still grows with no bound. In addition, as more data are moved to disk, the delivery of some join results may be highly deferred. A B 2018/11/27 CIKM'04

Avoiding Unbounded State Solution: exploit constraints to detect no-longer-useful data Sliding window [MWA+03] Identify a bounded set of input data based on time K-constraint [BW03] Models clustered or ordered data arrival pattern Punctuation [TMSF03] Dynamically announce termination of certain value Therefore, a better way is to avoid unbounded state in the first place. An effective solution is to exploit appropriate constraints to detect and discard no-longer-useful data from the join state. This is also the focus of our work. Several types of constraints have been proposed in the literature to serve this purpose. For queries in which recent elements of a stream are more important than older ones, users can use sliding window to specify such time-based constraint in query. Sliding window continuously identifies a bounded set of recent data for generating result. K-constraints are data-leval static constraints that models clustered data arrival pattern. Punctuation is also data-level constraint. It is used to dynamically announce that certain attribute value will no longer occur in the stream. We have observed that punctuation model covers k-constraint. So in our work, we only consider sliding window constraint and punctuation. 2018/11/27 CIKM'04

Sliding Window [KNV03] … … Wa Wb Timeline Stream A Stream B Let’s see how the window join works. Suppose the sliding windows W_a and W_b are specified on stream A and B respectively. As a new tuple arrives from stream A. It will only join with tuples from stream B that arrived within the last W_b time unit. The similar thing happens to tuples from stream B. Therefore, the join operator only need to maintain tuples in the current window. As we can see, as window moves, the expired tuples can be removed from the state to release the memory. Timeline Stream A Stream B 2018/11/27 CIKM'04

Punctuation Meta-knowledge embedded inside data streams An ordered set of patterns corresponding to attributes of tuples Wildcard (*), constant (9), list ({1,2,3}), range ([1, 20]), empty () Semantics: tuples after a punctuation p will NOT match p … Bid 180 Marlie 820.00 Nov-13-03 11:02:00 No more tuple will contain Item_id 180. 182 Ultrasale 1000.00 Nov-13-03 11:05:00 Punctuations has the similar effect as the sliding window in bounding join state. Punctuations are meta-knowledge that are embedded inside data stream. A punctuation is specified as an ordered set of patterns, each corresponding to an attribute of the tuple. A pattern could be either a wildcard, a constant, a list, or a range. The punctuation semantics are defined as tuples after a punctuation p will not match p. Punctuations can be provided by the customized stream generator, such as the sensors. It can also be implied from the application semantics or some static constraints, such as clustered data arrival pattern. For example, in an online auction application, the bid stream records the bids placed by users. Whenever an auction, for example, 180, is closed, the auction system can insert a punctuation into the Bid stream to indicate that no future-arriving tuples in this stream will contain this item_id. 180 Jocelyn 850.00 Nov-13-03 11:14:00 180 * * * 181 pcfan 50.00 Nov-13-03 11:36:00 … 2018/11/27 CIKM'04

Punctuation-Aware Join [DMR+04] B A C 1 200.00 Joinitem_id SA 2 63.00 SB … … 175 175 80.00 80.00 175 175 100.00 100.00 … … No more tuple will have A = 175. 175 * Let’s see how can punctuation help shrink the join state. As a punctuation is received from stream B, the join operator can purge the matching tuples currently in its state. These tuples have joined with all tuples that have arrived from stream B. And according to punctuation, they won’t join with any future arriving tuples. So they are no longer needed. In addition, any future tuples from stream A that match this punctuation can be discarded after being processed. So they don’t even need to be inserted into the state. We can see that the join state can be shrunk by punctuations on join attribute. 181 50.00 180 135.00 175 175 20.00 20.00 158 310.00 Stream A Stream B … … … … 2018/11/27 CIKM'04

Window and Punctuation Occur Simultaneously SELECT A.item_id, Count (*) FROM Auction [Range 24 Hours] A, Bid B WHERE A.item_id = B.item_id GROUP BY A.item_id Auction Stream Group-byitem_id (count(*)) Joinitem_id Bid Stream Out1 (item_id) Out2 (item_id, count) So far we know that either window or punctuation by itself can be exploited to reduce the resource usage and hence to improve the result output rate. We have observed that in many cases the two constraint types will occur simultaneously. Then further optimizations can be achieved. Here we show an example query in online auction application that asks for total number of bids from each auction after 24 hours of its opening. So the bid stream will contain punctuations on closed auctions. And according to the query, a 24-hour window is applied on the Auction stream. Therefore the two constraints become available simultaneously to the join operator. Contains punctuations on item_id Applies a 24-hour window on Auction stream 2018/11/27 CIKM'04

Optimization Opportunities Maintain smaller state than either pure window join or pure punctuation-exploiting join Bid tuples that have been joined don’t need to be maintained in state Drop tuples without affecting precision of result Bid tuples out of 24-hour window of corresponding Auction tuple don’t need to be processed Produce some aggregate results earlier Aggregate result for some Auciton tuples can be produced in less than 24 hours By studying this example query, we observe that several optimization opportunities can be achieved by exploiting the combined constraints rather than exploiting only one of them. First, we can achieve the smaller state than both pure window join and pure punctuation-exploiting join because more tuples can be purged by constraint of one-more-dimension. Second, we can drop some tuples that are detected to not contribute to join result. This way the join work load is reduced with no harm on the precision of the result. 2018/11/27 CIKM'04

Our Approach: PWJoin Punctuation-exploiting Window Join Features of PWJoin: Include optimizations enabled by punctuations and by sliding windows individually Accomplish optimizations enabled by interactions of two constraint types Employ a state design that effectively facilitates constraint-exploiting optimizations In view of the great optimization opportunities brought by the combined constraints, we propose the punctuation-exploiting window join solution, which we call PWJoin. The features of PWJoin are as follows: It includes optimizations enabled by punctuations and by sliding windows individually It accomplish optimizations enabled by interactions of two constraint types It employs a state design that effectively facilitates the above optimizations 2018/11/27 CIKM'04

PWJoin Basics and Issue Receive a new tuple ta from stream A Probe B state Invalidate tuples from B state Insert ta into A state Receive a new punct pa from stream A Purge tuples from B state Insert pa into A state Issue: how to design PWJoin state to facilitate all search-based operations? Invalidate conducts time-based search Probe and Purge needs value-based search Our PWJoin algorithm incorporates the exploitation of both window constraints and punctuations. The basic execution logic distinguishes the processing of tuples and the processing of punctuations. Here we can see that the basic operations include three search-based operations: probe, purge and invalidate. Among these operations, invalidate conducts time-based search, which probe and purge needs value-based search. An issue hence arises regarding how to design the storage structure of the PWJoin state in order to facilitate both time-based search and value-based search. 2018/11/27 CIKM'04

PWJoin State with Two-dimensional Index Time List I-Node Index (Hash Table) Punctuation Time List Punctuation Timestamp p1 T1 p2 T2 … Window Begin 8 8 none 10 10 punctuated 8 8 10 tuple T-Node NextValueListTNode 4 NextTimeListTNode To tackle this issue, we design the PWJoin state structure with two-dimensional index. Also we have a punctuation time list 8 Window End Key Head Tail PunctFlag I-Node 2018/11/27 CIKM'04

Facilitating Search-based Operations Invalidate: probe time list and stop when encountering a time-valid tuple Probe: probe I-Node index and join with tuples in value list of matching I-Node Purge: probe I-Node index and delete tuples in value list of matching I-Node Avoid access to irrelevant tuples Time list probe only access expired tuples while value list probe only access matching tuples. 2018/11/27 CIKM'04

Punctuation Propagation An operator may propagate punctuations to benefit downstream operators Auction Stream Group-byitem_id (count(*)) Joinitem_id Bid Stream In some cases, an operator may propagate punctuations that it received to benefit downstream operators. Again, the query we have talked about. The group-by operator can be blocked by punctuations propagated by join operator and then produce partial results. Item_id Bidder_id Bid_price propagate punctuations on item_id be unblocked by punctuations propagated by join operator 180 * * 2018/11/27 CIKM'04

Optimizations Enabled by Combined Constraints Early Punctuation Propagation Tuple Dropping a1 a1 a6 a6 a1 a1 a2 a3 a2 a3 a3 a3 a3 a3 a7 a7 a4 a4 a3 a3 a2 a2 a1 a1 a8 a8 a3 propagation point 2 a3 a2 a2 a6 a6 we observe that the interaction between punctuation and window constraint enables further optimization. The first optimization is called early punctuation propagation. In a regular join without window, in order to propagate a punctuation on the join attribute, we need to receive this punctuation from both input streams in order to guarantee that no join results that match this punctuation will be generated in the future. In this example, we cannot simply propagate punctuations on join value a_3 when we receive this punctuation from stream S_2 because tuples containing this join value may still arrive from stream S_1 such that the future join results may still contain this join value. We need to wait until we receive this punctuation from stream S_1, which we mark as the propagation point 1. However, if we have window constraints as well. Whenever the punctuation moves out of the window, we know that no tuple containing this join value will appear in state from stream S_2 any more. Although such tuple may still arrive from S_1, no corresponding join result will be produced. Hence we can propagate at propagation point 2, which could be much earlier than propagation point 2. In addition, when the early propagation occurs, any future arriving tuple from stream 1 that match this punctuation will not render any join results. So they can be directly dropping without even being processed. This reduces the join workload. we need to wait until we get the punctuation on a_3 from stream S_1. The two-dimensional index design also facilitate these optimizations. a3 a3 a10 a10 a3 propagation point 1 a3 Stream S1 Stream S2 Stream S1 Stream S2 2018/11/27 CIKM'04

Achieving Optimizations by Combined Constraints Early propagation Invalidate punctuations in punctuation time list as invalidating tuples Expired punctuations can be propagated Tuple dropping When early propagation happens, set PunctFlag of matching I-Node as “propagated” Drop new tuples that matches an I-Node whose PunctFlag is “propagated” 2018/11/27 CIKM'04

Memory Cost Analysis |Sb|T = |Sb|Tinsert - |Sb|Tpurge = |Sb|Tarrive - |Sb|Tpurge = bTb -  bTb( paT/NKb,T) b – tuple input rate of stream B pa – punctuation input rate of stream A NKb,T - # of distinct join values occurred in stream B up to T’th time unit Tb – time window on stream B Saving by Punctuation Window Join One significant achievement of PWJoin is the reduction in memory overhead. In system with limited memory or running memory-consuming applications, the reduction in memory should be the first optimization goal. We now show the estimation of the PWJoin state size measured in number of tuples. We apply the unit-time-basis cost model proposed in the literature and we assume that in any time unit, the number of arrived tuples equals the number of tuples that are inserted into the state. Then we get this equation for estimating number of tuples in state s_b in the T’th time unit. The formula for state S_a is similar due to the symmetric execution logic. Important factors: punctuation arrival rate pa and NKb,T 2018/11/27 CIKM'04

Experimental Setup Experimental System Experiments CAPE [RDS+04]: Continuous Query Processing System Stream benchmark: generate synthetic data streams 733MHz Intel(R) Celeron CPU, 512MB RAM, Windows 2000 Experiments Compare memory overhead and tuple output rate of PWJoin with a pure window join Compare punctuation output rate of PWJoin with PJoin To explore the effectiveness of PWJoin, we have conducted an experiment study by evaluating PWJoin in a real continuous query system named CAPE that are developed at WPI. We also employ a stream benchmark to generate synthetic data streams with controls on the arrival characteristics of data and punctuations. The configuration of our test machine is listed here. In this following we will show our experiment results on comparing the memory overhead and tuple output rate of PWJoin with a purge window join, and comparing the punctuation output rate of PWJoin with PJoin, a pure punctuation-exploiting join. 2018/11/27 CIKM'04

PWJoin vs. WJoin – Memory and Tuple Output Rate The first result we want to show is the performance comparison of PWJoin and a pure window join regarding memory overhead and tuple output rate. Here we denote the pure window join as WJoin. In this experiment we vary the size of the window and plot the number of tuples in join state and the number of result tuples output so far at each sampling step. The tuple arrival rate is 100 tuples/second. In the figure, PWJoin-1 denotes PWJoin with a 1 second sliding window. From these two figures, we can see that as window becomes larger, the memory saving and tuple output rate improvement by PWJoin become more and more significant. One interesting phenomenon here is that when window size is 5 seconds, the tuple output rate of PWJoin is slightly lower than WJoin. This is because the number of tuples purged by punctuations is small so that the purge cost exceeds the saving in probing. So in terms of very small window, we may wisely choose to not to exploit punctuations. Inter-arrival time: 10 millsec Cluster-order-clustersize Punct-order-segmentsize-matchpercentage Stream A, B: punct-asc-100-40 2018/11/27 CIKM'04

PWJoin vs. PJoin – Punctuation Output Rate Another important result we want to show is the comparison of PWJoin with PJoin regarding punctuation output rate. We can see that by employing early propagation strategy enabled by combined constraints, PWJoin can achieve a higher punctuation output rate than PJoin. This is very useful for the downstream stateful or blocking operators because in this case they are able to purge useless tuples or to generate partial result earlier. Stream A: punct-asc-100-40, Stream B: punct-random-30-40 Window: 1 second 2018/11/27 CIKM'04

Related Work Pipelined join solutions Symmetric Hash Join [WA93], XJoin [UF00], Hash-Merge Join[MLA04], Ripple Joins[HH99] Constraint-exploiting stream query optimization Window joins [KNV03, GO03, GGO04, HFA+03, ZRH04] Punctuation[TMS+03], PJoin [DMR+04] k-Constraint-exploiting algorithm [BW04] There are some existing research that relates to our PWJoin work. 2018/11/27 CIKM'04

Conclusion Proposed PWJoin algorithm Designed storage structure for PWJoin state Derived cost model for PWJoin Conducted experimental study to explore effectiveness of PWJoin To summarize, in this research, we validate performance gains, synergy and potential overhead in exploiting windows and punctuations 2018/11/27 CIKM'04

CAPE Project: http://davis.wpi.edu/~dsrg/CAPE/ Thanks Nishant Mehta (developing stream generator) Prof. Leonidas Fegaras (feedback on paper) CAPE Group Members WPI Database Research Group CAPE Project: http://davis.wpi.edu/~dsrg/CAPE/ Finally, I would like to thank everybody that has contributed to this work. In particular, Nishant Mehta for developing stream generator. Prof. Leonidas Fegaras for useful feedback on paper. CAPE group members and WPI database research group for valuable comments. If you are interested in this PWJoin work or our CAPE continuous query processing project, please visit this link. And thank you! 2018/11/27 CIKM'04

References [KNV03] J. Kang, J. F. Naughton and S. D. Viglas. Evaluating Window Joins over Unbounded Streams. ICDE’03. [UF00] T. Urhan and M. Franklin, XJoin: A Reactively Scheduled Pipelined Join Operator. IEEE Data Engineering Bulletin, 23(2), 2000. [HH99] P. Haas and J. Hellerstein, Ripple Joins for Online Aggregation. SIGMOD’99. [GO03] L. Golab and M. T. Ozsu, Processing Sliding Window Multi-Joins in Continuous Queries over Data Streams. VLDB’03. [GGO04] L. Golab, S. Garg and M. T. Ozsu, On Indexing Sliding Windows over On-line Data Streams, EDBT’04. [RDS+04] E. A. Rundensteiner, L. Ding, T. Sutherland, Y. Zhu, B. Pielech and N. Mehta, CAPE: Continuous Query Engine with Heterogeneous-Grained Adaptivity. VLDB Demo, 2004. [BW04] S. Babu and J. Widom. Exploiting k-Constraints to Reduce Memory Overhead in Continuous Queries over Data Streams [TMS+03] P. A. Tucker, D. Maier, T. Sheard and L. Fegaras. Exploiting Punctuation Semantics in Continuous Data Streams. TKDE, 15(3), 2003. [DMR+04] L. Ding, N. Mehta, E. A. Rundensteiner and G. T. Heineman, Joining Punctuated Streams. EDBT’04. [MWA+03] R. Motwani, J. Widom, A. Arasu et al. Query Processing, Resource Management, and Approximation in a Data Stream Management System. CIDR’03. 2018/11/27 CIKM'04

PWJoin vs. WJoin – Irrelevant Punctuations Stream A: punct-asc-100-40, Stream B: punct-random-30-40 Window: 2 seconds 2018/11/27 CIKM'04