Panel on Stream Query Languages The Aurora View Stan Zdonik Brown University.

Slides:



Advertisements
Similar presentations
Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD.
Advertisements

Quality of Service CS 457 Presentation Xue Gu Nov 15, 2001.
Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD.
Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
TURKISH STATISTICAL INSTITUTE 1 /34 SQL FUNDEMANTALS (Muscat, Oman)
CHAPTER 2 ALGORITHM ANALYSIS 【 Definition 】 An algorithm is a finite set of instructions that, if followed, accomplishes a particular task. In addition,
1 11. Streaming Data Management Chapter 18 Current Issues: Streaming Data and Cloud Computing The 3rd edition of the textbook.
1 Transport Protocols & TCP CSE 3213 Fall April 2015.
Error control Simplest: Cyclic Redundancy Checks - CRC Detects  all single bit errors  almost all 2-bit errors  any odd number of errors  all bursts.
Mining Data Streams.
Composite Subset Measures Lei Chen, Paul Barford, Bee-Chung Chen, Vinod Yegneswaran University of Wisconsin - Madison Raghu Ramakrishnan Yahoo! Research.
Comp Spring 2003 Delay Jitter Ketan Mayer-Patel.
Algebraic and Logical Query Languages Spring 2011 Instructor: Hassan Khosravi.
StreaQuel Overview Mike Franklin UC Berkeley Language Panel 1 st Octennial SWiM Meeting January 9, 2003.
Models and Security Requirements for IDS. Overview The system and attack model Security requirements for IDS –Sensitivity –Detection Analysis methodology.
A Case for Relative Differentiated Services and the Proportional Differentiation Model Constantinos Dovrolis Parameswaran Ramanathan University of Wisconsin-Madison.
Error control An Engineering Approach to Computer Networking.
ONE PASS ALGORITHM PRESENTED BY: PRADHYUMAN RAOL ID : 114 Instructor: Dr T.Y. LIN.
Relational Algebra on Bags A bag is like a set, but an element may appear more than once. –Multiset is another name for “bag.” Example: {1,2,1,3} is a.
Aurora Proponent Team Wei, Mingrui Liu, Mo Rebuttal Team Joshua M Lee Raghavan, Venkatesh.
Quality-Of-Service (QoS) Panel Mitch Cherniack Brandeis David Maier OGI Rajeev Motwani Stanford Johannes GehrkeCornell Hari BalakrishnanMIT SWiM, Stanford.
SWiM Panel on Engine Implementation Jennifer Widom.
AQuery A Database System for Order Dennis Shasha joint work with Alberto Lerner
Unary Query Processing Operators CS 186, Spring 2006 Background for Homework 2.
Unary Query Processing Operators Not in the Textbook!
Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.
Relational Algebra Chapter 4 - part I. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
Garmin GPS III Plus Data Collection. Objectives Collect: - Waypoints -Average Position Waypoints -Reference Waypoints - Multiple Tracks in One Track Log.
Path selection Packet scheduling and multipath Sebastian Siikavirta and Antti aalto.
1.A file is organized logically as a sequence of records. 2. These records are mapped onto disk blocks. 3. Files are provided as a basic construct in operating.
Cloud and Big Data Summer School, Stockholm, Aug Jeffrey D. Ullman.
OPUS Projects (beta) status and plans Improved solutions for simultaneous or repeated observations harvest data from multiple observers share upload &
MONITORING STREAMS: A NEW CLASS OF DATA MANAGEMENT APPLICATIONS DON CARNEY, U Ğ UR ÇETINTEMEL, MITCH CHERNIACK, CHRISTIAN CONVEY, SANGDON LEE, GREG SEIDMAN,
CS640: Introduction to Computer Networks Aditya Akella Lecture 20 - Queuing and Basics of QoS.
Master’s Thesis (30 credits) By: Morten Lindeberg Supervisors: Vera Goebel and Jarle Søberg Design, Implementation, and Evaluation of Network Monitoring.
Ripple Joins for Online Aggregation by Peter J. Haas and Joseph M. Hellerstein published in June 1999 presented by Ronda Hilton.
A new model and architecture for data stream management.
Switching breaks up large collision domains into smaller ones Collision domain is a network segment with two or more devices sharing the same Introduction.
Page 1 Online Aggregation for Large MapReduce Jobs Niketan Pansare, Vinayak Borkar, Chris Jermaine, Tyson Condie VLDB 2011 IDS Fall Seminar
Data Streams: Lecture 101 Window Aggregates in NiagaraST Kristin Tufte, Jin Li Thanks to the NiagaraST PSU.
Competitive Queue Policies for Differentiated Services Seminar in Packet Networks1 Competitive Queue Policies for Differentiated Services William.
CSE S. Tanimoto Paradigms 1 Paradigms Imperative Functional Object-Oriented Rule-Based Logic Visual* Scripting* * whether visual and scripting methodologies.
Joseph M. Hellerstein Peter J. Haas Helen J. Wang Presented by: Calvin R Noronha ( ) Deepak Anand ( ) By:
CS 257 Chapter – 15.9 Summary of Query Execution Database Systems: The Complete Book Krishna Vellanki 124.
Databases : Relational Algebra - Complex Expression 2007, Fall Pusan National University Ki-Joune Li These slides are made from the materials that Prof.
A new model and architecture for data stream management.
1 “One Size Fits All” An Idea Whose Time Has Come and Gone by Michael Stonebraker.
CSE Computer Networks Prof. Aaron Striegel Department of Computer Science & Engineering University of Notre Dame Lecture 19 – March 23, 2010.
Aurora: a new model and architecture for data stream management Daniel J. Abadi 1, Don Carney 2, Ugur Cetintemel 2, Mitch Cherniack 1, Christian Convey.
Random Early Detection (RED) Router notifies source before congestion happens - just drop the packet (TCP will timeout and adjust its window) - could make.
1 Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter Tucker This work.
1 SQL: The Query Language. 2 Example Instances R1 S1 S2 v We will use these instances of the Sailors and Reserves relations in our examples. v If the.
Mining Data Streams (Part 1)
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
Unary Query Processing Operators
Queries.
Data stream as an unbounded table
SWiM Panel on Stream Query Languages
Arrays in C.
COS 518: Advanced Computer Systems Lecture 11 Michael Freedman
Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016
The Relational Model Textbook /7/2018.
Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Zaharia, et al (2012)
CSE S. Tanimoto Paradigms
CS222P: Principles of Data Management Notes #13 Set operations, Aggregation, Query Plans Instructor: Chen Li.
CS240B Midterm: Winter 2017 Your Name: and your ID:
COS 518: Advanced Computer Systems Lecture 12 Michael Freedman
Relational Algebra Chapter 4 - part I.
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #03 Row/Column Stores, Heap Files, Buffer Manager, Catalogs Instructor: Chen Li.
Presentation transcript:

Panel on Stream Query Languages The Aurora View Stan Zdonik Brown University

Aurora Queries We do not have an SQL-like language. We have a GUI for dataflow diagrams. –Boxes = operators –Arrows = streams Rationale: –CSE is tough for thousands of queries. –Workflow is more natural. –Easier for users to extend what’s been done. –Best to understand implementation first.

Aurora Operators Very relational in spirit. –Filter, Map, Union, Join, Aggregate Adds Windows (everyone seems to agree). … with some wrinkles that we will get to. Adds a few operators. –Wsort –Resample

Simple Aggregation Aggregate Agg(init,incr,final) Window(on C, size = 2 offset = 1) GroupBy A,B , 1 1, 1 1, 2 1, 3 1, 2, 1 1, 1 1, 2, 1 1, 2, 2 ABCABC init:called when window opens incr: called for each new value final: called when window closes One or more open window per group. Size and Offset given in: #tuples, attribute interval, or time interval Generalized aggregate ABCABC

Query 1 Generate the stream of packets whose length is greater than twice the average packet length over the last 1 hour. Aggregate agg(init,incr,final) Window(on time, size = 1 hr, offset=1 tuple) Join Match ( length > 2 * avgLen and time=time2) (pID, length, time) Map f(t): (t.ID, t.length, t.time) State = (sum int, num int, endtime int)) init = {sum :=0, num :=0} incr (p) ={sum := sum+p.length; num:=num+1; endtime := p.time} final= emit (time2=endtime, avgLen=sum/num)

Query 2 Create an alert when more than 20 type 'A' squirrels are in Jennifer's backyard. Join Match (sID1=sID2) ST Filter region = JWY and type = “A” Aggregate agg (count) Window(on time, size=p sec, offset=p sec) (sID2, type) (sID1, region, time) Filter count > 20 Assume squirrels report every p sec

Query 3 Stream an event each time 3 different squirrels within a pairwise distance of 5 meters from each other chirp within 10 seconds of each other. Join Match (1.sID not= 2.sID and dist(1.loc, 2.loc) < 5 m) Window (on time, size = 5 sec, offset = 1 tuple) Join Match (dist(1.1.loc, 2.loc) < 5 m and dist(1.2.loc, 2.loc) < 5 m and 1.1.sID not= 2.sID and 1.2.sID not= 2.sID) Window ( on time, size = 5 sec, offset = 1 tuple) (sID, loc, time)

Super-bonus Query Create a log of flow information from a stream of packets. A flow (simple definition) from a source S to a destination D ends when no packet from S to D is seen for at least 2 minutes after the last packet from S to D. The next packet from S to D starts a new flow. The flow log contains the source, destination, count of packets, and total length of packets for each flow. Are you kidding!!!!

Actually, it’s Pretty Easy Aggregate Aggr = (init 1, incr 1, final 1 ) Window (size = 2 tuples, offset = 1) GroupBy (src, dest) SD Aurora State 1 = (flow#: int, first packet, second packet) ) init 1 = {flow# :=0;first:=null;second:=null} Incr 1 (p) ={first:=second, second:=p; if second.time-first.time > 2 then flow# := flow# + 1} final 1 = emit (second.src,second.dest, second.length, second.time, flow#) Aggregate Aggr = (init 2, incr 2, final 2 ) Window (on flow#, size = 1, offset = 1) GroupBy (src, dest) (pID, src, dest, length, time) State 2 = (count int, len int) init 2 = {count :=0; len := 0} incr 2 (p) ={count =: count + 1 len := len + p.length} final 2 = emit (src,dest,len, count) 2 min

… but this is not enough! What if it was really important that I know about the squirrels within 1 minute of the intrusion? => Queries need Quality-of-Service support. In fact, QoS is an integral part of the declarative spec. of the query.

…but it gets worse! Networks (e.g., mobile) can arbitrarily delay or lose tuples. => Operators can’t block arbitrarily waiting. A corollary of latency-based Qos.

…and worse! Tuples may not arrive at an operator in sort order. –The network can reorder them –Operators themselves can shuffle them. –Priority scheduling might force them out of order. This complicates things. –windows –aggregates

Our Solution Problem has to do with when to close windows. Tradeoff: Latency (QoS) vs. Accuracy Define additional parameters on windows that determine termination. –might result in lost data.

Our Solution (cont.) For disorder (early tuples) => Slack time timeout interval (time) time timeout interval (#tuples) slack For blocking (late tuples) => Timeout

Status Now: – users supply values for timeout and slack. –As in examples, not always needed. Goal: –automatically insert / adjust these values based on QoS specs.