UCLA, Fall CS240B Midterm Your Name: and your ID:

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD.
CS240B Midterm Spring 2013 Your Name: and your ID: Problem Max scoreScore Problem 140% Problem 232% Problem 228% Total 100%
Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
1 Efficient Temporal Coalescing Query Support in Relational Database Systems Xin Zhou 1, Carlo Zaniolo 1, Fusheng Wang 2 1 UCLA, 2 Simens Corporate Research.
CS4432: Database Systems II Query Operator & Algebraic Expressions 1.
What is Flow Control ? Flow Control determines how a network resources, such as channel bandwidth, buffer capacity and control state are allocated to packet.
The Design of the Borealis Stream Processing Engine Brandeis University, Brown University, MIT Magdalena BalazinskaNesime Tatbul MIT Brown.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
COMP 451/651 Optimizing Performance
ONE PASS ALGORITHM PRESENTED BY: PRADHYUMAN RAOL ID : 114 Instructor: Dr T.Y. LIN.
Unary Query Processing Operators Not in the Textbook!
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
ATLaS: A Complete Database Language for Streams Carlo Zaniolo, Haixun Wang Richard Luo,Jan-Nei Law et al. Documentation and software downloads:
Winter 2002Arthur Keller – CS 1807–1 Schedule Today: Jan. 24 (TH) u Subqueries, Grouping and Aggregation. u Read Sections Project Part 2 due.
1 Data Stream Management Systems Checkpoint CS240B Notes by Carlo Zaniolo UCLA CSD.
EECS 20 Chapter 3 Sections State Machines Continued Last time we Introduced the deterministic finite state machine Discussed the concept of state.
Avoiding Idle Waiting in the execution of Continuous Queries Carlo Zaniolo CSD CS240B Notes April 2008.
Finite State Machines – Page 1CSCI 1900 – Discrete Structures CSCI 1900 Discrete Structures Graphs and Finite State Machines Reading: Kolman, Sections.
STREAM The Stanford Data Stream Management System.
Lecture 11 Main Memory Databases Midterm Review. Time breakdown for Shore DBMS Source: “OLTP Under the Looking Glass”, SIGMOD 2008 Systematically removed.
7.5 Inverse Function 3/13/2013. x2x+ 3 x What do you notice about the 2 tables (The original function and it’s inverse)? The.
Practical Database Design and Tuning. Outline  Practical Database Design and Tuning Physical Database Design in Relational Databases An Overview of Database.
Chapter 16 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 13 – Query Evaluation.
CS4432: Database Systems II Query Processing- Part 2.
SCUHolliday - coen 1787–1 Schedule Today: u Subqueries, Grouping and Aggregation. u Read Sections Next u Modifications, Schemas, Views. u Read.
1 Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter Tucker This work.
SQL and Query Execution for Aggregation. Example Instances Reserves Sailors Boats.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
CS4432: Database Systems II Query Processing- Part 1 1.
SQL and Relational Algebra Edel Sherratt Nigel Hardy Horst Holstein.
1 Out of Order Processing for Stream Query Evaluation Jin Li (Portland State Universtiy) Joint work with Theodore Johnson, Vladislav Shkapenyuk, David.
Memory Buffering Techniques
Mining Data Streams (Part 1)
Practical Database Design and Tuning
Introduction to Analysis of Algorithms
Unary Query Processing Operators
Module 11: File Structure
Module 2: Intro to Relational Model
UCLA, Winter Sample from CS240B Past Midterms
CS257 Query Optimization.
Chapter 3 Introduction to SQL(3)
Databases : More about SQL
CPSC-310 Database Systems
Data stream as an unbounded table
Concept of Aggregation in SQL
Chapter 12: Query Processing
Chapter 2: Intro to Relational Model
CPSC-608 Database Systems
Lecture#7: Fun with SQL (Part 2)
Practical Database Design and Tuning
Approximate Frequency Counts over Data Streams
Chapter 2: Intro to Relational Model
Overview of Query Evaluation
Chapter 2: Intro to Relational Model
CS240B: Assignment1 Winter 2016.
Example of a Relation attributes (or columns) tuples (or rows)
Query Optimization Minimizing Memory and Latency in DSMS
Chapter 2: Intro to Relational Model
CS240B, Spring 2014 Task 2.2:  Using a syntax based on that of notes and reference 3 above, express a user-defined aggregate d_count to perform the exact.
Idle Waiting for slides
Continuous Query Languages for DSMS
CS240B Midterm: Winter 2017 Your Name: and your ID:
CPSC-608 Database Systems
External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot.
Adaptive Query Processing (Background)
Performance.
Instructor: Zhe He Department of Computer Science
Chapter 8 Views and Indexes
Presentation transcript:

UCLA, Fall 2018. CS240B Midterm Your Name: and your ID: Problem Max score Score Problem A 40% Problem B 20% Problem C Problem D Total 100% Extra Credit: 5% Final score:

Problem A [20%+20%=40%]. Luggage items at airports are often delivered by conveyor belts, where sensors are used to monitor the presence/passage of each item through a checkpoint. The resulting data stream has the following form: sensed(Time, CPS, LID). This denotes that at a certain Time (in seconds) the sensor at checkpoint CPS had detected a luggage item with id LID (assume that we have a reading every second). The airport is interested in detecting the occurrences of the following two problems, along with the CPS of the checkpoint where they were detected: A1. A luggage item that seems to be stuck: i.e. has spent more than 5 minutes spent at the same CPS. Return the LID, the CPS and the current time A2. A luggage item has been on the conveyor belt for at least 6 minutes, during which the item has gone through the same CPS at least 2 times (an indication that the item is wondering in aimless cycles—but the cycles do not need to be consecutive or identical). Write an SQL-TS query to detect situation A1 and an SQL-TS query for situation A2 above. In both cases return the luggage LID, the CPS for the problem checkpoint, and the current time at the expiration of the 6 minutes. (PS: you can write SQL-MR if you prefer) Select D.Time, D.CPS, D.LID From sensed Partitioned by LID ordered by Time AS (D+) Where D.CPS=previous(D.CPS) AND D.Time> first(D.Time)+300 Select D.Time, D.CPS, D.LID From sensed Partitioned by LID ordered by Time AS (S1 O1+ E1 X* S2 O2+ E2 W* Z) Where O1.CPS <>S1.CPS AND E1.CPS=S1.CPS and O2.CPS <>S2.CPS AND E2.CPS=S2.CPS and Z.Time > S.Time+360

Problem B [20 %]: Define a UDA that detects the occurrence of condition 1 from Problem A. Show the definition of the UDA using the notation used in the homeworks. The UDA will be called with a “partition by LID, unlimited preceding” clause. Window stuck(TimeIn timestamp, CPSin char) : (Stime timestamp, Sensor char) { Table previous(STime timestamp, SCPS char); initialize: {insert into previous value(Time, CPS)} Iterate: { update previous set STime= Timein, SCPS= CPSin) where CPSin<>SCPS; insert into return (STime, SCPS) select from previous where CPSin=SCPS and Timein = STime+301 % i.e., 5 minutes +1sec }

We will break to optimize memory and the response time will suffer. Problem C. 20% Source2 B Source3 C Sink A Source1 The processing speeds of these operators are: A: 100 tuple/sec B: 100 tuples/sec C: 100 tuples/sec U: 200 tuples/sec Layout: ||| U ||| We have timestamped tuples. Also we assume that the buffers feeding A, B, and C contain respectively 1000 tuples, 400 tuples and 600 tuples. Also B and C deliver one output tuple for each input tuple, while A is a selection that eliminates about 50% of its input tuples. All tuples have the same size. C1: What is the time T required to process all tuples? In which order should the operators be scheduled to minimize the memory usage? (1) Illustrate your answer with a memory diagram and (2) estimate the resulting average number of tuples kept in memory during time T. You can also assume that the idle waiting caused by U is negligible. Time A B+C+U 10s 15s 2000 1000 25s C1: B and C have the same memory release rate and they must operate together because of the union. Thus B and C process their 600+400 =1000 tuples in 10 sec. along with U that takes 5 for a total of 15 s. So A+B+U releases 1000 tuple of memory in 15 secds. 100/15= 66.6 tples/sec. A releases 100 tuples/sec and should go first. It will take 10 secs. The total area of 1,000x10+ 1000x10/2 + 1000x15/2= 22,500. The average memory usage over the 25 seconds is 22,500/25= 900 tuples Answer: If there is no idle-waiting the time is all spent in processing tuples. Thus if N1=N1=0.5 N , these N tuples will be processed in time 0.5*N/1000+ 0.5*N/1000+ 0.5*N/500+ 0.5*N/1000= 2.5*N/1000. Then say that C also takes N/400 seconds to process N tuples. Now say that N2 =0 , so the union group process the N tuples in source1 in N/1000+N/1000= N/500 much faster. We will break to optimize memory and the response time will suffer. Now say that N1 =0 , so the union group process the N tuples in source2 in N/500+N/1000= 3*N/1000 1/333 . Then C

We will break to optimize memory and the response time will suffer. I i Problem D. 20% The processing speeds of these operators are: A: 100 tuples/sec, B: 100 tuples/sec, C: 25 tuple/sec Sink A Source B C B1 B2 Layout II: Problem D: Consider Layout II and assume that the buffers feeding A contains 1000 tuples. A is a selection operator that discards 50% of its input tuples. However, for each tuple in its input, B delivers one tuple to its output, and so does C. D1. We wan to minimize the average latency measured in the number of missing query answers. Should we then partition this network and, if so, how? Illustrate your answer with a diagram and estimate the resulting average latency. Time A+B C 15s 20s 1000 500 35s D1: to process 1000 tuples, A takes 10 sec. Then to output 500 tuples B takes 5 sec and C takes 20. So the total output a is is 1000 answers in 35 sec. Unbroken it delivers at the rate of 1000/35 tuples per sec. If we cut A+B delivers 500 answers in 10+5 sec. i.e. 1000/30 faster we should break it Then to deliver 500 tuples, C takes 500/25= 20s. Avg Latency, i.e. the time taken by a tuple on the average. Area/1000 Area= 500x15+500x15/2+ 500x20/2= 500x32.5 Avg. Latency= 500X32.5/1000= 16.25 s Answer: If there is no idle-waiting the time is all spent in processing tuples. Thus if N1=N1=0.5 N , these N tuples will be processed in time 0.5*N/1000+ 0.5*N/1000+ 0.5*N/500+ 0.5*N/1000= 2.5*N/1000. Then say that C also takes N/400 seconds to process N tuples. Now say that N2 =0 , so the union group process the N tuples in source1 in N/1000+N/1000= N/500 much faster. We will break to optimize memory and the response time will suffer. Now say that N1 =0 , so the union group process the N tuples in source2 in N/500+N/1000= 3*N/1000 1/333 . Then C

range 300 preceding) as SNs ) where SNs=1. I i Extra Credit. 5% Rather than defining a new aggregate, express problem A1 using standard SQL OLAP functions %Idea: a window of size 300 seconds, in which there is only one value of CPS select, LID, Time from ( select LID, Time, count(DISTINCT CPS) over (partition by LID order by Time, range 300 preceding) as SNs ) where SNs=1. Answer: If there is no idle-waiting the time is all spent in processing tuples. Thus if N1=N1=0.5 N , these N tuples will be processed in time 0.5*N/1000+ 0.5*N/1000+ 0.5*N/500+ 0.5*N/1000= 2.5*N/1000. Then say that C also takes N/400 seconds to process N tuples. Now say that N2 =0 , so the union group process the N tuples in source1 in N/1000+N/1000= N/500 much faster. We will break to optimize memory and the response time will suffer. Now say that N1 =0 , so the union group process the N tuples in source2 in N/500+N/1000= 3*N/1000 1/333 . Then C

Alternative serial Schedules: Memory Used B Time A B+C+U 10s 15s 2000 C 1000 10s 25s