UCLA, Winter Sample from CS240B Past Midterms

Slides:

Advertisements

Similar presentations

Adam Jorgensen Pragmatic Works Performance Optimization in SQL Server Analysis Services 2008.

Advertisements

Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD.

CS240B Midterm Spring 2013 Your Name: and your ID: Problem Max scoreScore Problem 140% Problem 232% Problem 228% Total 100%

Lecture 8: Memory Hierarchy Cache Performance Kai Bu

1 Efficient Temporal Coalescing Query Support in Relational Database Systems Xin Zhou 1, Carlo Zaniolo 1, Fusheng Wang 2 1 UCLA, 2 Simens Corporate Research.

Multidimensional Data

Set operators (UNION, UNION ALL, MINUS, INTERSECT) [SQL]

Multidimensional Data. Many applications of databases are "geographic" = 2dimensional data. Others involve large numbers of dimensions. Example: data.

D ATABASE S YSTEMS I A DMIN S TUFF. 2 Mid-term exam Tuesday, Oct 2:30pm Room 3005 (usual room) Closed book No cheating, blah blah No class on Oct.

Disk Access Model. Using Secondary Storage Effectively In most studies of algorithms, one assumes the “RAM model”: –Data is in main memory, –Access to.

Temporal Indexing Snapshot Index. Transaction Time Environment Assume that when an event occurs in the real world it is inserted in the DB A timestamp.

1 Chapter 10 Query Processing: The Basics. 2 External Sorting Sorting is used in implementing many relational operations Problem: –Relations are typically.

CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.

Evaluating Window Joins Over Unbounded Streams By Nishant Mehta and Abhishek Kumar.

Query Processing (overview)

MAE 552 – Heuristic Optimization Lecture 26 April 1, 2002 Topic:Branch and Bound.

Chain: Operator Scheduling for Memory Minimization in Data Stream Systems Authors: Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani (Dept.

1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,

Multidimensional Data Many applications of databases are ``geographic'' = 2dimensional data. Others involve large numbers of dimensions. Example: data.

Avoiding Idle Waiting in the execution of Continuous Queries Carlo Zaniolo CSD CS240B Notes April 2008.

Finite State Machines – Page 1CSCI 1900 – Discrete Structures CSCI 1900 Discrete Structures Graphs and Finite State Machines Reading: Kolman, Sections.

Lecture 11 Main Memory Databases Midterm Review. Time breakdown for Shore DBMS Source: “OLTP Under the Looking Glass”, SIGMOD 2008 Systematically removed.

1 Physical Data Organization and Indexing Lecture 14.

Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.

Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.

Time Parallel Simulations I Problem-Specific Approach to Create Massively Parallel Simulations.

CS4432: Database Systems II Query Processing- Part 2.

Installment Numbers 6 Answers to Puzzle Corner Problems (Installment Numbers1-5) 會研所黃潔.

CSCE Database Systems Chapter 15: Query Execution 1.

1 Exercise Sheet 3 Exercise 7.: ROLAP Algebra Assume that a fact table SalesCube has 3 hierarchies with attributes  ear , Month M, Productgroup P and.

Lecture 3 - Query Processing (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch

Chap4 Temporal Database Chap 4: Temporal Extensions to the Relational Model and SQL.

Transaction Management Exercises I/O and CPU activities can be and are overlapped to minimize (disk and processor) idle time and to maximize throughput.

SQL and Query Execution for Aggregation. Example Instances Reserves Sailors Boats.

Chapter 10 The Basics of Query Processing. Copyright © 2005 Pearson Addison-Wesley. All rights reserved External Sorting Sorting is used in implementing.

CS4432: Database Systems II Query Processing- Part 1 1.

CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

CPU Scheduling CSSE 332 Operating Systems

Unary Query Processing Operators

Module 11: File Structure

Ch. 19, R.A. Arnold, Economics 9th Ed

The Stream Model Sliding Windows Counting 1’s

Chapter # 6 The Relational Algebra and Calculus

Database Management System

Fundamental of Database Systems

CS411 Database Systems 08: Midterm Review Kazuhiro Minami 1.

CSC 4250 Computer Architectures

Cache Memory Presentation I

Chapter 12: Query Processing

Chapter 2: Intro to Relational Model

CS222P: Principles of Data Management Notes #11 Selection, Projection

Faloutsos/Pavlo C. Faloutsos – A. Pavlo Lecture#13: Query Evaluation

MATS Quantitative Methods Dr Huw Owens

Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani

Query Optimization CS 157B Ch. 14 Mien Siao.

Temporal Databases.

If a DRAM has 512 rows and its refresh time is 9ms, what should be the frequency of row refresh operation on the average?

Process Scheduling Decide which process should run and for how long

Chapter 12 Query Processing (1)

Information Management

CS240B: Assignment1 Winter 2016.

Query Optimization Minimizing Memory and Latency in DSMS

UCLA, Fall CS240B Midterm Your Name: and your ID:

CS222: Principles of Data Management Notes #11 Selection, Projection

CS240B Midterm: Winter 2017 Your Name: and your ID:

External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot.

Database Administration

Fast Min-Register Retiming Through Binary Max-Flow

CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #10 Selection, Projection Instructor: Chen Li.

Computer Organization and Design Chapter 4

Presentation transcript:

UCLA, Winter 2017. Sample from CS240B Past Midterms

Problem A. 36% A Source B C D Source E F The time required by these operator to process a tuple are: A, B, D, E : 10ms; C: 20 ms ; F: 30ms A, C and D: are selection operators that send 50% of their input tuples to output and discard the rest. The other operators let 100% of their input through. A1: Show how the execution is scheduled to minimize average tuple latency and show the resulting latency diagram assuming that initially you have 1000 tuples in each input. Problem A1, latency minimization: A+B+C takes 10+5+10=25 s to complete and delivers 250 tuples—i.e. 10 tuples per second. Now, D+E+F takes 10+5+ 15= 30 s to complete, delivering 500 tuples i.e., 16.6 tuples per sec. Thus to minimize latency the second path is executed first.

Problem A3: Maximize memory release first and minimize latency second. Source B C D Source E F The time required by these operator to process one tuple are: A, B, D, E : 10ms; C: 20 ms ; F: 30ms. A, C and D: are selection operators that send 50% of their input tuples to output and discard the rest. The other operators let 100% of their input through. A2: Show how the execution will be scheduled to minimize memory. While memory release is the primary objective you should still use here latency reduction as your secondary one. Show the memory release diagram. Problem A2: So A has a memory release rate of (mrr) of 500 in 10 s --i.e., 50 tuples per sec. B has mmr=0, and B+C has mmr=1000/30=500/15=33.33 tuples per sec. So, the path is broken after A. The bottom path is also broken after D (D has mmr=50/s). E has mmr=0, and E+F has mmr of 100/40=500/20= 25 tuples/s. Thus, the overall schedule is to maximize mmr is: A and D first (in either order), then B+C and finally E+F. Problem A3: Maximize memory release first and minimize latency second. The mmr determines the schedule, leaving no room latency optimization. But if F were taking 20 seconds, and then EF would go before B+C which only returns 50% of its tuples.

B2. it is partially blocking—see picture above. Problem B, 32%: TSQL2 introduced a new temporal aggregate that given empsal(Eno, Sal, Tstart, Tend) returns for each employee the periods in which his/her salary was growing. Here instead of those periods we want to know the total raises during thoseperiods. Salaries can increase or decrease but employee cannot be rehired. B1. Write the aggregate totraise(Sal): DeltaSal to perform such computation on the input stream empsal(Eno, Sal, Tstart, Tend) ordered by Tstart partitioned by Eno. So, X-axis is time, Y-axis is Sal. Then we want the Difference between salaries at Tc and TA will return that value when we see the Third segment. But if that is missing we have to wait Possibly till Terminate. Punctuation timestamps could be useful here. TA TB TC TD Solution: B2. it is partially blocking—see picture above.

Problem B, 32%: TSQL2 introduced a new temporal aggregate that given empsal(Eno, Sal, Tstart, Tend) returns for each employee periods in which his/her salary was growing. Here instead of the periods we want to know the total raises during those periods. Salaries can increase or decrease but employee cannot be rehired. B1. Write the aggregate totraise(Sal): DeltaSal to perform such computation on the input stream empsal(Eno, Sal, Tstart, Tend) ordered by Tstart partitioned by Eno. aggregate raise (Sal real): (Deltasal Real) Table history(Previous real, Initial Real) Initialize:{ insert into history value (Sal, Sal)} Iterate:{ update history set Previous=Sal, Initial=Sal where Sal >Previous and Initial=0; % beginning of growing phase update history set Previous=Sal, where Sal >Previous; and Initial>0; % a growing phase continues insert into return select Previous − Initial from history where Sal<Previous and Initial>0. update history set Previous=Sal, Initial=0 where Sal <Previous % shirking phase initiate or continue } Terminate:{ insert into return where Previous − Initial from history Initial>0}

match-recognize( partition by Eno, ordered by Time Problem C. 32% Another solution to the previous problem is to have a stream of events such as evnt(Eno, Time, Sal), where Sal=0 denotes that the employee just quit the company, whereas Sal>0 denotes the salary of a just-hired employee, or the salary just updated for a current employee: C1. Write a SQL-MR to express the raise computation described in B. Remember that salaries can increase or decrease. C2. Determine the blocking/non-blocking properties of your query, assuming that the history of each employee in evnt terminates with a quit (Sal=0) tuple. C2. In the previous problem at time Tc We do not know if we are going to see another tuple for the same employee. But if Sal=0 the employee Just quit and we know that no more tuples are coming: blocking is no longer needed ! C1. select eno, FnlSal − InitSal as Raise from evnt match-recognize( partition by Eno, ordered by Time measures X.Eno as eno, X.Sal as InitSal previous(Z.Sal) as FnlSal One row per match After match skip past last row Maximal match pattern (X Y* Z) define Y as Y.Sal > previous(Y.Sal).

Extra Problem B Layout I: A Source2 Source3 C Sink A Source1 The processing speeds of these operators are: A: 100 tuples/sec B: 100 tuples/sec C: 25 tuple/sec B Consider Layout I and assume that the buffers feeding A, B, and C contain respectively 1000 tuples, 400 tuples and 600 tuples. Also B and C deliver one output tuple for each input tuple, while A is a selection that eliminates about 50% of its input tuples. C1: in which order should the operators be scheduled to minimize the average memory usage? Illustrate your answer with a memory diagram and estimate the resulting average memory usage assuming that each tuple takes 100 bytes. C2. in which order should the operators be scheduled to minimize the average latency measured in the number of missing query answers? Illustrate your answer with a latency diagram that shows the missing tuples over time and estimate the resulting average latency. C1: A and B have the same memory release rate. Thus a first segment that processes 1400 tuples in 14 seconds. Area(A+B)= 14*1400/2=9800 Area of rectangle: 600*14= 8400 C: 600 tuples in 600*4/100= 24 sec. area(C)= 24*600/2= 7200 Total Area=7200+8400+9800=25400 tuples*sec. Avg: 25400/38=668. Answer: If there is no idle-waiting the time is all spent in processing tuples. Thus if N1=N1=0.5 N , these N tuples will be processed in time 0.5*N/1000+ 0.5*N/1000+ 0.5*N/500+ 0.5*N/1000= 2.5*N/1000. Then say that C also takes N/400 seconds to process N tuples. Now say that N2 =0 , so the union group process the N tuples in source1 in N/1000+N/1000= N/500 much faster. We will break to optimize memory and the response time will suffer. Now say that N1 =0 , so the union group process the N tuples in source2 in N/500+N/1000= 3*N/1000 1/333 . Then C C2: A delivers 500 answer tuples per sec. B delivers 1000 per sec and will go first and completes in 4 sec A goes next and completes in 10 sec. C goes last in 24 sec. Average tuples missing. 4*(1000+200) +10*(600+500)+24*300=23,000 On the average 605 tuples missing over 38 sec.