Dop d d 1 2 reconst reconst sop P P 1 2.

Slides:

Advertisements

Similar presentations

Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD.

Advertisements

Whats New in Office 2003 Gini Courter Annette Marquis TRIAD Consulting.

Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD.

Introduction to Computer Science 2 Lecture 7: Extended binary trees

Applied Algorithmics - week7

CS4432: Database Systems II

Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.

Heaps1 Part-D2 Heaps Heaps2 Recall Priority Queue ADT (§ 7.1.3) A priority queue stores a collection of entries Each entry is a pair (key, value)

Winner trees. Loser Trees.

1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

Mining Data Streams.

Chapter 11 Group Functions (up to p.402)

SIA: Secure Information Aggregation in Sensor Networks Bartosz Przydatek, Dawn Song, Adrian Perrig Carnegie Mellon University Carl Hartung CSCI 7143: Secure.

Windows in Niagara Jin (Jenny) Li, David Maier, Vassilis Papadimos, Peter Tucker, Kristin Tufte.

SWiM Benchmark Brainstorming Dave Maier Mike Stonebraker and All of You! With thanks to Jim Gray for suggestions.

DSAC (Digital Signature Aggregation and Chaining) Digital Signature Aggregation & Chaining An approach to ensure integrity of outsourced databases.

Semantics and Evaluation Techniques for Window Aggregates in Data Stream Jin Li, David Maier, Kristin Tufte, Vassillis Papadimos, Peter Tucker. Presented.

CS 580S Sensor Networks and Systems Professor Kyoung Don Kang Lecture 7 February 13, 2006.

Hashed Samples Selectivity Estimators for Set Similarity Selection Queries.

Cloud and Big Data Summer School, Stockholm, Aug Jeffrey D. Ullman.

Selective and Authentic Third-Party distribution of XML Documents - Yashaswini Harsha Kumar - Netaji Mandava (Oct 16 th 2006)

Chapter 3 Single-Table Queries

Pattern tree algebras: sets or sequences? Stelios Paparizos, H. V. Jagadish University of Michigan Ann Arbor, MI USA.

1 Single Table Queries. 2 Objectives  SELECT, WHERE  AND / OR / NOT conditions  Computed columns  LIKE, IN, BETWEEN operators  ORDER BY, GROUP BY,

The Lower Bounds of Problems

Using Special Operators (LIKE and IN)

PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS Kristin Tufte PhD Defense Dec 17, 2004.

1 Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565.

Data Streams: Lecture 101 Window Aggregates in NiagaraST Kristin Tufte, Jin Li Thanks to the NiagaraST PSU.

1 Online Computation and Continuous Maintaining of Quantile Summaries Tian Xia Database CCIS Northeastern University April 16, 2004.

Space-Efficient Online Computation of Quantile Summaries SIGMOD 01 Michael Greenwald & Sanjeev Khanna Presented by ellery.

Advanced Adhoc Reporting 2010 Visions Conference July 28, 2010.

By: Gang Zhou Computer Science Department University of Virginia 1 Medians and Beyond: New Aggregation Techniques for Sensor Networks CS851 Seminar Presentation.

1 Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter Tucker This work.

SQL: Interactive Queries (2) Prof. Weining Zhang Cs.utsa.edu.

Chapter 11 Sorting Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and Mount.

Mining Data Streams (Part 1)

CPS216: Data-intensive Computing Systems

CS522 Advanced database Systems

CS 540 Database Management Systems

Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.

Greedy Method 6/22/2018 6:57 PM Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015.

Topics covered (since exam 1):

Performing Mail Merges

Priority Queues © 2010 Goodrich, Tamassia Priority Queues 1

Heaps © 2010 Goodrich, Tamassia Heaps Heaps

Heaps 9/13/2018 3:17 PM Heaps Heaps.

Managing Multiple Worksheets and Workbooks

Evaluation of Relational Operations: Other Operations

(2,4) Trees 11/15/2018 9:25 AM Sorting Lower Bound Sorting Lower Bound.

CS405G: Introduction to Database Systems

Part-D1 Priority Queues

Quiz About [Your topic]

Topics covered (since exam 1):

Heaps 11/27/ :05 PM Heaps Heaps.

© 2013 Goodrich, Tamassia, Goldwasser

(2,4) Trees 12/4/2018 1:20 PM Sorting Lower Bound Sorting Lower Bound.

Ch. 8 Priority Queues And Heaps

PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

CS4222 Principles of Database System

Heaps © 2014 Goodrich, Tamassia, Goldwasser Heaps Heaps

(2,4) Trees 2/28/2019 3:21 AM Sorting Lower Bound Sorting Lower Bound.

Section 4 - Sorting/Functions

Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes May 16, 2008.

Title Slide After creating the template:

External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot.

CS 583 Analysis of Algorithms

Evaluation of Relational Operations: Other Techniques

Heaps 9/29/2019 5:43 PM Heaps Heaps.

Presentation transcript:

dop d d 1 2 reconst reconst sop P P 1 2

a u c t i o n i t e m i t e m i d : 5 1 d e s c : b i d i d : 4 3 3 d e s c : 1 9 7 1 T r e k 5 9 M a r t i n S u p e r l i g h t G u i t a r b i d d e r : a m t : R o a d B i k e J o e $ 1 5

a u c t i o n i t e m i d : 5 1 b i d b i d d e r : a m t : S u e $ 1 5 5

a u c t i o n i t e m i t e m i d : 5 1 d e s c : b i d b i d i d : 4 3 3 d e s c : 1 9 7 1 T r e k 5 9 M a r t i n S u p e r l i g h t G u i t a r b i d d e r : a m t : b i d d e r : a m t : R o a d B i k e J o e $ 1 5 S u e $ 1 5 5

Differential Nest Old Value New Value Subject, Title Subject, Title Subject, Title Subject, Title produce partial result ( null, null, Google, {Title1}), ( null, null, Microsoft, {Title2, Title3}) (Google, Title1), (Microsoft, Title2), (Microsoft, Title3) (Google, {Title1}, Google, {Title1, Title4}) (Google, Title4) (Google, {Title1,Title4}, Google, {Title1, Title4, Title 5}) (Google, Title5) but what you’d really like to send is: (Google, {Title5}) and “merge” it with: (Google, {Title1,Title4}) Subject: Google Title: Title1 Title: Title4 Title: Title5 Title:Title4 Merge

Merge Example Combined Inserted Used in Match Merged Document auction item item iid:501 desc: Trek Madone 5.9 Bike bid bid iid:433 desc: 1971 Martin Guitar bidder: Dave amt: $1500 bidder: Sue amt: $1550 Merged Document auction auction item item item iid:501 desc: Trek Madone 5.9 Bike bid iid:433 desc: 1971 Martin Guitar iid:501 bid bidder: Dave amt: $1500 bidder: Sue amt: $1550 Auction Document New Bid

Merge Template (MT) (auction, [], NoContentNoAttrs) auction (item, [iid], NoContentNoAttrs) item (iid, [], ExactMatch) (desc, [], ShallowContent - Replace) iid:501 (bid, [bidder, amt], NoContentNoAttrs) bid (bidder, [], ExactMatch) (amt, [], ExactMatch) bidder: Sue amt: $1550 Merge Template is an XML document consisting of a tree of Element Merge Templates (EMT) EMT is a triplet containing: (name, local key, content combine function)

View Merge as Least Upper Bound auction item item iid:501 desc: Trek Madone 5.9 Bike bid bid iid:433 desc: 1971 Martin Guitar bidder: Dave amt: $1500 bidder: Sue amt: $1550 Merged Document (D3) D3 is “smallest” document that “contains” D1 and D2 auction auction item item item iid:501 desc: Trek Madone 5.9 Bike bid id:433 desc: 1971 Martin Guitar iid:501 bid bidder: Dave amt: $1500 bidder: Sue amt: $1550 Auction Document (D1) New Bid (D2)

What can go wrong? No unique result (no Least Upper Bound (LUB)) Keys in Merge Template eliminate ambiguity Know D4 is correct result if we know iid is a key for item auction auction item item item iid:501 iid:433 iid:501 iid:433 D3 D4 auction auction Id as key to eliminate D4 item item iid:501 iid:433 D1 D2

Non-Key-Respecting Documents auction auction (auction, [], NoContentNoAttrs) item item item (item, [iid], NoContentNoAttrs) iid:501 iid:433 iid:501 iid:433 (iid, [], ExactMatch) D3 D4 T means contained in. D is contained in D′ if there is a structure-preserving mapping from D into D′ D3 is not key-respecting with respect to T and should not be in LT. auction auction item item iid:501 iid:433 D1 D2

Merge-Lattice Theorem Overview D3 ρ(D1) ρ(D2) LT    D1 D2 ρ(D1) ρ(D2) ρ1 ρ2 Associate each document D with a unique path set ρ(D) ρ(D1)  ρ(D2) is a Least Upper Bound (LUB) for ρ(D1) and ρ(D2) ρ(D1)  ρ(D2) is the “smallest” set that contains both ρ(D1) and ρ(D2) Intuition: Merge of D1 and D2 should be the document associated with ρ(D1)  ρ(D2)

Document and Path Set Use Merge Template + document to create path set auction[]: auction[].item[id:501]: auction[].item[id:501].id[]:501 auction[].item[id:501].desc[]:Trek Madone 5.9 Bike auction[].item[id:501].bid[bidder:Dave,amt:$1500]: auction[].item[id:501].bid[bidder:Dave,amt:$1500]. bidder[]:Dave amt[]:$1500 auction item iid:501 desc: Trek Madone 5.9 Bike bid bidder: Dave amt: $1500 auction[].item[iid:501].desc[]:Trek Madone 5.9 Bike rooted key value element content Use Merge Template + document to create path set One element in path set for each element in document Path comprised of rooted key value and element content Path set order (subset) identical to document containment order

Proof that D3 is in L D3 3 σ σ-1 (=ρ3) T 2 ρ2 ρ(D1) ρ(D2)  D2 1 ρ2-1 ρ1 D1 ρ1-1 Construct D3 from ρ(D1)  ρ(D2), show D3 is compatible and key-respecting with respect to T

Figure 2: Using Panes to Evaluate Query 1 (item-id, bid-price, timestamp) (10, $9.12, 12:15:52 PM) t1 (11, $8.93, 12:16:42 PM) t2 (11, $9.20, 12:16:49 PM) t3 (pane-max, pane-timestamp) ($9.12, 12:15:52 PM) p1 ($9.20, 12:16:49 PM) p2 (win-max, timestamp) ($9.12, 12:15:52 PM) w1 ($9.20, 12:16:49 PM) w2 streamscan window-max (bid-price) RANGE = 1 min SLIDE = 1 min WATTR = timestamp RANGE = 4 min WATTR = pane-timestamp Figure 2: Using Panes to Evaluate Query 1

Figure 5: Cost Ratio of Pane vs. Original Window-Id Approach

Figure 6: Band Disorder

Figure 7: Block-sorted Disorder

Figure 8: Latency vs. Accuracy Band-Disorder (average error percentage)

Figure 9: Latency vs. Accuracy Block-Sorted-Disorder (average error percentage)

Figure 10: Latency vs. Accuracy Block-Sorted- Disorder (percentage of wrong answers)

Figure 1: Four detection stations in a detection task SUSPECT PRESUMPTIVE CONFIRMED INTERCEPTED Figure 1: Four detection stations in a detection task (from Yonnel Gardes, The Transpo Group, Kirkland, WA, with permission)

Figure 4: Example of insertion, initialization, and update of bins as new tuples arrive.

Figure 8 (b): Execution Time: WID versus Buffering – Zoom-in

Figure 10: Latency vs. Accuracy Block-Sorted- Disorder (percentage of incorrect answer) external punctuation

count(*) count (*) bucket streamscan (group on window-id, (sensor-id, room-id, timestamp, temperature) (2, C, 00:05:58PM, 80°) T0 (2, C, 00:06:05PM, 82°) T1 (item-id, bid-price, auction-site, timestamp, pane-id) (3, 5, 00:05:58PM, 80°, 5-5) T0 (*, C, * , * , 5) P1 (2, C, 00:06:05 PM, 83°, 6-6) T1 streamscan count (*) (group on pane-id, auction-site) bucket RANGE = 1 min SLIDE = 1 min WINATTR = timestamp RANGE = 5 SLIDE = 1 WINATTR = pane-id count(*) (group on window-id, (auction-site, pane-id, count, timestamp) (3, 5, 8, 00:05:42PM) M0 timestamp, window-id) (3, 5, 8, 00:05:42PM, 6-10) M0