Download presentation
Presentation is loading. Please wait.
1
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
Data Streams S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
2
Overview Two approaches Motwani et al [CIDR03]
Concentrate on query processing, approximations Cherniak et al. [CIDR03] Concentrate on system architecture with data-flow style processing DAG of operators to be created by users
3
Motwani et al. Query language (SQL extension)
Semantics of stream queries Query plans with sharing and approximation Reducing memory requirements for query processing by exploiting constraints on data Techniques for static and dynamic approximation of query results Resource allocation to queries Techniques for data compression
4
Motwani: Language and Semantics
Relations and streams Streams are timestamped Tuple s arrives at time t: <t, s> Istream and Dstream operators Create stream from relation insert/delete Query language implicitly converts from streams to relations and vice versa Any query with stream at outer level gives stream output Window operations convert streams to relations
5
Motwani: Operators and Queues
Create, changemem, run Synopsis Summarizes tuples seen on a stream Create, changemem, insert/delete, query Resource sharing Sharing of synopses
6
Motwani: Constraints Constraints:
Many-one join and referential integrity constraints between two streams Clustered/ordered arrival constraints Can exploit to avoid storing/examining history Implicit addition of now window
7
Motwani: Scheduling Greedy approach:
Schedule operator that consumes largest number of tuples per time unit and produces the fewest tuples Minimize queue lengths
8
Motwani: Approximation
Static and dynamic approximation Static: modify the query Window reduction Sampling rate reduction Dynamic Synopsis compression Sampling Load shedding Resource allocation to maximize precision Simple precision metric: (FP, FN)
9
Motwani: Implementation
Entities Operators, queues, synopses Control tables Attribute value pairs used to control the entity Query plan Network of entities
10
Cherniak et al. Query model Distribution model
Set of relational operations Push based query processing “tumble” aggregate operation outputs results whenever window is complete Distribution model Aurora* vs Medusa
11
Cherniak: Communication Infrastructure
Naming and discovery Catalogs to find streams/queries Each Aurora network binds its inputs and outputs to streams Tuples are routed based on who consumes the stream Single copy of stream per participant Single TCP connection between sites Streams multiplexed within connection Allows control on QOS Remote definition of streams
12
Cherniak: Economic Model
Economic model of load management and sharing Agoric system Source is paid, sink pays
13
Cherniak: Load Management
Redistribution of computation while system is active Periodic repartitioning of computation
14
Cherniak: Re-partitioning
Operator nodes can be reallocated across processor boundaries when required (box sliding) Pairwise interaction Box splitting to parallelize operation Filters to decide which tuples go where subsequent re-merging of streams Merge has to handle groups partially aggregated before split Deciding on split criterion
15
High Availability Basic idea: K-safe
Each server can act as backup for its downstream server Low overhead Tuples are discarded after ensuring they are saved elsewhere if they may be needed again K-safe Data is safe as long as < k sites fail Keep copies of in-transit tuples at k upstream servers
16
Cherniak: Availability
Remote queue truncation technique Flow messages Process pair Sequence numbering of tuples Earliest tuple that a box depends on E.g stateless (e.g. filter) and stateful (e.g. aggregates) stored and sent back periodically to upstream server More complex with split operators Split operator “merges” messages When to truncate: min of all truncation msgs Failure detection and recovery
17
Cherniak: Availability
Process pair techniques Traditional way in distributed systems More msgs during operation, less work during recovery Further extension to virtual machines
18
Charniak: Policy Specification and Guidelines
QoS based control in Aurora* Economic contract based control in Medusa
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.