Eddies for Continuous Queries Sam Madden CS286 Project S01
Motivation Want many queries over continuous streams of data Current Eddies Thread per query Scanner per source Share common work between modules Reduce memory burden Intra-query scheduling (Not focusing on joins – Need new operators to deal with endless streams)
Data Structures One Eddy per Telegraph Instance Only Source-module for each source (over all queries) One Filter per Source field (over all queries) Per-Source State Source -> Reachable modules Query -> Completion bitmask Per-Tuple State Output query mask Per Query State: Output queues Aggregate information
Tuple Flow Tuple Arrives Works for Joins Too (Somewhat Inefficiently?) Tagged with source id Routing policy chooses a filter to route to, based on modules reachable from source Filter marks query state as “output” for tuples which don’t pass Tuple output to queries which have completed, using source If more filters to check, tuple re-inserted into eddy Works for Joins Too (Somewhat Inefficiently?) Extend reachability graph across joins Project out unused sources when tuples are output
Combining Filters Given a Filter F over some field S.a, with n predicates generalized to be over ranges [a,b] (plus not-equals) Interval tree for >, >=, <, <= predicates, inserting from interval (a,], [a,], [- , b), or [- , b]. (O(log n)) When a tuple arrives, find intervals which it itersects. (O(n)) For = and , use a hash table For , output all tuples except those in table Saves routing, tuple parsing cost Simplifies optimization space
Routing Policy Random policy routes to each module with equal probability Ticket policy: from Eddy paper Route to modules with highest selectivity Estimate selectivity based on ratio of in/out tuples Use back-pressure to adjust delivery rates Multi-query Ticket policy Estimate selectivity based on ratio of (number of applied predicates /number of passed predicates) Based on Shankar’s implementation: back pressure not applied properly
Preliminary Results Simple, four query test: from s select s.index where s.a > 30 from s select s.index where s.b > 30 and s.a > 30 from s select s.index where s.c > 30 and s.b> 30 and s.a > 30 from s select s.index where s.d > 30 and s.c > 30 and s.b > 30 and s.a > 30 Becomes five modules: one scanner and four filters