Adaptive Query Processing for Wide-Area Distributed Data Michael Franklin UC Berkeley Joint work with Tolga Urhan, Laurent Amsaleg, and Anthony Tomasic
M. Franklin, 12/3/99 2 Motivation n Pervasive network connectivity enables global-scale federated DBMSs. n Improvements in heterogeneous DBMS and emerging standards enable Internet query processing. n Telegraph: Flow-based composition of data- intensive Internet services.
M. Franklin, 12/3/99 3 u Sources may be unreachable or slow to respond. u Data delivery may be: F F slower than expected F F bursty F F interrupted u Data statistics/cost estimates may be unavailable or unreliable. Traditional, static query processing approaches cannot cope with such problems at run-time. Wide-area + Wrapped sources Unpredictability
M. Franklin, 12/3/99 4 Some Solutions n Adaptive Query Processing u Query Scrambling - “Reactive Query Execution” u XJoin – non-blocking, reactive query operator. u and beyond! n Risk-Aware Query Planning u Producing robust plans. n Exploiting Alternative Sources u Mirrors or “not exactly”. n Relaxing Query Semantics u Partial, Fuzzy, or Alternative answers
M. Franklin, 12/3/99 5 Query Scrambling - Introduction n Goal: Overcome limitations of static QP for unexpected delays. n A Reactive Approach: u Start with an optimized plan. u Modify the plan on-the-fly if problems are detected. u Hide delays by performing other useful work. n Assumptions: u Focus on Initial Delay u Query processing at client; Iterator model u No replication.
M. Franklin, 12/3/99 6 n An iterative algorithm. n Monitor input and scramble when problems are detected. Query Scrambling - Overview n Phase 1: n Phase 1: Reschedule “runable” operators. n Phase 2: n Phase 2: Operator synthesis: create new operators. Phase 1 Phase 2 Scrambling Normal Execution Source(s) responded Source(s) delayed Still delayed
M. Franklin, 12/3/99 7 Query Scrambling Example 1 4 A CDE B Reschedule A CDEB New Operators BCDEA Initial PlanReschedule A BCDE ABCDE
M. Franklin, 12/3/99 8 n A thread per operator. n Monitoring and scheduling. n A “smart” materialization operator. n Multi-threaded query operators? Building a Scrambling Engine Not Started Active Stalled Suspended Closed open done timeout data_arrival de-schedule resume
M. Franklin, 12/3/99 9 Directing Scrambling [SIGMOD 98] n Original formulation [PDIS 96] was based on heuristics. n Demonstrated the ability for QS to hide delays, but was susceptible to making bad choices. n Query optimizers are able to choose good plans, but how to use an optimizer to do scrambling? u Phase I F Issue: where to place the materialization operator? F Answer: Choose subtree with best overhead/useful work ratio. u Phase II is trickier.
M. Franklin, 12/3/99 10 n If no runable subtrees, create new ones. n Needed: an optimizer that: 1) is lightweight & incremental, and 2) understands delays. n Most QP systems optimize for total work. n But, delay is inherently a response-time issue. but only if it knows the duration of the delay! n Response-time optimization can “magically” move delayed operators to the “best” point in the plan, Phase II - Operator Synthesis
M. Franklin, 12/3/99 11 n Invokes the optimizer with a very large delay value. n Optimizer pushes the delayed relation as far back as is useful. n Large delay estimation Aggressive Include Delayed (ID) Algorithm
M. Franklin, 12/3/99 12 Estimated Delay (ED) Algorithm n Initially calls the RT optimizer with a small delay u Small value = 25 % of the RT of the original query n Successively increases the delay estimation. u 50% and then 100% of the original RT. n Increasing estimates Adaptive
M. Franklin, 12/3/99 13 Experimental Environment n Workload: Queries derived from TPC-D benchmark u TPC-D (5), TPC-D(8), TPC-D(9), (1 GB base data) n Optimizer (built from scratch): u Two Phase Randomized Optimizer F F a la [Ioannidis 90]. u Optimizes for Total Work or Response Time (GHK 92). u Search space = bushy plans n Studied algorithms on a simulated environment u Network, remote sites, query engine etc. u Subsequently validated with Predator-based implementation.
M. Franklin, 12/3/99 14 National Market Share Query (TPC-D 8) n Experiments with several memory sizes n Delayed relation (Part) is an important relation. n Used hash joins only. n Lineitem is the largest relation, Part is a “reducer” n Optimizer initially chooses to go left-to-right. PartLineItem Supplier Nation Customer Region Order 1/1502/7 1/5
M. Franklin, 12/3/99 15 National Market Share Query (large memory) > 4 MB Delay No Scramb
M. Franklin, 12/3/99 16 National Market Share Query (Sm. memory) Scrambling becomes more expensive Pair: Local Decisions, lack of global view IN : Poor performance for short delays. ED : Good for a wide range of delay values. No Scramb. Delay
M. Franklin, 12/3/99 17 Cost-Based Query Scrambling Summary: u Traditional static query processing does not scale to the wide-area environment. u A reactive approach is needed. u This requires a multi-threaded engine and a scrambling-enabled optimizer. Experimental Results: u Avoids many of the problems of heuristic algorithms. u Response time-based optimization is needed. u Fundamental tradeoffs arise in the absence of good delay predictions.
M. Franklin, 12/3/99 18 XJoin - Improving Responsiveness n QS can speed up the delivery of the entire answer. n But, its ability to hide delays is limited by the amount of useful work that can be done in the query. n XJoin is a new query operator that: u Produces results incrementally as they become available. u Allows progress to be made in highly erratic situations. u Has a small memory footprint. u Tolerates bursty and slow behavior.
M. Franklin, 12/3/99 19 u Traditional Hash Joins block when one input stalls. Hash Join Build Probe Source A Source B Hash Table A Hash Table B u Symmetric Hash Join (SHJ) blocks only if both stall. u Processes tuples as they arrive from sources. u Produces all tuples in the join and no duplicates. Symmetric Hash Join
M. Franklin, 12/3/99 20 Memory Utilization n As originally specified, SHJ requires both inputs to be memory resident. n For a complex query, this means all intermediate results must be in memory. n This is wasteful and can result in thrashing. n XJoin extends SHJ to allow it to work with limited memory (like “Hybrid Hash”). n Spilled tuples are processed by a reactively- scheduled background thread.
M. Franklin, 12/3/99 21 Partitioning n XJoin is a partitioned hash join method. n When allocated memory is exhausted, a partition is flushed to disk. n Join processing continues on memory-resident data. n Disk-resident tuples are handled in background.
M. Franklin, 12/3/99 22 The 3 Stages of XJoin n Stage 1 - Symmetric hash join (memory-to-memory) n Stage 2- Disk-to-memory u Separate thread - runs when stage 1 blocks. u Stage 1 and 2 trade off until all input has been received. n Stage 3 - Clean up stage u Stage 1 misses pairs that were not in memory concurrently. u Stage 2 misses pairs when both are on disk, and may not get to run to completion.
M. Franklin, 12/3/99 23 XJoin - Details n The asynchronous/multi-threaded nature of XJoin combined with its small footprint allows it to be fully pipelined, but… n Duplicate result tuples can be introduced during stages 2 and 3. These are avoided using timestamps. u Each tuple is given an Arrival Timestamp (ATS) and a Departure Timestamp (DTS). u Two tuples with overlaping ATS-DTS ranges have already been matched in stage 1. u Timestamp of when disk-resident partition was used allows detection of tuples matched during stage 2. n Second stage can be further optimized, at the expense of a bit of memory and some additional duplicate detection.
M. Franklin, 12/3/99 24 XJoin-Performance n We implemented XJoin in our multi-threaded version of the PREDATOR ORDBMS (from Cornell). n We modeled network delays using traces obtained from accessing sites across the Internet. u Replaying these traces provides repeatable results. n Focus on a “slow” (24.1 KB/sec) and “fast” (132.8 KB/Sec) trace - both exhibit bursty behavior. n Workload is simple join queries on Wisconsin Benchmark relations.
M. Franklin, 12/3/99 25 XJoin H Fast Build, Slow Probe XJ-2 XJoin H Slow Build, Slow Probe XJ-2 Results - 2-Way Joins (Time in seconds to n th tuple) XJoin H Fast Build, Fast Probe XJ-2 XJoin Slow Build, Fast Probe H XJ-2
M. Franklin, 12/3/99 26 Taming the Second Stage H XJoin XJoin-A Fast Build, Fast Probe n Impact of the second stage decreases during the execution of an XJoin. Scheduling can be adjusted to account for this.
M. Franklin, 12/3/99 27 Results – Multiway Joins SLOW FAST Delivery Times (in Seconds)
M. Franklin, 12/3/99 28 XJoin - Summary n A non-blocking, small footprint join operator. n It is multi-threaded, consisting of three stages. u These stages allow XJoin to make progress when input blocks, but they can introduce duplicates. n XJoin is optimized for streaming results to users as fast as they are created. n Like QS, XJoin hides delays with useful work, but at the operator level rather than at the plan level. n Experiments showed order-of-magnitude improvements in time to get initial results.
M. Franklin, 12/3/99 29 Eddy – Continuous Optimization n Flow-based (“Rivers”) n Tuples are routed via a ticket-based scheme and back-pressure. n Hellerstein and Avnur 99 Eddy Join ST Join RS R S T
M. Franklin, 12/3/99 30 Adaptive Approaches n Increased uncertainty argues for increased adaptivity. u Wide-area nets and admin domains introduce uncertainty. u Pesky users introduce uncertainty. u Non-traditional data sources introduce uncertainty. n Implications for data-intensive Internet services. Dynamic, Parametric, Competitive, … static plans anarchy late binding reopt. continuous opt. current DBMS Query Scrambling, Kabra/DeWitt Eddy XJoin ???
M. Franklin, 12/3/99 31 The Telegraph Project n Adaptive data management for Internet-scale composition of services. u Dataflow-based scheduling. u Cross-domain negotiation. u “User-in-the-loop” u Adaptation and learning over varying granularities F individual long-running jobs F many similar short jobs F continuous data flows and filters.
M. Franklin, 12/3/99 32Conclusions n Current static query processing technology cannot cope with the wide-area environment. n A key concern is unpredictability. u Query Scrambling is a reactive execution approach. u XJoin is a pipelined operator that streams answers. u Even more adaptive approaches are possible. n Complementary approaches: u Alternative sources, optimizing for robustness, relaxing semantics. n These ideas extend to the composition of Internet services.
FINE
M. Franklin, 12/3/99 34 Future Work n Investigating the properties of query plans that make them robust in the presence of network problems. u Will use these properties in the objective function for query optimization. n Next step is to use alternative, but not necessarily equivalent sources. n Further progress will involve relaxing the guarantees on semantics that the query system provides. u The WWW has shown us that users will accept this!
M. Franklin, 12/3/99 35 n Semantic Interoperability n Source Discovery n Performance n Responsiveness and Availability u Distributed database technology, caching, etc. u Unpredictability: how to build responsive systems? u This is the focus of this talk. QP on the Internet? — Issues u Wrapper/Mediator Architecture. u XML,XMI, CWMI,OLE-DB,... u Metadata Repositories and Directories.
M. Franklin, 12/3/99 36 Databases to the Rescue? n DB query languages used to be navigational. n Relational languages are more useful for many tasks. u Powerful, and (more or less) declarative. u Queries are written without regard to the physical structure/location/etc. of data. (Data Independence) u Easily extended to distributed systems. n DB query languages and optimization techniques have been developed over decades. n This technology is unavailable to the Internet user.
M. Franklin, 12/3/99 37 Distributed Query Processing (QP) SELECT eid,ename,title,salary FROM Emp, Proj, Assign WHERE Emp.eid = Assign.eid AND Proj.pid = Assign.pid AND Emp.loc <> Proj.loc n System handles query plan generation & optimization; ensures correct execution. n Originally conceived for corporate networks. ©1998 Ozsu and Valduriez
M. Franklin, 12/3/99 38Conclusions n Current Internet querying and data manipulation capabilities are too limited. u Unexpressive, too coarse grained, etc. u Do not support manipulating data from multiple sites. n Distributed querying technology addresses these concerns but is not applicable on the Internet. n A key concern is unpredictability. u Query Scrambling is a reactive execution approach. u XJoin is a pipelined operator that streams answers. u Lots more interesting work to be done in this area.