Fall 2008Parallel Query Scheduling1
Fall 2008Parallel Query Scheduling2 Query Processing Queries submitted to the system are queued up and processed in two steps: Compile-time: Each query is translated into a query tree which specifies the optimized order for executing the necessary database operators. Run-time: The operators are scheduled to execute in the PNs in such a way to maximize system throughput while ensuring good response times.
Fall 2008Parallel Query Scheduling3 Competition-Based Scheduling In CB scheduling, a set of coordinator processes are pre- created and allocated to the queries by a dispatcher process on a First-Come-First-Serve (FCFS) basis. A coordinator is responsible for scheduling the operators in the corresponding query tree. A coordinator also competes for the operator servers on the behalf of its operators. When a coordinator has successfully acquired all the operator servers needed for an operator, it coordinates these servers to execute the operation in parallel. Partial execution of unary operations (e.g., select) is also allowed.
Fall 2008Parallel Query Scheduling4 CB Scheduling
Fall 2008Parallel Query Scheduling5 (Dis)Advantage of CB Scheduling Advantage: An obvious advantage is its simplicity. Each query is given the same opportunity to compete for the computing resources. Disadvantage: The system resources can be under-utilized because the coordinators are not aware of the presence of other queries. It cannot take advantage of techniques such as Best Fit which can maximize the system utilization.
Fall 2008Parallel Query Scheduling6 Planning-Based Scheduling All active queries share a single scheduler. Since the scheduler knows the resource requirements of all the active queries, it can schedule the operators based on how well their requirements match the current condition of the computing system. The requirement of an operator is defined as the set of operator servers needed for its execution. The scheduler considers only the queued queries within a fixed size window, called scheduling window.
Fall 2008Parallel Query Scheduling7 PB Scheduling
Fall 2008Parallel Query Scheduling8 PB with Largest-Fit-First (PB-LF) Step 1: Determine the requirement of each leaf node in the scheduling window, and insert these operators into a ready list. Step 2: Sort the operators in ready list into descending order according to the size of their requirements. Step 3: Examine each operator in the ready list in the sorting order until an operator whose requirement can be met is found. Step 4: Create a coordinator process to coordinate the parallel execution. Step 5: Repeat Step 3 & 4, until there is no operator found. Step 6: Execute found operators.
Fall 2008Parallel Query Scheduling9 PB with First-Fit-First (PB-FF) A potential drawback of PB-LF is that operators with smaller requirement run the risk of experiencing longer waiting time. Step 1: Determine the requirement of each leaf node in the scheduling window, and insert these operators into a ready list. Step 2: Examine each operator in the ready list in the arrival order until an operator whose requirement can be met is found. Step 4: Create a coordinator process to coordinate the parallel execution. Step 5: Repeat Step 3 & 4, until there is no operator found. Step 6: Execute found operators.
Fall 2008Parallel Query Scheduling10 Test Query Tree Structures
Fall 2008Parallel Query Scheduling11 Effect of Query Arrival Rate
Fall 2008Parallel Query Scheduling12 Effect of Number of PNs
Fall 2008Parallel Query Scheduling13 CR-Property CR-Property: Consecutive Retrieval Property. CR-property is most used as for data allocation in database systems. The basic application is to arrange all records relevant to a query and store them into consecutive storage locations on a linear storage for minimizing access time for the query. Another approach uses C-R property as a file allocation scheme to distribute arbitrarily well constructed file onto multiple disk systems for speedup the parallel data access.
Fall 2008Parallel Query Scheduling14 CR-Property Example: Q1 Q2 Q3 R1: R2: R3: R4: R5: R6: Page1 Page2 Q1 Q2 Q3 R1: R3: R5: R2: R4: R6: Page1 Page2 Query requirementsData allocation with CR-property
Fall 2008Parallel Query Scheduling15 Parallel Query Scheduling Q1Q1 Q2Q2 Q3Q3 Q4Q4 Q5Q5 Q6Q6 Q7Q7 PN PN PN PN PN PN PN PN Query Queue PN needed
Fall 2008Parallel Query Scheduling16 CRP Scheduling w/ Smallest First Level13567 Q4Q4 Q6Q6 Q1Q1 Q7Q7 Q2Q2 Q5Q5 Q3Q3 PN 3 11 PN 8 11 PN PN PN PN PN PN 7 11 Q 4, Q 1, and Q 5 will be scheduled first.
Fall 2008Parallel Query Scheduling17 CRP Scheduling w/ Largest First Level13567 Q6Q6 Q4Q4 Q7Q7 Q1Q1 Q2Q2 Q5Q5 Q3Q3 PN 3 11 PN 8 11 PN PN PN PN PN PN 7 11 Q 6, Q 2, and Q 3 will be scheduled first.
Fall 2008Parallel Query Scheduling18 Effect of No. of PNs