Presentation is loading. Please wait.

Presentation is loading. Please wait.

Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.

Similar presentations


Presentation on theme: "Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari."— Presentation transcript:

1 Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari

2 Overview Problem Definition Architectural View of Garlic Query Plan Generation Query Optimization Conclusions

3 Trying to solve The use of middleware systems. Optimize queries over sources with varying query processing capabilities. Use of cost based model. Implementation of garlic approach.

4 Cost-based (Disco, Garlic,... ) Quality-based (Object Globe, HiQIQ, … ) Adaptive query optimization (Telegraph, Tukwila, … ) Capability-based (Tsimmis, Infomaster )

5 Optimizing Queries Reasons to optimize People don’t like to wait, they want the programs to be fast. The response time should be smaller. Even if you use a faster server, this is proved wrong.

6 How to optimize? Filter as much as possible Where clause is most important in your query. Never write “select * “ specify the correct fields you want to know. Join the two tables by using all keys that are related to the tables.

7 How do they do Query Optimization? Processing costs (estimated from cost model of CPU, I/O) Communication costs (estimated using constants in catalog) Cost to initiate sub queries & methods (estimated using constants in catalog) Wrapper costs (estimated by wrapper) Plans are pruned upon enumeration Plan A not used as building block for more complex plan if cheaper alternatives available Plans with unique properties are not pruned

8 How to optimize? (cont..) Example: Query : Select * From Employees In Program : Add a filter on Dept or use command : if Dept = R&D Corrected : Select Name, Salary From Employees Where Dept = R&D For i = 1 to 2000 Call Query : Select salary From Employees Where EmpID = Parameter(i) Corrected: Select salary From Employees Where EmpID >= 1 and EmpID <= 2000

9 Garlic Architecture Wrapper acts as a interface between query services and data sources. Catalog contains local/global schemas.

10 Query services contains the query language processor and distributed query execution engine. Query language processor generatesexecution plan based on input. Query execution engine passes sub-queries to wrappers and assembles final result. Assembly may include performing joins, applying predicates, sorting, aggregates

11 What do wrappers do? Wrappers can wrap various types of data sources. Garlic wrappers are specific to Garlic provides interface to data source using Garlic’s internal protocols. Data described in an OO model, methods can be applied on data. Data source notifies wrapper of capabilities using rules. Wrapper does not have to reflect full query functionality of data source.

12 What are STARS? STARs = STrategy Alternative Rules Rules are high-level, declarative, compact specification of legal alternatives STARs define high-level constructs from low level database operators or other STARs. JoinRoot(T1,T2,p)={ Permuted Join(T1,T2,P) Permuted Join(T2,T1,P)

13 How are plans constructed? Tuples are operated upon by POPs (Plan Operators) A POP generally corresponds to one executable operator POPs include: join, sort, filter, fetch, temp, scan, pushdown (work to be performed by source) POPs have properties that describes the specifics of the operations. Source property records where output stream comes from (needed?)

14 Example Push Down POP performs operations on the data source Data sources only return OID Wrappers take Push Downs and performs them on sources by translation into query or API calls Source property shows where execution occurs Properties of POPs are functions of parent POP (I.e. predicates) Additional properties: cost, card

15

16 What do stars do? Stars can be viewed as a grammar with some set of rules. A Star determines how POPs can be combined in a plan. Here f1, f2 … are the name of the star or POPs.

17 Stars (cont…) A star can retrieve columns that are needed by another star. Access root STAR to create plans

18 How is plan enumeration performed? Access Root STAR is used to create plans to select all attributes used in query (no real variability in plans, performs a Push Down). Join Root STAR is used to create plans to perform joins. Finish Root STAR is used to include any missing parts of the query (i.e. projections, ordering) Pruning for query optimization performed throughout to minimize number of plans to enumerate.

19 How are data source capabilities determined? Wrapper implements STARs that describe the capability of each data source. STARs follow POP structure mentioned previously Simple STARs can model basic capabilities of data sources Complex capability is arguably not needed as Garlic’s query engine can make up for it. Wrapper can iteratively add STARs to: – Introduce source quickly into mediated schema – Improve performance

20 Modeling wrappers using stars University Offers a course, course description and online complaint mechanisms. That is (relational, text, mail) The mail has a sender, date, body and subject

21 Modeling wrapper using STARs The class objects have attributes like courses and professor.

22 Modeling Wrappers using STARs

23 Disco optimizer

24 Tsimmis

25 Future Trends

26 Any Questions? Thank you


Download ppt "Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari."

Similar presentations


Ads by Google