Presentation is loading. Please wait.

Presentation is loading. Please wait.

Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented.

Similar presentations


Presentation on theme: "Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented."— Presentation transcript:

1 Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented By : Pavan Kumar M.K. (1000618890) Aditya Mangipudi (1000649172)

2  Introduction  Motivation  A* Search Algorithm  A*-Driven State Space Construction  Optimization Driven Configuration  OPT* Search Algorithm  Experiments  Conclusion

3  The wide spread of databases for managing structured data, compounded with the expanded reach of the Internet, has brought forward interesting data retrieval and analysis scenarios to RDBMS  Only the Top-K results are of interest to the user.

4 4 Ranking query: Top 5 ranked by GPA + Boolean query: dept = CSE and year = 2 Qualifying constraint Quantifying function O: GPA B: dept = CSE and year = 2 Find top answers QUERY: Select the Top-5 2 nd year students in CSE with highest GPA

5  Query Q = (G, k)  G - Goal Function G = B. O  k – Retrieval Size

6 6 Ranking query + Boolean query How to answer?

7  If evaluated as separate operators  If search by an overall goal function G as a ranking function 7 Boolean query B ……… Ranking query R  Current techniques optimize only condition-by- condition D Boolean query B Ranking query R D RB Goal function G

8 Att 1Att 2

9  Threshold Algorithm essentially relies on a rigid assumption that G functions are Monotonic.  The monotonicity requires G to be decreasing if all its parameters are decreasing.

10  Consider the example query as below to find houses in a certain price range with good price/sqrft ratio  The function G here in Non-Monotonic. Select h.address from House h, Where h.price ≤ 200k ν h.price ≥ 400k Order by h.size/|h.price-300k|

11 Att 1Att 2

12  Existing algorithms build upon their problem- specific assumptions on the goal functions or index traversals.  For example, Threshold Algorithm assumes the monotonicity of G and the use of sorted accesses (interleaf navigation), based on which the search is implicitly hardwired.  In a Boolean Query like B = price > 100K, such a search is straightforward as the constraint expressions B explicitly suggests how to carry out a focused search, eg., visiting only the nodes with locality potentially satisfying B.

13  In contrast, for a general k-constrained optimization query potentially involving arbitrary ranking combined with Boolean conditions and joining multiple relations, eg.. Q maximizing size/price ratio, it is no longer clear how to focus the search.  By encoding into a generic search with no assumptions on G, the search is generalized to support arbitrary G over potentially multiple indices and a combination of both hierarchical and interleaf traversals.

14  A* is a well known search algorithm that finds the Shortest Path, given an initial and a designated goal state.  Widely used in the field of Artificial Intelligence.  Uses Best-First Search Traversal.  Uses heuristic information to carry out the search in a guided manner.  A* is guaranteed to find the correct answer (Correctness) by visiting the least number of states (Optimality)  Ex: GPS, Google Maps, A lot of puzzles, games etc.

15 For a tuple t with m attribute values, Goal Function G(t) maps the tuple to a positive numeric score. 15 G(t) = B(t)*R(t) = R(t) if B(t) is true 0 if B(t) is false (ie, lowest score)

16 AddrPriceSize 1. Oak park, Chicago 600K4500 2. Mattis, Champaign 350K2000 3.… 150K1000 4.… 250K2000 5.… 300K3500 6.… 80K500 Select h.address from House h, Where h.price ≤ 200k ν h.price ≥ 400k Order by h.size/|h.price-300k| Score 15 0 6.67 0 0 2.27

17 AddrPriceSize 1. Oak park, Chicago 600K4500 2. Mattis, Champaign 350K2000 3.… 150K1000 4.… 250K2000 5.… 300K3500 6.… 80K500 Score 15 0 6.67 0 0 2.27

18  To realize k-constrained optimization over databases, this paper develops the OPT* framework.  Objective: To Optimize G with the help of indices as access methods over tuples in D.  Discrete State Search: From the view of using indices, we are to search the maximizing tuples on the index nodes as “discrete states”.  Continuous Function Optimization: From the view of maximizing goal functions, we are to optimize G.

19 19 Optimize G over D Function optimization of G Discrete state search over D G D D OPT*

20 IndicesValue Space

21  States : States in a search graph represent “localities” of values at different granularity– from coarse to fine, and eventually reach tuples in the database. Region State Tuple State  Transitions : While states of space give “locations” in the map, transitions further capture possible paths followed to reach our destination of query answers. Example : for two states u and v, there is a transition (u, v) if v ∈ Next(u)

22 22 250 3000 350 100 150040004500 600 250-600 0-250 100-2500-100350-600250-350 521 ……… b1b1 b3b3 b2b2 b7b7 b6b6 3000-45000-3000 1500-30000-15004000-60003000-4000 51 ……… a1a1 a6a6 a3a3 a2a2 a7a7 size Price (k) 1 5 2 3 4 6

23 23 250 3000 350 100 150040004500 600 M 11 M 22 M 32 M 23 M 33 M 66 M 77 M 67 M 76 M 55 M 56 M 75 1542 250-600 0-250 100-2500-100350-600250-350 521 ……… b1b1 b3b3 b2b2 b7b7 b6b6 3000-45000-3000 1500-30000-15004000-60003000-4000 51 ……… a1a1 a6a6 a3a3 a2a2 a7a7 size Price (k) 1 5 2 3 4 6 M ij = (a i, b j ) … …

24 24 250 3000 350 100 150040004500 600 M 11 M 22 M 32 M 23 M 33 M 66 M 77 M 67 M 76 M 55 M 56 M 75 15 4 2 250-600 0-250 100-2500-100350-600250-350 521 ……… b1b1 b3b3 b2b2 b7b7 b6b6 3000-45000-3000 1500-30000-15004000-60003000-4000 51 ……… a1a1 a6a6 a3a3 a2a2 a7a7 size Price (k) 1 5 2 3 4 6 M ij = (a i, b j ) conceptually, combined space …

25 Challenge 1: What is the search mechanism? 25

26 26 > A* Gives Shortest Path to testable goal. > The goal is to find optimal tuple states with maximal G-Score. K-constrained optimization Find a tuple with maximal score A* Shortest path Find a path with minimal distance

27  How to encode a tuple to a path? ◦ Adding a virtual target t* only reachable through tuples  How to encode maximal tuple with minimal path? ◦ Quality of path depends solely on the tuple it passes by  For tuple state t D(t, t*) = - G(t)  For two states r, u D(r, u) = 0 27 M 55 M11M11 M 22 M 32 M 23 M 33 M 66 M 77 M 67 M 76 M 75 M 56 1542 t* 0 0 0 0 - G(4) - G(1) 0 0 …

28 Challenge 2: How to guide the search? 28

29  Function optimization measures quality of states  Function optimization aspects: Defines Proper Heuristics Identifies a set of initial states to start search. 29

30  Input : G(x 1,……,x m ) and domain of values dom = x i ε [x i 1,x i 2 ]  Output : = OPT(G,dom) where O={gives local optima} U={Upper Bound Score} OPTPOINT gives O Component of OPT OPTMAX gives U Component of OPT Approaches  Analytical Method  Seach based (Ex:Hill Climbing)  Template Based

31 Figure illustrates different states have different promises. Search should favor the choice of M 77 over M 67 because its more promising. High Medium Low

32  To guarantee completeness ◦ A* requires admissible heuristics, i.e., estimate optimistically  To ensure admissible heuristics ◦ Function optimization gives tightest upper bound  Analytical approaches  Numeric analysis package 32 H(region) = OPTMAX(G, region) i.e., maximal value of G in the region

33  h(M 67 ) gives U=0  However if we follow the link from M 67 to M 77, we can reach Tuple 1 with score 15. 250 3000 350 100 150040004500 600 1 5 2 3 4 6 M 77 M 67

34  To guarantee optimality ◦ A* requires descending heuristics  To ensure descending heuristics ◦ Remove uphill links 34 M 11 M 22 M 32 M 23 M 33 M 66 M 77 M 67 M 76 M 55 M 75 M 56 15 4 2 …

35  To guarantee correctness ◦ Every tuple state must be reachable from start states ◦ Taking only downhills requires start with high points  To ensure reachability ◦ Initial states should contain all local optima 35 M 11 M 22 M 32 M 23 M 33 M 66 M 77 M 67 M 76 M 55 M 75 M 56 1 5 4 2 …

36 36 M 11 M 22 M 32 M 23 M 33 M 66 M 77 M 67 M 76 M 55 M 75 M 56 15 4 2 M 57 … Search is implemented as priority queue driven traversal top-down

37  Example. Given a set of states constructed from the set of index graph I, the search, in principle, should follow those transitions to look for the tuple states maximizing the goal function.. The search may follow the path  M11 → M33 → M77 → 1  Top-down search  M57 → M77 → 1  Bottom-Up Search

38 M 11 M 22 M 32 M 23 M 33 M 66 M 77 M 67 M 76 M 55 M 75 M 56 1 4 2 5

39  OPT* may result in different costs if started at different initial states.  Top down-> More hops | Bottom up->Less hops  Preference goes to Bottom Up but what if Goal functions G=1/(X-Y) 2+ 1, any value satisfying X=Y maximizes the function.

40  Comparison vs. ◦ Boolean then ranking ◦ Ranking then boolean  Metrics: node accessed = N l + N t  Settings: ◦ Benchmark queries over real dataset ◦ Controlled queries over synthetic dataset 40

41  Datasets: ◦ 19,706 real estate listing crawled online  Queries ◦ Q1: size * bedrms/| price-450k| : [40k<=price<=50k] ◦ Q2: size * e bedrms / |price-350k| : [price 4000] ◦ Q3: size/price : [bedrms=3 ν bedrms=4] 41 BR_unclustered BR_clustered OPT* Q1Q2Q3

42  Datasets ◦ Three randomly generated datasets of 100k points  Uniform, gaussian, logvariatenormal  Queries ◦ Linear average queries: (eg, 0.4*a + 0.6*b) ◦ Nearest neighbor queries: (eg, (x-3)^2 + (y-4)^2) ◦ Join queries: (0.4*R.a + 0.6*S.b: R.c=R.d) 42

43  Problem ◦ Study K-constrained optimization queries as boolean + ranking  Abstraction ◦ Encode K-constrained optimization into shortest path problem  Framework ◦ Develop OPT* to process K-constrained optimization 43

44  References Boolean + Ranking: Querying a Database by K-Constrained Optimization. Z. Zhang, S. Hwang, K. C.-C. Chang, M. Wang, C. Lang, and Y. Chang. In Proceedings of the 2006 ACM SIGMOD Conference (SIGMOD 2006), pages 359-370, Chicago, June 2006 www.wikipedia.org 44

45 Questions? 45


Download ppt "Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented."

Similar presentations


Ads by Google