“Artificial Intelligence” in my research Seung-won Hwang Department of CSE POSTECH.

“Artificial Intelligence” in my research Seung-won Hwang Department of CSE POSTECH

2 Recap Bridging the gap between under-/over-specified user queries We went through various techniques to support intelligent querying, implicitly/automatically from data, prior users, specific user, and domain knowledge My research shares the same goal, with some AI techniques applied (e.g., search, machine learning)

3 The Context: Rank Formulation Rank Processing select * from houses order by [ranking function F] limit 3 ranked results query top-3 houses e.g., realtor.com

4 Overview Rank Formulation Rank Processing select * from houses order by [ranking function F] limit 3 ranked results query top-3 houses e.g., realtor.com Usability: Rank Formulation Efficiency: Processing Algorithms

5 Part I: Rank Processing Essentially a search problem (you studied in AI)

6 Limitation of Naïve approach a:0.90, b:0.80, c:0.70, d:0.60, e:0.50 b:0.78 Algorithm F = min(new,cheap,large) k = 1 Sort stepMerge step new (search predicate) : x cheap (expensive predicate) : p c large (expensive predicate) : p l d:0.90, a:0.85, b:0.78, c:0.75, e:0.70 b:0.90, d:0.90, e:0.80, a:0.75, c:0.20       Our goal is to schedule the order of probes to minimize the number of probes

7 a:0.9 b:0.8 c:0.7 d:0.6 e:0.5 a:0.85 b:0.8 c:0.7 d:0.6 e:0.5 pr(a,p c ) =0.85 pr(a,p l ) =0.75 OIDxpcpc plpl min(x, p c, p l ) a0.90 b0.80 c0.70 d0.60 e0.50 0.85 0.78 0.75 0.90 0.75 0.78 global schedule : H(p c, p l ) Unnecessary probes initial state a b c d e a b c d e b b goal state

8 Search Strategies? Depth-first Breadth-first Depth-limited / iterative deepening (try every depth limit) Bidirectional Iterative improvement (greedy/hill climbing)

9 Best First Search Determining which node to explore next, using evaluation function Evaluation function:  exploring more on object with the highest “upper bound score” We could show that this evaluation function minimizes the number of evaluation, by evaluating only when “absolutely necessary”.

10 Necessary Probes? Necessary probes  probe pr(u,p) is necessary if we cannot determine top-k answers until probing pr(u,p), where u: object, p: predicate OIDxpcpc plpl min(x, p c, p l ) a0.90 b0.80 c0.70 d0.60 e0.50 top-1: b(0.78) Can we decide top-1 without probing pr(a,p c )? 0.85 0.78 0.75 0.90 0.20 0.75 0.78 0.20 ≤0.90 Let global schedule be H(p c, p l )  No pr(a,p c ) necessary! 0.90 0.700.80 0.60 0.50

11 a:0.9 b:0.8 c:0.7 d:0.6 e:0.5 a:0.85 b:0.8 c:0.7 d:0.6 e:0.5 b:0.8 a:0.75 c:0.7 d:0.6 e:0.5 a:0.75 c:0.7 d:0.6 e:0.5 b:0.78 a:0.75 c:0.7 d:0.6 e:0.5 b:0.78 pr(a,p c ) =0.85 pr(a,p l ) =0.75 pr(b,p c ) =0.78 pr(b,p l ) =0.90 Top-1 OIDxpcpc plpl min(x, p c, p l ) a0.90 b0.80 c0.70 d0.60 e0.50 0.85 0.78 0.75 0.90 0.75 0.78 global schedule : H(p c, p l ) Unnecessary probes

12 Generalization FA, TA, QuickCombine r =1 (cheap) r = h (expensive) r =  (impossible) CA, SR-Combine NRA, StreamCombine s =1 (cheap) s = h (expensive) s =  (impossible) Random Access Sorted Access FA, TA, QuickCombine NRA, StreamCombine MPro [SIGMOD02/TODS] Unified Top-k Optimization [ICDE05a/TKDE]

13 Strong nuclear force Electromagnetic force Weak nuclear force Gravitational force Unified field theory Just for Laugh: Adapted from Hyountaek Yong’s presentation

14 FA TA NRA CA MPro Unified Cost-based Approach

15 Generality Across a wide range of scenarios  One algorithm for all

16 Adaptivity Optimal at specific runtime scenario

17 Cost based Approach Cost-based optimization  Finding optimal algorithm for the given scenario, with minimum cost, from a space  M opt 

18 Evaluation: Unification and Contrast (v. TA) T N N T N Unification: For symmetric function, e.g., avg( p 1, p 2 ), framework NC behaves similarly to TA Contrast: For asymmetric function, e.g., min( p 1, p 2 ), NC adapts with different behaviors and outperforms TA depth into p 1 depth into p 2 depth into p 1 depth into p 2 cost

19 Part II: Rank Formulation Rank Formulation Rank Processing select * from houses order by [ranking function F] limit 3 ranked results query top-3 houses e.g., realtor.com Usability: Rank Formulation Efficiency: Processing Algorithms

20 Learning F from implicit user interactions Using machine learning technique (that you will learn soon!) to combine quantitative model for efficiency and qualitative model for usability Quantitative model  Query condition is represented as a mapping F of objects into absolute numerical scores  DB-friendly, by attaining the absolute score on each object  Example F( )=0.9 F( )=0.5 Qualitative model  Query condition is represented as a relative ordering of objects  User-friendly by alleviating user from specifying the absolute score on each object  Example >

21 A Solution: RankFP (RANK Formulation and Processing) For usability, a qualitative formulation front-end which enables rank formulation by ordering samples For efficiency, a quantitative ranking function F which can be efficiently processed sample S (unordered) Sample Selection: generate new S Function Learning: learn new F ranking R* over S Over S: R F  R* ? 1 2 3 45 no yes F ranking function Rank Formulation Rank Processing ranked results processing of Q Q: select * from houses order by F limit k

22 Task 1: Ranking  Classification Challenge: Unlike a conventional learning problem of classifying objects into groups, we learn a desired ordering of all objects Solution: We transform ranking into a classification on pairwise comparisons [Herbrich00] learning algorithms: a binary classifier + - F a-b b-c c-d d-e a-c … … ranking view: c > b > d > e > a c b d e a classification view: - - + + - pairwise comparison classification [Herbrich00] R. Herbrich, et. al. Large margin rank boundary for ordinal regression. MIT Press, 2000.

23 Task 2: Classification  Ranking Challenge: With the pairwise classification function, we need to efficiently process ranking. Solution: developing duality connecting F also as a global per- object ranking function. Suppose function F is linear Classification View: Ranking View: F(u i -u j )>0  F(u i )- F(u j )>0  F(u i )> F(u j ) b d e a c F(a-b)? F(a)=0.7 F(a-c)? F(a-d)? ….. F Rank with F(. ) e.g., F(c)>F(b)>F(d)>…

24 Task 3: Active Learning Finding samples maximizing learning effectiveness  Selective sampling: resolving the ambiguity  Top sampling: focusing on top results Achieving >90% accuracy in <=3 iterations (<=10 ms) F F

25 Using Categorization for Intelligent Retrieval Category structure created a-priori (typically a manual process) At search time: each search result placed under pre-assigned category Susceptible to skew  information overload

26 Categorization: Cost-based Optimization Categorize results automatically/dynamically  Generate labeled, hierarchical category structure dynamically based on the contents of the tuples in the result set  Does not suffer from problems as in a-priori categorization Contributions:  Exploration/cost models to quantify information overload faced by an user during an exploration  Cost-driven search to find low cost categorizations  Experiments to evaluate models/algorithms

27 Thank You!

“Artificial Intelligence” in my research Seung-won Hwang Department of CSE POSTECH.

Similar presentations

Presentation on theme: "“Artificial Intelligence” in my research Seung-won Hwang Department of CSE POSTECH."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

“Artificial Intelligence” in my research Seung-won Hwang Department of CSE POSTECH.

Similar presentations

Presentation on theme: "“Artificial Intelligence” in my research Seung-won Hwang Department of CSE POSTECH."— Presentation transcript:

Similar presentations

About project

Feedback