Presentation is loading. Please wait.

Presentation is loading. Please wait.

Harikrishnan Karunakaran Sulabha Balan CSE 6339.  Introduction  Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model 

Similar presentations


Presentation on theme: "Harikrishnan Karunakaran Sulabha Balan CSE 6339.  Introduction  Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model "— Presentation transcript:

1 Harikrishnan Karunakaran Sulabha Balan CSE 6339

2  Introduction  Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model  Searching for the Best Answers ◦ Backward Expanding Search Algorithm  Browsing through BANKS  Experience and Performance  Related Work  Conclusion

3  With the onset of the web, number of users needing to access online databases have increased  Search engines have popularized use of unstructured querying which just needs the user to type in the keyword and follow links  Same methodology cannot be used in querying databases as knowledge of schema and querying language like SQL is needed  Keyword searching will not also work on datatbases because the data is usually spread across tables/tuples due to normalization

4  Browsing ANd Keyword Searching  Enables Keyword-Based search on Relational Databases along with Data and Schema Browsing  User interacts with data through typing keywords, following hyperlinks and using controls made available.  Absolutely no knowledge of querying or programming languages required of the user

5

6  Makes Joins Implicit and Transparent  Incorporates Notions of Proximity and Prestige  Methods to publish relational data which would otherwise remain invisible on the web are provided  Creates hierarchical and graphical views of data with hyperlinks to navigate through them

7

8

9

10  Answer to query should be sub-graph connecting nodes matching the keywords  Central node that connects all keyword nodes is Information Node and tree is Connection Tree. Foreign Key Dependencies constitute edges in the graph  Edges (References) are given weights according to its type  Weight of the tree is proportional to total of its edge weights and relevance is inversely proportional to its weight

11  To obtain model with edges directed away from information node and preserve directionality we make use of backward edges  Backward Edge assigned weight proportional to in- degree  Information Node is selected from certain sets of nodes in the graph  Backward Edges ensure that tree is rooted at the Information Node  To avoid problem of “hubs”, edges connecting popular nodes are given a higher weight thus lowering proximity

12  Concept of Node weights introduced to include Prestige Rankings  Nodes having more pointers given higher prestige  In BANKS node prestige is assigned based on in-degree of the node  Node weights and Tree Weights combined to obtain relevance score

13  Each tuple Τ has a corresponding node u τ  Each node u has a node weight N(u) depending upon the prestige of the node  Between each pair of related tuples T 1 & T 2, graph contains edge between u τ 1 to u τ 2 and back edge from u τ 2 to u τ 1  Similarity between two relations R 1 and R 2 depends upon the type of link from R 1 to R 2 and is set to infinity if R 1 does not refer to R 2

14 Edge weights  Depending upon importance of the link we set a value to the edge. Default value is 1  The weight of the directed edge(u, v) depends on factors:  If (u, v) exists but (v, u) does not, assign the weight s(R(u),R(v)) to (u, v)  If (u, v) does not exist and (v, u) does, assign the weight INv (u) s(R(v), R(u)) to (u, v) where INv is the indegree of u contributed by the tuples belonging to relation R(v)  If both (u, v) and (v, u) exist in the graph, assign the weight as the minimum of two values min{s(R(u),R(v)), INv (u) s(R(v), R(u))

15  Query  A set of keywords e.g.{k 1,k 2,…k n }  A set of nodes S i = {S 1,S 2,…S n }  Locate nodes matching search terms t 1,t 2,…t n  Answer Model  A rooted directed tree connecting keyword nodes (at least one node from S i ).  Note: Tree may also contain nodes not in any S i, Steiner Tree  Relevance score of an answer tree  Combination of its nodes and its edge weight presented in decreasing order

16  Calculating Relevance Score involves adjustment of both node weights and edge weights along with a factor to control individual weight variations  Node weights  Scaled to N max and depressed using log  Nscore(v) = N(v)/ N max or log(1+N(v)/N max  Overall Nscore taken to be average of node scores  Edge Weights  Normalized Escore(e) obtained by diving edge weight by minimum edge weight  Escore(e) = log(1+w(e)/w min )  Overall Edge Score = 1/(1 + Σ e Escore(e))  Combination of Overall Edge Score and Node Score  Additive : (1-λ)Escore + λNscore  Multiplicative : Escore * Nscore λ

17  We have to use not just the tree with the highest relevance score but also those with high scores  Answers have to be generated incrementally so that the user are provided with the ‘best’ answers at the beginning  Resultant Graph is assumed to fit in memory since only Row IDs and index to map RowIDs to nodes in the graph need to be stored by us.

18  Incrementally computes search results  Start at leaf nodes each containing a query keyword  Run concurrent single source shortest path algorithm from each such node  Traverses the graph edges backwards  Confluence of backward paths identify answer tree roots  Output a node whenever it is on the intersection of the sets of nodes reached from each keyword  Answer trees may not be generated in relevance order  Insert answers to a small buffer (heap)  Output highest ranked answer from buffer to user when buffer is full

19

20

21  Model (Query : Roy Sudarshan)

22  Due to the graphs being Steiner Trees lot of time is spent doing wasteful exploration of the graph  As keyword nodes increase, the feasibility of the algorithm decreases  Connection Trees are only approximately sorted in their increasing order of weights  Node weights are not considered, hence trees may not be produced in exact decreasing order of relevance

23  BANKS system provides  A rich interface to browse data stored in a relational database  Automatically generates browsable views of database relations and query results  Schema browsing and data browsing  A hyperlink to the referenced tuple

24  Functionalities  Columns can be projected away  Selections can be imposed on columns  Joins can be performed with foreign key columns by joining them with referencing tables  Results can be grouped by on columns which returns only distinct values in column being displayed  Sorting can be done on columns

25

26  Cross Tabs  Group By Template to view Data hierarchially  Folder Views modeled after the folder view supported by Windows Explorer etc.  In the form of bar chart, line chart or pie chart with HTML image maps to embed hyperlinks in the graphics

27  Datasets of varying sizes have been tested  No agreed upon benchmarks for Ranking Algorithms in this domain  System was found to return the most intuitive answers

28  Ideal answers were obtained using different queries  Compute absolute value of rank difference of the ideal answers with rank in the answers for given parameter setting  Sum of rank differences gives the raw error score for that parameter  We map error scores against λ and log- scaling of edge weights

29

30  Setting λ = 0.2 produced best results while λ = 1 produced worst with error scores of around 15  Log scaling of edge weight is important as otherwise back-edges from popular nodes would result in correct answers getting low relevance scores  Additive or Multiplicative combination has no effect on ranking  Node weights were abandoned as log-scaling and no log scaling produced same ranking

31  Effective when using queries matching non- metadata keywords  Brings to light data that might not be readily available on the web to the non technical user  Higher the no. of keywords, the less useful backward-expanding search algorithm becomes  Over reliance on Java can at times cause slow down of application


Download ppt "Harikrishnan Karunakaran Sulabha Balan CSE 6339.  Introduction  Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model "

Similar presentations


Ads by Google