Download presentation
Presentation is loading. Please wait.
Published byBethanie Rodgers Modified over 9 years ago
1
GEORGIOS FAKAS Department of Computing and Mathematics, Manchester Metropolitan University Manchester, UK. g.fakas@mmu.ac.uk Automated Generation of Object Summaries from Relational Databases: A Novel Keyword Searching Paradigm
2
Related Work: Keyword Search in Relational DBs Full-text Search ( e.g. Oracle 9i Text ) Kw Searching in Relational DB (DISCOVER, BANKS) Kw Search: Leverling, Peacock Result: e3-o2-c2 e4-06-c2
3
Related Work: Web Search Engines: Keyword Search Kw Search: Peacock Result: A ranked set of web pages
4
Related Work: Web Search Engines: Keyword Search Kw Search: Peacock Result: A ranked set of web pages
5
A Novel Keyword Searching Paradigm: Object Summaries (OSs) Kw Search: Peacock Result: A Ranked set of OSs
6
A Novel Keyword Searching Paradigm: Object Summaries (OSs) Kw Search: Peacock Result: A Ranked set of OSs Problems-Challenges: How can we automatically (1) Generate and (2) Rank OSs liberating users from knowledge of: (1) Schema and (2) Query Language?
7
OS Generation - Methodology Our solutions are based on the assumption that each database has central relations (denoted as RDS) that represent the DS’s. E.g Northwind RDS = { REmploye es, RCustomer s } Relations linked around RDSs include additional information about the DS t DS a central tuple containing the Kw; tuples around t DS contain additional information about the Data Subject. R DS the corresponding central Relation; similarly Relations around contain additional information.
8
OS Generation - Methodology t DS a central tuple containing the Kw; tuples around t DS contain additional information about the Data Subject. R DS the corresponding central Relation; similarly Relations around contain additional information.
9
OS Generation - Methodology t DS a central tuple containing the Kw; tuples around t DS contain additional information about the Data Subject. R DS the corresponding central Relation; similarly Relations around contain additional information.
10
OS Generation - Methodology t DS a central tuple containing the Kw; tuples around t DS contain additional information about the Data Subject. R DS the corresponding central Relation; similarly Relations around contain additional information.
11
OS Generation - Methodology G DS
12
OS Generation - Methodology G DS Problem: Problem: Not all Relations in G DS are relevant: How do I decide 1) What relations to select or not 2) When to Stop Traversing Solution: Solution: Investigate Relational Semantics: Schema Connectivity, Cardinality, Related Cardinality etc. Quantify Affinity of Relations
13
: Affinity of Relations to R DS in G DS Distance Physical (fd), Logical (ld), ld=fd- |M:N|
14
: Affinity of Relations to R DS in G DS Distance Physical (fd), Logical (ld), ld=fd- |M:N| E.g. Orders closer than Customer and CustomerDemo to Employees
15
: Affinity of Relations to R DS in G DS Distance Physical (fd), Logical (ld), ld=fd- |M:N| E.g. Orders closer than Customer and CustomerDemo to Employees Hubs: spurious shortcuts Rather irrelevant or lateral information RC(R1, R2)
16
: Affinity of Relations to R DS in G DS Connectivity Schema Connectivity (Co i ) Data-graph Connectivity : Relative Cardinality (RC i → j ), i.e. the average number of tuples of R i that are connected with each tuple from R j for 1:M RC i → j =|Ri|/|Rj| for M:1 RC i → j =1 Reverse Relative Cardinality (RRCi → j) is the reverse of RC i → j i.e. RRC i → j =RC i → j ).
17
: Affinity of Relations to R DS in G DS DAf(Ri)={(m1, w1), (m2, w2),.. (mn, wn)} m1=f1(ldi), m2=f1(log(10*RCi), m3=f1(log(10*RRCi), m4=f1(log(10*Coi) f1(α)=(11- α)/10 For a hub-child m1=f1(ldi *hi) and m2=f1(RCi) Formula 1 (Semantic Affinity): The affinity of R i to R DS, denoted as, with respect to a schema and a database conforming to the schema, can be calculated with the following formula: Where is the affinity of the R i ’s Parent to R DS or is 1 if R Parent ≡ R DS.□
18
: Affinity of Relations to R DS in G DS G DS ( θ )
19
A Ranked set of Partial OSs - A complete OS OS Ranking
20
OS Ranking- Problems and Challenges Existing Keyword Searching ranking semantics the smaller size the higher ranking In contrast, in the proposed paradigm an OS containing many and well connected tuples should have certainly greater importance than an OS with less tuples. For instance, a Customer or Employee OS involved in many Orders or an Author authored many important papers and books.
21
OS Ranking- Importance Im(OS)= t i is a tuple of OS Im(t i ) is the Importance of t i (i.e. PageRank) |OS| is the amount of tuples in OS, Af R (t i ) is the affinity of R that t i belongs to
22
Experimental Evaluation MS Northwind and TPC-H DBs Precision, Recall, F-Score Compare G DS s and OSs produced by 12 G DS (θ) v G DS (h) G DS (h) was proposed by 10 participants G DS : average F-score 86.77, OS aver F-score 83
23
Conclusions –Future Work Top-k OS results Top-k size of an OS Challenge: the weights of new tuples are not monotonic (since a tuple’s PageRank may increase while its Affinity decrease). Alternative to PageRank weighting systems are currently investigated; i.e. ObjectRanks
24
Conclusions -Novel Contributions The formal definition of the novel Searching Paradigm which automatically produces a ranked set of OSs for a Data Subject. minimum contribution from the user (i.e. only a Kw) no prior knowledge of the DB schema or query language needed. Excellent Precision, Recall and F-score results The formal definition and quantification of Relation’s Affinity in the context of G DS consider both Schema Design and Data distributions A novel ranking paradigm to calculate Im(OS). The quantification of tuples’ and OSs’ Importance. A Combine Function that considers: the weight (e.g. PageRank) of tuples, Affinity and size of OS
25
: Affinity of Relations to R DS in G DS
26
Affinity Ranking Correctness (Average)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.