EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al.
The Problem Keyword search introduces false positives Keyword search introduces false positives i.e.: “Conference 2008 Canada Data Integration”
The Problem Websites are organized through content Websites are organized through content “Dr Pain, Math 343, Linear Algebra”
The Solution Combine linked pages for search, ordered by ranking
The Solution r-Radius Steiner Graph Problem r-Radius Graph Centric Distance: shortest path Radius: minimal centric distance v u t r s
The Solution r-Radius Steiner Graph Problem Content node: Contains a keyword Steiner node: Two content nodes u t r “Dr Pain” “Math 343” v s
r-Radius Steiner Graph on search Example: Example:
r-Radius Steiner Graph on search
The graph model for the publication database
Adjacency Matrix
Finding r-Radius Graphs Query: “Shanmugasundaram, Guo, XRANK” Query: “Shanmugasundaram, Guo, XRANK”
Avoiding Overlapping Maximal r-Radius Graph Maximal r-Radius Graph It is not contained in another r-Radius subgraph It is not contained in another r-Radius subgraph But wait! There is still overlap But wait! There is still overlap No problem: No problem: Graph Clustering Graph Clustering Graph Partitioning Graph Partitioning
Graph Clustering
Ranking TF-IDF-based IR ranking (tf,idf,ndl) is ok TF-IDF-based IR ranking (tf,idf,ndl) is ok Better yet: structural compactness-based DB ranking (SIM) Better yet: structural compactness-based DB ranking (SIM) More compact more relevant More compact more relevant Length of path inversely proportional to ranking Length of path inversely proportional to ranking
Indexing IR score and Sim score are combined IR score and Sim score are combined An inverted index (EI-Index) is created An inverted index (EI-Index) is created The inverted index stores keyword pairs and scores The inverted index stores keyword pairs and scores
Experiments
Results
Results
Results
Results
Strengths of the Paper Very well written paper Very well written paper Deep research on the topic Deep research on the topic Mathematical based and proved Mathematical based and proved Baseline with current methods Baseline with current methods Good results Good results
Weakness and Future Work It might be too complex It might be too complex Could work on ways to find Steiner graphs faster Could work on ways to find Steiner graphs faster It doesn’t consider cases of farming sites or bogus sites It doesn’t consider cases of farming sites or bogus sites
Questions?