EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al.

Slides:



Advertisements
Similar presentations
Computer Science and Engineering Inverted Linear Quadtree: Efficient Top K Spatial Keyword Search Chengyuan Zhang 1,Ying Zhang 1,Wenjie Zhang 1, Xuemin.
Advertisements

Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
1 Network Coding: Theory and Practice Apirath Limmanee Jacobs University.
CPSC 335 Application of Trees Dr. Marina Gavrilova Computer Science University of Calgary Canada.
Graph & BFS.
Information Retrieval and Databases: Synergies and Syntheses IDM Workshop Panel 15 Sep 2003 Jayavel Shanmugasundaram Cornell University.
CS Lecture 9 Storeing and Querying Large Web Graphs.
Graph COMP171 Fall Graph / Slide 2 Graphs * Extremely useful tool in modeling problems * Consist of: n Vertices n Edges D E A C F B Vertex Edge.
Graph & BFS Lecture 22 COMP171 Fall Graph & BFS / Slide 2 Graphs * Extremely useful tool in modeling problems * Consist of: n Vertices n Edges D.
CAREER: Towards Unifying Database Systems and Information Retrieval Systems NSF IDM Workshop 10 Oct 2004 Jayavel Shanmugasundaram Cornell University.
Graph G is shown: And 7 of its subgraphs are: How many of these subgraphs are induced?
Link Analysis. 2 HITS - Kleinberg’s Algorithm HITS – Hypertext Induced Topic Selection For each vertex v Є V in a subgraph of interest: A site is very.
Information Retrieval
 Search engines are programs that search documents for specified keywords and returns a list of the documents where the keywords were found.  A search.
The Further Mathematics network
An Application of Graphs: Search Engines (most material adapted from slides by Peter Lee) Slides by Laurie Hiyakumoto.
NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality.
Keyword Search in Relational Databases Jaehui Park Intelligent Database Systems Lab. Seoul National University
Adversarial Information Retrieval The Manipulation of Web Content.
DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Know your Neighbors: Web Spam Detection Using the Web Topology Presented By, SOUMO GORAI Carlos Castillo(1), Debora Donato(1), Aristides Gionis(1), Vanessa.
Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.
ITIS 1210 Introduction to Web-Based Information Systems Chapter 27 How Internet Searching Works.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
Harikrishnan Karunakaran Sulabha Balan CSE  Introduction  Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model 
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
Link Analysis on the Web An Example: Broad-topic Queries Xin.
EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Ranking in Information Retrieval Systems Prepared by: Mariam John CSE /23/2006.
Optimal Link Bombs are Uncoordinated Sibel Adali Tina Liu Malik Magdon-Ismail Rensselaer Polytechnic Institute.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
CS315-Web Search & Data Mining. A Semester in 50 minutes or less The Web History Key technologies and developments Its future Information Retrieval (IR)
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?
Effective Keyword-Based Selection of Relational Databases By Bei Yu, Guoliang Li, Karen Sollins & Anthony K. H. Tung Presented by Deborah Kallina.
1 Information Retrieval LECTURE 1 : Introduction.
1 Beginning & Intermediate Algebra – Math 103 Math, Statistics & Physics.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Google PageRank Algorithm
Graphs A graphs is an abstract representation of a set of objects, called vertices or nodes, where some pairs of the objects are connected by links, called.
Graphs. Introduction Graphs are a collection of vertices and edges Graphs are a collection of vertices and edges The solid circles are the vertices A,
1/16/20161 Introduction to Graphs Advanced Programming Concepts/Data Structures Ananda Gunawardena.
1 Some Guidelines for Good Research Dr Leow Wee Kheng Dept. of Computer Science.
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
Informatics tools in network science
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
GRIN: A Graph Based RDF Index Octavian Udrea 1 Andrea Pugliese 2 V. S. Subrahmanian 1 1 University of Maryland College Park 2 Università di Calabria.
One Platform for Mining Structured and Unstructured Data: Dream or Reality? VLDB Panel 13 Sep 2006 Jayavel Shanmugasundaram Yahoo! Research.
Construction of Optimal Data Aggregation Trees for Wireless Sensor Networks Deying Li, Jiannong Cao, Ming Liu, and Yuan Zheng Computer Communications and.
CS 440 Database Management Systems Web Data Management 1.
Algebra 1 Section 7.6 Solve systems of linear inequalities The solution to a system of linear inequalities in two variable is a set of ordered pairs making.
Databases and Information Retrieval: Rethinking the Great Divide SIGMOD Panel 14 Jun 2005 Jayavel Shanmugasundaram Cornell University.
Database Research for the Current Millennium ICDE Panel 1 Apr 2004 Jayavel Shanmugasundaram Cornell University.
Algebra 1 Section 4.2 Graph linear equation using tables The solution to an equation in two variables is a set of ordered pairs that makes it true. Is.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Modular organization.
Structured-Value Ranking in Update- Intensive Relational Databases Jayavel Shanmugasundaram Cornell University (Joint work with: Lin Guo, Kevin Beyer,
Find Customer – Bind Customer
Algebra 1 Section 6.5 Graph linear inequalities in two variables.
Indexing Goals: Store large files Support multiple search keys
Inferring People’s Site Preference in Web Search
Summarizing Entities: A Survey Report
Information Retrieval and Web Design
Information Retrieval and Web Design
INTRODUCTION TO NETWORK FLOWS
Introduction to XML IR XML Group.
Journal Club Physical Review E
Presentation transcript:

EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al.

The Problem Keyword search introduces false positives Keyword search introduces false positives i.e.: “Conference 2008 Canada Data Integration”

The Problem Websites are organized through content Websites are organized through content “Dr Pain, Math 343, Linear Algebra”

The Solution Combine linked pages for search, ordered by ranking

The Solution r-Radius Steiner Graph Problem r-Radius Graph Centric Distance: shortest path Radius: minimal centric distance v u t r s

The Solution r-Radius Steiner Graph Problem Content node: Contains a keyword Steiner node: Two content nodes u t r “Dr Pain” “Math 343” v s

r-Radius Steiner Graph on search Example: Example:

r-Radius Steiner Graph on search

The graph model for the publication database

Adjacency Matrix

Finding r-Radius Graphs Query: “Shanmugasundaram, Guo, XRANK” Query: “Shanmugasundaram, Guo, XRANK”

Avoiding Overlapping Maximal r-Radius Graph Maximal r-Radius Graph It is not contained in another r-Radius subgraph It is not contained in another r-Radius subgraph But wait! There is still overlap But wait! There is still overlap No problem: No problem: Graph Clustering Graph Clustering Graph Partitioning Graph Partitioning

Graph Clustering

Ranking TF-IDF-based IR ranking (tf,idf,ndl) is ok TF-IDF-based IR ranking (tf,idf,ndl) is ok Better yet: structural compactness-based DB ranking (SIM) Better yet: structural compactness-based DB ranking (SIM) More compact more relevant More compact more relevant Length of path inversely proportional to ranking Length of path inversely proportional to ranking

Indexing IR score and Sim score are combined IR score and Sim score are combined An inverted index (EI-Index) is created An inverted index (EI-Index) is created The inverted index stores keyword pairs and scores The inverted index stores keyword pairs and scores

Experiments

Results

Results

Results

Results

Strengths of the Paper Very well written paper Very well written paper Deep research on the topic Deep research on the topic Mathematical based and proved Mathematical based and proved Baseline with current methods Baseline with current methods Good results Good results

Weakness and Future Work It might be too complex It might be too complex Could work on ways to find Steiner graphs faster Could work on ways to find Steiner graphs faster It doesn’t consider cases of farming sites or bogus sites It doesn’t consider cases of farming sites or bogus sites

Questions?