Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.

Slides:

Advertisements

Similar presentations

Introduction to Information Retrieval Introduction to Information Retrieval Lecture 7: Scoring and results assembly.

Advertisements

Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,

Chapter 5: Introduction to Information Retrieval

Efficient IR-Style Keyword Search over Relational Databases Vagelis Hristidis University of California, San Diego Luis Gravano Columbia University Yannis.

Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.

Improved TF-IDF Ranker

Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.

Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.

Information Retrieval in Practice

1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.

FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.

Computer comunication B Information retrieval. Information retrieval: introduction 1 This topic addresses the question on how it is possible to find relevant.

Chapter 19: Information Retrieval

The Vector Space Model …and applications in Information Retrieval.

Information Retrieval

Chapter 5: Information Retrieval and Web Search

Overview of Search Engines

Authors: Bhavana Bharat Dalvi, Meghana Kshirsagar, S. Sudarshan Presented By: Aruna Keyword Search on External Memory Data Graphs.

NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality.

Keyword Search in Relational Databases Jaehui Park Intelligent Database Systems Lab. Seoul National University

Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.

Leveraging Conceptual Lexicon ： Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.

DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal Surajit Chaudhuri Gautam Das Presented by Bhushan Pachpande.

DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer.

1 Chapter 19: Information Retrieval Chapter 19: Information Retrieval Relevance Ranking Using Terms Relevance Using Hyperlinks Synonyms., Homonyms,

Sanjay Agarwal Surajit Chaudhuri Gautam Das Presented By : SRUTHI GUNGIDI.

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

Basics of Information Retrieval Lillian N. Cassel Some of these slides are taken or adapted from Source:

Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.

CSE 6331 © Leonidas Fegaras Information Retrieval 1 Information Retrieval and Web Search Engines Leonidas Fegaras.

Querying Structured Text in an XML Database By Xuemei Luo.

A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:

Harikrishnan Karunakaran Sulabha Balan CSE  Introduction  Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model 

11 A Hybrid Phish Detection Approach by Identity Discovery and Keywords Retrieval Reporter: 林佳宜 /10/17.

EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong.

Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.

Chapter 6: Information Retrieval and Web Search

RANKING SUPPORT FOR KEYWORD SEARCH ON STRUCTURED DATA USING RELEVANCE MODEL Date: 2012/06/04 Source: Veli Bicer(CIKM’11) Speaker: Er-gang Liu Advisor:

FINDING RELEVANT INFORMATION OF CERTAIN TYPES FROM ENTERPRISE DATA Date: 2012/04/30 Source: Xitong Liu (CIKM’11) Speaker: Er-gang Liu Advisor: Dr. Jia-ling.

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

Date : 2012/10/25 Author : Yosi Mass, Yehoshua Sagiv Source : WSDM’12 Speaker : Er-Gang Liu Advisor : Dr. Jia-ling Koh 1.

Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:

Date: 2013/10/23 Author: Salvatore Oriando, Francesco Pizzolon, Gabriele Tolomei Source: WWW’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang SEED:A Framework.

Effective Keyword-Based Selection of Relational Databases By Bei Yu, Guoliang Li, Karen Sollins & Anthony K. H. Tung Presented by Deborah Kallina.

A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,

Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.

Date: 2013/6/10 Author: Shiwen Cheng, Arash Termehchy, Vagelis Hristidis Source: CIKM’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Predicting the Effectiveness.

Adaptive Processing of Top-k Queries in XML Amelie Marian, Sihem Amer-Yahia Nick Koudas, Divesh Srivastava Proceedings of the 21st International Conference.

Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.

Advisor: Koh Jia-Ling Nonhlanhla Shongwe EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09.

Sudhanshu Khemka.  Treats each document as a vector with one component corresponding to each term in the dictionary  Weight of a component is calculated.

CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,

Fast Indexes and Algorithms For Set Similarity Selection Queries M. Hadjieleftheriou A.Chandel N. Koudas D. Srivastava.

Leveraging Knowledge Bases for Contextual Entity Exploration Categories Date:2015/09/17 Author:Joonseok Lee, Ariel Fuxman, Bo Zhao, Yuanhua Lv Source:KDD'15.

Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.

Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.

CS791 - Technologies of Google Spring A Webbased Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.

1 Link Privacy in Social Networks Aleksandra Korolova, Rajeev Motwani, Shubha U. Nabar CIKM’08 Advisor: Dr. Koh, JiaLing Speaker: Li, HueiJyun Date: 2009/3/30.

Structured-Value Ranking in Update- Intensive Relational Databases Jayavel Shanmugasundaram Cornell University (Joint work with: Lin Guo, Kevin Beyer,

IR 6 Scoring, term weighting and the vector space model.

An Efficient Algorithm for Incremental Update of Concept space

Information Retrieval

Information Retrieval and Web Search

Multimedia Information Retrieval

Information Retrieval

Keyword Searching and Browsing in Databases using BANKS

Chapter 31: Information Retrieval

Information Retrieval and Web Design

Chapter 19: Information Retrieval

Presentation transcript:

Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD : Keyword Search Over Structured Data

Outline Introduction Approach : KESOSD Graph Tuple Unit Virtual Document Ranking Indexing Experiments Conclusion

Introduction The origin of the information comes from a variety of data sources that could be classified based on its structure: Unstructured data : plain text Semi-structured :XML documents, HTML pages Structured : tables in a relational database

Introduction Current search engines are mainly focused on exploring unstructured information. Without having a detailed knowledge of the schema of the database and without the need to learn some formal language like SQL. Losing relevant information that could be found in different data sources like semi-structured or structured.

Introduction Propose a novel approach for keyword search over structured data : KESOSD. To improve the keyword-based search over structured data and make an efficient processing using a score based on the classical Information Retrieval.

Outline Introduction Approach : KESOSD Graph Tuple Unit Virtual Document Ranking Indexing Experiments Conclusion

Query Processing A database with n table Data Graph Tuple Units Virtual documents Given a keyword query K = k1, k2,……kn Compute the score of each relevant virtual document Return the top-k answers with the highest scores Virtual document contains K KSAI index

Graph Model structured data as an undirected graph, where nodes are tuples, and the edges are primary-foreign key relationships. Given a database D with n table. R i R j : R i has a foreign key k which refers to the primary key of R j. k

Graph A relational table R i is called a link table.

Tuple Units Choose a root node of the data graph and the set of neighbors or adjacent nodes connected by primary or foreign key, is called a Tuple Unit.

Tuple Units Choose a root node of the data graph and the set of neighbors or adjacent nodes connected by primary or foreign key, is called a Tuple Unit.

Tuple Units Choose a root node of the data graph and the set of neighbors or adjacent nodes connected by primary or foreign key, is called a Tuple Unit.

Virtual Document Create a new document for each neighbor of the node, and all these documents together compose a single Virtual Document. Ex:

Virtual Document Create a new document for each neighbor of the node, and all these documents together compose a single Virtual Document. Ex:

Ranking The simplest way to score and rank the virtual documents is by the use of the TF·IDF metric. D : total set of the virtual documents N different virtual documents K keywords Given a virtual document denoted as d є D and a keyword k i contained in d.

Normalized term frequency of k i : tf( ki, d) : the number of occurrences of the term k i in a document d Ex: k1 : Children’s ntf( k1 ) = 1 + ln( 1 + tf( k1, d ) ) = K2 : Fantasy ntf( k2 ) = 1 + ln( 1 + tf( k2, d ) ) = 1 Ranking

Inverse document frequency of k i : dfki : the number of virtual documents that terms ki occurs in D Ex: k1 : Children’s idf( k1 ) = ln( 5 / (2+1) ) = K2 : Fantasy idf( k2 ) = ln( 5/ (1+1) ) =

Ranking Document Normalized length: |d| : the number of terms in a documents D S : a constant taken from IR literature and is usually set to 0.2 Ex: d1 : vd (Tu1) ndl( d1 ) = ( ) * ( 14 / (50/ 5) ) = 1.08 d2 : vd (Tm2) ndl( d2 ) = ( ) * ( 12 / (50/ 5) ) = 1.04

Ranking TF·IDF : Ex: k1 : Children’s TFIDF( k1, d1) = ( / 1.08) * = K2 : Fantasy TFIDF( k2, d3) = (1 / 1.04) * =

Ranking SCORE : Ex: k1 : Children’s Score( k1, d1) = = K2 : Fantasy Score( k2, d3) = = K = Children’s Fantasy Score( K, d1) = =

Indexing Propose a keyword-structure-aware-index : KSAI index. Similar to the traditional inverted index. The entries of KSAI are also the keywords.

Outline Introduction Approach : KESOSD Graph Tuple Unit Virtual Document Ranking Indexing Experiments Conclusion

Database Used the Internet Movie DataBase ( IMDB ) to evaluate. This dataset generate vertex arcs Elapsed time Build the KSAI : 741 seconds and need 355MB (includes 4170 keywords without stop words ) Load the index in memory : 108 seconds

Search efficiency One hundred queries. KESOSD : Not need to use SQL Not need to discover the relationship

Search efficiency

Search accuracy KESOSD : Include structural information in the index construction Include the TF·IDF

Search efficiency and accuracy

Outline Introduction Approach : KESOSD Graph Tuple Unit Virtual Document Ranking Indexing Experiments Conclusion

KESOSD models structured data as graph and have proposed the use of an data index called KSAI to capture the structural relationships. KESOSD achieves high search efficiency and quality for keyword search. Future work includes to continue the testing of KESOSD with other datasets, and continue working some strategies to reduce of the size of the index.