Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

Slides:



Advertisements
Similar presentations
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Advertisements

An F-Measure for Context-Based Information Retrieval Michael Kandefer and Stuart C. Shapiro University at Buffalo Department of Computer Science and Engineering.
Jianxin Li, Chengfei Liu, Rui Zhou Swinburne University of Technology, Australia Wei Wang University of New South Wales, Australia Top-k Keyword Search.
Efficient Keyword Search for Smallest LCAs in XML Database Yu Xu Department of Computer Science & Engineering University of California, San Diego Yannis.
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
Trust and Profit Sensitive Ranking for Web Databases and On-line Advertisements Raju Balakrishnan (Arizona State University)
Web Document Clustering: A Feasibility Demonstration Hui Han CSE dept. PSU 10/15/01.
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.
Search Engines and Information Retrieval
Efficient Processing of Top-k Spatial Keyword Queries João B. Rocha-Junior, Orestis Gkorgkas, Simon Jonassen, and Kjetil Nørvåg 1 SSTD 2011.
Paper Title Your Name CMSC 838 Presentation. CMSC 838T – Presentation Motivation u Problem paper is trying to solve  Characteristics of problem  … u.
1 Configurable Indexing and Ranking for XML Information Retrieval Shaorong Liu, Qinghua Zou and Wesley W. Chu UCLA Computer Science Department {sliu, zou,
Flexible and Efficient XML Search with Complex Full-Text Predicates Sihem Amer-Yahia - AT&T Labs Research → Yahoo! Research Emiran Curtmola - University.
Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.
Query Biased Snippet Generation in XML Search Yi Chen Yu Huang, Ziyang Liu, Yi Chen Arizona State University.
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.
Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.
1 CS 178H Introduction to Computer Science Research What is CS Research?
An Information Theory based Modeling of DSMLs Zekai Demirezen 1, Barrett Bryant 1, Murat M. Tanik 2 1 Department of Computer and Information Sciences,
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Search Engines and Information Retrieval Chapter 1.
1 A Discriminative Approach to Topic- Based Citation Recommendation Jie Tang and Jing Zhang Presented by Pei Li Knowledge Engineering Group, Dept. of Computer.
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness.
Implicit An Agent-Based Recommendation System for Web Search Presented by Shaun McQuaker Presentation based on paper Implicit:
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
TM 1 Dr. Chen, Business Database Systems Data Modeling Professor Chen School of Business Administration Gonzaga University Spokane, WA
Querying Structured Text in an XML Database By Xuemei Luo.
GEORGIOS FAKAS Department of Computing and Mathematics, Manchester Metropolitan University Manchester, UK. Automated Generation of Object.
EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong.
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
Personalizing XML Text Search in Piment Sihem Amer-Yahia AT&T Labs Research - USA Irini Fundulaki Bell Labs - USA Prateek Jain IIT-Kanpur - India Laks.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
CIKM Finding and Approximating Top-k Answers in Keyword Proximity Search Benny Kimelfeld Yehoshua Sagiv Benny Kimelfeld and Yehoshua Sagiv The Selim.
QED: A Novel Quaternary Encoding to Completely Avoid Re-labeling in XML Updates Changqing Li,Tok Wang Ling.
Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.
Performance Measurement. 2 Testing Environment.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
AnHai Doan & Alon Halevy Department of Computer Science & Engineering University of Washington Efficiently Ordering Query Plans for Data Integration.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
More Than Relevance: High Utility Query Recommendation By Mining Users' Search Behaviors Xiaofei Zhu, Jiafeng Guo, Xueqi Cheng, Yanyan Lan Institute of.
A New Algorithm for Inferring User Search Goals with Feedback Sessions.
Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques Mert Özer, Ilkcan Keles, Ismail Hakki Toroslu, Pinar.
Efficient Processing of Updates in Dynamic XML Data Changqing Li, Tok Wang Ling, Min Hu.
Research Word has a broad spectrum of meanings –“Research this topic on ….” –“Years of research has produced a new ….”
Efficient Discovery of XML Data Redundancies Cong Yu and H. V. Jagadish University of Michigan, Ann Arbor - VLDB 2006, Seoul, Korea September 12 th, 2006.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
1 Learning to Impress in Sponsored Search Xin Supervisors: Prof. King and Prof. Lyu.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
INFORMATION RETRIEVAL MEASUREMENT OF RELEVANCE EFFECTIVENESS 1Adrienn Skrop.
SZRZ6014 Research Methodology Prepared by: Aminat Adebola Adeyemo Study of high-dimensional data for data integration.
Click to edit Present’s Name AP-Tree: Efficiently Support Continuous Spatial-Keyword Queries Over Stream Xiang Wang 1*, Ying Zhang 2, Wenjie Zhang 1, Xuemin.
Modeling Perspective Effects in Photographic Composition Zihan Zhou, Siqiong He, Jia Li, and James Z. Wang The Pennsylvania State University.
Tian Xia and Donghui Zhang Northeastern University
A paper on Join Synopses for Approximate Query Answering
RE-Tree: An Efficient Index Structure for Regular Expressions
Structure and Content Scoring for XML
CS 416 Artificial Intelligence
MCN: A New Semantics Towards Effective XML Keyword Search
Structure and Content Scoring for XML
Efficient Processing of Top-k Spatial Preference Queries
CoXML: A Cooperative XML Query Answering System
Presentation transcript:

Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University

VLDB 2008, Auckland, New Zealand Motivation Identifying relevant matches is a critical step of processing XML search. Query: “Gasol, position” relevant matches irrelevant matches

VLDB 2008, Auckland, New Zealand How to Evaluate Various Strategies? Existing approaches for identifying relevant matches:  XKSearch (SLCA) [Xu and Papakonstantinou 2005]  XRank [Guo et al. 2003]  XSEarch [Cohen et al. 2003]  Star-semantics  All-semantics  Schema-free XQuery (MLCA) [Li et al. 2004]  CVLCA [Li et al. 2007]

VLDB 2008, Auckland, New Zealand How to Evaluate Various Strategies? The traditional approach  Obtain ground truth of query results by user studies on a large number of documents and queries.  Measure the precision and recall of a strategy wrt ground truth  Costly An axiomatic approach  Formalize broad intuitions as a collection of simple axioms and evaluate strategies based on the axioms.  It has been successful in many areas, e.g. mathematical economics, clustering, location theory, collaborative filtering, etc  Cost-effective Problem: Is it possible to evaluate and reason about XML keyword search strategies in a formal axiomatic framework?

VLDB 2008, Auckland, New Zealand Roadmap Motivation and Problem Definition Challenges and Contributions Four properties that an XML search engine should satisfy  Query Monotonicity/Consistency  Data Monotonicity/Consistency MaxMatch: the first system that satisfies all four properties Experimental Evaluation Conclusions

VLDB 2008, Auckland, New Zealand Challenge It is easy for an individual to assess the relevance of matches But it is extremely difficult to formalize the relevance assessment, independently of any query, data, algorithm, and user Query: “Gasol, position” relevant matches irrelevant matches

VLDB 2008, Auckland, New Zealand Example: Similar Queries Interestingly, we discovered that some abnormal behaviors can be clearly observed when examining results of two similar queries or one query on two similar documents produced by the same search engine. Q1: “Gasol, position” Q2: “Grizzlies, Gasol, position” These two “position” nodes should still be irrelevant.

VLDB 2008, Auckland, New Zealand Example: Similar Data Q: “Grizzlies, Gasol, Brown, position” position forward An empty result after data insertion is abnormal. How to capture the logical connection between query results?

VLDB 2008, Auckland, New Zealand Contributions of This Work The first work that formally reasoned about keyword search in an axiomatic framework We identified four desirable properties that an XML search engine should satisfy.  Data/Query Monotonicity capture the desirable changes to the number of query results  Data/Query Consistency capture the desirable changes to the content of a query result We reasoned about existing XML keyword search strategies. We proposed MaxMatch - the only XML keyword search strategy that possess all properties. Experiments verified our intuition and demonstrated the effectiveness and efficiency of MaxMatch.

VLDB 2008, Auckland, New Zealand Roadmap Motivation and Problem Definition Challenges and Contributions Four properties that an XML search engine should satisfy  Query Monotonicity/Consistency  Data Monotonicity/Consistency MaxMatch: the first system that satisfies all four properties Experimental Evaluation Conclusions

VLDB 2008, Auckland, New Zealand Properties wrt Similar Queries Query Monotonicity  When we add a keyword to the query, the query becomes more restrictive, therefore the number of query results should not increase. Query Consistency  When we add a new keyword to the query, each delta subtree that newly becomes (part of) a query result should contain the new keyword.

VLDB 2008, Auckland, New Zealand Example: Query Monotonicity/Consistency Q1: “forward, name”Q2: “forward, USA, name” New Keyword Monotonicity: the number of query results reduces from 2 to 1. Consistency: in each result, the delta sub-tree (if exists) contains “USA”.

VLDB 2008, Auckland, New Zealand Example Revisited: Violation of Query Consistency Q1: “Gasol, position” An XML keyword search engine that considers these nodes as relevant for the new query violates query consistency. Q2: “Grizzlies, Gasol, position”

VLDB 2008, Auckland, New Zealand Properties wrt Similar Data Data Monotonicity  When we add a node to the data, the data content becomes richer, and the number of query results should not decrease. Data Consistency  After we add a node to the data, each delta subtree that becomes (part of) a query result should contain the newly inserted node.

VLDB 2008, Auckland, New Zealand Example: Data Monotonicity/Consistency Q: “forward, name” position forward New Match Monotonicity: the number of query results increases from 1 to 2. Consistency: in each result, the delta sub-tree (if exists) contains the new data node.

VLDB 2008, Auckland, New Zealand Example Revisited: Violation of Data Monotonicity Q: “Grizzlies, Gasol, Brown, position” position forward An XML keyword search engine that outputs an empty result on the updated data violates data monotonicity.

VLDB 2008, Auckland, New Zealand The Proposed Axiomatic Framework Four desirable properties  Query Monotonicity  Query Consistency  Data Monotonicity  Data Consistency These properties are:  Non-trivial  No prior XML keyword system satisfies all of them.  Non-redundant  An algorithm may violate any one of them while satisfying others.  Satisfiable  We propose a novel technique – MaxMatch - that satisfies all four properties.

VLDB 2008, Auckland, New Zealand Roadmap Motivation and Problem Definition Challenges and Contributions Four properties that an XML search engine should satisfy  Query Monotonicity/Consistency  Data Monotonicity/Consistency MaxMatch: the first system that satisfies all four properties Experimental Evaluation Conclusions

VLDB 2008, Auckland, New Zealand MaxMatch MaxMatch’s name comes from “Maximal Match” MaxMatch preserves each subtree whose set of descendant keyword matches is “Maximal” among its siblings.  Intuitively, the subtrees that are removed are strictly less relevant to the query since fewer keywords are contained.

VLDB 2008, Auckland, New Zealand MaxMatch Q: Grizzlies, Gasol, Brown, position Not as informative as its siblings: discarded MaxMatch satisfies all four properties. Proof details and algorithms can be found in the paper.

VLDB 2008, Auckland, New Zealand Roadmap Motivation and Problem Definition Challenges and Contributions Four properties that an XML search engine should satisfy  Query Monotonicity/Consistency  Data Monotonicity/Consistency MaxMatch: the first system that satisfies all four properties Experimental Evaluation Conclusions

VLDB 2008, Auckland, New Zealand Search Quality Data set: Baseball, Mondial Query set: 36 queries in total Ground truth: obtained by user study. User perception of search results on query pairs and document pairs confirms our intuition of the proposed properties F-measure of MaxMatch vs. Existing Approaches

VLDB 2008, Auckland, New Zealand Processing Time Mondial Data (515KB) Baseball Data (1014KB)

VLDB 2008, Auckland, New Zealand Conclusions This is the first work on reasoning about and evaluating XML keyword search strategies using a formal axiomatic framework. Four intuitive and elegant properties are proposed: query monotonicity/consistency, data monotonicity/consistency. We designed and developed MaxMatch - the only XML keyword search strategy that satisfies all properties. Experiments verified the intuition of the properties and the effectiveness and efficiency of MaxMatch. MaxMatch is incorporated as part of XSeek [Liu & Chen Sigmod 07]

Thank You! Questions? Welcome to try MaxMatch at: xseek.asu.edu