Automatic Suggestion of Query-Rewrite Rules for Enterprise Search Date : 2013/08/13 Source : SIGIR’12 Authors : Zhuowei Bao, Benny Kimelfeld, Yunyao Li.

Slides:



Advertisements
Similar presentations
Date: 2013/1/17 Author: Yang Liu, Ruihua Song, Yu Chen, Jian-Yun Nie and Ji-Rong Wen Source: SIGIR12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Adaptive.
Advertisements

Learning to Suggest: A Machine Learning Framework for Ranking Query Suggestions Date: 2013/02/18 Author: Umut Ozertem, Olivier Chapelle, Pinar Donmez,
Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
Term Level Search Result Diversification DATE : 2013/09/11 SOURCE : SIGIR’13 AUTHORS : VAN DANG, W. BRUCE CROFT ADVISOR : DR.JIA-LING, KOH SPEAKER : SHUN-CHEN,
Greedy Algorithms.
Diversity Maximization Under Matroid Constraints Date : 2013/11/06 Source : KDD’13 Authors : Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur Advisor :
Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.
University of Minnesota 1 Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN) Queries Xiaobin Ma Advisor: Shashi Shekhar Dec,
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
The Greedy Method1. 2 Outline and Reading The Greedy Method Technique (§5.1) Fractional Knapsack Problem (§5.1.1) Task Scheduling (§5.1.2) Minimum Spanning.
Chapter 5 Fundamental Algorithm Design Techniques.
Particle swarm optimization for parameter determination and feature selection of support vector machines Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen,
DOMAIN DEPENDENT QUERY REFORMULATION FOR WEB SEARCH Date : 2013/06/17 Author : Van Dang, Giridhar Kumaran, Adam Troy Source : CIKM’12 Advisor : Dr. Jia-Ling.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Learning Object Identification Rules for Information Integration Sheila Tejada Craig A. Knobleock Steven University of Southern California.
SIEVE—Search Images Effectively through Visual Elimination Ying Liu, Dengsheng Zhang and Guojun Lu Gippsland School of Info Tech,
Web Usage Mining with Semantic Analysis Date: 2013/12/18 Author: Laura Hollink, Peter Mika, Roi Blanco Source: WWW’13 Advisor: Jia-Ling Koh Speaker: Pei-Hao.
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
Mining Optimal Decision Trees from Itemset Lattices Dr, Siegfried Nijssen Dr. Elisa Fromont KDD 2007.
Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Selective Block Minimization for Faster Convergence of Limited Memory Large-scale Linear Models Kai-Wei Chang and Dan Roth Experiment Settings Block Minimization.
Querying Structured Text in an XML Database By Xuemei Luo.
Diversified Top-k Graph Pattern Matching 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Adding Semantics to Clustering Hua Li, Dou Shen, Benyu Zhang, Zheng Chen, Qiang Yang Microsoft Research Asia, Beijing, P.R.China Department of Computer.
1 CPSC 320: Intermediate Algorithm Design and Analysis July 28, 2014.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
1 Using The Past To Score The Present: Extending Term Weighting Models with Revision History Analysis CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG,
Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
RFC 4477 DHCP: Dual-Stack Issues Speaker: Ching-Chen Chang Date:
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.
Date: 2013/10/23 Author: Salvatore Oriando, Francesco Pizzolon, Gabriele Tolomei Source: WWW’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang SEED:A Framework.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Post-Ranking query suggestion by diversifying search Chao Wang.
Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
IMRank: Influence Maximization via Finding Self-Consistent Ranking
Compact Query Term Selection Using Topically Related Text Date : 2013/10/09 Source : SIGIR’13 Authors : K. Tamsin Maxwell, W. Bruce Croft Advisor : Dr.Jia-ling,
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
Date: 2012/5/28 Source: Alexander Kotov. al(CIKM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Interactive Sense Feedback for Difficult Queries.
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
PERSONALIZED DIVERSIFICATION OF SEARCH RESULTS Date: 2013/04/15 Author: David Vallet, Pablo Castells Source: SIGIR’12 Advisor: Dr.Jia-ling, Koh Speaker:
Algorithms For Solving History Sensitive Cascade in Diffusion Networks Research Proposal Georgi Smilyanov, Maksim Tsikhanovich Advisor Dr Yu Zhang Trinity.
TO Each His Own: Personalized Content Selection Based on Text Comprehensibility Date: 2013/01/24 Author: Chenhao Tan, Evgeniy Gabrilovich, Bo Pang Source:
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Date: 2013/9/25 Author: Mikhail Ageev, Dmitry Lagun, Eugene Agichtein Source: SIGIR’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Improving Search Result.
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
Rate-Based Query Optimization for Streaming Information Sources Stratis D. Viglas Jeffrey F. Naughton.
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Contextual Search and Name Disambiguation in Using Graphs Einat Minkov, William W. Cohen, Andrew Y. Ng Carnegie Mellon University and Stanford University.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Searching for Pattern Rules Guichong Li and Howard J. Hamilton Int'l Conf on Data Mining (ICDM),2006 IEEE Advisor : Jia-Ling Koh Speaker : Tsui-Feng Yen.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
KDD Reviews 周天烁 2018年5月9日.
Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen, Zne-Jung Lee
Feature Selection for Ranking
Presentation transcript:

Automatic Suggestion of Query-Rewrite Rules for Enterprise Search Date : 2013/08/13 Source : SIGIR’12 Authors : Zhuowei Bao, Benny Kimelfeld, Yunyao Li Advisor : Dr.Jia-ling, Koh Speaker : Shun-Chen, Cheng

Outline Introduction Recognizing Nature Rules Optimizing Multi-Rules Selection Experiments Conclusions

Outline Introduction Recognizing Nature Rules Optimizing Multi-Rules Selection Experiments Conclusions

Introduction Enterprise Search index data and documents from a variety of sources such as: file systems, intranets, document management systems, , and databases.  integrate structured and unstructured data in their collections.  dynamic terminology and jargon that are specific to the enterprise domain.  domain experts maintaining

Introduction Relevant documents missing from the top matches. tedious and time consuming administrators to influence search results by crafting query-rewrite rules Goal : ease the burden on search administrators by automatically suggesting rewrite rules.

Two Challenges Generating Intuitive Rules Challenge1 corresponding to closely related and syntactically complete concepts Solved by machine-learning classification approach

Cross-Query Effect Challenge2 Query1 -> r1 -> spreadsheets issi -> pushing d2 below d1. Query1 : spreadsheets download -> r3 -> symphony download -> d2 on top match Propose a heuristic approaches and optimization thereof

Outline Introduction Recognizing Nature Rules Optimizing Multi-Rules Selection Experiments Conclusions

Recognizing Nature Rules(1/3) Candidate generation set S: all the n-grams (subsequences of n tokens) of q(5 in our implementation) set T: T consists of the n-grams just from the high-quality fields of d Candidate : Cartesian product S×T Ex : q=change management info fields = welcome to scip strategy & change internal practice Candidate: management → scip change → strategy & change internal change management → scip strategy

Recognizing Nature Rules(2/3) Features The considered rule is s → t, and u refers to either s or t

Recognizing Nature Rules(3/3) Classification models SVM Decision Tree with linear-combination splits(rDTLC)

Outline Introduction Recognizing Nature Rules Optimizing Multi-Rules Selection Experiments Conclusions

Optimizing Multi-Rules Selection(1/7) W(q,d)

Optimizing Multi-Rules Selection(2/7) q = spreadsheets download Score(d|q) the maximal weight of a path from q to d. ex: score(d2|q)=3 , score(d1|q)=4 the series of k documents with the highest w(q, d), ordered in descending w(q, d). ex: top1[q|G] is the series (d1), top2[q|G] (as well as top3[q|G]) is the series (d1, d2).

Optimizing Multi-Rules Selection(3/7) quality measure μ : a quality score for each query q based on the series topk[q|G] and the set δ(q), for a natural number k of choice MRR DCG k (without labeled relevance) topk[q|G] = (d1,..., dj), and each ai is 1 if di ∈ δ(q) and 0 otherwise. top-k quality of G, denoted μk(G, δ)

Optimizing Multi-Rules Selection(4/7) Ex: desideratum δ: δ(lotus notes download) = δ( client issi) = {d1} δ(spreadsheets download) = {d2} top1[q1|G] = (d1) top1[q2|G] = (d1) top1[q3|G] = (d1)  MRR at 1: μ1(G, δ)=(1/1)+(1/1)+(0/2)  DCG1: μ1(G, δ)

Optimizing Multi-Rules Selection(5/7) G-Greedy

Example of G-Greedy(6/7)

Iteration1: Candidate = r1 , Candidate=r2 , Candidate=r3 , Candidate=r4 ,

Iteration2: Candidate=r1: Candidate=r3: Candidate=r4: stop the algorithm

Optimizing Multi-Rules Selection(7/7) L-Greedy

Outline Introduction Recognizing Nature Rules Optimizing Multi-Rules Selection Experiments Conclusions

Experiments Query log: 4 months of intranet search at IBM Recognizing Nature Rules randomly selected and manually labeled 1187 rules as either natural or unnatural. Weight : query is weighted by the number of sessions where it is posed Accuracy

Experiments

Optimizing Multi-Rules Selection Measures : NDCG k 、 MRR (top-5) Labeled Dataset: administration graph contains 135 queries, 300 rqueries, 423 documents, and a total of 1488 edges. Extended Dataset:administration graph contains 1001 queries, r-queries, 4188 documents, and a total of edges

Experiments Labeled Dataset nDCGk (unweighted) nDCGk (weighted) MRR L-Greedy and G-Greedy reach the upper bound L-Greedy and G-Greedy score significantly higher than the other alternatives.

Experiments Running time  locally greedy algorithms are over one order of magnitude faster than their globally greedy counterparts  optimized versions are generally over one order of magnitude faster than their unoptimized counterparts.  the optimized version of our locally greedy algorithm is capable of finding an optimal solution in real time for the typical usage scenarios

Outline Introduction Recognizing Nature Rules Optimizing Multi-Rules Selection Experiments Conclusions

proposed heuristic algorithms to accommodate the hardness of the task(the problem of selecting rules). Experiments on a real enterprise case (IBM intranet search) indicate that the proposed solutions are effective and feasible. In future work, we plan to focus on extending our techniques to handle significantly more expressive rules.