Date : 2012/10/25 Author : Yosi Mass, Yehoshua Sagiv Source : WSDM’12 Speaker : Er-Gang Liu Advisor : Dr. Jia-ling Koh 1.

Slides:



Advertisements
Similar presentations
Term Level Search Result Diversification DATE : 2013/09/11 SOURCE : SIGIR’13 AUTHORS : VAN DANG, W. BRUCE CROFT ADVISOR : DR.JIA-LING, KOH SPEAKER : SHUN-CHEN,
Advertisements

Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Diversity Maximization Under Matroid Constraints Date : 2013/11/06 Source : KDD’13 Authors : Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur Advisor :
1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Efficient IR-Style Keyword Search over Relational Databases Vagelis Hristidis University of California, San Diego Luis Gravano Columbia University Yannis.
Date : 2014/12/04 Author : Parikshit Sondhi, ChengXiang Zhai Source : CIKM’14 Advisor : Jia-ling Koh Speaker : Sz-Han,Wang.
Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Hierarchical Dirichlet Trees for Information Retrieval Gholamreza Haffari Simon Fraser University Yee Whye Teh University College London NAACL talk, Boulder,
Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Searchable Web sites Recommendation Date : 2012/2/20 Source : WSDM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh Jia-ling 1.
Evaluating Search Engine
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.
XSEarch: A Semantic Search Engine for XML Sara Cohen Jonathan Mamou Yaron Kanza Yehoshua Sagiv Presented at VLDB 2003, Germany.
Chapter 5: Information Retrieval and Web Search
SEEKING STATEMENT-SUPPORTING TOP-K WITNESSES Date: 2012/03/12 Source: Steffen Metzger (CIKM’11) Speaker: Er-gang Liu Advisor: Dr. Jia-ling Koh 1.
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Improved search for Socially Annotated Data Authors: Nikos Sarkas, Gautam Das, Nick Koudas Presented by: Amanda Cohen Mostafavi.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
1 Efficient Search Ranking in Social Network ACM CIKM2007 Monique V. Vieira, Bruno M. Fonseca, Rodrigo Damazio, Paulo B. Golgher, Davi de Castro Reis,
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Chapter 6: Information Retrieval and Web Search
RANKING SUPPORT FOR KEYWORD SEARCH ON STRUCTURED DATA USING RELEVANCE MODEL Date: 2012/06/04 Source: Veli Bicer(CIKM’11) Speaker: Er-gang Liu Advisor:
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
FINDING RELEVANT INFORMATION OF CERTAIN TYPES FROM ENTERPRISE DATA Date: 2012/04/30 Source: Xitong Liu (CIKM’11) Speaker: Er-gang Liu Advisor: Dr. Jia-ling.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval Ben Cartrette and Praveen Chandar Dept. of Computer and Information Science.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
CIKM Finding and Approximating Top-k Answers in Keyword Proximity Search Benny Kimelfeld Yehoshua Sagiv Benny Kimelfeld and Yehoshua Sagiv The Selim.
Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:
Chapter 23: Probabilistic Language Models April 13, 2004.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor : Dr. Koh Jia-Ling Speaker : Tu.
A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.
Effective Keyword-Based Selection of Relational Databases By Bei Yu, Guoliang Li, Karen Sollins & Anthony K. H. Tung Presented by Deborah Kallina.
Performance Measures. Why to Conduct Performance Evaluation? 2 n Evaluation is the key to building effective & efficient IR (information retrieval) systems.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Date: 2013/6/10 Author: Shiwen Cheng, Arash Termehchy, Vagelis Hristidis Source: CIKM’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Predicting the Effectiveness.
Using Social Annotations to Improve Language Model for Information Retrieval Shengliang Xu, Shenghua Bao, Yong Yu Shanghai Jiao Tong University Yunbo Cao.
Ranking-based Processing of SQL Queries Date: 2012/1/16 Source: Hany Azzam (CIKM’11) Speaker: Er-gang Liu Advisor: Dr. Jia-ling Koh.
Compact Query Term Selection Using Topically Related Text Date : 2013/10/09 Source : SIGIR’13 Authors : K. Tamsin Maxwell, W. Bruce Croft Advisor : Dr.Jia-ling,
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
Topical Clustering of Search Results Date : 2012/11/8 Resource : WSDM’12 Advisor : Dr. Jia-Ling Koh Speaker : Wei Chang 1.
TO Each His Own: Personalized Content Selection Based on Text Comprehensibility Date: 2013/01/24 Author: Chenhao Tan, Evgeniy Gabrilovich, Bo Pang Source:
A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval Min Zhang, Xinyao Ye Tsinghua University SIGIR
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Key Blog Distillation: Ranking Aggregates Presenter : Yu-hui Huang Authors :Craig Macdonald, Iadh Ounis.
Personalized, Interactive Question Answering on the Web
Feature Selection for Ranking
Information Retrieval and Web Design
Connecting the Dots Between News Article
Introduction to XML IR XML Group.
Presentation transcript:

Date : 2012/10/25 Author : Yosi Mass, Yehoshua Sagiv Source : WSDM’12 Speaker : Er-Gang Liu Advisor : Dr. Jia-ling Koh 1

Outline Introduction Language Model Ranking Answers IR Relevance (Language Model) Language Models for Data Graphs Normalization Structural weights Graph Weights Experiment Conclusion 2

Outline Introduction Language Model Ranking Answers IR Relevance (Language Model) Language Models for Data Graphs Normalization Structural weights Graph Weights Experiment Conclusion 3

4 Finding the language spoken in Poland Polish Poland Type:country Type:language Returning Polish as the spoken language of Poland. Introduction Extract meaningful parts of data w.r.t. the keywords Returns documents about Poland that might contain the needed information.

Keywords Structural node : tuple, Keyword node : keywords Edges : foreign-key references Edges and nodes may have weights Weak relationships are penalized by large weights Introduction - Data Graph

6 Efficiently generating answer (Candidate answer) Using the algorithm to create a node for each keyword of the given query, and then add edges from keyword to all the nodes of the data graph that contain keyword. Finding methods for effectively ranking the generated answers according to their relevance to the user. Introduction - Goal K. Golenberg, B. Kimelfeld, and Y. Sagiv. Keyword proximity search in complex data graphs. In SIGMOD, Ranking the candidate answers (This Paper focus on)

7 Query = { Summers, Cohen, coffee } Introduction - Data Graph (Query) Contains all keywords of the query

8 Query = { Summers, Cohen, coffee } Introduction - Data Graph (Query Result) An answer is a directed subtree of the data graph Has no redundant edges (and nodes)

9 Query = { Summers, Cohen, coffee } Introduction - Data Graph (Query Result) An answer is a directed subtree of the data graph Has no redundant edges (and nodes)

Outline Introduction Language Model Ranking Answers IR Relevance (Language Model) Language Models for Data Graphs Normalization Structural weights Graph Weights Experiment Conclusion 10

11 Unigram Each word occurs independently of the other words The so-called “bag-of-words” model (e.g., how to distinguish “street market” from “market street) Language Model Relevance(?

12 The zero-probability problem : If We do not occur in D1 then P(W e |M D1 ) = 0 Smooth the document-specific unigram model with a collection model (two states, or a mixture of two multinomials) WaWaWaWa WbWbWb WcWc Wd WbWb WcWcWc Wd WeWeWeWe WaWaWe WbWbWfWf WcWc Wd Doc D 1 Doc D 2 Doc D 3 Collection P(Wa|M D1 )=0.4 P(Wb|M D1 )=0.3 P(Wc|M D1 )=0.2 P(Wd|M D1 )=0.1 P(We|M D1 )=0 P(Wf|M D1 )=0 P(Wa|M D1 )=0.2 P(Wb|M D1 )=0.2 P(Wc|M D1 )=0.2 P(Wd|M D1 )=0.1 P(We|M D1 )=0.1 P(Wf|M D1 )=0.2 P(Wa|M D1 )=0 P(Wb|M D1 )=0.2 P(Wc|M D1 )=0.2 P(Wd|M D1 )=0.1 P(We|M D1 )=0.5 P(Wf|M D1 )=0 Q = W c, W d, W e P(Q | M D1 ) = P(W c | M D1 )*P(W d | M D1 )*P(W e |M D1 ) = 0.2 * 0.1 * 0 P(Q | M D1 ) = P(W c | M D1 )*P(W d | M D1 )*P(W e |M D1 ) = (0.9* * 6/30) * (0.9* * 3/30) * (0.9* * 5/30) Language Model

Outline Introduction Language Model Ranking Answers IR Relevance (Language Model) Language Models for Data Graphs Normalization Structural weights Graph Weights Experiment Conclusion 13

14 Structural weights Language models (The name l-score emphasizes that lower scores are better) Assigning structural weights to the nodes and edges of the data graph. These weights are derived only from semantic considerations and are static, namely, they are independent of any particular query Language models are used to assign weights to the edges that connect the keywords of a given query to the nodes that contain them. These weights are dynamic, depending on the given query. Ranking Answer

15 Using logarithm in the above equation so that lscr ir(Q;A f ) will not grow too fast as R(Q;A f ) Avoid underflow when taking the product of many probabilities. (normalize) (- ∞ ; R max ] → interval [0; 1] ) MAX : ln( - R max + R max ) = ln (2.718) = 1, = 1- 1 = 0 min : ln( - (-∞) + R max ) = ln (∞ ) = ∞, = 1 - 1/ ∞ = 1 IR Relevance

Outline Introduction Language Model Ranking Answers IR Relevance (Language Model) Language Models for Data Graphs Normalization Structural weights Graph Weights Experiment Conclusion 16

17 Static weights of nodes Static weights of edges The weight of a node v, denoted by w(v), is static and depends on its incoming edges Idg(v) = 1 : Idg(v) = 3 : v v ln( ) = ln (2.718) = 1, w(v) = 1 (max) ln( ) = ln (4.718) = 1.55, w(v) = The more incoming edges there are, the lower the weight is, which means that nodes with many incoming edges are more important., interval [0; 1] Graph Weights - Weight of an Node

18 Let fdg(e) be the number of edges that have the same type as e and are from u to nodes with the same type as v. Static weights of edges Let tdg(e) be the number of edges that have the same type as e, emanate from nodes with the same type as u and point to v. An edge with fewer similar edges is more unique and, hence, describes a stronger semantic relationship between its nodes. Strongest! Graph Weights - Weight of an Edge

Outline Introduction Language Model Ranking Answers IR Relevance (Language Model) Language Models for Data Graphs Normalization Structural weights Graph Weights Experiment Conclusion 19

20 Experiment Datasets: Subsets of Wikipedia, IMDB and Mondial Web databases Queries: 50 queries for each dataset including Metrics: Mean Average Precision (MAP) The number of top-1 relevant results Reciprocal rank

21

Experiment - MAP

Experiment – Top 1 & RR

24 Experiment - MRR

Outline Introduction Language Model Ranking Answers IR Relevance (Language Model) Language Models for Data Graphs Normalization Structural weights Graph Weights Experiment Conclusion 25

Conclusion 26 Presenting a novel and effective ranking technique for keyword search over data graphs. language models structural weights Conclude that systems for keyword search can also be used as effective retrieval engines

MAP(Mean Average Precision) Topic 1 : There are 4 relative document ‧ rank : 1, 2, 4, 7 Topic 2 : There are 5 relative document ‧ rank : 1, 3,5,7,10 Topic 1 Average Precision : (1/1+2/2+3/4+4/7)/4=0.83 。 Topic 2 Average Precision : (1/1+2/3+3/5+4/7+5/10)/5=0.45 。 MAP= ( )/2=0.64 。 Reciprocal Rank Topic 1 Reciprocal Rank : (1+1/2+1/4+1/7)/4=0.473 。 Topic 2 Reciprocal Rank : (1+1/3+1/5+1/7+1/10)/5=0.354 。 MRR= ( )/2= 。 27

28 Query 1 Query 2Query 4 Query 5Query 3  Query  Ranking Result 1/ /4 1/3  Reciprocal Rank Mean Reciprocal Rank (MRR) / 5 = 0.617