Query Expansion in Information Retrieval using a Bayesian Network-Based Thesaurus Luis M. de Campus, Juan M. Fernandez, Juan F. Huete.

Slides:



Advertisements
Similar presentations
A Tutorial on Learning with Bayesian Networks
Advertisements

CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.
Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Networks Prim’s Algorithm
O(N 1.5 ) divide-and-conquer technique for Minimum Spanning Tree problem Step 1: Divide the graph into  N sub-graph by clustering. Step 2: Solve each.
1 Abdeslame ALILAOUAR, Florence SEDES Fuzzy Querying of XML Documents The minimum spanning tree IRIT - CNRS IRIT : IRIT : Research Institute for Computer.
Exact Inference in Bayes Nets
Modern Information Retrieval by R. Baeza-Yates and B. Ribeiro-Neto
Information Retrieval Models: Probabilistic Models
Web Search - Summer Term 2006 II. Information Retrieval (Basics Cont.)
A Differential Approach to Inference in Bayesian Networks - Adnan Darwiche Jiangbo Dang and Yimin Huang CSCE582 Bayesian Networks and Decision Graph.
Graph Algorithms: Minimum Spanning Tree We are given a weighted, undirected graph G = (V, E), with weight function w:
Models for Information Retrieval Mainly used in science and research, (probably?) less often in real systems But: Research results have significance for.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Modern Information Retrieval Chapter 2 Modeling. Probabilistic model the appearance or absent of an index term in a document is interpreted either as.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Chapter 2Modeling 資工 4B 陳建勳. Introduction.  Traditional information retrieval systems usually adopt index terms to index and retrieve documents.
December Marginal and Joint Beliefs in BN1 A Hybrid Algorithm to Compute Marginal and Joint Beliefs in Bayesian Networks and its complexity Mark.
Modeling Modern Information Retrieval
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Minimum Spanning Trees and Clustering By Swee-Ling Tang April 20, /20/20101.
APPLICATIONS OF DATA MINING IN INFORMATION RETRIEVAL.
Chapter 2 Graph Algorithms.
Clustering-based Collaborative filtering for web page recommendation CSCE 561 project Proposal Mohammad Amir Sharif
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
Pairwise Document Similarity in Large Collections with MapReduce Tamer Elsayed, Jimmy Lin, and Douglas W. Oard Association for Computational Linguistics,
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
Query Operations J. H. Wang Mar. 26, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Introduction to Bayesian Networks
Book: Bayesian Networks : A practical guide to applications Paper-authors: Luis M. de Campos, Juan M. Fernandez-Luna, Juan F. Huete, Carlos Martine, Alfonso.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
LexPageRank: Prestige in Multi- Document Text Summarization Gunes Erkan and Dragomir R. Radev Department of EECS, School of Information University of Michigan.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Prim's Algorithm This algorithm starts with one node. It then, one by one, adds a node that is unconnected to the new graph to the new graph, each time.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Najah Alshanableh. Fuzzy Set Model n Queries and docs represented by sets of index terms: matching is approximate from the start n This vagueness can.
Union-find Algorithm Presented by Michael Cassarino.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Lecture19: Graph III Bohyung Han CSE, POSTECH CSED233: Data Structures (2014F)
Union-Find  Application in Kruskal’s Algorithm  Optimizing Union and Find Methods.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Set Theoretic Models 1. IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models.
Trees Dr. Yasir Ali. A graph is called a tree if, and only if, it is circuit-free and connected. A graph is called a forest if, and only if, it is circuit-free.
Information Retrieval CSE 8337 Spring 2005 Modeling (Part II) Material for these slides obtained from: Modern Information Retrieval by Ricardo Baeza-Yates.
Today Graphical Models Representing conditional dependence graphically
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
::Network Optimization:: Minimum Spanning Trees and Clustering Taufik Djatna, Dr.Eng. 1.
An Algorithm to Learn the Structure of a Bayesian Network Çiğdem Gündüz Olcay Taner Yıldız Ethem Alpaydın Computer Engineering Taner Bilgiç Industrial.
Spanning Trees Alyce Brady CS 510: Computer Algorithms.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
Recuperação de Informação B Modern Information Retrieval Cap. 2: Modeling Section 2.8 : Alternative Probabilistic Models September 20, 1999.
Kruskal’s Algorithm for Computing MSTs Section 9.2.
Minimum Spanning Tree Chapter 13.6.
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Multimedia Information Retrieval
Graph Algorithm.
Minimum Spanning Tree.
Minimum Spanning Tree.
An Algorithm for Bayesian Network Construction from Data
Class #19 – Tuesday, November 3
Prepared by: Mahmoud Rafeek Al-Farra
Networks Prim’s Algorithm
Discriminative Probabilistic Models for Relational Data
Recuperação de Informação B
Presentation transcript:

Query Expansion in Information Retrieval using a Bayesian Network-Based Thesaurus Luis M. de Campus, Juan M. Fernandez, Juan F. Huete

Introduction Methods for query expansion based on Bayesian networks preprocessing: Smart [25] learning: constructing a Bayesian network(thesaurus for a given collection) that represents some of the relationships among the terms appearing in a given document collection query expansion: given a particular query, we instantiate the terms that compose it and propagate this information through the network by selecting the new terms whose posterior probability is high and adding them to the original query.

IRS indexing inverted file query, indexing c.f. four classic retrieval models: Boolean, vector space, cluster, probabilistic models [21, 25] BNs to IR: Croft and Turtle ’ s document and query networks[7, 28], Ghazfan et al. [13], Fung et al. [10], [2, 9, 18, 24] Building Thesaurus: Schutze and Pederson [26].

Thesaurus Construction Algo. Thesaurus (based on a Bayesian network, dag, polytree(singly connected graph)) from a inverted file.  go to next page nodes: a term in the form of a binary variable,  = {  0,  1 } Learning: PA algo, RP algo. Propagation: MWST: Kruskal and Prim ’ s algorithm

Why Polytree instead of a more general BNs? big number of terms learning phase  [3, 20] propagation phase  [19]

Algorithm for Learning a Polytree 1. For every pair of nodes ,  U, being U the set of nodes, do 1.1. Compute Dep( ,  |  ). 2. Build a maximum weight spanning tree G, where the weight of each edge  -  is 3. For every triplet of nodes , ,  U such that  - ,  -  G do 3.1. If Dep( ,  |  )< Dep( ,  |  ) and – I ( ,  |  ) then direct the subgraph  -  -  as . 4. Direct the remaining edges without introducing new head to head connections. 5. Return G. cal. Dep. degree. skeleton construction performing orientation

Dependency Marginal dependency (Kullback-Leibler cross entropy, Mutual information measure) Conditional dependency degrees (conditional mutual information measure)

Experimentation three standard test collections Adi, Cranfield and Medlars ftp.cs.cornell.eduftp.cs.cornell.edu (with smart) CollectionAdiCranfieldMedlars SubjectsInform.Sci.AeronauticsMedicine Documents Terms Queries

Query Expansion Process Given that all the terms in the query (e.g.  ) are relevant, get the probability(posterior probability: p(  1 |  1 )) that a term(  ) is relevant from the learnt polytree (threshold). Add the term of which the posterior probability is larger than pre-determined threshold.

Concluding Remarks Contributions propose a new approach of learning thesaurus using BNs Combine RP and PA algo. in learning polytree(dependency graph). Further improvement more accuracy in thesaurus learning algo. incorporating documents into our models improving performance of the propagation process