Other IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models boolean vector.

Slides:



Advertisements
Similar presentations
INFO624 - Week 2 Models of Information Retrieval Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.
Advertisements

Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Basic IR: Modeling Basic IR Task: Slightly more complex:
The Probabilistic Model. Probabilistic Model n Objective: to capture the IR problem using a probabilistic framework; n Given a user query, there is an.
Modern Information Retrieval by R. Baeza-Yates and B. Ribeiro-Neto
What is missing? Reasons that ideal effectiveness hard to achieve: 1. Users’ inability to describe queries precisely. 2. Document representation loses.
IR Models: Overview, Boolean, and Vector
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
IR Models: Structural Models
ISP 433/633 Week 10 Vocabulary Problem & Latent Semantic Indexing Partly based on G.Furnas SI503 slides.
Information Retrieval Modeling CS 652 Information Extraction and Integration.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 4 March 30, 2005
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Chapter 2Modeling 資工 4B 陳建勳. Introduction.  Traditional information retrieval systems usually adopt index terms to index and retrieve documents.
Modeling Modern Information Retrieval
TFIDF-space  An obvious way to combine TF-IDF: the coordinate of document in axis is given by  General form of consists of three parts: Local weight.
1/ 30. Problems for classical IR models Introduction & Background(LSI,SVD,..etc) Example Standard query method Analysis standard query method Seeking.
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
Neural Networks for Information Retrieval Hassan Bashiri May 2005.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006
Vector Space Model CS 652 Information Extraction and Integration.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
IR Models: Review Vector Model and Probabilistic.
Information Retrieval: Foundation to Web Search Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 13, 2015 Some.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])
1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.
Advanced information retreival Chapter 02: Modeling - Neural Network Model Neural Network Model.
Indices Tomasz Bartoszewski. Inverted Index Search Construction Compression.
Information Retrieval Chapter 2: Modeling 2.1, 2.2, 2.3, 2.4, 2.5.1, 2.5.2, Slides provided by the author, modified by L N Cassel September 2003.
Information Retrieval Models - 1 Boolean. Introduction IR systems usually adopt index terms to process queries Index terms:  A keyword or group of selected.
Introduction to Digital Libraries Searching
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
IR Models J. H. Wang Mar. 11, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y.
Chapter. 02: Modeling Contenue... 19/10/2015Dr. Almetwally Mostafa 1.
Latent Semantic Indexing: A probabilistic Analysis Christos Papadimitriou Prabhakar Raghavan, Hisao Tamaki, Santosh Vempala.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Information Retrieval Model Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Information Retrieval CSE 8337 Spring 2005 Modeling Material for these slides obtained from: Modern Information Retrieval by Ricardo Baeza-Yates and Berthier.
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
SINGULAR VALUE DECOMPOSITION (SVD)
Najah Alshanableh. Fuzzy Set Model n Queries and docs represented by sets of index terms: matching is approximate from the start n This vagueness can.
Alternative IR models DR.Yeni Herdiyeni, M.Kom STMIK ERESHA.
LATENT SEMANTIC INDEXING BY SINGULAR VALUE DECOMPOSITION
Modern information retreival Chapter. 02: Modeling (Latent Semantic Indexing)
Information Retrieval and Web Search Probabilistic IR and Alternative IR Models Rada Mihalcea (Some of the slides in this slide set come from a lecture.
The Boolean Model Simple model based on set theory
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
Information Retrieval and Web Search IR models: Boolean model Instructor: Rada Mihalcea Class web page:
C.Watterscsci64031 Classical IR Models. C.Watterscsci64032 Goal Hit set of relevant documents Ranked set Best match Answer.
Set Theoretic Models 1. IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models.
Information Retrieval and Web Search Introduction to IR models and methods Rada Mihalcea (Some of the slides in this slide set come from IR courses taught.
Information Retrieval CSE 8337 Spring 2005 Modeling (Part II) Material for these slides obtained from: Modern Information Retrieval by Ricardo Baeza-Yates.
Recuperação de Informação B Cap. 02: Modeling (Latent Semantic Indexing & Neural Network Model) 2.7.2, September 27, 1999.
Introduction n IR systems usually adopt index terms to process queries n Index term: u a keyword or group of selected words u any word (more general) n.
Probabilistic Model n Objective: to capture the IR problem using a probabilistic framework n Given a user query, there is an ideal answer set n Querying.
Information Retrieval Models School of Informatics Dept. of Library and Information Studies Dr. Miguel E. Ruiz.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
Recuperação de Informação B Modern Information Retrieval Cap. 2: Modeling Section 2.8 : Alternative Probabilistic Models September 20, 1999.
Latent Semantic Indexing
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Recuperação de Informação B
CS 430: Information Discovery
Recuperação de Informação B
Recuperação de Informação B
Recuperação de Informação B
Recuperação de Informação B
Berlin Chen Department of Computer Science & Information Engineering
Recuperação de Informação B
Advanced information retrieval
Presentation transcript:

Other IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models boolean vector probabilistic Set Theoretic Fuzzy Extended Boolean Probabilistic Inference Network Belief Network Algebraic Generalized Vector Lat. Semantic Index Neural Networks Browsing Flat Structure Guided Hypertext

Another Vector Model: Motivation 1. Index terms have synonyms. [Use thesauri?] 2. Index terms have multiple meanings (polysemy). [Use restricted vocabularies or more precise queries?] 3. Index terms are not independent; think “phrases”. [Use combinations of terms?]

Latent Semantic Indexing/Analysis Basic Idea: Keywords in a query are just one way of specifying the information need. One really wants to specify the key concepts rather than words. Assume a latent semantic structure underlying the term-document data that is partially obscured by exact word choice.

LSI In Brief Map from terms into lower dimensional space (via SVD) to remove “noise” and force clustering of similar words. Pre-process corpus to create reduced vector space Match queries to docs in reduced space

SVD for Term-Doc Matrix Docs Terms t x d = t x m m x mm x d C = where m is the rank of X (<=min(t,d)), T is orthonornal matrix of eigenvectors for term-term correlation, D is orthonornal matrix of eigenvectors from transpose of doc-doc correlation

Reducing Dimensionality Order singular values in S 0 by size, keep the k largest, and delete other rows/columns in S 0, T 0 and D 0 to form Approximate model is the rank-k model with best possible least-squares-fit to X. Pick k large enough to fit structure, but small enough to eliminate noise – usually ~

Computing Similarities in LSI How similar are 2 terms? dot product between two row vectors of How similar are two documents? dot product between two column vectors of How similar are a term and a document? value of an individual cell

Query Retrieval As before, treat query as short document: make it column 0 of C First row of C provides rank of docs wrt query.

LSI Issues Requires access to corpus to compute SVD How to efficiently compute for Web? What is the right value of k ? Can LSI be used for cross-language retrieval? Size of corpus is limited: “one student’s reading through high school” (Landauer 2002).

Other Vector Model: Neural Network Basic idea: 3 layer neural net: query terms, document terms, documents Signal propagation based on classic similarity computation Tune weights.

Neural Network Diagram from Wilkinson and Hingston, SIGIR 1991 Document Terms Query Terms Document s kaka kbkb kckc kaka kbkb kckc k1k1 ktkt d1d1 djdj d j+1 dNdN

Computing Document Rank Weight from query to document term Wiq Wiq = wiq sqrt (  i wiq ) Weight from document term to document Wij Wij = wij sqrt (  i wij )

Probabilistic Models Principle: Given a user query q and a document d in the collection, estimate the probability that the user will find d relevant. (How?) User rates a retrieved subset. System uses rating to refine the subset. Over time, retrieved subset should converge on relevant set.

Computing Similarity I probability that document dj is relevant to query q, probability that dj is non-relevant to the query q, probability of randomly selecting dj from set R probability that a randomly selected document is relevant

Computing Similarity II probability that index term ki is present in document randomly selected from R, Assumes independence of index terms

Initializing Probabilities assume constant probabilities for index terms: assume distribution of index terms in non-relevant documents matches overall distribution:

Improving Probabilities Assumptions: approximate probability given relevant as % docs with index i retrieved so far: approximate probabilities given non- relevant by assuming not retrieved are non-relevant:

Classic Probabilistic Model Summary Pros: ranking based on assessed probability can be approximated without user intervention Cons: really need user to determine set V ignores term frequency assumes independence of terms

Probabilistic Alternative: Bayesian (Belief) Networks A graphical structure to represent the dependence between variables in which the following holds: 1. a set of random variables for the nodes 2. a set of directed links 3. a conditional probability table for each node, indicating relationship with parents 4. a directed acyclic graph

Belief Network Example BEP(A) TT.95 TF.94 FT.29 FF.001 Burglary Earthquake Alarm JohnCallsMary Calls P(B).001 P(E).002 AP(J) T.90 F.05 AP(M) T.70 F.01 from Russell & Norvig

Belief Network Example (cont.) BEP(A ) TT.95 TF.94 FT.29 FF.00 1 P(B).001 P(E).002 AP(J) T.90 F.05 AP(M) T.70 F.01 Probability of false notification: alarm sounded and both people call, but there was no burglary or earthquake Burglary Earthquake Alarm JohnCallsMary Calls

Inference Networks for IR Random variables are associated with documents, index terms and queries. Edges from document node to term nodes increases belief in terms.

Computing rank in Inference Networks for IR q is keyword query. q1 is Boolean query. I is information need. Rank of document is computed as P(q^dj)

Where do probabilities come from? (Boolean Model) uniform priors on documents only terms in the document are active query is matched to keywords ala Boolean model

Belief Network Formulation different network topology does not consider each document individually adopts set theoretic view