Recuperação de Informação B

Slides:



Advertisements
Similar presentations
Introduction to Information Retrieval Outline ❶ Latent semantic indexing ❷ Dimensionality reduction ❸ LSI in information retrieval 1.
Advertisements

Latent Semantic Analysis
Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Basic IR: Modeling Basic IR Task: Slightly more complex:
The Probabilistic Model. Probabilistic Model n Objective: to capture the IR problem using a probabilistic framework; n Given a user query, there is an.
Dimensionality Reduction PCA -- SVD
5/16/2015Intelligent Systems and Soft Computing1 Introduction Introduction Hebbian learning Hebbian learning Generalised Hebbian learning algorithm Generalised.
Artificial neural networks:
Modern Information Retrieval by R. Baeza-Yates and B. Ribeiro-Neto
What is missing? Reasons that ideal effectiveness hard to achieve: 1. Users’ inability to describe queries precisely. 2. Document representation loses.
IR Models: Overview, Boolean, and Vector
Hinrich Schütze and Christina Lioma
Lecture 19 Quadratic Shapes and Symmetric Positive Definite Matrices Shang-Hua Teng.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Chapter 2Modeling 資工 4B 陳建勳. Introduction.  Traditional information retrieval systems usually adopt index terms to index and retrieve documents.
TFIDF-space  An obvious way to combine TF-IDF: the coordinate of document in axis is given by  General form of consists of three parts: Local weight.
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 18: Latent Semantic Indexing 1.
Neural Networks for Information Retrieval Hassan Bashiri May 2005.
Vector Space Model CS 652 Information Extraction and Integration.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
IR Models: Review Vector Model and Probabilistic.
Other IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models boolean vector.
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.
Advanced information retreival Chapter 02: Modeling - Neural Network Model Neural Network Model.
Artificial Neural Network Unsupervised Learning
Information Retrieval Chapter 2: Modeling 2.1, 2.2, 2.3, 2.4, 2.5.1, 2.5.2, Slides provided by the author, modified by L N Cassel September 2003.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
IR Models J. H. Wang Mar. 11, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y.
Chapter. 02: Modeling Contenue... 19/10/2015Dr. Almetwally Mostafa 1.
CpSc 881: Information Retrieval. 2 Recall: Term-document matrix This matrix is the basis for computing the similarity between documents and queries. Today:
Generic text summarization using relevance measure and latent semantic analysis Gong Yihong and Xin Liu SIGIR, April 2015 Yubin Lim.
Generalized Vector Model n Classic models enforce independence of index terms. n For the Vector model: u Set of term vectors {k1, k2,..., kt} are linearly.
SINGULAR VALUE DECOMPOSITION (SVD)
Najah Alshanableh. Fuzzy Set Model n Queries and docs represented by sets of index terms: matching is approximate from the start n This vagueness can.
Modern information retreival Chapter. 02: Modeling (Latent Semantic Indexing)
1 Latent Concepts and the Number Orthogonal Factors in Latent Semantic Analysis Georges Dupret
Motivation  Methods of local analysis extract information from local set of documents retrieved to expand the query  An alternative is to expand the.
Information Retrieval and Web Search IR models: Boolean model Instructor: Rada Mihalcea Class web page:
Recuperação de Informação B Cap. 02: Modeling (Set Theoretic Models) 2.6 September 08, 1999.
Set Theoretic Models 1. IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models.
10.0 Latent Semantic Analysis for Linguistic Processing References : 1. “Exploiting Latent Semantic Information in Statistical Language Modeling”, Proceedings.
Information Retrieval CSE 8337 Spring 2005 Modeling (Part II) Material for these slides obtained from: Modern Information Retrieval by Ricardo Baeza-Yates.
Recuperação de Informação B Cap. 02: Modeling (Latent Semantic Indexing & Neural Network Model) 2.7.2, September 27, 1999.
Introduction n IR systems usually adopt index terms to process queries n Index term: u a keyword or group of selected words u any word (more general) n.
Probabilistic Model n Objective: to capture the IR problem using a probabilistic framework n Given a user query, there is an ideal answer set n Querying.
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Information Retrieval: Models and Methods
Advanced information retreival
Information Retrieval: Models and Methods
Latent Semantic Indexing
LSI, SVD and Data Management
Representation of documents and queries
CS 430: Information Discovery
Recuperação de Informação B
Automatic Global Analysis
CS 430: Information Discovery
Recuperação de Informação B
Information Retrieval and Web Search
Recuperação de Informação B
Recuperação de Informação B
Recuperação de Informação B
Recuperação de Informação B
Berlin Chen Department of Computer Science & Information Engineering
Recuperação de Informação B
Recuperação de Informação B
Advanced information retrieval
Presentation transcript:

Recuperação de Informação B Cap. 02: Modeling (Latent Semantic Indexing & Neural Network Model) 2.7.2, 2.7.3 September 27, 1999

Latent Semantic Indexing Classic IR might lead to poor retrieval due to: unrelated documents might be included in the answer set relevant documents that do not contain at least one index term are not retrieved Reasoning: retrieval based on index terms is vague and noisy The user information need is more related to concepts and ideas than to index terms A document that shares concepts with another document known to be relevant might be of interest

Latent Semantic Indexing The key idea is to map documents and queries into a lower dimensional space (i.e., composed of higher level concepts which are in fewer number than the index terms) Retrieval in this reduced concept space might be superior to retrieval in the space of index terms

Latent Semantic Indexing Definitions Let t be the total number of index terms Let N be the number of documents Let (Mij) be a term-document matrix with t rows and N columns To each element of this matrix is assigned a weight wij associated with the pair [ki,dj] The weight wij can be based on a tf-idf weighting scheme

Latent Semantic Indexing The matrix (Mij) can be decomposed into 3 matrices (singular value decomposition) as follows: (Mij) = (K) (S) (D)t (K) is the matrix of eigenvectors derived from (M)(M)t (D)t is the matrix of eigenvectors derived from (M)t(M) (S) is an r x r diagonal matrix of singular values where r = min(t,N) that is, the rank of (Mij)

Computing an Example Let (Mij) be given by the matrix Compute the matrices (K), (S), and (D)t

Latent Semantic Indexing In the matrix (S), select only the s largest singular values Keep the corresponding columns in (K) and (D)t The resultant matrix is called (M)s and is given by (M)s = (K)s (S)s (D)t where s, s < r, is the dimensionality of the concept space The parameter s should be large enough to allow fitting the characteristics of the data small enough to filter out the non-relevant representational details s

Latent Ranking The user query can be modelled as a pseudo-document in the original (M) matrix Assume the query is modelled as the document numbered 0 in the (M) matrix The matrix (M)t(M)s quantifies the relantionship between any two documents in the reduced concept space The first row of this matrix provides the rank of all the documents with regard to the user query (represented as the document numbered 0) s

Conclusions Latent semantic indexing provides an interesting conceptualization of the IR problem It allows reducing the complexity of the underline representational framework which might be explored, for instance, with the purpose of interfacing with the user

Neural Network Model Classic IR: Motivation: Terms are used to index documents and queries Retrieval is based on index term matching Motivation: Neural networks are known to be good pattern matchers

Neural Network Model Neural Networks: The human brain is composed of billions of neurons Each neuron can be viewed as a small processing unit A neuron is stimulated by input signals and emits output signals in reaction A chain reaction of propagating signals is called a spread activation process As a result of spread activation, the brain might command the body to take physical reactions

Neural Network Model A neural network is an oversimplified representation of the neuron interconnections in the human brain: nodes are processing units edges are synaptic connections the strength of a propagating signal is modelled by a weight assigned to each edge the state of a node is defined by its activation level depending on its activation level, a node might issue an output signal

Neural Network for IR: From the work by Wilkinson & Hingston, SIGIR’91 Document Terms Query Terms Documents ka kb kc k1 kt d1 dj dj+1 dN

Neural Network for IR Three layers network Signals propagate across the network First level of propagation: Query terms issue the first signals These signals propagate accross the network to reach the document nodes Second level of propagation: Document nodes might themselves generate new signals which affect the document term nodes Document term nodes might respond with new signals of their own

Quantifying Signal Propagation Normalize signal strength (MAX = 1) Query terms emit initial signal equal to 1 Weight associated with an edge from a query term node ki to a document term node ki: Wiq = wiq sqrt ( i wiq ) Weight associated with an edge from a document term node ki to a document node dj: Wij = wij sqrt ( i wij ) 2 2

Quantifying Signal Propagation After the first level of signal propagation, the activation level of a document node dj is given by: i Wiq Wij = i wiq wij sqrt ( i wiq ) * sqrt ( i wij ) which is exactly the ranking of the Vector model New signals might be exchanged among document term nodes and document nodes in a process analogous to a feedback cycle A minimum threshold should be enforced to avoid spurious signal generation 2 2

Conclusions Model provides an interesting formulation of the IR problem Model has not been tested extensively It is not clear the improvements that the model might provide