Presentation is loading. Please wait.

Presentation is loading. Please wait.

Optimization of a Finite State Space for Information Retrieval Qiang Huang School of Computing, Robert Gordon University.

Similar presentations


Presentation on theme: "Optimization of a Finite State Space for Information Retrieval Qiang Huang School of Computing, Robert Gordon University."— Presentation transcript:

1 Optimization of a Finite State Space for Information Retrieval Qiang Huang School of Computing, Robert Gordon University

2 Outline - Our previous research - Challenges and our idea - Work plan

3 Concept Review Query:Documents: glasgow monday weather temperature …

4 Concept Review Model of information retrieval: Document Model:P(Doc_Model) = P(w|D) Query Model:P(Query_Model) = P(w|Q) Language Model: The construction of a language model is to estimate the probability of the observations in a probabilistic space with using some measurement.

5 Concept Review TermsQuery Model: Document Model: D1 D2 D3 …DN a0 0 0.01 00 aa0 0.01 0.05 0 0.05 glasgow 0.25 0.02 0.5 0.30.3 Monday 0.25 0.22 0.4 0.20.2 temperature 0.4 0.21 0.01 0.40.4 weather 0.1 0.5 0.01 0.10.05 zoo0 0.04 0.02 00........................

6 Concept Review Probability based Measures: Kullback-Leibler (KL) Divergence: Vector based Measures: Euclidean Distance Cosine value of the angle between two vectors, query and document.

7 Model Optimization glasgow mondaytemperature wiwi

8 Model Optimization glasgow monday temperature wiwi P(w i |monday) P(w i |temperature) P(w i |glasgow)

9 Framework of Theory Problem: - what information can be used to build the state space? words, query terms (single term and combinations), part of document, documents, even the intra-relationship between query terms, and other context information, such as user’s search history Example: Q = {glasgow, monday, temperature} Q’ = {{glasgow}, {monday}, {temperature}, {glasgow monday}, {glasgow temperature}, {monday temperature}, {glasgow monday temperature}}

10 Framework of Theory Algorithms: - Association Rule: is used to estimate P(w | Q j ). Song, D., Huang, Q., Rüger, S., Bruza, P., Facilitating Query Decomposition in Query Language Modeling by Association Rule Mining Using Multiple Sliding Windows. The 30th European Conference on Information Retrieval (ECIR’2008). - The Aspect Model: is used to estimate P(w | Q j ) and P(Q j ). Huang, Q., Song, D., Rüger, S., Bruza, P., Learning and Optimization of an Aspect Hidden Markov Model for Query Language Model Generation. The 1st International Conference on the Theory of Information Retrieval (ICTIR’2007). - Markov Chain: P(w i ) = Σ w j P(w i | w j ) P(w j ) Hoenkamp, E., Bruza, P., Huang, Q. and Song, D. The Asymptotic Behavior of a Limited Dependencies Language Model. Dutch-Belgian information retrieval (DIR’2008), the Netherlands, 2008. - The Hidden Markov Model: is used to estimate P(w | Q j ) and P(Q j ) with taking into account the intra-relations between the subsets of query Q. Huang, Q., Song, D., A latent Variable Model for Query Expansion using the Hidden Markov Model. The 17th Conference on Information and Knowledge Management (CIKM’2008).

11 Evaluation Evaluation: Data sets: -.GOV2 (~22 million documents, ~500G) - WT10G (~1.6 million documents, ~10G) - TREC Discs 1-5 (~2 million documents, ~9G) Performances: Our methods significantly outperform a number of state-of-the-art models, such as the Relevance Model (from University of Massachusetts)

12 Challenges Challenges: - No general ways to model the relationships between dynamic contexts - Lack of mechanisms in classical IR for integrating and mapping between representations of such multimedia and structured documents and their respective contexts

13 Why using Quantum Theory (QT) Why using Quantum Theory (QT): QT provides a unified framework for different types of mechanisms: - geometrical representation of information as vectors in Hilbert space - measurement of observables via subspace projection operators - ability for logical reasoning through lattice structures - modeling the change of states via evolution operators

14 Issues Issues: - How to represent search state? - How to develop operational methods for measurement of observables and modelling context? - How to use the interaction and evolution of contexts?

15 How to do (PFSA) Probabilistic finite state automata (PFSA): is a tuple - is a finite set of states - is the initial state - is the set of final states - is a finite alphabet - is an nxn stochastic matrix: is the probability of going from state i to state j when w is a input letter - is a probability distribution of a letter w over

16 How to do (QFSA) Quantum finite state automata (QFSA): is a quantum analogue of probabilistic automata - N-state qubit, which is an element of N-dimensional complex projective space. - Uw is an NxN unitary matrix for letter w. - ∑ is a finite alphabet,. - Pr(w), the probability of the state machine accepting a given finite input string (w = {w k, …, w 1, w k }) is. (||.||² is L² norm, and P is a NxN projection matrix)

17 How will I do (Methods) Method 1: - search states: queries, documents, and different types of context - subspace model: the rich lattice structure of the quantum theory (QT) space Method 2: - context operator: the required basis vectors, the relative importance with respect to the context, and the projection of the document onto the subspace Method 3: - evolution of states: the use of unitary operators in QT and the dynamics of search involving changing information

18 Plan


Download ppt "Optimization of a Finite State Space for Information Retrieval Qiang Huang School of Computing, Robert Gordon University."

Similar presentations


Ads by Google