Project Management: The project is due on Friday inweek13.

Slides:



Advertisements
Similar presentations
Traditional IR models Jian-Yun Nie.
Advertisements

Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Basic IR: Modeling Basic IR Task: Slightly more complex:
INSTRUCTOR: DR.NICK EVANGELOPOULOS PRESENTED BY: QIUXIA WU CHAPTER 2 Information retrieval DSCI 5240.
The Probabilistic Model. Probabilistic Model n Objective: to capture the IR problem using a probabilistic framework; n Given a user query, there is an.
Extended Boolean Model n Boolean model is simple and elegant. n But, no provision for a ranking n As with the fuzzy model, a ranking can be obtained by.
Fuzzy Expert System. Basic Notions 1.Fuzzy Sets 2.Fuzzy representation in computer 3.Linguistic variables and hedges 4.Operations of fuzzy sets 5.Fuzzy.
Modern Information Retrieval by R. Baeza-Yates and B. Ribeiro-Neto
CS 430 / INFO 430 Information Retrieval
Web Search - Summer Term 2006 II. Information Retrieval (Basics Cont.)
IR Models: Overview, Boolean, and Vector
ISP 433/533 Week 2 IR Models.
Models for Information Retrieval Mainly used in science and research, (probably?) less often in real systems But: Research results have significance for.
Fussy Set Theory Definition A fuzzy subset A of a universe of discourse U is characterized by a membership function which associate with each element u.
Learning Techniques for Information Retrieval Perceptron algorithm Least mean.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
PART 7 Constructing Fuzzy Sets 1. Direct/one-expert 2. Direct/multi-expert 3. Indirect/one-expert 4. Indirect/multi-expert 5. Construction from samples.
Chapter 2Modeling 資工 4B 陳建勳. Introduction.  Traditional information retrieval systems usually adopt index terms to index and retrieve documents.
Modeling Modern Information Retrieval
Construction of Index: (Page 197) Objective: Given a document, find the number of occurrences of each word in the document. Example: Computer Science students.
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
Vector Space Model CS 652 Information Extraction and Integration.
1 CS 430 / INFO 430 Information Retrieval Lecture 10 Probabilistic Information Retrieval.
IR Models: Review Vector Model and Probabilistic.
Orthogonal Sets (12/2/05) Recall that “orthogonal” matches the geometric idea of “perpendicular”. Definition. A set of vectors u 1,u 2,…,u p in R n is.
Learning Techniques for Information Retrieval We cover 1.Perceptron algorithm 2.Least mean square algorithm 3.Chapter 5.2 User relevance feedback (pp )
Fussy Set Theory Definition A fuzzy subset A of a universe of discourse U is characterized by a membership function which associate with each element.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 32-33: Information Retrieval: Basic concepts and Model.
1 CS 430: Information Discovery Lecture 12 Extending the Boolean Model.
Information Retrieval Chapter 2: Modeling 2.1, 2.2, 2.3, 2.4, 2.5.1, 2.5.2, Slides provided by the author, modified by L N Cassel September 2003.
Information Retrieval Models - 1 Boolean. Introduction IR systems usually adopt index terms to process queries Index terms:  A keyword or group of selected.
IR Models J. H. Wang Mar. 11, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y.
Advanced information retrieval Chapter. 02: Modeling (Set Theoretic Models) – Fuzzy model.
Boolean Algebra and Computer Logic Mathematical Structures for Computer Science Chapter 7.1 – 7.2 Copyright © 2006 W.H. Freeman & Co.MSCS Slides Boolean.
CSE3201/CSE4500 Term Weighting.
Information Retrieval CSE 8337 Spring 2005 Modeling Material for these slides obtained from: Modern Information Retrieval by Ricardo Baeza-Yates and Berthier.
Najah Alshanableh. Fuzzy Set Model n Queries and docs represented by sets of index terms: matching is approximate from the start n This vagueness can.
Modern information retreival Chapter. 02: Modeling (Latent Semantic Indexing)
Information Retrieval and Web Search Probabilistic IR and Alternative IR Models Rada Mihalcea (Some of the slides in this slide set come from a lecture.
The Boolean Model Simple model based on set theory
Information Retrieval and Web Search IR models: Boolean model Instructor: Rada Mihalcea Class web page:
Recuperação de Informação B Cap. 02: Modeling (Set Theoretic Models) 2.6 September 08, 1999.
C.Watterscsci64031 Classical IR Models. C.Watterscsci64032 Goal Hit set of relevant documents Ranked set Best match Answer.
1 A Fuzzy Logic Framework for Web Page Filtering Authors : Vrettos, S. and Stafylopatis, A. Source : Neural Network Applications in Electrical Engineering,
Set Theoretic Models 1. IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Introduction n IR systems usually adopt index terms to process queries n Index term: u a keyword or group of selected words u any word (more general) n.
Probabilistic Model n Objective: to capture the IR problem using a probabilistic framework n Given a user query, there is an ideal answer set n Querying.
1 Boolean Model. 2 A document is represented as a set of keywords. Queries are Boolean expressions of keywords, connected by AND, OR, and NOT, including.
Fuzzy Ordering C i ’ = min f(x i | x)i = 1,2,…,n C i ’ is the membership ranking for the i th variable. Example: Computing C matrix and C’
CS 430: Information Discovery
Latent Semantic Indexing
Recuperação de Informação B
CS 430: Information Discovery
Models for Retrieval and Browsing - Fuzzy Set, Extended Boolean, Generalized Vector Space Models Berlin Chen 2003 Reference: 1. Modern Information Retrieval,
Recuperação de Informação B
Recuperação de Informação B
Recuperação de Informação B
Recuperação de Informação B
Recuperação de Informação B
Berlin Chen Department of Computer Science & Information Engineering
Recuperação de Informação B
Information Retrieval and Web Design
Modeling in Information Retrieval - Fuzzy Set, Extended Boolean, Generalized Vector Space, Set-based Models, and Best Match Models Berlin Chen Department.
Modeling in Information Retrieval - Fuzzy Set, Extended Boolean, Generalized Vector Space, Set-based Models, and Best Match Models Berlin Chen Department.
Advanced information retrieval
Modeling in Information Retrieval - Fuzzy Set, Extended Boolean, Generalized Vector Space, Set-based Models, and Best Match Models Berlin Chen Department.
Berlin Chen Department of Computer Science & Information Engineering
Presentation transcript:

Project Management: The project is due on Friday inweek13. You have to demo the system to me (Lab will be booked) There will be a test in week 10. Course work contains: (1) assignment 1, test in week 10 and the project. The weight for the project is heavy. The purpose for test: (1) small part of coursework and (2) training for the final examination.

Extended Boolean Model: Disadvantages of “Boolean Model” : No term weight is used Counterexample: query q=Kx AND Ky. Documents containing just one term, e,g, Kx is considered as irrelevant as another document containing none of these terms. The size of the output might be too large or too small

Extended Boolean Model: The Extended Boolean model was introduced in 1983 by Salton, Fox, and Wu[703] The idea is to make use of term weight as vector space model. Strategy: Combine Boolean query with vector space model. Why not just use Vector Space Model? Advantages: It is easy for user to provide query.

Extended Boolean Model: Each document is represented by a vector (similar to vector space model.) Remember the formula. Query is in terms of Boolean formula. How to rank the documents?

Fig. Extended Boolean logic considering the space composed of two terms kx and ky only.

Extended Boolean Model: For query q=Kx or Ky, (0,0) is the point we try to avoid. Thus, we can use to rank the documents The bigger the better.

Extended Boolean Model: For query q=Kx and Ky, (1,1) is the most desirable point. We use to rank the documents. The bigger the better.

Extend the idea to m terms qor=k1 p k2 p … p Km qand=k1 p k2 p … p km

Properties: The p norm as defined above enjoys a couple of interesting properties as follows. First, when p=1 it can be verified that Second, when p= it can be verified that Sim(qor,dj)=max(xi) Sim(qand,dj)=min(xi)

Example: For instance, consider the query q=(k1 k2)  k3. The similarity sim(q,dj) between a document dj and this query is then computed as Any boolean can be expressed as a numeral formula.

Exercise: 1. Give the numeral formula for extended Boolean model of the query q=(k1 or k2 or k3)and (not k4 or k5). (assume that there are 5 terms in total.) 2. Assume that the document is represented by the vector (0.8, 0.1, 0.0, 0.0, 1.0). What is sim(q, d) for extended Boolean model? Also try to do more exercise for other Boolean formulas.

Fussy Set Theory Definition A fuzzy subset A of a universe of discourse U is characterized by a membership function which associate with each element u of U a number in the interval [0,1]. Set Theory: A={a, b, c}.Subset of A: {a, c}. An element is either in a set of not in a set. is either 0 or 1.

Set Theory Let U be the set of all elements (universe) There are three basic operations: AB={elements in A or in B}. AB={elements in both A and B} Not A=U-A.

Definition Let U be the universe of discourse, A and B be two fussy subsets of U, and be the complement of A relative to U. Also, let u be an element of U. Then,

Fuzzy Information Retrieval We first set up term-term correlation matric: For terms ki and kl, Where ni is the number of documents containing ki , nl is the number of documents containing kl And ni,l is the number of documents containing both ki and kl. Note Ci,i=1.

Fuzzy Information Retrieval We define a fuzzy set for each term ki. In the fuzzy set for ki , a document dj has a degree of membership ij computed as Example: c1,2=0.1, c1,3=0.21. D1=(0, 1, 1, 0). 1,1= 1-0.9*0.79. D2=(1, 0, 0, 0). 1,2= 1-0. (since c1,1=1.) How is d3=(1, 0, 1,0)?

Fuzzy Information Retrieval Whenever, the document dj contains a term that is strongly related to ki, then the document dj is belong to the fuzzy set of term ki, i.e., i,j is very close to 1. Example, c1,2=0.9, d1=(0, 1, 0, 0). 1,1 =1-(1-0.9)=0.9

Query: Query is a Boolean formula, e.g., q=Ka and (Kb or not Kc). q= (1, 1, 1) or (1, 1, 0) or (1, 0, 0). Suppose q is

Figure 1. Fuzzy document sets for the query . Each is a conjunctive component. is the query fuzzy set.

Where is the membership of in the fuzzy set associated with . q,j is the membership of document j for query q.

Some changes in the last slide. Instead of max{ }, we use +. Instead of min{ }, we use . Exercise: suppose there are 3 doc. and 4 terms. d1=(1, 0, 1, 0), d2=(1, 1, 0, 0), and d3=(0, 1, 1, 0). (1) Compute the term-term correlation matrix ci,j. (2) Compute i,j (membership of document j in term i.) (3) If the query q=(1, 0, 0, 0) or (1, 1, 0, 0), compute q,k for each document dk.