Advanced information retrieval Chapter. 02: Modeling (Set Theoretic Models) – Fuzzy model.

Slides:



Advertisements
Similar presentations
Traditional IR models Jian-Yun Nie.
Advertisements

Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Basic IR: Modeling Basic IR Task: Slightly more complex:
Fuzzy Set and Opertion. Outline Fuzzy Set and Crisp Set Expanding concepts Standard operation of fuzzy set Fuzzy relations Operations on fuzzy relations.
The Probabilistic Model. Probabilistic Model n Objective: to capture the IR problem using a probabilistic framework; n Given a user query, there is an.
FUZZY SYSTEMS. Fuzzy Systems Fuzzy Sets – To quantify and reason about fuzzy or vague terms of natural language – Example: hot, cold temperature small,
Modern Information Retrieval by R. Baeza-Yates and B. Ribeiro-Neto
Web Search - Summer Term 2006 II. Information Retrieval (Basics Cont.)
IR Models: Overview, Boolean, and Vector
ISP 433/533 Week 2 IR Models.
Models for Information Retrieval Mainly used in science and research, (probably?) less often in real systems But: Research results have significance for.
Fussy Set Theory Definition A fuzzy subset A of a universe of discourse U is characterized by a membership function which associate with each element u.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Chapter 2Modeling 資工 4B 陳建勳. Introduction.  Traditional information retrieval systems usually adopt index terms to index and retrieve documents.
Modeling Modern Information Retrieval
Project Management: The project is due on Friday inweek13.
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
Vector Space Model CS 652 Information Extraction and Integration.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
IR Models: Review Vector Model and Probabilistic.
Fussy Set Theory Definition A fuzzy subset A of a universe of discourse U is characterized by a membership function which associate with each element.
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
Chapter 7 Retrieval Models.
Information Retrieval: Foundation to Web Search Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 13, 2015 Some.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Fuzzy Logic. Lecture Outline Fuzzy Systems Fuzzy Sets Membership Functions Fuzzy Operators Fuzzy Set Characteristics Fuzziness and Probability.
CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 32-33: Information Retrieval: Basic concepts and Model.
1 CS 430: Information Discovery Lecture 12 Extending the Boolean Model.
Information Retrieval Chapter 2: Modeling 2.1, 2.2, 2.3, 2.4, 2.5.1, 2.5.2, Slides provided by the author, modified by L N Cassel September 2003.
Information Retrieval Models - 1 Boolean. Introduction IR systems usually adopt index terms to process queries Index terms:  A keyword or group of selected.
IR Models J. H. Wang Mar. 11, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y.
Chapter. 02: Modeling Contenue... 19/10/2015Dr. Almetwally Mostafa 1.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 19: Fuzzy Logic and Neural Net Based IR.
Term Frequency. Term frequency Two factors: – A term that appears just once in a document is probably not as significant as a term that appears a number.
Generalized Vector Model n Classic models enforce independence of index terms. n For the Vector model: u Set of term vectors {k1, k2,..., kt} are linearly.
Information Retrieval CSE 8337 Spring 2005 Modeling Material for these slides obtained from: Modern Information Retrieval by Ricardo Baeza-Yates and Berthier.
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
Najah Alshanableh. Fuzzy Set Model n Queries and docs represented by sets of index terms: matching is approximate from the start n This vagueness can.
Lógica difusa  Bayesian updating and certainty theory are techniques for handling the uncertainty that arises, or is assumed to arise, from statistical.
1 Patrick Lambrix Department of Computer and Information Science Linköpings universitet Information Retrieval.
Vector Space Models.
CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.
Topic 2 Fuzzy Logic Control. Ming-Feng Yeh2-2 Outlines Basic concepts of fuzzy set theory Fuzzy relations Fuzzy logic control General Fuzzy System R.R.
Modern information retreival Chapter. 02: Modeling (Latent Semantic Indexing)
Information Retrieval Chap. 02: Modeling - Part 2 Slides from the text book author, modified by L N Cassel September 2003.
Information Retrieval and Web Search Probabilistic IR and Alternative IR Models Rada Mihalcea (Some of the slides in this slide set come from a lecture.
The Boolean Model Simple model based on set theory
Information Retrieval and Web Search IR models: Boolean model Instructor: Rada Mihalcea Class web page:
Recuperação de Informação B Cap. 02: Modeling (Set Theoretic Models) 2.6 September 08, 1999.
C.Watterscsci64031 Classical IR Models. C.Watterscsci64032 Goal Hit set of relevant documents Ranked set Best match Answer.
Set Theoretic Models 1. IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models.
Information Retrieval CSE 8337 Spring 2005 Modeling (Part II) Material for these slides obtained from: Modern Information Retrieval by Ricardo Baeza-Yates.
Introduction n IR systems usually adopt index terms to process queries n Index term: u a keyword or group of selected words u any word (more general) n.
Fuzzy Relations( 關係 ), Fuzzy Graphs( 圖 形 ), and Fuzzy Arithmetic( 運算 ) Chapter 4.
Chapter 3: Fuzzy Rules & Fuzzy Reasoning Extension Principle & Fuzzy Relations (3.2) Fuzzy if-then Rules(3.3) Fuzzy Reasonning (3.4)
CLASSICAL LOGIC and FUZZY LOGIC
Representation of documents and queries
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Models for Retrieval and Browsing - Fuzzy Set, Extended Boolean, Generalized Vector Space Models Berlin Chen 2003 Reference: 1. Modern Information Retrieval,
Recuperação de Informação B
Recuperação de Informação B
Recuperação de Informação B
Berlin Chen Department of Computer Science & Information Engineering
Recuperação de Informação B
Information Retrieval and Web Design
Modeling in Information Retrieval - Fuzzy Set, Extended Boolean, Generalized Vector Space, Set-based Models, and Best Match Models Berlin Chen Department.
Modeling in Information Retrieval - Fuzzy Set, Extended Boolean, Generalized Vector Space, Set-based Models, and Best Match Models Berlin Chen Department.
Advanced information retrieval
Modeling in Information Retrieval - Fuzzy Set, Extended Boolean, Generalized Vector Space, Set-based Models, and Best Match Models Berlin Chen Department.
Berlin Chen Department of Computer Science & Information Engineering
Presentation transcript:

Advanced information retrieval Chapter. 02: Modeling (Set Theoretic Models) – Fuzzy model

Set Theoretic Models The Boolean model imposes a binary criterion for deciding relevance The Boolean model imposes a binary criterion for deciding relevance The question of how to extend the Boolean model to accomodate partial matching and a ranking has attracted considerable attention in the past The question of how to extend the Boolean model to accomodate partial matching and a ranking has attracted considerable attention in the past We discuss now two set theoretic models for this: We discuss now two set theoretic models for this: –Fuzzy Set Model –Extended Boolean Model

Fuzzy Set Model Queries and docs represented by sets of index terms: matching is approximate from the start This vagueness can be modeled using a fuzzy framework, as follows:  with each term is associated a fuzzy set  each doc has a degree of membership in this fuzzy set This interpretation provides the foundation for many models for IR based on fuzzy theory In here, we discuss the model proposed by Ogawa, Morita, and Kobayashi (1991)

Fuzzy Set Theory Framework for representing classes whose boundaries are not well defined Key idea is to introduce the notion of a degree of membership associated with the elements of a set This degree of membership varies from 0 to 1 and allows modeling the notion of marginal membership Thus, membership is now a gradual notion, contrary to the crispy notion enforced by classic Boolean logic

Alternative Set Theoretic Models -Fuzzy Set Model Model Model –a query term: a fuzzy set –a document: degree of membership in this set –membership function Associate membership function with the elements of the class Associate membership function with the elements of the class 0: no membership in the set 0: no membership in the set 1: full membership 1: full membership 0~1: marginal elements of the set 0~1: marginal elements of the set documents

Fuzzy Set Theory A fuzzy subset A of a universe of discourse U is characterized by a membership function µ A : U  [0,1] which associates with each element u of U a number µ A (u) in the interval [0,1] A fuzzy subset A of a universe of discourse U is characterized by a membership function µ A : U  [0,1] which associates with each element u of U a number µ A (u) in the interval [0,1] –complement: –union: –intersection: a class a document document collection for query term

Examples Assume U={d 1, d 2, d 3, d 4, d 5, d 6 } Assume U={d 1, d 2, d 3, d 4, d 5, d 6 } Let A and B be {d 1, d 2, d 3 } and {d 2, d 3, d 4 }, respectively. Let A and B be {d 1, d 2, d 3 } and {d 2, d 3, d 4 }, respectively. Assume  A ={d 1 :0.8, d 2 :0.7, d 3 :0.6, d 4 :0, d 5 :0, d 6 :0} and Assume  A ={d 1 :0.8, d 2 :0.7, d 3 :0.6, d 4 :0, d 5 :0, d 6 :0} and  B ={d 1 :0, d 2 :0.6, d 3 :0.8, d 4 :0.9, d 5 :0, d 6 :0} ={d 1 :0.2, d 2 :0.3, d 3 :0.4, d 4 :1, d 5 :1, d 6 :1} ={d 1 :0.2, d 2 :0.3, d 3 :0.4, d 4 :1, d 5 :1, d 6 :1} = {d 1 :0.8, d 2 :0.7, d 3 :0.8, d 4 :0.9, d 5 :0, d 6 :0} = {d 1 :0.8, d 2 :0.7, d 3 :0.8, d 4 :0.9, d 5 :0, d 6 :0} = {d 1 :0, d 2 :0.6, d 3 :0.6, d 4 :0, d 5 :0, d 6 :0} = {d 1 :0, d 2 :0.6, d 3 :0.6, d 4 :0, d 5 :0, d 6 :0}

Fuzzy Information Retrieval basic idea basic idea –Expand the set of index terms in the query with related terms (from the thesaurus) such that additional relevant documents can be retrieved –A thesaurus can be constructed by defining a term-term correlation matrix c whose rows and columns are associated to the index terms in the document collection keyword connection matrix

Fuzzy Information Retrieval (Continued) normalized correlation factor c i,l between two terms k i and k l (0~1) normalized correlation factor c i,l between two terms k i and k l (0~1) In the fuzzy set associated to each index term k i, a document d j has a degree of membership µ i,j In the fuzzy set associated to each index term k i, a document d j has a degree of membership µ i,j where n i is # of documents containing term k i n l is # of documents containing term k l n i,l is # of documents containing k i and k l

Fuzzy Information Retrieval (Continued) physical meaning physical meaning –A document d j belongs to the fuzzy set associated to the term k i if its own terms are related to k i, i.e.,  i,j =1. –If there is at least one index term k l of d j which is strongly related to the index k i, then  i,j  1. k i is a good fuzzy index –When all index terms of d j are only loosely related to k i,  i,j  0. k i is not a good fuzzy index

Example q=(k a  (k b   k c )) =(k a  k b  k c )  (k a  k b   k c )  (k a   k b   k c ) =cc 1 +cc 2 +cc 3 q=(k a  (k b   k c )) =(k a  k b  k c )  (k a  k b   k c )  (k a   k b   k c ) =cc 1 +cc 2 +cc 3 DaDa DbDb DcDc cc 3 cc 2 cc 1 D a : the fuzzy set of documents associated to the index k a d j  D a has a degree of membership  a,j > a predefined threshold K D a : the fuzzy set of documents associated to the index k a (the negation of index term k a )

Example Query q=k a  (k b   k c ) disjunctive normal form q dnf =(1,1,1)  (1,1,0)  (1,0,0) (1) the degree of membership in a disjunctive fuzzy set is computed using an algebraic sum (instead of max function) more smoothly (2) the degree of membership in a conjunctive fuzzy set is computed using an algebraic product (instead of min function) Recall

Fuzzy Set Model –Q: “gold silver truck” D1:“Shipment of gold damaged in a fire” D2:“Delivery of silver arrived in a silver truck” D3:“Shipment of gold arrived in a truck” –IDF (Select Keywords) a = in = of = 0 = log 3/3 arrived = gold = shipment = truck = = log 3/2 damaged = delivery = fire = silver = = log 3/1 a = in = of = 0 = log 3/3 arrived = gold = shipment = truck = = log 3/2 damaged = delivery = fire = silver = = log 3/1 –8 Keywords (Dimensions) are selected arrived(1), damaged(2), delivery(3), fire(4), gold(5), silver(6), shipment(7), truck(8) arrived(1), damaged(2), delivery(3), fire(4), gold(5), silver(6), shipment(7), truck(8)

Fuzzy Set Model

Sim(q,d): Alternative 1 Sim(q,d): Alternative 1 Sim(q,d 3 ) > Sim(q,d 2 ) > Sim(q,d 1 ) Sim(q,d): Alternative 2 Sim(q,d): Alternative 2 Sim(q,d 3 ) > Sim(q,d 2 ) > Sim(q,d 1 )