CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010
Agenda Fuzzy Sets Fuzzy Operations Fuzzy Set Retrieval Processing Fuzzy Queries Set Issues -level Sets
Fuzzy Sets
Issue Problem – Boolean assumes documents Exist in 1 of 2 states States are Non-Overlapping – CRISP! – Other crisp states may exist Grades: A, B, C, D, F – Can be in one and not the other. – Often, exact boundaries are not known Hot, Warm, Luke-warm, Cool, Chilly, Cold – When does Hot become Warm?
Issue Fuzzy Sets – Informal Definition Fuzzy Set is a class with fuzzy boundaries. Alternatively, fuzzy set is a class with non-crisp boundaries.
Fuzzy Sets – Slightly More Formal Regular (Crisp) set – Let X = {x i } be the universe of objects of interest. Fuzzy Subset of X A. – A is a set of order pairs A = { } – (x) Grade of membership of x in A – : X [0,1]
Example 1 Assume Crisp Set is – X = {x 1, x 2, x 3, x 4 } Fuzzy Subsets – A = {, } – B = { }
Example 2 Membership Functions 4’6” 6’ short mediumtall 1 0 1 0 ’ tall 4’ medium (4’)=0.2, short (4’)=.75 Crisp Fuzzy
Fuzzy Operations
Operations 1 Let – A = { } – B = { } Equality – A = B iff A (x) = B (x), for all x X.
Operations 2 Containment – A B iff A (x) B (x), for all x X. Complement – Ā = { } iff Ā (x) = 1 - A (x), for all x X.
Operations 3 Union – A B = { } iff A B = max( A (x), B (x)), for all x X. Intersection – A B = { } iff A B = min( A (x), B (x)), for all x X.
Fuzzy Set Retrieval Model
Document Representation Boolean Relation – D = { } – D : D x T {0,1} – D (d , t i ) 1, if d contains t i 0, otherwise D t = {d D | D (d , t) = 1} d D d = {t i T | D (d, t i ) = 1}
Document Representation Fuzzy D = { } D : D x T [0,1] D (d , t i ) – The degree of membership of t i to d D t = { | d D >} D d = { | t i T>}
Processing Boolean Query (Method 2) D t1 t2 = {d D | D (d , t 1 ) D (d , t 2 ) = 1} D t1 t2 = {d D | D (d , t 1 ) D (d , t 2 ) = 1} D t = set of Documents containing term t T = {a,b,c,d,e} D a, D b, D c, D d, D e,
Processing Fuzzy Query (Method 2) D t1 t2 = { | d D } D t1 t2 = { | d D } D t = set of pairs containing term t T = {a, b, c, d, e} D a, D b, D c, D d, D e
Retrieval Function Boolean RSV F RSV t (d ) = D (d , t) RSV e (d ) = 1 - RSV e (d ) RSV e1 e2 (d ) = RSV e1 (d ) RSV e2 (d ) RSV e1 e2 (d ) = RSV e1 (d ) RSV e2 (d )
Retrieval Function Fuzzy RSV t (d ) = D (d , t) RSV e (d ) = 1 - RSV e (d ) RSV e1 e2 (d ) = min(RSV e1 (d ), RSV e2 (d )) RSV e1 e2 (d ) = max(RSV e1 (d ), RSV e2 (d ))
Processing Fuzzy Queries
Term-Document Incidence Matrix d1d1 d2d2 d3d3 d4d4 d5d5 d6d6 d7d7 d8d8 t1t t2t t3t t4t t5t
Fuzzy example q = (t 1 t 2 ) (t 2 t 3 t 4 )) t1t1 t2t2 t2t2 t3t3 t4t4
Fuzzy example (Method 1) q = (t 1 t 2 ) (t 2 t 3 t 4 )) t1t1 t2t2 t2t2 t3t3 t4t4 d d d
Inverted File Processing (Method 2) D t = { | D t (d ) > 0} D e = { | 1-RSV e (d ) > 0} D e1 e2 = { | RSV e1 e2 (d ) > 0} D e1 e2 = { | RSV e1 e2 (d ) > 0}
Processing Fuzzy Query (Method 2) Output D t D e1 D e2 D e1 D e2 D e1 D e2 D\D e1 Query t e1 e2 e1 e2 e1 e2 e1
Set Issues
Set Identities A B = B A (A B) C = A (B C) A B C) = (A B) C) A = A A A’ = S A B = B A (A B) C = A (B C) A B C) = (A B) C) A S = A A A’ =
Definitions of Union and Intersection NameUnionIntersection max, min max, product · probabilistic sum, product· Einstein bold Hamacher Yager Schweizer-Sklard ⊥p⊥p ⊤p⊤p ℇ + · + v v
Operations – Union & Intersection a · b = ab a b = a + b – ab a b = (a + b) / (1 + ab) a b = (ab) / ( 1 + (1-a)(1-b) a b = min(a + b, 1) a b = max(0, a + b - 1) + ℇ
Operations – Union & Intersection a b = (a + b – (1 – )ab) / ( 1- ab – ∞) a b = (ab) / ( a+b – ∞) a b = min(1, (a v + b v ) (1/v) ) – v ∞) a b = 1 - min(1, [(1-a) v + (1-b) v ] (1/v) ) – v ∞) + · v v
Operations – Union & Intersection 1 – ((1-a) -p + (1-b) -p -1) (1/p) p > 0 a bp = 0 1 – ((1-a) -p + (1-b) -p -1) (1/p) p < 0, (1-a) -p + (1-b) -p >1 1p < 0, (1-a) -p +(1-b) -p ≤ 1 a T p b =
Operations – Union & Intersection (a -p + b -p -1) (1/p) p > 0 abp = 0 (a -p + b -p -1) (1/p) p < 0, a -p + b -p >1 0p < 0, a -p +b -p ≤ 1 a T p b =
Example of Set Operations Assume – Union = Probabilistic Sum = a b = a + b – ab – Intersection = Product = a · b = ab
Term-Document Incidence Matrix d1d1 d2d2 d3d3 d4d4 d5d5 d6d6 d7d7 d8d8 t1t t2t t3t t4t t5t
Fuzzy example q = (t 1 t 2 ) (t 2 t 3 t 4 )) t1t1 t2t2 t2t2 t3t3 t4t4
Fuzzy example (Method 1) Max, Min q = (t 1 t 2 ) (t 2 t 3 t 4 )) t1t1 t2t2 t2t2 t3t3 t4t4 d d d
Fuzzy example (Method 1) probabilistic sum and product q = (t 1 t 2 ) (t 2 t 3 t 4 )) t1t1 t2t2 t2t2 t3t3 t4t4 d d d
Differences DocumentMax, MinProb. Sum, Product d3d d6d d8d
-level Sets
-level Sets - Definitions -level Set – D( ) = { | D (d , t) ≥ } – D t ( ) = { d | D ( )} Then meaning of D t( ) of descriptor t T T* – D t( ) = { | d D t ( )} Note: T* has same meaning as E used in Boolean IR If t’ T*, then meaning of D t( ) of descriptor t = t’ – D t( ) = { | 1- D t’( ) (d) > ,d D t’ ( ) }
-level Sets - Definitions If t’, t’’ T*, then meaning of D t( ) of descriptor t = t’ t’’ – D t( ) = { |d D t’ ( ) D t’’ ( ) If t’, t’’ T*, then meaning of D t( ) of descriptor t = t’ t’’ – D t( ) = { |d D t’ ( ) D t’’ ( )
Fuzzy example q = (t 1 t 2 ) (t 2 t 3 t 4 )) t1t1 t2t2 t2t2 t3t3 t4t4
Fuzzy example Max, Min q = (t 1 t 2 ) (t 2 t 3 t 4 )) t1t1 t2t2 t2t2 t3t3 t4t4 d d d
Fuzzy example Max, Min, *=0.3 q = (t 1 t 2 ) (t 2 t 3 t 4 )) t1t1 t2t2 t2t2 t3t3 t4t4 d d d (0) (0) (0)
Questions?