Extended Boolean Model n Boolean model is simple and elegant. n But, no provision for a ranking n As with the fuzzy model, a ranking can be obtained by relaxing the condition on set membership n Extend the Boolean model with the notions of partial matching and term weighting n Combine characteristics of the Vector model with properties of Boolean algebra
The Idea n The extended Boolean model (introduced by Salton, Fox, and Wu, 1983) is based on a critique of a basic assumption in Boolean algebra n Let, u q = kx ky u wxj = fxj * idf(x) associated with [kx,dj] max(idf(i)) u Further, wxj = x and wyj = y
The Idea: n qand = kx ky; wxj = x and wyj = y dj dj+1 y = wyj x = wxj(0,0) (1,1) kx ky sim(qand,dj) = 1 - sqrt( (1-x) + (1-y) ) 2 22 AND
The Idea: n qor = kx ky; wxj = x and wyj = y dj dj+1 y = wyj x = wxj(0,0) (1,1) kx ky sim(qor,dj) = sqrt( x + y ) 2 22 OR
Generalizing the Idea n We can extend the previous model to consider Euclidean distances in a t-dimensional space n This can be done using p-norms which extend the notion of distance to include p-distances, where 1 p is a new parameter n A generalized conjunctive query is given by u qor = k1 k2... kt n A generalized disjunctive query is given by u qand = k1 k2... kt p p p p p p
Generalizing the Idea u sim(qand,dj) = 1 - ((1-x1) + (1-x2) (1-xm) ) m ppp p 1 u sim(qor,dj) = (x1 + x xm ) m ppp p 1
Properties u sim(qor,dj) = (x1 + x xm ) m u If p = 1 then (Vector like) F sim(qor,dj) = sim(qand,dj) = x xm m u If p = then (Fuzzy like) F sim(qor,dj) = max (wxj) F sim(qand,dj) = min (wxj) ppp p 1
Properties u By varying p, we can make the model behave as a vector, as a fuzzy, or as an intermediary model u This is quite powerful and is a good argument in favor of the extended Boolean model u (k1 k2) k3 F k1 and k2 are to be used as in a vectorial retrieval while the presence of k3 is required. 2
Properties u q = (k1 k2) k3 u sim(q,dj) = ( (1 - ( (1-x1) + (1-x2) ) ) + x3 ) 2 2 2 p p p 1 p 1 p
Conclusions n Model is quite powerful n Properties are interesting and might be useful n Computation is somewhat complex n However, distributivity operation does not hold for ranking computation: u q1 = (k1 k2) k3 u q2 = (k1 k3) (k2 k3) u sim(q1,dj) sim(q2,dj)