Download presentation
Presentation is loading. Please wait.
1
Fussy Set Theory Definition A fuzzy subset A of a universe of discourse U is characterized by a membership function which associate with each element u of U a number in the interval [0,1]. Set Theory: A={a, b, c}.Subset of A: {a, c}. An element is either in a set of not in a set. is either 0 or 1.
2
Set Theory Let U be the set of all elements (universe) There are three basic operations: A B={elements in A or in B}. A B={elements in both A and B} Not A=U-A.
3
Definition Let U be the universe of discourse, A and B be two fussy subsets of U, and be the complement of A relative to U. Also, let u be an element of U. Then,
4
Fuzzy Information Retrieval We first set up term-term correlation matric: For terms k i and k l, Where n i is the number of documents containing k i, n l is the number of documents containing k l And n i,l is the number of documents containing both k i and k l. Note C i,i =1.
5
Fuzzy Information Retrieval We define a fuzzy set for each term k i. In the fuzzy set for k i, a document dj has a degree of membership ij computed as Example: c1,2=0.1, c1,3=0.21. D1=(0, 1, 1, 0). 1,1 = 1-0.9*0.79. D2=(1, 0, 0, 0). 1,2 = 1-0. (since c 1,1 =1.) How is d3=(1, 0, 1,0)?
6
Fuzzy Information Retrieval Whenever, the document d j contains a term that is strongly related to k i, then the document d j is belong to the fuzzy set of term k i, i.e., i,j is very close to 1. Example, c 1,2 =0.9, d1=(0, 1, 0, 0). 1,1 =1-(1-0.9)=0.9
7
Query: Query is a Boolean formula, e.g., q=Ka and (Kb or not Kc). q= (1, 1, 1) or (1, 1, 0) or (1, 0, 0). Suppose q is
8
Figure 1. Fuzzy document sets for the query. Each is a conjunctive component. is the query fuzzy set.
9
Where is the membership of in the fuzzy set associated with. q,j is the membership of document j for query q.
10
Exercise: suppose there are 3 doc. and 4 terms. d1=(1, 0, 1, 0), d2=(1, 1, 0, 0), and d3=(0, 1, 1, 0). (1) Compute the term-term correlation matrix c i,j. (2) Compute i,j (membership of document j in term i.) (3) If the query q=(1, 0, 0, 0) or (1, 1, 0, 0), compute q,k for each document d k.
11
Some changes in the last slide. q, j = cc1+cc2+cc3,j =max { cc1,j, cc2,j, cc3,j }, where cc1,j, cc2,j, cc3,j are computed as before.
12
String Matching Allowing Errors Problem: Given a short pattern P of length m, a long text T of length n, and a maximum allowed number of errors k, find all the text positions where the pattern occurs with at most k errors.
13
Dynamic Programming C[i,j] be the number of errors allowed, i and j are the indices for the pattern and the text. Three kinds of error: mismatch (a, b), insertion( a, )and deletion (, a).
14
The matrix The dynamic programming algorithm search ‘survey’ in the text ‘surgery’ with two errors. Bold entries indicate matching positions. Running time O(n m). sxsurgery 0000000000 s1010111111 u2111012222 r3222101223 v4333211233 e5444322123 y6555433222
15
Exercise Let ABCABCDDABEDF be the text and pattern be ABCDAB. Find the occurrence of the pattern with at most 1 error.
16
String Matching Allowing Errors (FAST Algorithm) Just keep the cells with value at most k. This will reduce the time complexity.
17
Regular expressions Matching Regular expression: 1.Any letter x in { } ,is a regular expression, where is the set of all letters. 2. if A and B are regular expression, then A|B, A.B and (A)* are regular expressions.
18
Regular expressions Matching (Not Required) Given an regular expression E and a string T, find all the substrings in T that match E. Let d(i) be the set of all states in the automaton that can be reached after T 1 T 2 …T i is accepted. Given d(i), d(i+1) can be computed easily. There is a starting and final state in the automaton. Whenever the final state is reach, we find a substring in T that match the expression.
23
Example: E=(A|AA).(B|AB). T=ABBAB. D(1)={a, b, d, c} D(2)={ a,b, d, e, f, g, i }, D(3)={a,b,c, e, f, g, i, h, l}. D(4)={a,b,d,c,j} D(5)={a,b,d, e, f, g, i, k}
24
Running time O(n 2 ), where n is the size of the automaton since d(s, i) could contain O(n) states.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.