Download presentation
Presentation is loading. Please wait.
1
A Self Learning Universal Concept Spotter By Tomek Strzalkowski and Jin Wang Presented by Iman Sen
2
Introduction Previously, information taggers were hand crafted, domain specific, and/or too reliant on lexical clues such as upper case, format, etc. The Universal Spotter is one of the first set of algorithms for unsupervised learning which can identify any category from any large corpus, given some initial examples and context information on what to spot.
3
Basic Idea Get some prior examples and context for things to spot (called seed) & a large corpus Exploiting redundancy of patterns in text Use those examples to get “new” item and context information to add to original set of rules. Initially precision is high, recall very low. Repeat above cycle to maximize recall, while maintaining/improving precision.
4
Seeds: What we are looking for Initially, the seed is some information provided by user. It is either Examples or Contextual Information. Examples can be highlighted in the text ( “Microsoft”, “toothbrushes”). Context information can also be specified (both Internal & External). For example, “Name ends with Co.” or “appears after produced”. Negative examples and context information such as “Not to the right of produced”.
5
The Cyclic Process 1.Build rules from the initial examples and context info. 2.Find further examples of this concept, in the corpus, while trying to maximize precision/recall. 3.As we find more examples of the concepts, we can find more contextual information. 4.Use the expanded context info to find more entities.
6
Simple Example Suppose we have the seeds “Co” and “Inc” initially and the following text. “Henry Kaufman is president of Henry Kaufman & Co., …..president of Gabelli Funds Inc. ; Claude. N. Rosenberg is named president of Thomson S.A ….” Use “Co” and “Inc” to pick out Henry Kaufman & Co and Gabelli Funds Inc. Use these new seeds to get contextual information such as for example, “president of” before each of the entities. Use “president of” to find “Thomson S.A.”
7
The Classification Task So our goal is to decide whether a sequence of words contains a desired entity/concept. This is done by calculating significance weights, SW, and then combining them.
8
The Process: In Detail Initially some preprocessing is done including tokenization, POS tagging and lexical normalization or stemming. POS tagging help to delineate which sequence of words might contain the desired entities. These steps reduce the amount of noise.
9
How to calculate SW Consider sequence of words W1,W2,…Wm in text which is of interest. There is a window of size n on either side of the central unit where one looks for contextual information. Then do the following: Make up pairs of (word, position), where position is one of preceding (p) context, central unit (s) or following (f) context for all words within the window of size n. Similarly make up pairs of (bigram, position). Make up 3-tuples of (word, position, distance) for the same sequence of words, where distance is the distance from W1 or Wm. (for units in W1 thru Wm take distance from Wm).
10
An SW Calculation Example Example:... boys kicked the door with rage... with window n=2, and central unit, “the door”. The generated tuples (called evidence items) are : (boys, p), (kicked, p), (the, s), (door, s), (with, f), (rage, f), ((boys, kicked), p), ((the, door)), s), ((with,,rage), f), (boys, p, 2), (kicked, p, 1), (the, s, 2), (door, s, 1), (with, f, 1), (rage, f, 2), ((boys, kicked), p, 1), ((the, door)), s, 1), ((with,,rage), f, 1)
11
SW Calculation continued …. 2 groups of items, A is the group of accepted items and R the group of rejected items. Use these groups, to calculate SW: where s is a constant to filter noise and f(x,X) is frequency of x in X. SW as described here takes values between -1.0 & 1.0 For some e, SW(t)>e>0 is taken as a +ve evidence and SW(t)<-e is taken as –ve evidence. SW (t) = f(t,A)-f(t,R) f ( t, A ) + f ( t, R ) > s f(t,A)+y(t,R) 0 otherwise
12
Combining SW weights Then, these SW weights are combined and if this exceeds a threshold, then they become available during the tagging stage. the primary scheme used by the authors for combining is: x + y - xy if x>O and y>O x O y = x + y + xy if x<O and y<O x + y otherwise Note: Values still remain with [-1.0, 1.0]
13
Bootstrapping The basic bootstrapping process then looks like this: Procedure Bootstrapping Collect seeds l o o p Training phase (calc. SW weights, combine, add to rules) Tagging phase (use all accumulated rules to tag) until Satisfied.
14
Experiments and Results Organizations : Training on 7 MB WSJ corpus, Testing on 10 selected articles. Initially, precision 97% but recall 49% Maximized to p=95% & r= 90% after 4 th cycle Similar experiment for identifying products but worse results
15
Improvements Different weighing and combining schemes Universal Lexicon Lookups: Can verify accepted items in existing online lexical databases. Program cannot deal with Conjunctions of noun phrases due to identification difficulties.
16
Some Considerations Not clear how many initial seeds were provided The program is described for identifying one category of items at a time but could be extended to more. A limitation is that it might not be possible to spot certain context/examples due to noise in data and also for entities that do not have obvious context patterns. The POS tagger errors are inherited.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.