Coupling Semi-Supervised Learning of Categories and Relations by Andrew Carlson, Justin Betteridge, Estevam R. Hruschka Jr. and Tom M. Mitchell School of Computer Science Carnegie Mellon University presented by Thomas Packer
Bootstrapped Information Extraction Semi-Supervised: – Seed knowledge (predicate instances & patterns) – Pattern learners (uses learned instances) – Instance learners (uses learned patterns) Feedback Loop: – Rel 1 (X, Y) – Sent 1 (X, Y), Rel 0 (X, Y) Pat 1 – Pat 1 : Sent 2 (A, B) Rel 1 (A, B)
Challenges and Previous Solutions Semantic drift: Feedback loop amplifies error and ambiguities. Semi-Supervised learning often suffers from being under-constrained. Multiple mutually-exclusive predicate learning: Positive examples of one predicate are also negative examples of others. Category and predicate learning: Arguments must be of certain types.
Does More Look Harder?
Approach Simultaneous bootstrapped training of multiple categories and multiple relations. Growing related knowledge provides constraints to guide continued learning. Ontology Constraints: – Mutually exclusive predicates imply negative instances and patterns. – Hypernyms imply positive instances. – Relation argument type constraints imply positive category and negative relation instances.
Mutual Exclusion Constraint “city” and “scientist” categories are mutually exclusive. If “Boston” is an instance of “city”, then it is also a negative instance of “scientist”. If “mayor of arg1” is a pattern for “city”, then it is also a negative pattern for “scientist”.
Hypernym Constraints “athlete” is a hyponym of “person”. If “John McEnroe” is a positive instance of athlete, then it is also a positive instance of “person”.
Type Checking Constraints The “ceoOf()” relation must have arguments of type “person” and “company”. If “bicycle” is not a “person” then “ceoOf(bicycle, Microsoft)” is a negative instance of “ceoOf()”. If “ceoOf(Steve Ballmer, Microsoft)” is true, then “Steve Ballmer” is a positive instance of “person”. “Microsoft” handled similarly.
Coupled Bootstrap Learner
Knowledge Constraints Makes Extraction Easier
Conclusion Clearly shows improvements based on constraints. Could probably benefit by – adding probabilistic reasoning – larger corpus – higher thresholds – more contrastive categories – other techniques discussed in this class
Questions