Problem: Extracting attribute set for classes (Eg: Price, Creator, Genre for class ‘Video Games’) Why? Attributes are used to extract templates which in turn are used to extract large set of facts from World Wide Web Suggest attributes/topics for humans in Web publishing Useful as a tool for building Vertical Search (Topic Specific Search) Organizing and Searching the World Wide Web of Facts: Harnessing the wisdom of crowds
Solution Extract attributes from Google search queries - Captures common interests of people Extract templates using seed attributes and instances of class from queries. Do the same for candidate attributes. Calculate similarity between the two and rank candidate phrases
Criticism How to get all the classes and their example instances when performing the operation on a web scale? Too many heuristic steps in the whole process. Precision of final extracted facts is suspect (Experiments required)
How is it related to what we learned in the class? Template based extraction - “The capital of India is New Delhi” Goal is to extract the attribute of ‘country’ – ‘capital’ in this case. This in turn will be used to extract templates like “The capital of – is – “ Validation using HITS?