Searching for Common Sense: Populating Cyc from the Web Presented by Yu-Chung Shen 2007/05/03
Introduction In the last twenty years, over 3 million facts and rules have been entered manually in the Cyc knowledge base by ontologists. Shouldn’t there be a better way ? –Automating the process of gathering and verifying facts from the World Wide Web.
Knowledge acquisition from WWW Gather information from the web preceeds in six stages –Choosing queries –Searching ( Google ) –Parsing results –KB consistency checking –Google verification –Reviewing and asserting
Learning Cycle
Choosing Queries and Generating Search Strings Example : Limited to a set of 134 binary predicates. Generating search strings using templates.
Parsing search results into CycL sentences Example :
Checking Cyc KB Consistency Discard facts that are redundant or contradictory via inference. Example : Fact : (foundingAgent PalestineIslamicJihad AugusteRodin) Cyc know AugusteRodin died in Cyc know PIJ was founded in The fact is contradictory. It will be discarded.
Google Verification Guard against parser error. Example : New Fact : (foundingAgent PalestineIslamicJihad xasdawqeqw) Search string :PIJ founder xasdawqeqw
Review and Assertion Learned sentences are reviewed by a human curator. If correct, assert learned sentences into Cyc knowledge base.
Experimental Results The majority of the searches expanded, about 80% were peformed in the verification phase. The results were as follows : (GAFs : Ground atomic formulas. Atomic sentences in Cyc KB. )
Experimental Results A human reviewer then went through the verified GAFs, and a sample of 53 of the unverified GAFs, and determined their actual correctness.
Conclusions The work being done here is immediately useful as a tool that makes human knowledge entry faster, easier, and more effective. Hope to provide Cyc with a mechanism to truly acquire knowledge by learning. Q&A ?