Web Crawler Agent (WCA) Presented by Kirk Martinez University of Southampton
Introduction WCA searches for missing information (fragments) on the Web WCA structures information into ontology “place_of_birth” (Person,Place) Techniques used: NLP (Natural Language Processing), Information extraction, relation extraction, question answering
Overview
Is it something like “Google”? Search “date_of_birth” (when Rembrandt was born) with Google
Searching information with Google The “old” Web Search (eg Google) is good for getting documents but NOT for extracting concise answers –(e.g. “15-July-1606”) No analysis to “understand” the documents (e.g. “Rembrandt” can mean “hotel” or “bookstore”)
Information extraction on the Web data may be low quality and repeated –e.g. Seurat Georges’s date of death –29, March 1891( –19, March 1891 ( WCA depends on: –Well-structured sentences and documents –Good named-entity recognisers
Future work verification performance autonomous