Searching for Common Sense: Populating Cyc from the Web Presented by Yu-Chung Shen 2007/05/03.

Slides:



Advertisements
Similar presentations
Sections 2 and 3 Chapter 1. Review of the Scientific Method The scientific method is not a list of rules that must be followed but a general guideline.
Advertisements

Data quality and checking Presentation template for adaptation and use in medicine prices and availability survey training workshop for survey personnel.
Data. What is an Input Device? List three (3) examples of Input Devices. Define the term Data. In what phase of the DPLC would you find data? 1/18/2012Ms.
Multi-Contextual Knowledge Base and Inference Engine
BUILDING A LEARNING ORGANIZATION David A. Garvin.
 New CIO Training Phase III: Verification Errors and Uploading Data to Level 1.
Artificial Intelligence CS482, CS682, MW 1 – 2:15, SEM 201, MS 227 Prerequisites: 302, 365 Instructor: Sushil Louis,
1 Asking What No One Has Asked Before : Using Phrase Similarities To Generate Synthetic Web Search Queries CIKM’11 Advisor : Jia Ling, Koh Speaker : SHENG.
Knowledge Engineering.  Process of acquiring knowledge from experts and building knowledge base  Narrow perspective  Knowledge acquisition, representation,
Automating Governmental Budgetary Accounting in Oracle Federal Financials A Case Study.
1 Unsupervised Semantic Parsing Hoifung Poon and Pedro Domingos EMNLP 2009 Best Paper Award Speaker: Hao Xiong.
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
A Probabilistic Framework for Information Integration and Retrieval on the Semantic Web by Livia Predoiu, Heiner Stuckenschmidt Institute of Computer Science,
Chapter 6: Design of Expert Systems
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
Match Types ● Google  Exact ● Nothing but the keyword  Phrase ● Keyword appears exactly in search  Broad ● Keywords appear in any order ● Relevant variations.
A new way to monitor System Automation configuration file LUDIWAC Helcia Conseil.
Query Response to get Document Using FHIR Messaging 1 Presented by Prashant Trivedi (HSCIC)
Information Extraction from Documents for Automating Softwre Testing by Patricia Lutsky Presented by Ramiro Lopez.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
CYC: The Common Sense Knowledge Base By: Jeremy Berman Alok Sahgal Dr. Reed CSC 550.
Test Automation: An Architected Approach Dan Young March 17th, 2005
Ontology Mapping with Cyc doug foxvog 14 July 2004
MIS2502: Data Analytics Extract, Transform, Load
I. Pribela, M. Ivanović Neum, Content Automated assessment Testovid system Test generator Module generators Conclusion.
1. Topics to be discussed Introduction Objectives Testing Life Cycle Verification Vs Validation Testing Methodology Testing Levels 2.
Effectively Validate Query/Report: Strategy and Tool Steven Luo Sr. System Analyst Barnes & Noble Session id:
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
Author: William Tunstall-Pedoe Presenter: Bahareh Sarrafzadeh CS 886 Spring 2015.
Artificial intelligence project
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
Program Development Life Cycle (PDLC)
 Architecture and Description Of Module Architecture and Description Of Module  KNOWLEDGE BASE KNOWLEDGE BASE  PRODUCTION RULES PRODUCTION RULES 
A Language Independent Method for Question Classification COLING 2004.
©2003 Paula Matuszek CSC 9010: Text Mining Applications Document Summarization Dr. Paula Matuszek (610)
Project Schedule PLAN IT!. Project Schedule Project Schedule is based on Work Breakdown Structure (WBS) Define the WBS correctly or the Project Schedule.
Building a Learning Organization
Alexey Kolosoff, Michael Bogatyrev 1 Tula State University Faculty of Cybernetics Laboratory of Information Systems.
Database Security Outline.. Introduction Security requirement Reliability and Integrity Sensitive data Inference Multilevel databases Multilevel security.
A Novel Pattern Learning Method for Open Domain Question Answering IJCNLP 2004 Yongping Du, Xuanjing Huang, Xin Li, Lide Wu.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
3.2 Semantics. 2 Semantics Attribute Grammars The Meanings of Programs: Semantics Sebesta Chapter 3.
ISBN Chapter 3 Describing Semantics.
D Goforth COSC Translating High Level Languages Note error in assignment 1: #4 - refer to Example grammar 3.4, p. 126.
Data Collection. Data Capture This is the first stage involved in getting data into a computer Various input devices are used when getting data to the.
Software Quality Assurance and Testing Fazal Rehman Shamil.
1 Acceleration of Inductive Inference of Causal Diagrams Olexandr S. Balabanov Institute of Software Systems of NAS of Ukraine
Of An Expert System.  Introduction  What is AI?  Intelligent in Human & Machine? What is Expert System? How are Expert System used? Elements of ES.
University of Florida’s dchecker: Software for ensuring semantic data integrity Nicholas Rejack, MS 1, Christopher P. Barnes 1, Michael Conlon, PhD 2
Testing and Evaluating Software Solutions Introduction.
Aeros ERP-Did you Know ? What’s New & Quick Tips October 16, 2014 Sharon Clark and Dave Coughlin.
SCIENTIFIC WRITINGS Choosing a topic and finding sources.
Scalable and E ffi cient Reasoning for Enforcing Role-Based Access Control Tyrone Cadenhead Advisors: Murat Kantarcioglu, and.
GOOGLE TAG MANAGER. INTRODUCTION Google Tag Manager (GTM) is a free solution, introduced in October Google Tag Manager (GTM) is a free solution,
Unit 1 Lesson 2 Scientific Investigations Copyright © Houghton Mifflin Harcourt Publishing Company.
Introduction Most samples in Household Travel Surveys (HTS) complete via web Geocoding is an important element in HTS collection Online geocoding services.
System Development Life Cycle (SDLC)
Chapter 6: Design of Expert Systems
Reading Report: Open QA Systems
An Interactive Dialogue System for Knowledge Acquisition in Cyc
Your Title Here Your Title Here
Internet Commerce Cisco Systems
Modeling Population Growth: Having Kittens
Determining Compliance
Systems Analysis and Design
Scalable and Efficient Reasoning for Enforcing Role-Based Access Control
Scalable and Efficient Reasoning for Enforcing Role-Based Access Control
Prediction Patterns and Summary Holdings
Presentation transcript:

Searching for Common Sense: Populating Cyc from the Web Presented by Yu-Chung Shen 2007/05/03

Introduction In the last twenty years, over 3 million facts and rules have been entered manually in the Cyc knowledge base by ontologists. Shouldn’t there be a better way ? –Automating the process of gathering and verifying facts from the World Wide Web.

Knowledge acquisition from WWW Gather information from the web preceeds in six stages –Choosing queries –Searching ( Google ) –Parsing results –KB consistency checking –Google verification –Reviewing and asserting

Learning Cycle

Choosing Queries and Generating Search Strings Example : Limited to a set of 134 binary predicates. Generating search strings using templates.

Parsing search results into CycL sentences Example :

Checking Cyc KB Consistency Discard facts that are redundant or contradictory via inference. Example : Fact : (foundingAgent PalestineIslamicJihad AugusteRodin)  Cyc know AugusteRodin died in  Cyc know PIJ was founded in  The fact is contradictory. It will be discarded.

Google Verification Guard against parser error. Example : New Fact : (foundingAgent PalestineIslamicJihad xasdawqeqw) Search string :PIJ founder xasdawqeqw

Review and Assertion Learned sentences are reviewed by a human curator. If correct, assert learned sentences into Cyc knowledge base.

Experimental Results The majority of the searches expanded, about 80% were peformed in the verification phase. The results were as follows : (GAFs : Ground atomic formulas. Atomic sentences in Cyc KB. )

Experimental Results A human reviewer then went through the verified GAFs, and a sample of 53 of the unverified GAFs, and determined their actual correctness.

Conclusions The work being done here is immediately useful as a tool that makes human knowledge entry faster, easier, and more effective. Hope to provide Cyc with a mechanism to truly acquire knowledge by learning. Q&A ?