Illustrations of different approaches Peter Clark and John Thompson

Slides:



Advertisements
Similar presentations
AeroDAML Applying Information Extraction to Generate DAML Annotations Dr. Paul Kogut Lockheed Martin Management & Data Systems.
Advertisements

A centralized approach to language resources Piek Vossen S&T Forum on Multilingualism, Luxembourg, June 6th 2005.
Module 1: Defining the Topic
Plagiarism: Recognizing and Avoiding It! Barb Falkinburg OTFC Fall 2006.
Extracting Knowledge-Bases from Machine- Readable Dictionaries: Have We Wasted Our Time? Nancy Ide and Jean Veronis Proc KB&KB’93 Workshop, 1993, pp
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
CPSC 322, Lecture 19Slide 1 Propositional Logic Intro, Syntax Computer Science cpsc322, Lecture 19 (Textbook Chpt ) February, 23, 2009.
Ontologies and the Semantic Web by Ian Horrocks presented by Thomas Packer 1.
The Unreasonable Effectiveness of Data Alon Halevy, Peter Norvig, and Fernando Pereira Kristine Monteith May 1, 2009 CS 652.
1 CBioC: Collaborative Bio- Curation Chitta Baral Department of Computer Science and Engineering Arizona State University.
Developing Semantic Web Sites: Results and Lessons Learnt Enrico Motta, Yuangui Lei, Martin Dzbor, Vanessa Lopez, John Domingue, Jianhan Zhu, Liliana Cabral,
Requirements Analysis 5. 1 CASE b505.ppt © Copyright De Montfort University 2000 All Rights Reserved INFO2005 Requirements Analysis CASE Computer.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
Populating the Semantic Web by Macro-Reading Internet Text T.M Mitchell, J. Betteridge, A. Carlson, E. Hruschka, R. Wang Presented by: Will Darby.
B IOMEDICAL T EXT M INING AND ITS A PPLICATION IN C ANCER R ESEARCH Henry Ikediego
Artificial Intelligence What’s Possible, What’s Not, How Do We Move Forward? Adam Cheyer Co-Founder, VP Engineering Siri Inc.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Steps Toward an AGI Roadmap Włodek Duch ( Google: W. Duch) AGI, Memphis, 1-2 March 2007 Roadmaps: A Ten Year Roadmap to Machines with Common Sense (Push.
Author: William Tunstall-Pedoe Presenter: Bahareh Sarrafzadeh CS 886 Spring 2015.
Mrs. Maninder Kaur 1 Mrs. Maninder Kaur.
Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003.
Push Singh & Tim Chklovski. AI systems need data – lots of it! Natural language processing: Parsed & sense-tagged corpora, paraphrases, translations Commonsense.
Artificial Intelligence
Knowledge Entry as the Graphical Assembly of Components Peter Clark, John Thompson (Boeing) Ken Barker, Bruce Porter (Univ Texas at Austin) Vinay Chaudhri,
The Reading to Learn Project Peter Clark Phil Harrison Tom Jenkins John Thompson Rick Wojcik (Boeing Phantom Works) David Israel (SRI)
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
Strategies for Realizing the Semantic Web Ian Horrocks.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Knowledge Entry as the Graphical Assembly of Components Peter Clark, John Thompson (Boeing) Ken Barker, Bruce Porter (Univ Texas at Austin) Vinay Chaudhri,
Approach to building ontologies A high-level view Chris Wroe.
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
Of An Expert System.  Introduction  What is AI?  Intelligent in Human & Machine? What is Expert System? How are Expert System used? Elements of ES.
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
Making Software Agents Smarter Tim Finin University of Maryland, Baltimore County ICAART 2010, 22 January 2010
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
COMPUTER SYSTEM FUNDAMENTAL Genetic Computer School INTRODUCTION TO ARTIFICIAL INTELLIGENCE LESSON 11.
Knowledge Management Challenges for Question Answering Vinay K. Chaudhri SRI International White Paper Co-authors: Ken Barker (UT), Tom Garvey (SRI), Ken.
Linux Standard Base Основной современный стандарт Linux, стандарт ISO/IEC с 2005 года Определяет состав и поведение основных системных библиотек.
Queensland University of Technology
WHIT 3.0 December 11, 2007 Christopher Pierce and Chimezie Ogbuji
Representing Campus Research Data in a Comprehensive Tool
Computer Science cpsc322, Lecture 20
Thai AGROVOC Ontology Base for Agricultural Information Retrieval
Managing study and approaches to learning
RECENT TRENDS IN METADATA GENERATION
Information Retrieval and Web Search
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Architecture Components
LACONEC A Large-scale Multilingual Semantics-based Dictionary
Information Retrieval and Web Search
Survey of Knowledge Base Content
Discovery of Inference Rules for Question Answering
Semantic Web: Commercial Opportunities and Prospects
Data Warehousing and Data Mining
Presented by: Prof. Ali Jaoua
Data Science with Python
Data Mining Chapter 6 Search Engines
Automatic Detection of Causal Relations for Question Answering
CSE 635 Multimedia Information Retrieval
Chapter 11 user support.
Computer Science cpsc322, Lecture 20
Integrating Skills Where is the future going?.
Context-Aware Internet
Information Retrieval
Reading log day one: 1. Write the date and your starting page number. 2. Copy the log (in red) below. 3. READ. (30 minutes) 4. When I say “stop,”
University of Manchester
Presentation transcript:

The Knowledge Acquisition Bottleneck Revisited: How can we build large KBs? Illustrations of different approaches Peter Clark and John Thompson Boeing Research 2004

Premise Intelligent machines needs lots of knowledge, for question-answering intelligent search information integration natural language understanding decision support modeling etc. etc. Much of this knowledge can be drawn from some general repository of reusable knowledge e.g., WordNet How does one build such a repository? “No-one considers hand-building a large KB to be a realistic proposition these days” [paraphrase of Daphne Koller, 2004]

1. Build it by Hand “Let’s roll up our sleeves and get on with it!” But: It’s a daunting task Our own work Cyc + Lots in it, (Relatively) well designed ontology - 650 person-years effort so far - Still patchy coverage (why?) Difficult to use outside Cycorp

1. Build it by Hand (cont) WordNet + Easy to use + Comprehensive Little inference-supporting knowledge in Ad hoc ontology

1. Build it by Hand (cont) The Component Library Claim: can bound the required knowledge by working at a coarse-grained level + Large, more doable Hard to use, still very incomplete

2. Extract from Dictionaries - MindNet + Automatically built Unusable? Extended WordNet + Won TREC competition - Still somewhat incoherent Lot of manual labor

3. Corpus-based Text/Web Mining - Schubert’s system + Automatic + Lots of knowledge Noisy No word senses Only grabs certain kinds of knowledge 30M entries…

3. Corpus-based Text/Web Mining (cont) - KnowIt (Etsioni) + automatic only factoids

4. Community-Based Acquisition Knowledge entry by the masses OpenMind + Large Full of junk, unusable (?) Would this work with better acquisition tools? (see next slide for illustration)

5. Use Existing Resources e.g., databases CIA World Fact Book Web data/services e.g., SRI/ISI’s ARDA QA system + Syntactically simple + Available Largely limited to factoids Information integration is a major challenge different ontologies, contradictory data

Where to? Can we bound the knowledge needed for a particular application for a useful, sharable, general resource? Which of these approaches seems most realistic? build by hand extract from dictionaries mine text corpora community knowledge entry use existing resources