QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

AeroDAML Applying Information Extraction to Generate DAML Annotations Dr. Paul Kogut Lockheed Martin Management & Data Systems.
The 20th International Conference on Software Engineering and Knowledge Engineering (SEKE2008) Department of Electrical and Computer Engineering
Copyright © 2002 Cycorp Introduction Fundamental Expression Types Top Level Collections Time and Dates Spatial Properties and Relations Event Types Information.
Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Page 1 Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integrating Multiple Data Sources using a Standardized XML.
Graph Data Management Lab, School of Computer Science Put conference information here.
Analyzing Minerva1 AUTORI: Antonello Ercoli Alessandro Pezzullo CORSO: Seminari di Ingegneria del SW DOCENTE: Prof. Giuseppe De Giacomo.
6/2/ CycL Semantic Annotation Dave Schneider April 2005.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
9/6/2001Database Management – Fall 2000 – R. Larson Information Systems Planning and the Database Design Process University of California, Berkeley School.
A Robust System Architecture For Mining Semi-structured Data By Aby M Mathew CSE
QUIRK:Project Progress Report Monterey, June Cycorp IBM.
Basi di dati distribuite Prof. M.T. PAZIENZA a.a
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
Detecting Economic Events Using a Semantics-Based Pipeline 22nd International Conference on Database and Expert Systems Applications (DEXA 2011) September.
The LINDI Project Linking Information for New Discoveries UIs for building and reusing hypothesis seeking strategies. Statistical language analysis techniques.
Module 2b: Modeling Information Objects and Relationships IMT530: Organization of Information Resources Winter, 2007 Michael Crandall.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Knowledge Mediation in the WWW based on Labelled DAGs with Attached Constraints Jutta Eusterbrock WebTechnology GmbH.
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
TWIRL Twinning virtual World (on- line) Information with Real world (off-Line) data sources Kick-Off Meeting Cassidian 08 & 09 October 2012, Paris - France.
Tables to Linked Data Zareen Syed, Tim Finin, Varish Mulwad and Anupam Joshi University of Maryland, Baltimore County
Author: William Tunstall-Pedoe Presenter: Bahareh Sarrafzadeh CS 886 Spring 2015.
Artificial intelligence project
Interpreting Dictionary Definitions Dan Tecuci May 2002.
A service-oriented middleware for building context-aware services Center for E-Business Technology Seoul National University Seoul, Korea Tao Gu, Hung.
Query Processing In Multimedia Databases Dheeraj Kumar Mekala Devarasetty Bhanu Kiran.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
Carnegie Mellon School of Computer Science Copyright © 2001, Carnegie Mellon. All Rights Reserved. JAVELIN Project Briefing 1 AQUAINT Phase I Kickoff December.
EU Project proposal. Andrei S. Lopatenko 1 EU Project Proposal CERIF-SW Andrei S. Lopatenko Vienna University of Technology
Web Usage Mining for Semantic Web Personalization جینی شیره شعاعی زهرا.
A Language Independent Method for Question Classification COLING 2004.
Aude Dufresne and Mohamed Rouatbi University of Montreal LICEF – CIRTA – MATI CANADA Learning Object Repositories Network (CRSNG) Ontologies, Applications.
Srihari Sadagoparamanujam. Agenda IntroductionCharacteristicsCYC.
Information Systems Engineering. Lecture Outline Information Systems Architecture Information System Architecture components Information Engineering Phases.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Service Service metadata what Service is who responsible for service constraints service creation service maintenance service deployment rules rules processing.
© Geodise Project, University of Southampton, Knowledge Management in Geodise Geodise Knowledge Management Team Barry Tao, Colin Puleston, Liming.
QUIRK:Project Progress Report December Cycorp IBM.
Trustworthy Semantic Webs Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #4 Vision for Semantic Web.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
AQUAINT IBM PIQUANT ARDACYCORP Subcontractor: IBM Question Answering Update piQuAnt ARDA/AQUAINT December 2002 Workshop This work was supported in part.
Personalized Recommendation of Related Content Based on Automatic Metadata Extraction Andreas Nauerz 1, Fedor Bakalov 2, Birgitta.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 1 Mining knowledge from natural language texts using fuzzy associated concept mapping Presenter : Wu,
MIT Artificial Intelligence Laboratory — Research Directions The START Information Access System Boris Katz
Clinical research data interoperbility Shared names meeting, Boston, Bosse Andersson (AstraZeneca R&D Lund) Kerstin Forsberg (AstraZeneca R&D.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
AUTONOMOUS REQUIREMENTS SPECIFICATION PROCESSING USING NATURAL LANGUAGE PROCESSING - Vivek Punjabi.
Chapter 7 K NOWLEDGE R EPRESENTATION, O NTOLOGICAL E NGINEERING, AND T OPIC M APS L EO O BRST AND H OWARD L IU.
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
Enable Semantic Interoperability for Decision Support and Risk Management Presented by Dr. David Li Key Contributors: Dr. Ruixin Yang and Dr. John Qu.
Semantic Wiki: Automating the Read, Write, and Reporting functions Chuck Rehberg, Semantic Insights.
Versatile Information Systems, Inc International Semantic Web Conference An Application of Semantic Web Technologies to Situation.
1 Integrating Databases into the Semantic Web through an Ontology-based Framework Dejing Dou, Paea LePendu, Shiwoong Kim Computer and Information Science,
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
©2003 Paula Matuszek CSC 9010: AeroText, Ontologies, AeroDAML Dr. Paula Matuszek (610)
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
Web IR: Recent Trends; Future of Web Search
Presentation transcript:

QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

Project Goals Break answer-by-retrieval bottleneck Deep (semantic) understanding of queries and answers Integration of heterogeneous sources Formalized knowledge to integrate state-of-the-art IR components with state-of-the-art knowledge bases

Answer by retrieval Q: Who was the first president of Zambia? ……………………………………… … Kenneth Kaunda, the first president, kept Zambia within the Commonwealth of Nations… …………………………..

Answer by reasoning Q: Who sponsored Kai’s attack against Pamina? …On February 13, Kai detonated the truck in front of Pamina’s HQ… …On January 25, Kai bought a truckload of fertilizer drawing against account 9999 at MegaBank… … On January 15, Vitas Bayo deposited $50,000 on account 9999 at MegaBank…

QUIRK strategy Use Formalized knowledge for: –Semantic understanding of queries; –Justification of answers; Use Formalized knowledge as: –Format for data normalization –‘Glue’ for data integration of: information extracted from unstructured data SQL queries against structured DBs Cyc’s knowledge

Blackboard Query Manager Answer Manager Inference Agent IR Agent Cyc KB GuruQA (IBM) DB1 DB2 DB-N Preemptive annotations Unstructured Documents

Q-Eng A-Eng Q-CycL A-CycL Q-Guru A-Guru Query InterpreterGuruQA Assistant GuruQA (IBM) Cyc English GeneratorCyc Inference EngineAnswer Manager Query Refiner Blackboard

Blackboard architecture Add/remove agents without disrupting existing architecture Test performance/speed with several combinations of agents Operate asynchronously.

Query Interpreter Q: “Who opposes the WTO?” (and (isa ?WHO Person) (thereExists ?EVENT (and (isa ?EVENT ActOfDissent) (performedBy ?EVENT ?WHO) (maleficiary ?EVENT WorldTradeOrganization))))

GuruQA Assistant CycL query => PERSON$ oppose(s/d)the WTO denounce(s/d) the World Trade Organization attacke(s/d) …

Cyc Inference Engine CycL Query => [(PersonNamedFn “Kai”) JUSTIFICATION-1] [(PersonNamedFn “Dr. Chen”) JUSTIFICATION-2] … [(PersonNamedFn “Kai”)JUSTIFICATION-N] …

Cyc Justifications A? Afrom [B and C] (source 6743) Bfrom source Cfrom source 78539

Sources for Cyc Inference 1.4M+ CycL assertions already in Cyc’s Knowledge Base Virtual Assertions in DataBases Unsupervised Textract / CycL annotation of unstructured documents

Data Source Integration Data Normalization Data Fusion

Data Normalization Interpretation Search cat chat Katze gato gatto “felis felis” cat OR chat OR Katze OR gato OR gatto OR “felis felis”

Data Normalization …Zhang Mei Li, was born on January 1, 1927… NameDOB Zhang Mei Li …… (birthDate (PersonNamedFn “Zhang Mei Li”) (DayFn 01 (MonthFn January (YearFn 1927))))

Data Normalization language independent representation of - entities - concepts - relationships CycL contains 100K+ primitives, can compositionally define infinitely many non-atomic terms.

Data Fusion Dr. Chen lives in Fresno Zhang Mei Li lives in Oakland Kai lives in Los Angeles California is in the Pacific Time Zone Dr. Chen/Zhang Mei Li/Kai and Dr. Chen/Zhang Mei Li/Kai live in the same time zone

Heterogeneous Sources Q: How old is Dr. Chen’s mother? …Zhang Mei Li, mother of Pamina’s Dr. Chen… NameDOB Zhang Mei Li ……

Data Fusion Requires language independent connections/inferential links among - Entities - Concepts - Propositions (Facts, Rules) Cyc’s Ontology Cyc’s Knowledge Base

Consensus Reality Formalized Knowledge about `Consensus Reality’ = inferentially enabled `glue’ for Data Fusion E.g. “Was Kai implicated in the Munich 1972 attack (when he was a toddler of 2)?”

DBs as `virtual assertions’ stores (birthDate (PersonNamedFn “Zhang Mei Li) ?WHEN) SELECT: DOB FROM: PERSONAL_DATA WHERE: NAME = “Zhang Mei Li”

Unsupervised Textract / CycL Annotations IBM Textract relations: [Cycorp, Inc. : located-in : Austin, TX] mapped to CycL Assertions: (objectFoundInLocation Cycorp CityOfAustinTX)

Augmenting Textract Annotations Concept Annotation “Boston”  { CityOfBostonMA, BostonTheBand, … } Word Sense Disambiguation “I went to Boston”  CityOfBostonMA Analysis of nominal compounds “leather jacket”  (SubcollectionOfWithRelationToTypeFn Jacket mainConstituent Leather)

Unsupervised CycL Annotations IBM’s Nominator and Parsers to extract Named Entities and basic syntactic dependencies (SUBJ- VERB, VERB-OBJ) Map dependencies to CycL event structures.

Cyc-to-English generator (PersonNamedFn “Dr. Chen”) JUSTIFICATION-N “Dr. Chen opposes the WTO, because people who demonstrate against organizations oppose them (Cyc KB, assertion 99999) and Dr. Chen demonstrated against the WTO in Seattle (document 12345).

Year 1 Tasks Get entire system to run robustly with integration of all the IBM and Cycorp components described Improve question understanding and refinement Broaden coverage of English to CycL mapping enabling annotation of large collection of documents

Year 2 Tasks Add new agents to the blackboard to represent the user and session context Improve integration of answers obtained from GuruQA and Cyc Improve integrated IBM and Cycorp modules for unstructured document annotation