QA and Language Modeling (and Some Challenges) Eduard Hovy Information Sciences Institute University of Southern California.

Slides:



Advertisements
Similar presentations
1 Ontolog OOR Use Case Review Todd Schneider 1 April 2010 (v 1.2)
Advertisements

ThemeInformation Extraction for World Wide Web PaperUnsupervised Learning of Soft Patterns for Generating Definitions from Online News Author Cui, H.,
Question Answering Gideon Mann Johns Hopkins University
Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
TextMap: An Intelligent Question- Answering Assistant Project Members:Ulf Hermjakob Eduard Hovy Chin-Yew Lin Kevin Knight Daniel Marcu Deepak Ravichandran.
Group Members: Satadru Biswas ( ) Tanmay Khirwadkar ( ) Arun Karthikeyan Karra (05d05020) CS Course Seminar Group-2 Question Answering.
Query Processing and Reasoning How Useful are Natural Language Interfaces to the Semantic Web for Casual End-users? Esther Kaufmann and Abraham Bernstein.
A Probabilistic Framework for Information Integration and Retrieval on the Semantic Web by Livia Predoiu, Heiner Stuckenschmidt Institute of Computer Science,
Question-Answering: Overview Ling573 Systems & Applications March 31, 2011.
Intelligent Information Retrieval CS 336 –Lecture 2: Query Language Xiaoyan Li Spring 2006 Modified from Lisa Ballesteros’s slides.
Data-oriented Content Query System: Searching for Data into Text on the Web Mianwei Zhou, Kevin Chen-Chuan Chang Department of Computer Science UIUC 1.
Team Skill 6 - Building The Right System Part 1: Applying Use Cases (Chapters of the requirements text) CSSE 371 Software Requirements and Specification.
Techniques Used in Modern Question-Answering Systems Candidacy Exam Elena Filatova December 11, 2002 Committee Luis GravanoColumbia University Vasileios.
Employing Two Question Answering Systems in TREC 2005 Harabagiu, Moldovan, et al 2005 Language Computer Corporation.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Use of Patterns for Detection of Answer Strings Soubbotin and Soubbotin.
Lecture 5 Geocoding. What is geocoding? the process of transforming a description of a location—such as a pair of coordinates, an address, or a name of.
Oxford English Dictionary (1989) factoid, n. and a. A. n. Something that becomes accepted as a fact, although it is not (or may not be) true; spec. an.
Information Retrieval in Practice
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
CAREERS IN LINGUISTICS OUTSIDE OF ACADEMIA CAREERS IN INDUSTRY.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
CSC 9010 Spring Paula Matuszek A Brief Overview of Watson.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
A Technical Seminar on Question Answering SHRI RAMDEOBABA COLLEGE OF ENGINEERING & MANAGEMENT Presented By: Rohini Kamdi Guided By: Dr. A.J.Agrawal.
Information Need Question Understanding Selecting Sources Information Retrieval and Extraction Answer Determina tion Answer Presentation This work is supported.
Artificial intelligence project
Knowledge and Tree-Edits in Learnable Entailment Proofs Asher Stern, Amnon Lotan, Shachar Mirkin, Eyal Shnarch, Lili Kotlerman, Jonathan Berant and Ido.
TextMap: An Intelligent Question- Answering Assistant Project Members:Abdessamad Echihabi Ulf Hermjakob Eduard Hovy Soo-Min Kim Kevin Knight Daniel Marcu.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
Structured Use of External Knowledge for Event-based Open Domain Question Answering Hui Yang, Tat-Seng Chua, Shuguang Wang, Chun-Keat Koh National University.
Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.
Towards Natural Question-Guided Search Alexander Kotov ChengXiang Zhai University of Illinois at Urbana-Champaign.
QUALIFIER in TREC-12 QA Main Task Hui Yang, Hang Cui, Min-Yen Kan, Mstislav Maslennikov, Long Qiu, Tat-Seng Chua School of Computing National University.
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
CIKM Recognition and Classification of Noun Phrases in Queries for Effective Retrieval Wei Zhang 1 Shuang Liu 2 Clement Yu 1
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Oxygen Indexing Relations from Natural Language Jimmy Lin, Boris Katz, Sue Felshin Oxygen Workshop, January, 2002.
Querying Web Data – The WebQA Approach Author: Sunny K.S.Lam and M.Tamer Özsu CSI5311 Presentation Dongmei Jiang and Zhiping Duan.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Automatic Question Answering  Introduction  Factoid Based Question Answering.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
AQUAINT IBM PIQUANT ARDACYCORP Subcontractor: IBM Question Answering Update piQuAnt ARDA/AQUAINT December 2002 Workshop This work was supported in part.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
Towards Entailment Based Question Answering: ITC-irst at Clef 2006 Milen Kouylekov, Matteo Negri, Bernardo Magnini & Bonaventura Coppola ITC-irst, Centro.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
© University of Manchester Creative Commons Attribution-NonCommercial 3.0 unported 3.0 license Quality Assurance, Ontology Engineering, and Semantic Interoperability.
Semantic Wiki: Automating the Read, Write, and Reporting functions Chuck Rehberg, Semantic Insights.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
DAY 20: ACCESS CHAPTERS 5, 6, 7 Larry Reaves October 28,
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Linguistic knowledge for Speech recognition
Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning Shizhu He, Cao liu, Kang Liu and Jun Zhao.
Web IR: Recent Trends; Future of Web Search
Thanks to Bill Arms, Marti Hearst
Traditional Question Answering System: an Overview
Introduction to Information Retrieval
Retrieval Utilities Relevance feedback Clustering
Information Retrieval and Web Design
Topic: Semantic Text Mining
Presentation transcript:

QA and Language Modeling (and Some Challenges) Eduard Hovy Information Sciences Institute University of Southern California

Standard QA architecture (factoids) Identify keywords from Q Build (Boolean) query Retrieve texts using IR Rank texts/passages Move window over text and score each position Rank candidate answers Return top N candidates A list Input Q Corpus: 35% + Web: + 10% (Microsoft 01) (Waterloo 01) Replace this by more-targeted matching

Textmap: Knowledge used for pinpointing Orthography (rules) –ZIP codes, URLs, etc. Default numerical info (rules) –how many people live in a city? Abbreviations / acronyms (rules) External sources (WordNet etc.) –definitions, instances, etc. Syntactic constituents (parse tree) –delimit answer extent exactly Syntactic and semantic types & relations (parse tree) –pinpoint correct syntactic relation –pinpoint correct semantic type –QA typology (140 types) (PRED) [2] Jack Ruby (DUMMY) [6], (MOD) [7] who killed John F. Kennedy assassin Lee Harvey Oswald (SUBJ) [8] who (PRED) [10] killed (OBJ) [11] John F. Kennedy assassin Lee Harvey Oswald (MOD) [13] John F. Kennedy (MOD) [19] assassin (PRED) [20] Lee Harvey Oswald [1] Lee Harvey Oswald allegedly shot and killed Pres. John Kennedy... [2] Jack Ruby, who killed John F. Kennedy assassin Lee Harvey Oswald Surface answer patterns (patterns)

Language modeling? IR stage: as for IR Pinpointing stage: learn to generate Qs from As…? –for factoids: very brief Qs, very brief As…hard –for longer As (biographies, event descriptions, opinion descriptions…): better outlook BIRTHDATE 1.0 ( - ) 0.85 was born on, 0.6 was born in 0.59.was born 0.53 was born 0.50 – ( 0.36 ( - LOCATION 1.0 ' s 1.0 regional : : 1.0 at the in 0.96 the in, 0.92 near in ‘Structured’ language model: word sequence patterns –Learn patterns for each Qtype; apply to pinpoint answer (Soubbotin & Soubbotin 01) –Automated learning from web (Ravichandran & Hovy 02) –Eventually create FSMs with semantic and syntactic types This is the LM for the semantics of birthdates!

Moving beyond factoids Structured non-factoid answers: biographies, event stories, opinion ‘arguments’, etc. –Multi-doc summarization Answer ‘qualifiers’: tense, hypotheticals, negation… “who is the president?” – when? –Linguistics work Non-structured long answers –Text planning? Inference –AI? / KR? easier harder

Challenges for QA Remembering what you learned today; adding that to some (structured) knowledge repository Complex answers (and extend QAtypology) Answer validity / trustworthiness Merging answer (pieces) from multiple media sources (speech, databases, etc.) Learning the LM / structure for any type of non- factoid answer—moving to more complex models: –bag-of-words –ngram distributions –patterns –schemas/templates (decomposition&recomposition) –?user’s known-fact list

Thank you