The Winograd Schema Challenge Hector J. Levesque AAAI, 2011

Slides:

Advertisements

Similar presentations

Summer 2011 Tuesday, 8/ No supposition seems to me more natural than that there is no process in the brain correlated with associating or with.

Advertisements

Learning Objectives Explain similarities and differences among algorithms, programs, and heuristic solutions List the five essential properties of an algorithm.

A Brief History of Artificial Intelligence

Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.

J. Chen, O. R. Zaiane and R. Goebel An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities.

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

Common Sense Computing MIT Media Lab Interaction Challenges for Agents with Common Sense Henry Lieberman MIT Media Lab Cambridge, Mass. USA

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Push Singh & Tim Chklovski. AI systems need data – lots of it! Natural language processing: Parsed & sense-tagged corpora, paraphrases, translations Commonsense.

A Language Independent Method for Question Classification COLING 2004.

I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.

Coreference Resolution with Knowledge Haoruo Peng March 20,

Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.

Disambiguation Read J & M Chapter 17.1 – The Problem Washington Loses Appeal on Steel Duties Sue caught the bass with the new rod. Sue played the.

Syntax Lecture 6: Missing Subjects of Non-finite Clauses.

The Unreasonable Effectiveness of Data

1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.

Using Semantic Relations to Improve Information Retrieval

Solving Hard Coreference Problems Haoruo Peng, Daniel Khashabi and Dan Roth Problem Description  Problems with Existing Coref Systems Rely heavily on.

Syntax By WJQ. Syntax ： Syntax is the study of the rules governing the way words are combined to form sentences in a language, or simply, the study of.

Brief Introduction to the Concept of Form vs Function in Design Design 11.

CS440/ECE448: Artificial Intelligence. Section Q course website:

Module 7 English for you and me. Unit 1 Have you ever been to an English corner?

Language Identification and Part-of-Speech Tagging

Preparing for the ACT in one week

CS440/ECE448: Artificial Intelligence Lecture 1: What is AI?

Neural Machine Translation

Using Winograd Schemas for Evaluation of Implicit Information Extraction Systems Ivan Rygaev, Dialogue 2017 Laboratory of Computational Linguistics Institute.

PSYC 206 Lifespan Development Bilge Yagmurlu.

What is a CAT? What is a CAT?.

Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.

Part 2 Applications of ILP Formulations in Natural Language Processing

Topics Beyond the imitation game Reading the web Turing test

Challenges in Creating an Automated Protein Structure Metaserver

Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin

7/20/2018 EMR 17 Logical Reasoning Lecture 7.

Entity- & Topic-Based Information Ordering

Improving a Pipeline Architecture for Shallow Discourse Parsing

Statistical NLP: Lecture 13

Vector-Space (Distributional) Lexical Semantics

Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning Shizhu He, Cao liu, Kang Liu and Jun Zhao.

Lesson plans Introduction.

Writing Analytics Clayton Clemens Vive Kumar.

Statistical NLP: Lecture 9

CS 416 Artificial Intelligence

Making Progress in Multiplication

Back to “Serious” Topics…

Introduction to Information Retrieval

Text Mining & Natural Language Processing

Needs analysis (ESP) Communicative language needs for your job ?

CS246: Information Retrieval

A User study on Conversational Software

What is the Entrance Exams Task

A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 22, Feb, 2010 Department of Computer.

Word embeddings (continued)

ece 720 intelligent web: ontology and beyond

The Importance of Being Important for Question Generation, that is …

COMP3710 Artificial Intelligence Thompson Rivers University

Rachit Saluja 03/20/2019 Relation Extraction with Matrix Factorization and Universal Schemas Sebastian Riedel, Limin Yao, Andrew.

Unsupervised Learning of Narrative Schemas and their Participants

Entity Linking Survey

VERB PHYSICS: Relative Physical Knowledge of Actions and Objects

Artificial Intelligence 2004 Speech & Natural Language Processing

Information Retrieval

Statistical NLP : Lecture 9 Word Sense Disambiguation

CS249: Neural Language Model

Presentation transcript:

The Winograd Schema Challenge Hector J. Levesque AAAI, 2011 Oshin Agarwal 4 Feb `19

Problem - Can machines think? Turing test System needs to assume a person’s identity What is acceptable conversation ? Evasive indirect answers Short minimal replies Subjective evaluation

Problem - Can machines think?

Problem - Can machines think? Textual entailment PROS No evasiveness. Easy evaluation CONS Will have to explain to people what is entailment Can we have something that everyone would understand naturally? Winograd schema

List of Content WS Contributions WS Examples WS Properties WS Challenges Future work/questions

Winograd Schema - Contributions Small binary questions that any English speaking individual should be able to answer Questions involve thinking and cannot be answered even with access to a large corpus Need to disambiguate pronouns based on commonsense PROS: Easy to understand questions for humans No trickery or evasiveness can work Can be evaluated quickly and automatically

Winograd Schema – Example I I poured water from the bottle into the cup until it was full. What was full? I poured water from the bottle into the cup until it was empty. What was empty? Cup Bottle

Winograd Schema – Example II Paul tried to call George on the phone but he wasn’t successful. Who was not successful? Paul tried to call George on the phone but he wasn’t available. Who was not available? Paul George

Winograd Schema - Properties Two similar parties (Male/Female/Object) Pronoun used that could refer to either Question asked that involves disambiguation of the pronoun Sentence has a special word which when replaced by an alternate word changes the answer to the question Special and alternate word need not be opposites

What do we need to solve WS? Lexical knowledge – Language models? Distributional representations? Structural understanding – Syntax? Parsing? Background knowledge – Knowledge bases? John asked George a question, but he was reluctant to [answer/repeat] it. Who was reluctant to [answer/repeat] it? George asked John a question, but he was reluctant to [answer/repeat] it. Who was reluctant to [answer/repeat] it? - Meaning of the words or the sentence - Knowledge that a question needs to be answered - Sentence structure to decide who asked the question and who should answer it

Solving the WS Problems “Resolving Complex Cases of Definite Pronouns” (Rahman & Ng; 2012) Rank the two candidate antecedents for the target pronoun Features based on complex linguistic rules are used http://www.hlt.utdallas.edu/~vince/papers/emnlp12.pdf “Solving hard coreference problems” (Peng & Kashabi & Roth; 2015) WS as coreference resolution or clustering http://cogcomp.org/page/publication_view/762 “The Winograd Schema Challenge and Reasoning about Correlation” (Bailey et al; 2015) Determine correlation between different phrases in the sentences after substituting the correct answer https://www.cs.utexas.edu/~vl/papers/wsc_SS7.pdf

Solving the WS Problems “Combing Context and Commonsense Knowledge Through Neural Networks for Solving Winograd Schema Problems” (Liu et al; 2017) Learn word representations based on current context and knowledge base Predict the correct antecedent for the pronoun either based on semantic similarity or using a neural network https://www.aaai.org/ocs/index.php/SSS/SSS17/paper/viewFile/15392/14552 “Addressing the Winograd Schema Challenge as a Sequence Ranking Task” (Opitz & Frank; 2018) Replace pronoun by antecedent and generate a plausibility score for each sentence using a neural network http://aclweb.org/anthology/W18-4105

Winograd schema in other languages Chinese - https://cs.nyu.edu/davise/papers/WinogradSchemas/WSChinese.html French - http://www.llf.cnrs.fr/winograd-fr Japanese - http://arakilab.media.eng.hokudai.ac.jp/~kabura/collection_ja.html

Challenges with WS Coming up with questions!!!! Answers can be too obvious because of properties of the two parties The women stopped taking the pills because they were [pregnant/carcinogenic]. Which individuals were [pregnant/carcinogenic] ? Women can’t be carcinogenic Pills can’t be pregnant Such information can be learnt through a large corpus of text via co-occurrence statistics easily and are excluded form WS datasets

Challenges with WS Coming up with questions!!!! Phrases need not be Google-proof Many astronomers are engaged in the search for distant galaxies. They are spread all over the [earth/universe]. What are spread all over the [earth/universe] ? ‘’Galaxies are spread all over the universe’’ may be found on the web Direct fact but the system would have to find it and use it ‘’There are many galaxies found throughout the universe” may be found System would have to understand that it means the same as the question Even though they could still be difficult, such questions are excluded in WS datasets

Challenges with WS Coming up with questions!!!! Lack of knowledge in human subjects on whom questions are tested The large ball crashed through the table because it is made up of [steel/Styrofoam]. What is made of [steel/Styrofoam]? Some person might not know what is Styrofoam so such questions are excluded Though a system might understand these if they appeared in some large corpus it used

Summary Turing test might not be sufficient to say machines are intelligent Involves deception Evaluation is subjective Winograd Schema as an alternative test Binary questions involving pronoun disambiguation No deception Easy to evaluate Difficult to come up with questions

Future work/questions What does it mean if a system can solve WS problems? Can we say machines are intelligent now? Is solving WS type problems enough? We saw that different works formulate this problem differently. Is the system doing well only because it was modelled a certain way and does not have the intelligence or knowledge for other tasks? How to compare evaluations of different systems? Classification task might use just accuracy vs coreference would use MUC/B3/CEAF Is evaluating observable behavior enough? Do we need to know how the machine is reasoning internally?