A Naturalistic, Functional Approach to NLU November 2008 Jerry Ball Air Force Research Laboratory.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

Toward a Large-Scale Model of Language Comprehension in ACT-R 6 July 2007 Jerry Ball 1, Andrea Heiberg 2, and Ronnie Silber 3 Air Force Research Laboratory.
Sentence Processing III Language Use and Understanding Class 12.
Context Accommodation in Human Language Processing June 2010 Jerry T. Ball Senior Research Psychologist 711 th HPW / RHAC Air Force Research Laboratory.
Projecting Grammatical Features in Nominals: 23 March 2010 Jerry T. Ball Senior Research Psychologist 711 th HPW / RHAC Air Force Research Laboratory DISTRIBUTION.
Language and Cognition Colombo, June 2011 Day 2 Introduction to Linguistic Theory, Part 4.
“Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.
Synthetic Teammate Project March 2009
Statistical NLP: Lecture 3
Cognitive Modeling in the Large July 2008 Jerry Ball Human Effectiveness Directorate 711 th Human Performance Wing Air Force Research Laboratory.
Introduction and Jurafsky Model Resource: A Probabilistic Model of Lexical and Syntactic Access and Disambiguation, Jurafsky 1996.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Modeling Long-Distance Dependencies in Double R July 2008 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 4.
Models of Human Performance Dr. Chris Baber. 2 Objectives Introduce theory-based models for predicting human performance Introduce competence-based models.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
C SC 620 Advanced Topics in Natural Language Processing 3/9 Lecture 14.
Intro to Psycholinguistics What its experiments are teaching us about language processing and production.
Transformational Grammar p.33 - p.43 Jack October 30 th, 2012.
Lecture 1 Introduction: Linguistic Theory and Theories
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
The time-course of prediction in incremental sentence processing: Evidence from anticipatory eye movements Yuki Kamide, Gerry T.M. Altman, and Sarah L.
An Automatic Segmentation Method Combined with Length Descending and String Frequency Statistics for Chinese Shaohua Jiang, Yanzhong Dang Institute of.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
PSY 369: Psycholinguistics Language Production & Comprehension: Conversation & Dialog.
Invitation to Computer Science 5th Edition
9/8/20151 Natural Language Processing Lecture Notes 1.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Jelena Mirković and Maryellen C. MacDonald Language and Cognitive Neuroscience Lab, University of Wisconsin-Madison Introduction How to Study Subject-Verb.
Some Advances in Transformation-Based Part of Speech Tagging
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
 Knowledge Acquisition  Machine Learning. The transfer and transformation of potential problem solving expertise from some knowledge source to a program.
Mathematical Modeling and Formal Specification Languages CIS 376 Bruce R. Maxim UM-Dearborn.
IV. SYNTAX. 1.1 What is syntax? Syntax is the study of how sentences are structured, or in other words, it tries to state what words can be combined with.
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
Compiler course 1. Introduction. Outline Scope of the course Disciplines involved in it Abstract view for a compiler Front-end and back-end tasks Modules.
Construction Driven Language Processing May 2007 Jerry T. Ball Senior Research Psychologist Air Force Research Laboratory Mesa, AZ.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
PSY 369: Psycholinguistics Conversation & Dialog: Language Production and Comprehension in conjoined action.
1 Prof.Roseline WEEK-4 LECTURE -4 SYNTAX. 2 Prof.Roseline Syntax Concentrate on the structure and ordering of components within a sentence Greater focus.
Linguistic Essentials
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
Artificial Intelligence: Natural Language
CSA2050 Introduction to Computational Linguistics Parsing I.
The Minimalist Program
Introduction Chapter 1 Foundations of statistical natural language processing.
Supertagging CMSC Natural Language Processing January 31, 2006.
Syntax and Grammars.
Dec 11, Human Parsing Do people use probabilities for parsing?! Sentence processing Study of Human Parsing.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
The Structure of Language Finding Patterns in the Noise Presented by Cliff Jones, M.A., Linguistics.
Understanding Naturally Conveyed Explanations of Device Behavior Michael Oltmans and Randall Davis MIT Artificial Intelligence Lab.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Chapter 11 Language. Some Questions to Consider How do we understand individual words, and how are words combined to create sentences? How can we understand.
VISUAL WORD RECOGNITION. What is Word Recognition? Features, letters & word interactions Interactive Activation Model Lexical and Sublexical Approach.
The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.
CSC 594 Topics in AI – Natural Language Processing
PRESENTED BY: PEAR A BHUIYAN
Statistical NLP: Lecture 3
Natural Language Processing (NLP)
Chunk Parsing CS1573: AI Application Development, Spring 2003
Linguistic Essentials
Natural Language Processing (NLP)
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Natural Language Processing (NLP)
Presentation transcript:

A Naturalistic, Functional Approach to NLU November 2008 Jerry Ball Air Force Research Laboratory

2 Introduction By Naturalistic, I mean… – Models language behavior below the level of input-output behavior Inside the cognitive “black box” (Ball, 2006) (But above the neural level) – Adheres to well established cognitive constraints on human language representation and processing

3 Introduction By Naturalistic, I mean… – Avoids computational techniques which are obviously not cognitively plausible, e.g. Algorithmic backtracking Requiring the full input in advance Strictly autonomous processing modules – Staged part of speech tagging followed by parsing Using the right context to make parsing decisions Full unification (unlimited depth of recursion) Backward inferencing (running productions in reverse)

4 Introduction By Functional, I mean… – Handles a broad range of linguistic inputs Not limited to some specialized collection of inputs which tests some isolated psycholinguistic phenomenon or models a toy world Doesn’t assume away lexical and structural ambiguity – Supports the addition of linguistic categories and mechanisms, as needed, to model a broad range of inputs Functionally motivated linguistic categories – Focus on meaning, not just form – Intended for use in real-world applications Synthetic Air Vehicle Operator (AVO) Teammate project

5 Introduction Empirically validated at a gross-level – Small-scale laboratory studies conducted without a functional system in place are likely to be counter-productive Don’t generalize well to more complex systems – From the functionalist perspective, it is premature to enforce minimalist assumptions in the absence of a functional model – Ockham’s Razor may well be inappropriate Ockham’s Razor favors the simplest model that covers a set of phenomona, but does not simultaneously favor modeling the simplest set of phenomena (Roelofs, 2005)

6 Key Assumption Given the inherently human nature of language processing, adhering to well-established cognitive constraints may actually facilitate development by pushing development in directions that are more likely to be successful – Short-term costs associated with adherence to cognitive constraints will ultimately yield long-term benefits System for handling variability in word input form (e.g. H-AREA h-area H Area harea) also supports processing of multi-word expressions (e.g. “kick the bucket”) – Don’t know what you’re giving up when you adopt cognitively implausible mechanisms Microsoft parser – processes input from right to left! – Can’t be integrated with speech recognition systems – Full input required in advance – Can’t be used in interactive applications

7 Constraints on Human Language Processing Visual World Paradigm (Tanenhaus et al. 1995) – Subjects presented with a visual scene – Subjects listen to auditory linguistic input describing scene Immediate determination of meaning – Subjects look immediately at referents of linguistic expressions, sometimes before end of expression Incremental processing Interactive, highly context-sensitive processing (Trueswell et al. 1999) – Ambiguous expressions are processed consistent with scene “the green…” “put the arrow on the paper into the box”

8 Largely serial and deterministic – Empirical evidence that we don’t retract previously built representations (Christianson et al. 2001) “While Mary dressed the baby sat up on the bed” – Empirical evidence that we don’t carry forward multiple representations in parallel – Garden Path Sentences “The horse raced past the barn fell” (Bever 1970) Some evidence of parallelism – Empirical evidence that we may carry forward multiple representations in parallel – Garden Path Effects can be eliminated with sufficient context Sensitive to frequency of language experience Limited recursive capabilities (no unbounded stack) – Center embedded constructions are extremely difficult to process “The mouse the cat the dog chased bit ate the cheese” Constraints on Human Language Processing

9 Linguistic Representations Psycholinguistic studies reveal little about linguistic representations – Levelt’s early studies are an exception However, if language processing is highly context sensitive, then linguistic representations are likely to reflect this… – No autonomous syntactic processing  no strictly syntactic representations

10 Linguistic Representations Encode syntactic, functional and linguistically relevant semantic information No sharp distinction between syntax and semantics (or pragmatics) – Most form-based variation is functional and meaningful Linguistic categories are functionally motivated – Handling wh-questions requires mechanisms for recognizing the fronted wh-expression and binding the fronted expression to a trace of an implicit argument (or equivalent functionality) What 1 did he do t 1 ?

11 Linguistic Representations Two key dimensions of meaning which get grammatically encoded are Referential and Relational meaning (Double R Grammar) – X-Bar Semantics: (Ref-Pt) + Spec + Head  Referring Expression (aka Maximal Projection) Rel-Head Complements  Relational Expression – Nominals refer to objects  Object Referring Expression – Clauses refer to situations  Situation Referring Expression Encoding additional dimensions of meaning leads to more complex grammatical representations – Topic/Focus – Given/New

12 Who did he kick the ball to? Wh-question Wh-question  Wh-focus Operator-Specifier Subject Head part of speech major grammatical unit Flat representations akin to Simpler Syntax and Construction Grammar

13 Who did he kick the ball to? Wh-question grammatical function Head Specifier -- Operator Specifier Modifier -- Post-head Modifier Complement -- Subject, Object… Functional categories from X-Bar Theory explicitly represented

14 Who did he kick the ball to? Wh-question referring expression All refering expressions have a bind-indx slot

15 Who did he kick the ball to? Wh-question relation Relations (verb, preposition, adjective, adverb) take 1 to 4 complements (subj, obj, iobj, sit-comp, loc-comp) complement

16 Who did he kick the ball to? Wh-question semantic feature

17 Who did he kick the ball to? Implicit object of preposition binds to fronted wh-obj-refer-expr Wh-question trace-*1* *1*

18 “Well-Established” Cognitive Constraint At a gross level, humans process language incrementally in real-time  performance cannot slow down with length of input Non-determinism must somehow be managed at Marr’s algorithmic level – Via parallel processing Spreading activation – Via non-monotonic processing Context accommodation Heuristics – Using probabilities – (Restricted language)

19 “Nearly” deterministic serial processing (integration) without backtracking or lookahead! Parallel, probabilistic, spreading activation mechanism (activation and selection) proposes linguistic constructions which are likely to be correct given current input & prior context – highly context sensitive If current input is unexpected given the prior context, then accommodate the input without backtracking The following example is from the Language Processing Model – “no airspeed or altitude restrictions” Language Processing in the Model

20 no “no”  object specifier  object referring expression = nominal construction

21 no airspeed “airspeed”  object head integration

22 no airspeed or altitude “airspeed or altitude”  object head Accommodation of conjunction via function overriding override

23 no airspeed or altitude restrictions “airspeed or altitude”  modifier “restrictions”  object head Appearance of parallel processing! airspeed or altitude = head vs. airspeed or altitude = mod Accommodation of new head via function shift shift

24 Combining Serial, Deterministic and Parallel, Probabilistic Mechanisms Tree Supertagging Construction Activation & Selection Supertag Stapling Construction Integration Rule ApplicationLexical Rule Selection Rule Selection Rule Application Rule Selection & Application Parallel Probabilistic Serial Deterministic Parallel Distributed Processing CFG PCFG Lexicalized PCFG Double R Probabilistic LTAG PDP Nearly Deterministic Range Non-deterministic The parallel probabilistic substrate makes a nearly deterministic serial processing mechanism possible!

25 Some Pitfalls to Avoid Risk of becoming detached from empirical reality – Competence/Performance distinction allowed generative grammarians to ignore performance No theory of performance Not constrained by computational implementation – Core/Peripheral distinction exacerbates the problem No sharp distinction between core and peripheral grammar – Language full of pseudo-regular constructions No sharp distinction between lexicon and grammar – Grammaticality judgements are the primary empirical tool OK gross level tool is used judiciously, but not exclusively

26 Our Empirical Reality

27 Some Pitfalls to Avoid Computational linguistic systems which use machine learning techniques to identify linguistic categories are at risk of over fitting the data – Trade-off between simplicity and fit (Tenenbaum, 2007) – The Bikel reimplementation of the Collins parser learns rule like “if the noun following the verb is ‘milk’ attach low, else attach high” based on a single occurrence of “milk” following a verb in the Penn Treebank corpus where “milk” was annotated as attaching low (Fong, 2007) – On our corpus, the Brill part of speech tagger tagged “airspeed” as a verb based on the “ed” ending, due to over reliance on morphological information and lack of context for when to apply the rule Silly to tag “airspeed” in “the airspeed” as a verb!

28 The Problem of Complexity Manual development may be overcome by inherent complexity – Computational linguistic systems built using machine learning techniques outperform manually built systems on large corpora, but provide only superficial analysis Overcoming complexity may require – Better theories Staged models of language processing were never practical for large systems – too much non-determinism and errors at lower levels get propagated to higher levels! – Integrating statistical and manual techniques Use statistical mechanisms to compute frequencies and probabilities over theoretically motivated linguistic categories

29 Conclusions A Naturalistic, Functional approach to NLU has much to recommend it Adhering to well-established cognitive constraints pushes development in directions that are more likely to be successful What is needed is a demonstration that the approach is capable of delivering a functional system that is cognitively plausible…

30 Questions?

31 Ball, J. (2007). A Bi-Polar Theory of Nominal and Clause Structure and Function. Annual Review of Cognitive Linguistics. Ball, J., Heiberg, A. & Silber, R. (2007). Toward a Large-Scale Model of Language Comprehension in ACT-R 6. Proceedings of the 8th International Conference on Cognitive Modeling. Ball, J. (2007). Construction-Driven Language Processing. Proceedings of the 2nd European Cognitive Science Conference. Heiberg, A., Harris, J. & Ball, J. (2007). Dynamic Visualization of ACT- R Declarative Memory Structure. Proceedings of the 8th International Conference on Cognitive Modeling. References Ball, J. (2006). Can NLP Systems be a Cognitive Black Box? In Papers from the AAAI Spring Symposium, Technical Report SS-06-02, 1-6. Menlo Park, CA: AAAI Press

32 Tanenhaus et al. (1995). Integration of visual and linguistic information in spoken language comprehension. Science, 268, Anderson, J. R., Bothell, D., Byrne, M. D., Douglass, S, Lebiere, C, and Qin, Y. (2004). An Integrated Theory of the Mind. Psychological Review 111, (4) Prince, A. & Smolensky, P. (1993/2004). Optimality Theory: Constraint interaction in generative grammar. Tech Report, Rutgers University & University of Colorado at Boulder. Revised version published by Blackwell, Rutgers Optimality Archive 537. Other References Christianson et al. (2001). Thematic roles assigned along the garden path linger. Cognitive Psychology, 42, Bever, T. (1970). The cognitive basis for linguistic structures. In J.R. Hayes (ed.), Cognition and Language Development, New York: Wiley. Cooke, N. & Shope, S. (2005). Synthetic Task Environments for Teams: CERTT’s UAV-STE. Handbook on Human Factors and Ergonomics Methods Boca Raton, FL: CLC Press, LLC. Trueswell, J. Sekering, I., Hill, N. & Logrip, M. (1999). The kindergarten path effect: studying on-line sentence processing in young children. Cognition, 73,

33 Some Pitfalls to Avoid Typical computational linguistic systems perform only low level analysis of the linguistic input – “most of the current research on statistical NLP is focused on shallow syntactic analysis, due to the difficulty of modeling deep analysis with basic statistical learning algorithms” (Shen, 2006) – Sergei & Marge’s system is an exception!

34 Some Pitfalls to Avoid Risk of proliferation of functional elements – Incremental addition of categories for each new phenomenon of study can be explosive – Too many levels of representation and hidden elements in pre-minimalist generative grammar based representations No psychological “face validity” (cf. Fereira, 2000) How can hidden elements be learned? – The Minimalist Program is attempting to simplify grammar to redress the language acquisition problem Explanatory adequacy

35 Some Pitfalls to Avoid Trade-off between simplicity and fit (Tenenbaum, 2007) – The simplest theory will seldom be the best fit, but don’t want to over fit the data Minimalist syntax is a much simpler theory than its predecessors, but is a poor fit to much of the linguistic data that earlier theories handled (Culicover & Jackendoff, 2005) – Descriptive adequacy has been sacrificed in pursuit of a “perfect” system of core grammar

36 Some Pitfalls to Avoid Culicover and Jackendoff’s Simpler Syntax is redressing empirical and functional shortcomings of generative grammar by simplifying syntax and adding a generative semantic component – Not all meaning distinctions must be represented syntactically  syntax can be simplified Scope of quantification, noun-noun combination, binding – By complicating semantic representations, and the interface between semantic and syntactic representations, syntactic representations can be simplified without loss in empirical coverage Is overall complexity reduced?