A Naturalistic, Functional Approach to NLU November 2008 Jerry Ball Air Force Research Laboratory
2 Introduction By Naturalistic, I mean… – Models language behavior below the level of input-output behavior Inside the cognitive “black box” (Ball, 2006) (But above the neural level) – Adheres to well established cognitive constraints on human language representation and processing
3 Introduction By Naturalistic, I mean… – Avoids computational techniques which are obviously not cognitively plausible, e.g. Algorithmic backtracking Requiring the full input in advance Strictly autonomous processing modules – Staged part of speech tagging followed by parsing Using the right context to make parsing decisions Full unification (unlimited depth of recursion) Backward inferencing (running productions in reverse)
4 Introduction By Functional, I mean… – Handles a broad range of linguistic inputs Not limited to some specialized collection of inputs which tests some isolated psycholinguistic phenomenon or models a toy world Doesn’t assume away lexical and structural ambiguity – Supports the addition of linguistic categories and mechanisms, as needed, to model a broad range of inputs Functionally motivated linguistic categories – Focus on meaning, not just form – Intended for use in real-world applications Synthetic Air Vehicle Operator (AVO) Teammate project
5 Introduction Empirically validated at a gross-level – Small-scale laboratory studies conducted without a functional system in place are likely to be counter-productive Don’t generalize well to more complex systems – From the functionalist perspective, it is premature to enforce minimalist assumptions in the absence of a functional model – Ockham’s Razor may well be inappropriate Ockham’s Razor favors the simplest model that covers a set of phenomona, but does not simultaneously favor modeling the simplest set of phenomena (Roelofs, 2005)
6 Key Assumption Given the inherently human nature of language processing, adhering to well-established cognitive constraints may actually facilitate development by pushing development in directions that are more likely to be successful – Short-term costs associated with adherence to cognitive constraints will ultimately yield long-term benefits System for handling variability in word input form (e.g. H-AREA h-area H Area harea) also supports processing of multi-word expressions (e.g. “kick the bucket”) – Don’t know what you’re giving up when you adopt cognitively implausible mechanisms Microsoft parser – processes input from right to left! – Can’t be integrated with speech recognition systems – Full input required in advance – Can’t be used in interactive applications
7 Constraints on Human Language Processing Visual World Paradigm (Tanenhaus et al. 1995) – Subjects presented with a visual scene – Subjects listen to auditory linguistic input describing scene Immediate determination of meaning – Subjects look immediately at referents of linguistic expressions, sometimes before end of expression Incremental processing Interactive, highly context-sensitive processing (Trueswell et al. 1999) – Ambiguous expressions are processed consistent with scene “the green…” “put the arrow on the paper into the box”
8 Largely serial and deterministic – Empirical evidence that we don’t retract previously built representations (Christianson et al. 2001) “While Mary dressed the baby sat up on the bed” – Empirical evidence that we don’t carry forward multiple representations in parallel – Garden Path Sentences “The horse raced past the barn fell” (Bever 1970) Some evidence of parallelism – Empirical evidence that we may carry forward multiple representations in parallel – Garden Path Effects can be eliminated with sufficient context Sensitive to frequency of language experience Limited recursive capabilities (no unbounded stack) – Center embedded constructions are extremely difficult to process “The mouse the cat the dog chased bit ate the cheese” Constraints on Human Language Processing
9 Linguistic Representations Psycholinguistic studies reveal little about linguistic representations – Levelt’s early studies are an exception However, if language processing is highly context sensitive, then linguistic representations are likely to reflect this… – No autonomous syntactic processing no strictly syntactic representations
10 Linguistic Representations Encode syntactic, functional and linguistically relevant semantic information No sharp distinction between syntax and semantics (or pragmatics) – Most form-based variation is functional and meaningful Linguistic categories are functionally motivated – Handling wh-questions requires mechanisms for recognizing the fronted wh-expression and binding the fronted expression to a trace of an implicit argument (or equivalent functionality) What 1 did he do t 1 ?
11 Linguistic Representations Two key dimensions of meaning which get grammatically encoded are Referential and Relational meaning (Double R Grammar) – X-Bar Semantics: (Ref-Pt) + Spec + Head Referring Expression (aka Maximal Projection) Rel-Head Complements Relational Expression – Nominals refer to objects Object Referring Expression – Clauses refer to situations Situation Referring Expression Encoding additional dimensions of meaning leads to more complex grammatical representations – Topic/Focus – Given/New
12 Who did he kick the ball to? Wh-question Wh-question Wh-focus Operator-Specifier Subject Head part of speech major grammatical unit Flat representations akin to Simpler Syntax and Construction Grammar
13 Who did he kick the ball to? Wh-question grammatical function Head Specifier -- Operator Specifier Modifier -- Post-head Modifier Complement -- Subject, Object… Functional categories from X-Bar Theory explicitly represented
14 Who did he kick the ball to? Wh-question referring expression All refering expressions have a bind-indx slot
15 Who did he kick the ball to? Wh-question relation Relations (verb, preposition, adjective, adverb) take 1 to 4 complements (subj, obj, iobj, sit-comp, loc-comp) complement
16 Who did he kick the ball to? Wh-question semantic feature
17 Who did he kick the ball to? Implicit object of preposition binds to fronted wh-obj-refer-expr Wh-question trace-*1* *1*
18 “Well-Established” Cognitive Constraint At a gross level, humans process language incrementally in real-time performance cannot slow down with length of input Non-determinism must somehow be managed at Marr’s algorithmic level – Via parallel processing Spreading activation – Via non-monotonic processing Context accommodation Heuristics – Using probabilities – (Restricted language)
19 “Nearly” deterministic serial processing (integration) without backtracking or lookahead! Parallel, probabilistic, spreading activation mechanism (activation and selection) proposes linguistic constructions which are likely to be correct given current input & prior context – highly context sensitive If current input is unexpected given the prior context, then accommodate the input without backtracking The following example is from the Language Processing Model – “no airspeed or altitude restrictions” Language Processing in the Model
20 no “no” object specifier object referring expression = nominal construction
21 no airspeed “airspeed” object head integration
22 no airspeed or altitude “airspeed or altitude” object head Accommodation of conjunction via function overriding override
23 no airspeed or altitude restrictions “airspeed or altitude” modifier “restrictions” object head Appearance of parallel processing! airspeed or altitude = head vs. airspeed or altitude = mod Accommodation of new head via function shift shift
24 Combining Serial, Deterministic and Parallel, Probabilistic Mechanisms Tree Supertagging Construction Activation & Selection Supertag Stapling Construction Integration Rule ApplicationLexical Rule Selection Rule Selection Rule Application Rule Selection & Application Parallel Probabilistic Serial Deterministic Parallel Distributed Processing CFG PCFG Lexicalized PCFG Double R Probabilistic LTAG PDP Nearly Deterministic Range Non-deterministic The parallel probabilistic substrate makes a nearly deterministic serial processing mechanism possible!
25 Some Pitfalls to Avoid Risk of becoming detached from empirical reality – Competence/Performance distinction allowed generative grammarians to ignore performance No theory of performance Not constrained by computational implementation – Core/Peripheral distinction exacerbates the problem No sharp distinction between core and peripheral grammar – Language full of pseudo-regular constructions No sharp distinction between lexicon and grammar – Grammaticality judgements are the primary empirical tool OK gross level tool is used judiciously, but not exclusively
26 Our Empirical Reality
27 Some Pitfalls to Avoid Computational linguistic systems which use machine learning techniques to identify linguistic categories are at risk of over fitting the data – Trade-off between simplicity and fit (Tenenbaum, 2007) – The Bikel reimplementation of the Collins parser learns rule like “if the noun following the verb is ‘milk’ attach low, else attach high” based on a single occurrence of “milk” following a verb in the Penn Treebank corpus where “milk” was annotated as attaching low (Fong, 2007) – On our corpus, the Brill part of speech tagger tagged “airspeed” as a verb based on the “ed” ending, due to over reliance on morphological information and lack of context for when to apply the rule Silly to tag “airspeed” in “the airspeed” as a verb!
28 The Problem of Complexity Manual development may be overcome by inherent complexity – Computational linguistic systems built using machine learning techniques outperform manually built systems on large corpora, but provide only superficial analysis Overcoming complexity may require – Better theories Staged models of language processing were never practical for large systems – too much non-determinism and errors at lower levels get propagated to higher levels! – Integrating statistical and manual techniques Use statistical mechanisms to compute frequencies and probabilities over theoretically motivated linguistic categories
29 Conclusions A Naturalistic, Functional approach to NLU has much to recommend it Adhering to well-established cognitive constraints pushes development in directions that are more likely to be successful What is needed is a demonstration that the approach is capable of delivering a functional system that is cognitively plausible…
30 Questions?
31 Ball, J. (2007). A Bi-Polar Theory of Nominal and Clause Structure and Function. Annual Review of Cognitive Linguistics. Ball, J., Heiberg, A. & Silber, R. (2007). Toward a Large-Scale Model of Language Comprehension in ACT-R 6. Proceedings of the 8th International Conference on Cognitive Modeling. Ball, J. (2007). Construction-Driven Language Processing. Proceedings of the 2nd European Cognitive Science Conference. Heiberg, A., Harris, J. & Ball, J. (2007). Dynamic Visualization of ACT- R Declarative Memory Structure. Proceedings of the 8th International Conference on Cognitive Modeling. References Ball, J. (2006). Can NLP Systems be a Cognitive Black Box? In Papers from the AAAI Spring Symposium, Technical Report SS-06-02, 1-6. Menlo Park, CA: AAAI Press
32 Tanenhaus et al. (1995). Integration of visual and linguistic information in spoken language comprehension. Science, 268, Anderson, J. R., Bothell, D., Byrne, M. D., Douglass, S, Lebiere, C, and Qin, Y. (2004). An Integrated Theory of the Mind. Psychological Review 111, (4) Prince, A. & Smolensky, P. (1993/2004). Optimality Theory: Constraint interaction in generative grammar. Tech Report, Rutgers University & University of Colorado at Boulder. Revised version published by Blackwell, Rutgers Optimality Archive 537. Other References Christianson et al. (2001). Thematic roles assigned along the garden path linger. Cognitive Psychology, 42, Bever, T. (1970). The cognitive basis for linguistic structures. In J.R. Hayes (ed.), Cognition and Language Development, New York: Wiley. Cooke, N. & Shope, S. (2005). Synthetic Task Environments for Teams: CERTT’s UAV-STE. Handbook on Human Factors and Ergonomics Methods Boca Raton, FL: CLC Press, LLC. Trueswell, J. Sekering, I., Hill, N. & Logrip, M. (1999). The kindergarten path effect: studying on-line sentence processing in young children. Cognition, 73,
33 Some Pitfalls to Avoid Typical computational linguistic systems perform only low level analysis of the linguistic input – “most of the current research on statistical NLP is focused on shallow syntactic analysis, due to the difficulty of modeling deep analysis with basic statistical learning algorithms” (Shen, 2006) – Sergei & Marge’s system is an exception!
34 Some Pitfalls to Avoid Risk of proliferation of functional elements – Incremental addition of categories for each new phenomenon of study can be explosive – Too many levels of representation and hidden elements in pre-minimalist generative grammar based representations No psychological “face validity” (cf. Fereira, 2000) How can hidden elements be learned? – The Minimalist Program is attempting to simplify grammar to redress the language acquisition problem Explanatory adequacy
35 Some Pitfalls to Avoid Trade-off between simplicity and fit (Tenenbaum, 2007) – The simplest theory will seldom be the best fit, but don’t want to over fit the data Minimalist syntax is a much simpler theory than its predecessors, but is a poor fit to much of the linguistic data that earlier theories handled (Culicover & Jackendoff, 2005) – Descriptive adequacy has been sacrificed in pursuit of a “perfect” system of core grammar
36 Some Pitfalls to Avoid Culicover and Jackendoff’s Simpler Syntax is redressing empirical and functional shortcomings of generative grammar by simplifying syntax and adding a generative semantic component – Not all meaning distinctions must be represented syntactically syntax can be simplified Scope of quantification, noun-noun combination, binding – By complicating semantic representations, and the interface between semantic and syntactic representations, syntactic representations can be simplified without loss in empirical coverage Is overall complexity reduced?