Attendee questionnaire Name Affiliation/status Area of study/research For each of these subjects: –Linguistics (Optimality Theory) –Computation (connectionism/neural.

Slides:



Advertisements
Similar presentations
The Helmholtz Machine P Dayan, GE Hinton, RM Neal, RS Zemel
Advertisements

Optimality Theory Presented by Ashour Abdulaziz, Eric Dodson, Jessica Hanson, and Teresa Li.
Cognitive Systems, ICANN panel, Q1 What is machine intelligence, as beyond pattern matching, classification and prediction. What is machine intelligence,
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Psych : Introduction to Parallel Distributed Processing Michael Harm, Professor Anthony Cate, TA.
An Introduction to Artificial Intelligence Presented by : M. Eftekhari.
PDP: Motivation, basic approach. Cognitive psychology or “How the Mind Works”
A Summary of the Article “Intelligence Without Representation” by Rodney A. Brooks (1987) Presented by Dain Finn.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
What is Cognitive Science? … is the interdisciplinary study of mind and intelligence, embracing philosophy, psychology, artificial intelligence, neuroscience,
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
Stochastic Neural Networks, Optimal Perceptual Interpretation, and the Stochastic Interactive Activation Model PDP Class January 15, 2010.
Artificial Intelligence and Lisp Lecture 13 Additional Topics in Artificial Intelligence LiU Course TDDC65 Autumn Semester, 2010
COGNITIVE NEUROSCIENCE
1 Chapter 20 Section Slide Set 1 Prologue to neural networks: the neurons in the human brain Additional sources used in preparing the slides: Various.
Symbolic Encoding of Neural Networks using Communicating Automata with Applications to Verification of Neural Network Based Controllers* Li Su, Howard.
Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.
AI – CS364 Hybrid Intelligent Systems Overview of Hybrid Intelligent Systems 07 th November 2005 Dr Bogdan L. Vrusias
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Overview and History of Cognitive Science. How do minds work? What would an answer to this question look like? What is a mind? What is intelligence? How.
PSY 369: Psycholinguistics Some basic linguistic theory part3.
CSE 471/598,CBS598 Introduction to Artificial Intelligence Fall 2004
January 24-25, 2003Workshop on Markedness and the Lexicon1 On the Priority of Markedness Paul Smolensky Cognitive Science Department Johns Hopkins University.
Lecture 1 Introduction: Linguistic Theory and Theories
THEORIES OF MIND: AN INTRODUCTION TO COGNITIVE SCIENCE Jay Friedenberg and Gordon Silverman.
Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.
Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Chapter 14: Artificial Intelligence Invitation to Computer Science, C++ Version, Third Edition.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Introduction to Neural Networks. Neural Networks in the Brain Human brain “computes” in an entirely different way from conventional digital computers.
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
February 22, 2010 Connectionist Models of Language.
Artificial Neural Networks. Applied Problems: Image, Sound, and Pattern recognition Decision making  Knowledge discovery  Context-Dependent Analysis.
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
ARTIFICIAL INTELLIGENCE [INTELLIGENT AGENTS PARADIGM] Professor Janis Grundspenkis Riga Technical University Faculty of Computer Science and Information.
Methodology of Simulations n CS/PY 399 Lecture Presentation # 19 n February 21, 2001 n Mount Union College.
Modelling Language Acquisition with Neural Networks Steve R. Howell A preliminary research plan.
Optimality in Cognition and Grammar Paul Smolensky Cognitive Science Department, Johns Hopkins University Plan of lectures 1.Cognitive architecture: Symbols.
Soft Computing Lecture 19 Part 2 Hybrid Intelligent Systems.
The Harmonic Mind Paul Smolensky Cognitive Science Department Johns Hopkins University A Mystery ‘Co’-laborator Géraldine Legendre Alan Prince Peter Jusczyk.
CS 460/660 Compiler Construction. Class 01 2 Why Study Compilers? Compilers are important – –Responsible for many aspects of system performance Compilers.
ICS 586: Neural Networks Dr. Lahouari Ghouti Information & Computer Science Department.
Program Structure  OT Constructs formal grammars directly from markedness principles Strongly universalist: inherent typology  OT allows completely formal.
CS 478 – Tools for Machine Learning and Data Mining Perceptron.
1 Discovery and Neural Computation Paul Thagard University of Waterloo.
Linguistics as a Model for the Cognitive Approaches in Biblical Studies Tamás Biró SBL, London, 4 July 2011.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Chapter 1: Introduction to Neuro-Fuzzy (NF) and Soft Computing (SC)
CHAPTER 1 1 INTRODUCTION “Principles of Soft Computing, 2 nd Edition” by S.N. Sivanandam & SN Deepa Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
The Language of Thought : Part II Joe Lau Philosophy HKU.
CSC321: Neural Networks Lecture 1: What are neural networks? Geoffrey Hinton
Emergent Semantics: Meaning and Metaphor Jay McClelland Department of Psychology and Center for Mind, Brain, and Computation Stanford University.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 17: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
Pattern Recognition. What is Pattern Recognition? Pattern recognition is a sub-topic of machine learning. PR is the science that concerns the description.
MENTAL GRAMMAR Language and mind. First half of 20 th cent. – What the main goal of linguistics should be? Behaviorism – Bloomfield: goal of linguistics.
1 Azhari, Dr Computer Science UGM. Human brain is a densely interconnected network of approximately neurons, each connected to, on average, 10 4.
Intro. ANN & Fuzzy Systems Lecture 3 Basic Definitions of ANN.
Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.
NEURONAL NETWORKS AND CONNECTIONIST (PDP) MODELS Thorndike’s “Law of Effect” (1920’s) –Reward strengthens connections for operant response Hebb’s “reverberatory.
Lecture 7: Cognitive Science A Necker Cube
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.
Fall 2004 Perceptron CS478 - Machine Learning.
Artificial Intelligence and Lisp Lecture 13 Additional Topics in Artificial Intelligence LiU Course TDDC65 Autumn Semester,
Artificial Intelligence introduction(2)
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 8, 2018.
CS 621 Artificial Intelligence Lecture /10/05 Prof
Learning linguistic structure with simple recurrent neural networks
The Network Approach: Mind as a Web
Active, dynamic, interactive, system
Presentation transcript:

Attendee questionnaire Name Affiliation/status Area of study/research For each of these subjects: –Linguistics (Optimality Theory) –Computation (connectionism/neural networks) –Philosophy (symbolic/connectionist debate) –Psychology (infant phonology) please indicate your relative level of –interest (for these lectures) [1 = least, 5 = most] –background [1 = none, 5 = expert] Thank you

Optimality in Cognition and Grammar Paul Smolensky Cognitive Science Department Johns Hopkins University

Optimality in Cognition and Grammar Paul Smolensky Cognitive Science Department, Johns Hopkins University Plan of lectures 1.Cognitive architecture –Symbols and neurons –Symbols in neural networks –Optimization in neural networks 2.Optimization in grammar I: HG  OT 3.Optimization in grammar II: OT 4.OT in neural networks

Cognitive architecture Central dogma of cognitive science: Cognition is computation But what type of computation? What exactly is computation, and what work must it do in cognitive science?

Computation Functions, cognitive –Pixels  objects  locations [low- to high-level vision] –Sound stream  word string [phonetics + …] –Word string  parse tree [syntax] –Underlying form  surface form [phonology] petit copain: /p ə tit + kop ɛ ̃ /  [p ə. ti.ko.p ɛ ̃] petit ami: /p ə tit + ami/  [p ə. ti.ta.mi] Reduction of complex procedures for evaluating functions to combinations of primitive operations Computational architecture: –Operations: primitives + combinators –Data

Symbolic Computation Computational architecture: –Operations: primitives + combinators –Data The Pure Symbolic Architecture (PSA) –Data: strings, (binary) trees, graphs, … –Operations Primitives –Concatenate (string, tree) = cons –First-member(string); left-subtree(tree) = ex 0 Combinators –Composition: f ( x ) = def g ( h(x ))) –IF( x = A) THEN … ELSE …

ƒ Passive Few leaders are admired by George  admire(George, few leaders) ƒ ( s ) = cons(ex 1 (ex 0 (ex 1 (s))), cons(ex 1 (ex 1 (ex 1 (s))), ex 0 (s))) But for cognition, need a reduction to a very different computational architecture Aux by V A P V PA  PassiveLF

The cognitive architecture: The connectionist hypothesis –Representations: Distributed activation patterns –Primitive operations (e.g.) Multiplication of activations by synaptic weights Summation of weighted activation values Non-linear transfer functions – Combination: Massive parallelism PDP Computation At the lowest computational level of the mind/brain

Criticism of PDP (e.g., neuroscientists) Much too simple Misguided. Relevant complaint: – Much too complex – Target of computational reduction must be within the scope of neural computation. Confusion between two questions

The cognitive question for neuroscience What is the function of each component of the nervous system? Our question is quite different.

The neural question for cognitive science How are complex cognitive functions computed by a mass of numerical processors like neurons— each very simple, slow, and imprecise relative to the components that have traditionally been used to construct powerful, general-purpose computational systems? How does the structure arise that enables such a medium to achieve cognitive computation?

The ICS Hypothesis The Integrated Connectionist/Symbolic Cognitive Architecture (ICS) In higher cognitive domains, representations and fuctions are well approximated by symbolic computation The Connectionist Hypothesis is correct Thus, cognitive theory must supply a computational reduction of symbolic functions to PDP computation

Output Agent B DCFby EG Patient Aux Agent F G B DC Patient E Input W PassiveNet

The ICS Isomorphism  Aux by V A P Passive V PA LF Output Agent B DCFbyby EG Patie nt Au x F G B DC Patie nt Input W Tensor product representationsTensorial networks

Within-level compositionality W = W cons0 [ W ex1 W ex0 W ex1 ] + W cons1 [ W cons0 ( W ex1 W ex1 W ex1 )+ W cons1 ( W ex0 ) ] ƒ ( s ) = cons(ex 1 (ex 0 (ex 1 (s))), cons(ex 1 (ex 1 (ex 1 (s))), ex 0 (s))) Between-level reduction

Levels

ƒ G The ICS Architecture “dogs” dog+s A Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal:  Activation Pattern Connection Weights Harmony Optimization Spreading Activation σ k tæ σ k tæ σ k tæ  d  gz  Processing (Learning)

Computational neuroscience Key sources –Hopfield 1982, 1984 –Cohen and Grossberg 1983 –Hinton and Sejnowski 1983, 1986 –Smolensky 1983, 1986 –Geman and Geman 1984 –Golden 1986, 1988 Processing I: Activation

a1a1 i 1 (0.6) a2a2 i 2 (0.5) –λ (–0.9) Processing I: Activation Processing — spreading activation — is optimization : Harmony maximization

ƒ The ICS Architecture Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal:  optimal H : Activation Pattern Connection Weights Harmony Optimization Spreading Activation σ k tæ σ k tæ σ k tæ  catkæt G A

–λ (–0.9) a1a1 i 1 (0.6) a2a2 i 2 (0.5) Processing II: Optimization Cognitive psychology Key sources: –Hinton & Anderson 1981 –Rumelhart, McClelland, & the PDP Group 1986 Processing — spreading activation — is optimization : Harmony maximization

–λ (–0.9) a1a1 i 1 (0.6) a2a2 i 2 (0.5) Processing II: Optimization a 1 must be active (strength: 0.6) a 2 must be active (strength: 0.5) Harmony maximization is satisfaction of parallel, violable well-formedness constraints a 1 and a 2 must not be simultaneously active (strength: λ) 0.79 –0.21 Optimal compromise: CONFLICT

Processing II: Optimization The search for an optimal state can employ randomness Equations for units’ activation values have random terms – pr ( a ) ∝ e H ( a )/ T – T (‘temperature’) ~ randomness  0 during search –Boltzmann Machine (Hinton and Sejnowski 1983, 1986); Harmony Theory (Smolensky 1983, 1986)

The ICS Architecture A Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal:  opt. constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Satisfaction Spreading Activation σ k tæ σ k tæ σ k tæ  ƒ catkæt G

Two Fundamental Questions 2.What are the constraints? Knowledge representation Prior question: 1.What are the activation patterns — data structures — mental representations — evaluated by these constraints?  Harmony maximization is satisfaction of parallel, violable constraints

Representation  Symbolic theory Complex symbol structures Generative linguistics (Chomsky & Halle ’68 …)  Particular linguistic representations Markedness Theory (Jakobson, Trubetzkoy, ’30s …)  Good (well-formed) linguistic representations  Connectionism (PDP) Distributed activation patterns  ICS realization of (higher-level) complex symbolic structures in distributed patterns of activation over (lower-level) units (‘tensor product representations’ etc.) will employ ‘local representations’ as well

Representation σ k tæ σ /r ε σ k /r 0 k [ σ k [æ t]] æ /r 01 æ t /r 11 t

Tensor Product Representations Representations: ⊗ Filler vectors: A, B, X, Y Role vectors: r ε = 1 r 0 = (1 1) r 1 = (1 –1) i, j, k ∊ {A, B, X, Y} j k i Depth 0Depth 1 ① ② ③ ④ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩ ⑪ ⑫

 Representations: ⊗ Filler vectors: A, B, X, Y Role vectors: r ε = 1 r 0 = (1 1) r 1 = (1 –1) i, j, k ∊ {A, B, X, Y} j k i Depth 0Depth 1 ① ② ③ ④ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩ ⑪ ⑫ Tensor Product Representations ⊗ Filler vectors: A, B, X, Y i, j, k ∊ {A, B, X, Y} j k i Role vectors: r ε = 1 r 0 = (1 1) r 1 = (1 –1) Depth 0 ⑫ ⑤ ⑪ ⑥ ⑦ ⑧ ⑨ ⑩ Depth 1 ① ② ③ ④ ① ② ③ ④

 Representations: ⊗ Filler vectors: A, B, X, Y Role vectors: r ε = 1 r 0 = (1 1) r 1 = (1 –1) i, j, k ∊ {A, B, X, Y} j k i Depth 0Depth 1 ① ② ③ ④ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩ ⑪ ⑫ Tensor Product Representations Filler vectors: A, B, X, Y i, j, k ∊ {A, B, X, Y} j k i ⊗ Role vectors: r ε = 1 r 0 = (1 1) r 1 = (1 –1) ⑧ ⑦ ⑤ ⑥ Depth 1Depth 0 ① ② ③ ④ ① ② ③ ④ ⑫ ⑪ ⑩ ⑨

Local tree realizations Representations:

stopped

The ICS Isomorphism  Aux by V A P Passive V PA LF Output Agent B DCFbyby EG Patie nt Au x F G B DC Patie nt Input W Tensor product representationsTensorial networks

Structuring operation Symbolic formalizationConnectionist formalization StructuresExample Vector operation CombiningSets {c 1, c 2 } c 1 + c 2 Vector sum: + Role/filler binding Strings, frames AB = { A / r 1, B / r 2 } A  r 1 + B  r 2 Tensor product:  Recursive embedding Trees A B C A  r 0 +[B  r 0 + C  r 1 ]  r 1 Recursive role vectors: r left/right-child(x) = r 0/1  r x Tensor Product Representations

Mental representations are defined by the activation values of connectionist units. When analyzed at a higher level, these representations are distributed patterns of activity — activation vectors. For core aspects of higher cognitive domains, these vectors realize symbolic structures. Such a symbolic structure s is defined by a collection of structural roles { r i } each of which may be occupied by a filler f i ; s is a set of constituents, each a filler/role binding f i / r i. The connectionist realization of s is an activity vector s = Σ i f i  r i In higher cognitive domains such as language and reasoning, mental representations are recursive: the fillers or roles of s have themselves the same type of internal structure as s. And these structured fillers f or roles r in turn have the same type of tensor product realization as s.

Representation: Combination

The ICS Architecture Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal:  opt.constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Sat. Spreading Activation σ k tæ σ k tæ σ k tæ  A ƒ catkæt G

Two Fundamental Questions 2.What are the constraints? Knowledge representation Prior question: 1.What are the activation patterns — data structures — mental representations — evaluated by these constraints?  Harmony maximization is satisfaction of parallel, violable constraints

Representation σ k tæ σ /r ε σ k /r 0 k [ σ k [æ t]] æ /r 01 æ t /r 11 t

Two Fundamental Questions 2.What are the constraints? Knowledge representation Prior question: 1.What are the activation patterns — data structures — mental representations — evaluated by these constraints?  Harmony maximization is satisfaction of parallel, violable constraints

Constraints N O C ODA : A syllable has no coda [Maori] W * H ( a [ σ k [æ t]] ) = – s N O C ODA < 0 a [ σ k [æ t ]] * * violation σ k tæ ‘cat’

The ICS Architecture Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal:  opt.constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Sat. Spreading Activation σ k tæ σ k tæ σ k tæ  A ƒ catkæt G

NEXT LECTURE: HG, OT The ICS Architecture Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal:  opt.constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Sat. Spreading Activation σ k tæ σ k tæ σ k tæ  Constraint Interaction ?? A ƒ catkæt G