Introduction to Computational Natural Language Learning Linguistics 79400 (Under: Topics in Natural Language Processing ) Computer Science 83000 (Under:

Slides:

Advertisements

Similar presentations

Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.

Advertisements

 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.

Syntax and Context-Free Grammars Julia Hirschberg CS 4705 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin.

Chapter 4 Syntax.

Sub-constituents of NP in English September 12, 2007.

Statistical NLP: Lecture 3

Fall 2008Programming Development Techniques 1 Topic 9 Symbol Manipulation Generating English Sentences Section This is an additional example to symbolic.

Learning linguistic structure with simple recurrent networks February 20, 2013.

LING NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006.

Natural Language Processing - Feature Structures - Feature Structures and Unification.

Cognitive Processes PSY 334 Chapter 11 – Language Structure.

NLP and Speech 2004 Feature Structures Feature Structures and Unification.

1 Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:

Neural Networks Basic concepts ArchitectureOperation.

9.012 Brain and Cognitive Sciences II Part VIII: Intro to Language & Psycholinguistics - Dr. Ted Gibson.

Artificial Intelligence 2005/06 From Syntax to Semantics.

1 Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:

Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:

Tom Griffiths CogSci C131/Psych C123 Computational Models of Cognition.

 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.

1 Kakia Chatsiou Department of Language and Linguistics University of Essex XLE Tutorial & Demo LG517. Introduction to LFG Introduction.

1 Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:

Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:

Sentence Processing using a Simple Recurrent Network EE 645 Final Project Spring 2003 Dong-Wan Kang 5/14/2003.

Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:

1 Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:

Models of Generative Grammar Smriti Singh. Generative Grammar  A Generative Grammar is a set of formal rules that can generate an infinite set of sentences.

Lecture 1 Introduction: Linguistic Theory and Theories

Grammar Nuha Alwadaani.

Syntax Nuha AlWadaani.

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

Phrases and Sentences: Grammar

11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.

Radial Basis Function Networks

Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.

1.Syntax: the rules of sentence formation; the component of the mental grammar that represent speakers’ knowledge of the structure of phrase and sentence.

For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.

1 Words and rules Linguistics lecture #2 October 31, 2006.

James L. McClelland Stanford University

Dr. Monira Al-Mohizea MORPHOLOGY & SYNTAX WEEK 12.

February 22, 2010 Connectionist Models of Language.

GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.

NLP. Introduction to NLP Is language more than just a “bag of words”? Grammatical rules apply to categories and groups of words, not individual words.

11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.

Syntax Why is the structure of language (syntax) important? How do we represent syntax? What does an example grammar for English look like? What strategies.

For Wednesday Read chapter 23 Homework: –Chapter 22, exercises 1,4, 7, and 14.

Culture , Language and Communication

Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin.

CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.

Rules, Movement, Ambiguity

Artificial Intelligence: Natural Language

1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.

1 Some English Constructions Transformational Framework October 2, 2012 Lecture 7.

Pattern Associators, Generalization, Processing Psych /719 Feb 6, 2001.

Week 3. Clauses and Trees English Syntax. Trees and constituency A sentence has a hierarchical structure Constituents can have constituents of their own.

Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:

Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.

Simple recurrent networks.

Chapter Eight Syntax.

Prof. Carolina Ruiz Department of Computer Science

CSC 594 Topics in AI – Applied Natural Language Processing

BBI 3212 ENGLISH SYNTAX AND MORPHOLOGY

Chapter Eight Syntax.

Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 8, 2018.

Learning linguistic structure with simple recurrent neural networks

Prof. Carolina Ruiz Department of Computer Science

Presentation transcript:

Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under: Topics in Artificial Intelligence ) The Graduate School of the City University of New York Fall 2001 William Gregory Sakas Hunter College, Department of Computer Science Graduate Center, PhD Programs in Computer Science and Linguistics The City University of New York

boydogrun book rockseeeat boydogrun book rockseeeat Elman’s Single Recurrent Network 1) activate from input to output as usual (one input word at a time), but copy the hidden activations to the context layer. 2) repeat 1 over and over - but activate from the input AND copy layers to the ouput layer. 1-to-1 exact copy of activations "regular" trainable weight connections

From Elman (1990) Templates were set up and lexical items were chosen at random from "reasonable" categories. Templates for sentence generator NOUN-HUM VERB-EAT NOUN-FOOD NOUN-HUM VERB-PERCEPT NOUN-INANIM NOUN-HUM VERB-DESTROY NOUN-FRAG NOUN-HUM VERB-INTRAN NOUN-HUM VERB-TRAN NOUN-HUM NOUN-HUM VERB-AGPAT NOUN-INANIM NOUN-HUM VERB-AGPAT NOUN-ANIM VERB-EAT NOUN-FOOD NOUN-ANIM VERB-TRAN NOUN-ANIM NOUN-ANIM VERB-AGPAT NOUN-INANIM NOUN-ANIM VERB-AGPAT NOUN-INANIM VERB-AGPAT NOUN-AGRESS VERB-DESTORY NOUN-FRAG NOUN-AGRESS VERB-EAT NOUN-HUM NOUN-AGRESS VERB-EAT NOUN-ANIM NOUN-AGRESS VERB-EAT NOUN-FOOD Categories of lexical items NOUN-HUM man, woman NOUN-ANIM cat, mouse NOUN-INANIM book, rock NOUN-AGRESS dragon, monster NOUN-FRAG glass, plate NOUN-FOOD cookie, sandwich VERB-INTRAN think, sleep VERB-TRAN see, chase VERB-AGPA move, break VERB-PERCEPT smell, see VERB-DESTROY break, smash VERB-EA eat

Training dataSupervisor's answers woman smash plate cat move man break car boy move girl eat bread dog smash plate cat move man break car boy move girl eat bread dog move Resulting training and supervisor files. Files were 27,354 words long, made up of 10,000 two and three word "sentences."

Cluster (Similarity) analysis Hidden activations were for each word were averaged together. For simplicity assume only 3 hidden nodes (in fact there were 150). After the SRN was trained, the file was run through the network. The activations at the hidden nodes was recorded (I made up these numbers for the example). Now the average was taken for every word: boy smash plate... dragon eat boy... boy eat cookie boy smash plate dragon eat cookie Each of these vectors represents a point in 3-D space – some vectors are close together, some furthur apart.

Each of these words represents a point in 150-Dimentional space averaged from all activations generated by the network when processing that word. Each joint (where there is a connection) represents the distance between clusters. So for example, the distance between animals and humans is approx.85 and the distance between ANIMATES and INANIMATES is approx 1.5.

Seems to correctly discover Nouns vs Verbs, verb subcategorization, animates/inanimates, etc. Cool, eh? Remarks: No information is represented in the input (localist, orthogonal) There are no "rules" in the traditional sense. The categories are learned from statistical regularities in the sentences – there is no structure being provided to the network (more on this in a bit) There are no "symbols" in the traditional sense. Classic symbol manipulating systems use names for well-defined classes of entities (N, V, adj, etc). In an SRN the representation of the concept of, say, boy, is: 1. distributed – (as a vector of activations), and 2. represented over context wrt to words that come before. (E.g. boy is represented one way when used as an object and another when used in subject position) Although. note that when a cluster analysis is performed on specific occurrences of a word, the cluster is very tight, but there is some variation based on a words context.

From Elman (1991) – Constituency, long distance relations, optionality. A simple context-free grammar was used S -> NP VP. NP -> PropN | N | N RC VP -> V (NP) RC -> who NP VP | who VP (NP) N -> boy | girl | cat | dog | boys | girls | cats | dogs V -> chase | feed | see | hear | walk | live | chases | feeds | sees | hears | walks | lives PropN -> John | Mary Plus constraints on number agreement, and verb argument subcategorization.

This allows a variety of interesting sentences that were used for training. (note *'d items were not used for training. For you CS people out there, * frequently means ungrammatical) Dogs live. *Dogs live cats. Boys see. Boys see dogs. Boys see dog. Boys hit dogs. *Boys hit. Dog who chases cat sees girl. *Dog who chase cat sees girl. Dog who cat chases sees girl. Boys who girls who dogs chase see hear. Boys see dogs who see girls who hear. Boys see dogs who see girls. Boys see dogs. Boys see. Transitive Optionally transitive intransitive long distance number agreement Ambiguous sentence boundaries.

Boys who Mary chase feed cats. This is much, much difficult input than Elman Long distance agreement: chases agrees with Boys, but who Mary is in the way. Subcategorization: chases is mandatorily transitive, but in a relative clause, the network has to NOT mistake it as the independent sentence Mary chases.

Analysis of results – Principle Component Analysis. Suppose you have 3 hidden nodes and four vectors of activation that correspond to: boy subj, boy obj, girl subj, girl obj. boy subj boy obj girl obj girl subj Adapted from Crocker (2001) activation at hidden node 1 activation at hidden node 2 boy subj boy obj girl obj girl subj activation at hidden node 3 But PCA give you more info: And hierarchical clustering gives you this: