A computational study of cross-situational techniques for learning word-to- meaning mappings Jeffrey Mark Siskind Presented by David Goss-Grubbs March.

Slides:



Advertisements
Similar presentations
Data Mining Classification: Alternative Techniques
Advertisements

Lexical Analysis IV : NFA to DFA DFA Minimization
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Chapter 4: Trees Part II - AVL Tree
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
On the Genetic Evolution of a Perfect Tic-Tac-Toe Strategy
Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.
Significance Testing Chapter 13 Victor Katch Kinesiology.
1.2 Row Reduction and Echelon Forms
Linear Equations in Linear Algebra
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
UMass Lowell Computer Science Graduate Analysis of Algorithms Prof. Karen Daniels Spring, 2010 Lecture 3 Tuesday, 2/9/10 Amortized Analysis.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
Chapter 9 Describing Process Specifications and Structured Decisions
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
UMass Lowell Computer Science Graduate Analysis of Algorithms Prof. Karen Daniels Spring, 2009 Lecture 3 Tuesday, 2/10/09 Amortized Analysis.
This material in not in your text (except as exercises) Sequence Comparisons –Problems in molecular biology involve finding the minimum number of edit.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Building Knowledge-Driven DSS and Mining Data
Data Flow Analysis Compiler Design Nov. 8, 2005.
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Radial Basis Function Networks
Basic Data Mining Techniques
Chapter 1: Introduction to Statistics
SVM by Sequential Minimal Optimization (SMO)
Learning CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
 1  Outline  stages and topics in simulation  generation of random variates.
Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.
Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.
Probabilistic Context Free Grammars for Representing Action Song Mao November 14, 2000.
Theory Revision Chris Murphy. The Problem Sometimes we: – Have theories for existing data that do not match new data – Do not want to repeat learning.
10/3/2012ISC329 Isabelle Bichindaritz1 Logical Design.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
CP Summer School Modelling for Constraint Programming Barbara Smith 2. Implied Constraints, Optimization, Dominance Rules.
Bab 5 Classification: Alternative Techniques Part 1 Rule-Based Classifer.
Lecture 5 Normalization. Objectives The purpose of normalization. How normalization can be used when designing a relational database. The potential problems.
Machine Learning Chapter 5. Artificial IntelligenceChapter 52 Learning 1. Rote learning rote( โรท ) n. วิถีทาง, ทางเดิน, วิธีการตามปกติ, (by rote จากความทรงจำ.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
Automated Patch Generation Adapted from Tevfik Bultan’s Lecture.
1 CSCD 326 Data Structures I Software Design. 2 The Software Life Cycle 1. Specification 2. Design 3. Risk Analysis 4. Verification 5. Coding 6. Testing.
CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Generic Tasks by Ihab M. Amer Graduate Student Computer Science Dept. AUC, Cairo, Egypt.
Automated Reasoning Early AI explored how to automated several reasoning tasks – these were solved by what we might call weak problem solving methods as.
MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 6 Deadlocks Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All.
Data Mining Spring 2007 Noisy data Data Discretization using Entropy based and ChiMerge.
Course14 Dynamic Vision. Biological vision can cope with changing world Moving and changing objects Change illumination Change View-point.
Data Mining and Decision Support
October 1, 2013Computer Vision Lecture 9: From Edges to Contours 1 Canny Edge Detector However, usually there will still be noise in the array E[i, j],
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
Memory Management OS Fazal Rehman Shamil. swapping Swapping concept comes in terms of process scheduling. Swapping is basically implemented by Medium.
Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.
Concept Learning and The General-To Specific Ordering
Computational Learning Theory Part 1: Preliminaries 1.
1 Systematic Data Selection to Mine Concept-Drifting Data Streams Wei Fan Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery.
Mapping ER to Relational Model Each strong entity set becomes a table. Each weak entity set also becomes a table by adding primary key of owner entity.
Logical Design 12/10/2009GAK1. Learning Objectives How to remove features from a local conceptual model that are not compatible with the relational model.
CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
1 1.2 Linear Equations in Linear Algebra Row Reduction and Echelon Forms © 2016 Pearson Education, Ltd.
Copyright © 2009 Pearson Education, Inc. Chapter 11 Understanding Randomness.
1 Estimation Chapter Introduction Statistical inference is the process by which we acquire information about populations from samples. There are.
Hierarchical and K-means Clustering
Problem solving sequence
Binomial Heaps On the surface it looks like Binomial Heaps are great if you have no remove mins. But, in this case you need only keep track of the current.
Normalization Dale-Marie Wilson, Ph.D..
Classification Algorithms
Chapter 11 Describing Process Specifications and Structured Decisions
Regression Testing.
Dekai Wu Presented by David Goss-Grubbs
Implementation of Learning Systems
Presentation transcript:

A computational study of cross-situational techniques for learning word-to- meaning mappings Jeffrey Mark Siskind Presented by David Goss-Grubbs March 5, 2006

The Problem: Mapping Words to Concepts ► Child hears John went to school ► Child sees GO(John, TO(school)) ► Child must learn  John  John  went  GO(x, y)  to  TO(x)  school  school

Two Problems ► Referential uncertainty  MOVE(John, feet)  WEAR(John, RED(shirt)) ► Determining the correct alignment  John  TO(x)  walked  school  to  John  school  GO(x, y)

Helpful Constraints ► Partial Knowledge ► Cross-situational inference ► Covering constraints ► Exclusivity

Partial Knowledge ► Child hears Mary lifted the block ► Child sees  CAUSE(Mary, GO(block, UP))  WANT(Mary, block)  BE(block, ON(table)) ► If the child knows lift contains CAUSE, the second two hypotheses can be ruled out.

Cross-situational inference ► John lifted the ball  CAUSE(John, GO(ball, UP)) ► Mary lifted the block  CAUSE(Mary, GO(block, UP)) ► Thus, lifted  {UP, GO(x, y), GO(x, UP), CAUSE(x, y), CAUSE(x, GO(y, z)), CAUSE(x, GO(y, UP))}

Covering constraints ► Assume: all components of an utterance’s meaning come from the meanings of words in that utterance. ► If it is known that CAUSE is not part of the meaning of John, the or ball, it must be part of the meaning of lifted. ► (But what about constructional meaning?)

Exclusivity ► Assume: any portion of the meaning of an utterance comes from no more than one of its words. ► If John walked  WALK(John) and John  John Then walked can be no more than walked  WALK(x)

Three more problems ► Bootstrapping ► Noisy Input ► Homonymy

Bootstrapping ► Lexical acquisition is much easier if some of the language is already known ► Some of Siskind’s strategies (e.g. cross- situational learning) work without such knowledge ► Others (e.g. exclusivity) require it. ► The algorithm starts off slow, then speeds up

Noise ► Only a subset of all possible meanings will be available to the algorithm ► If none of them contain the correct meaning, cross-situational learning would cause those words never to be acquired ► Some portion of the input must be ignored. ► (A statistical approach is rejected – it is not clear why)

Homonymy ► Similar to noisy input, cross-situational techniques would fail to find a consistent mapping for homonymous words. ► When an inconsistency is found, a split is made. ► If the split is corroborated, a new sense is created; otherwise it is noise.

The problem, formally stated ► From: a sequence of utterances  Each utterance is an unordered collection of words  Each utterance is paired with a set of conceptual expressions ► To: a lexicon  The lexicon maps each word to a set of conceptual expressions, one for each sense of the word

Composition ► Select one sense for each word ► Find all ways of combining these conceptual expressions ► The meaning of an utterance is derived only from the meaning of its component words. ► Every conceptual expression in the meanings of the words must appear in the final conceptual expression (copies are possible)

The simplified algorithm: no noise or homonymy ► Two learning stages  Stage 1: The set of conceptual symbols  E.g. {CAUSE, GO, UP}  Stage 2: The conceptual expression  CAUSE(x, GO(y, UP))

Stage 1: Conceptual symbol set ► Maintain sets of necessary and possible conceptual symbols for each word ► Initialize the former to the empty set and the latter to the universal set ► Utterances will increase the necessary set and decrease the possible set, until they converge on the actual conceptual symbol set

Stage 2: Conceptual expression ► Maintain a set of possible conceptual expressions for each word ► Initialize to the set of all expressions that can be composed from the actual conceptual symbol set ► New utterances will decrease the possible conceptual expression set until only one remains

Example necessaryPossible John {John} {John, ball} Took{CAUSE} {CAUSE, WANT, GO, TO, arm} The{} {WANT, arm} Ball {ball} {ball, arm}

Selecting the meaning John took the ball ► CAUSE(John, GO(ball, TO(John))) ► WANT(John, ball) ► CAUSE(John, GO(PART-OF (LEFT(arm), John), TO(ball)))  Second is eliminated because no CAUSE  Third is eliminated because no word has LEFT or PART-OF

Updated table necessaryPossible John {John} Took {CAUSE, GO, TO} The{}{} Ball {ball}

Stage 2 CAUSE(John, GO(ball, TO(John))) John {John} Took {CAUSE(x, GO(y, TO(x)))} The{} Ball {ball}

Noise and Homonymy ► Noisy or homonymous data can corrupt the lexicon ► Adding an incorrect element to the set of necessary elements ► Taking a correct element away from the set of possible elements ► This may or may not create an inconsistent entry

Extended algorithm ► Necessary and possible conceptual symbols are mapped to senses rather than words ► Words are mapped to their senses ► Each sense has a confidence factor

Sense assignment ► For each utterance, find the cross-product of all the senses ► Choose the “best” consistent sense assignment ► Update the entries for those senses as before ► Add to a sense’s confidence factor each time it is used in a preferred assignment

Inconsistent utterances ► Add the minimal number of new senses until the utterance is no longer inconsistent – three possibilities ► If the current utterance is noise, new senses are bad (and will be ignored) ► There really are new senses ► The original senses were bad, and the right senses are only now being added. ► On occasion, remove senses with low confidence factors

Four simulations ► Vary the task along five parameters ► Vocabulary growth rate by size of corpus ► Number of required exposures to a word by size of corpus ► How high can it scale?

Method (1 of 2) ► Construct a random lexicon ► Vary it by three parameters  Vocabulary size  Homonymy rate  Conceptual-symbol inventory size

Method (2 of 2) ► Construct a series of utterances, each paired with a set of meaning hypotheses ► Vary this by the following parameters  Noise rate  Degree of referential uncertainty  Cluster size (5)  Similarity probability (.75)

Sensitivity analysis

Vocabulary size

Degree of referential uncertainty

Noise rate

Conceptual-symbol inventory size

Homonymy rate

Vocabulary Growth

Number of exposures