May 26, 2005: Empiricism versus Rationalism in Language Learning

Slides:



Advertisements
Similar presentations
2 Information Retrieval System IR System Query String Document corpus Ranked Documents 1. Doc1 2. Doc2 3. Doc3.
Advertisements

1 Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal Continuous Parameter Modeling Jerome R. Bellegarda.
SI485i : NLP Set 11 Distributional Similarity slides adapted from Dan Jurafsky and Bill MacCartney.
Comparison of information retrieval techniques: Latent semantic indexing (LSI) and Concept indexing (CI) Jasminka Dobša Faculty of organization and informatics,
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
Latent Semantic Analysis
Concepts and Categories. Functions of Concepts By dividing the world into classes of things to decrease the amount of information we need to learn, perceive,
An Introduction to Latent Semantic Analysis
Indexing by Latent Semantic Analysis Written by Deerwester, Dumais, Furnas, Landauer, and Harshman (1990) Reviewed by Cinthia Levy.
Singular Value Decomposition in Text Mining Ram Akella University of California Berkeley Silicon Valley Center/SC Lecture 4b February 9, 2011.
Indexing by Latent Semantic Analysis Scot Deerwester, Susan Dumais,George Furnas,Thomas Landauer, and Richard Harshman Presented by: Ashraf Khalil.
NLP Workshop Eytan Ruppin Ben Sandbank
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.
1 CS 430 / INFO 430 Information Retrieval Lecture 9 Latent Semantic Indexing.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Automatic Categorization.
NLP: Why? How much? How? Peter Wiemer-Hastings. Why NLP? Intro: once upon a time, I was a grad student and worked on MUC. Learned: –the NLP was as good.
Introduction.  Classification based on function role in classroom instruction  Placement assessment: administered at the beginning of instruction 
Fractal Composition of Meaning: Toward a Collage Theorem for Language Simon D. Levy Department of Computer Science Washington and Lee University Lexington,
Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])
Speech Analysing Component in Automatic Tutoring Systems Presentation by Doris Diedrich and Benjamin Kempe.
NL Question-Answering using Naïve Bayes and LSA By Kaushik Krishnasamy.
Indices Tomasz Bartoszewski. Inverted Index Search Construction Compression.
An Introduction to Latent Semantic Analysis. 2 Matrix Decompositions Definition: The factorization of a matrix M into two or more matrices M 1, M 2,…,
Katrin Erk Vector space models of word meaning. Geometric interpretation of lists of feature/value pairs In cognitive science: representation of a concept.
Latent Semantic Indexing: A probabilistic Analysis Christos Papadimitriou Prabhakar Raghavan, Hisao Tamaki, Santosh Vempala.
LATENT SEMANTIC INDEXING Hande Zırtıloğlu Levent Altunyurt.
Learn Big Data Application Development on Windows Azure Wen-ming Ye (叶文铭 ) Sr. Technical Evangelist Microsoft Corporation.
Alternative IR models DR.Yeni Herdiyeni, M.Kom STMIK ERESHA.
1 CS 430: Information Discovery Lecture 11 Latent Semantic Indexing.
Vector Semantics Dense Vectors.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Class Summary We have accomplished a lot this semester!
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Term semantics by Association Peter Bruza Faculty of Information Technology Queensland University of Technology
From Frequency to Meaning: Vector Space Models of Semantics
Latent Semantic Analysis John Martin Small Bear Technologies, Inc.
Search Engine and Optimization 1. Agenda Indexing Algorithms Latent Semantic Indexing 2.
Automatic Writing Evaluation
You Can’t Afford to be Late!
Plan for Today’s Lecture(s)
Best pTree organization? level-1 gives te, tf (term level)
Criterial features If you have examples of language use by learners (differentiated by L1 etc.) at different levels, you can use that to find the criterial.
Vocabulary Module 2 Activity 5.
Quantitative Analysis of Purposive Systems: Some Spadework at the Foundation of Scientific Psychology William T. Powers Course: Autonomous Machine.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Chapter 4 Research Methods in Clinical Psychology
Entity- & Topic-Based Information Ordering
Assessing Writing Module 5 Activity 2.
Vector-Space (Distributional) Lexical Semantics
Efficient Estimation of Word Representation in Vector Space
Preparing for the Verbal Reasoning Measure
Remember these terms? Analytic/ synthetic A priori/ a posteriori
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
Design open relay based DNS blacklist system
HCC class lecture 13 comments
Ying Dai Faculty of software and information science,
EPAS Educational Planning and Assessment System By: Cindy Beals
Validity and Reliability II: The Basics
Learning linguistic structure with simple recurrent neural networks
Information Retrieval and Web Search
Presented by Tim Hamilton
Rachit Saluja 03/20/2019 Relation Extraction with Matrix Factorization and Universal Schemas Sebastian Riedel, Limin Yao, Andrew.
The Winograd Schema Challenge Hector J. Levesque AAAI, 2011
Information Retrieval
We have accomplished a lot this semester!
Latent Semantic Analysis
CS249: Neural Language Model
Presentation transcript:

May 26, 2005: Empiricism versus Rationalism in Language Learning SYMBOLIC SYSTEMS 100: Introduction to Cognitive Science Dan Jurafsky and Daniel Richardson Stanford University Spring 2005 May 26, 2005: Empiricism versus Rationalism in Language Learning IP Notice: Lots of text from the LSA website (Landauer, Dumais, Laham, etc) And from Peter Chew http://www.sandia.gov/ACG/focusareas/cud/lsa.ppt 9/21/2018 Symbolic Systems 100 Spring 2005

Outline Plato’s problem “Innatist” solution Landauer and Dumais alternative The LSA Model itself Applications of LSA 9/21/2018 SYMBSYS 100 Spring 2005

Plato’s Problem How are humans able to learn to so much given the fragmentary, noisy, and incomplete nature of our interaction with the world? 9/21/2018 SYMBSYS 100 Spring 2005

BHID is twice as big as ABCD 9/21/2018 SYMBSYS 100 Spring 2005

Plato’s answer Knowledge of virtue (or geometry) is innate is some way His version: “remembered” from a previous life. 9/21/2018 SYMBSYS 100 Spring 2005

Chomsky’s argument from Plato The Poverty of the Stimulus argument We don’t get enough input from the world to learn the structure of human language Therefore the structure of human language is innate I.e. biologically determined in our genes 9/21/2018 SYMBSYS 100 Spring 2005

Landauer and Dumais response Something is certainly innate about language (humans have language and rocks don’t) But maybe it’s not the structure of language. Maybe it’s something about how we learn? We get lots of input! So it’s possible that much of human knowledge of language is acquired from experience But what exactly would such a theory be like? How could meaning be induced purely from experience? 9/21/2018 SYMBSYS 100 Spring 2005

Rationalism versus Empiricism Rationalists: The key factor in human behavior is innate predisposition Empiricists: The key factor in human behavior is induction from experience (Of course the right answer is that both are true but it’s not as shocking to say so) 9/21/2018 SYMBSYS 100 Spring 2005

LSA: a proof of concept L+D make a simplifying assumption: Much of meaning can be learned purely from language input L+D propose to build a device for extracting meaning from language data: A ‘theory and method for extracting and representing the contextual-usage meaning of words by statistical computations applied to a large corpus of text’ (http://lsa.colorado.edu/whatis.html) Again, the simplifying assumption: None of its ‘knowledge’ comes from perceptual information about the physical world What is the learning method? 9/21/2018 SYMBSYS 100 Spring 2005

Assumptions underlying LSA The meaning of a word is defined distributionally Firth (1957) “You shall know a word by the company it keeps” Idea: the meaning of a word can be determined by looking at its distribution, I.e. neighboring words “distribution” = the range or occurrence of a word among other words 9/21/2018 SYMBSYS 100 Spring 2005

Distributional induction of meaning Two words are similar if they tend to occur in similar contexts How do we define “similar context”? Appearing near similar “word neighbors” 9/21/2018 SYMBSYS 100 Spring 2005

Word similarity and vectors stem pit debug subroutine mango apple software hardware 9/21/2018 SYMBSYS 100 Spring 2005

How can LSA work if it doesn’t know about word order? Word order is not part of model How can LSA still produce intuitive results? Assume: vocabulary of 100,000 (105) words sentences are 20 words long word order important only within sentences  Contribution (in bits) to passage ‘information’: From word order: log2(20!)  610774 15.53% From word choice: log2((105)20) 3321928 84.47% Total log2(20!  10100)  393.2702 100.00% 9/21/2018 SYMBSYS 100 Spring 2005

Singular Value Decomposition Singular Value Decomposition (SVD) is a form of factor analysis Any m  n matrix A can be written using an SVD of the form A = UDVT where: U is an m  n matrix (a ‘hanger’ matrix) D is an n  n diagonal matrix (a ‘stretcher’ matrix) VT is an n  n matrix (an ‘aligner’ matrix) (see http://www.uwlax.edu/faculty/will/svd/index.html) 9/21/2018 SYMBSYS 100 Spring 2005

Application of SVD to LSA Assemble a large corpus of natural language Parse corpus into meaningful passages Form matrix with passages as rows and words as columns SVD applied to re-represent the words and passages as vectors in a high-dimensional ‘semantic space’ 9/21/2018 SYMBSYS 100 Spring 2005

SVD: an example (1) Titles of Technical Memos c1: Human machine interface for ABC computer applications c2: A survey of user opinion of computer system response time c3: The EPS user interface management system c4: System and human system engineering testing of EPS c5: Relation of user perceived response time to error measurement m1: The generation of random, binary, ordered trees m2: The intersection graph of paths in trees m3: Graph minors IV: Widths of trees and well-quasi-ordering m4: Graph minors: A survey 9/21/2018 SYMBSYS 100 Spring 2005

SVD Example: The original matrix 9/21/2018 SYMBSYS 100 Spring 2005

SVD: an example (3) SVD of matrix on previous slide   = 9/21/2018 SYMBSYS 100 Spring 2005

SVD: an example (4) 2-D reconstruction of original matrix r (human.user) = .94 r (human.minors) = -.83 9/21/2018 SYMBSYS 100 Spring 2005

9/21/2018 SYMBSYS 100 Spring 2005

9/21/2018 SYMBSYS 100 Spring 2005

Intuitive evaluation of LSA validity (Landauer 2002: 12) thing-things .61 husband-wife .87 man-husband .22 chemistry-physics .65 blackbird-black .04 go-went .71 9/21/2018 SYMBSYS 100 Spring 2005

Applications of LSA Information retrieval (search engines) Computation of passage similarity Text Assessment Automated grading of essays for quality and quantity of content Automatic summariation of text Determine best subset of text to portray same meaning As a thesaurus Finding synonyms vocabulary tests subject matter tests 9/21/2018 SYMBSYS 100 Spring 2005

Taking the synonym test Test LSA’s model of word knowledge Can it pass the TOEFL test? 80 item test from ETS (Educational Testing Service) Test of English as a Foreign Language Trained on AP newswire and Grollier’s Encyclopedia 45 million words of text, 60,000 word vocabulary Roughly equivalent to what a child would have read by the eighth grade 9/21/2018 SYMBSYS 100 Spring 2005

Taking the synonym test Given a test item: levied: A) imposed B) believed C) requested D) correlated Use LSA to compute the cosine between the test word and each of the putative synonyms. Choose the word with the highest cosine. 9/21/2018 SYMBSYS 100 Spring 2005

Thesaurus (word similarity) 9/21/2018 SYMBSYS 100 Spring 2005

Thesaurus (word similarity) 9/21/2018 SYMBSYS 100 Spring 2005

Using LSA to grade essays Middle school, high school, college Intro psychology, biology, etc Algorithm: Train LSA on texts in the domain Pre-grade some “training set” essays Use LSA to compare each “test set” essay to the training set graded essays Essays which are more similar to “A” essays get an A, more similar to “B” essays get a B, etc 9/21/2018 SYMBSYS 100 Spring 2005

Inter-rater reliability for standardized and classroom tests 9/21/2018 SYMBSYS 100 Spring 2005

Scattergram for narrative essays 9/21/2018 SYMBSYS 100 Spring 2005

Plagiarism Detection LSA-based Intelligent Essay Assessor An example of actual plagiarism detected at a university 520 student essays total For a reader to detect the plagiarism would require 134.940 essay-to-essay comparisons In this case, both essays scored by the same reader, and plagiarism went undetected 9/21/2018 SYMBSYS 100 Spring 2005

An example of plagiarism 9/21/2018 SYMBSYS 100 Spring 2005

Conclusions Learning Language learning Rationalists: focus on role of innate knowledge (Plato, Chomsky, Pinker) Empiricists: focus on role of experience Language learning Lots of information in the input Important: how to generalize beyond just textual information in learning! 9/21/2018 SYMBSYS 100 Spring 2005