Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Slides:



Advertisements
Similar presentations
Knowledge Construction
Advertisements

Introduction to: Automated Essay Scoring (AES) Anat Ben-Simon Introduction to: Automated Essay Scoring (AES) Anat Ben-Simon National Institute for Testing.
1 Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal Continuous Parameter Modeling Jerome R. Bellegarda.
COGNITIVE VIEWS OF LEARNING Information processing is a cognitive theory that examines the way knowledge enters and is stored in and retrieved from memory.
Latent Semantic Analysis
Common Factor Analysis “World View” of PC vs. CF Choosing between PC and CF PAF -- most common kind of CF Communality & Communality Estimation Common Factor.
Concepts and Categories. Functions of Concepts By dividing the world into classes of things to decrease the amount of information we need to learn, perceive,
Knowing Semantic memory.
Singular Value Decomposition in Text Mining Ram Akella University of California Berkeley Silicon Valley Center/SC Lecture 4b February 9, 2011.
Indexing by Latent Semantic Analysis Scot Deerwester, Susan Dumais,George Furnas,Thomas Landauer, and Richard Harshman Presented by: Ashraf Khalil.
NLP Workshop Eytan Ruppin Ben Sandbank
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
Overview of Long-Term Memory laura leventhal. Reference Chapter 14 Chapter 14.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Copyright ©2011 Pearson Education 8-1 Chapter 8 Confidence Interval Estimation Statistics for Managers using Microsoft Excel 6 th Global Edition.
1 CS 430 / INFO 430 Information Retrieval Lecture 9 Latent Semantic Indexing.
Cognitive Psychology, 2 nd Ed. Chapter 8 Semantic Memory.
Science and Engineering Practices
Section VI: Comprehension Teaching Reading Sourcebook 2 nd edition.
Science Inquiry Minds-on Hands-on.
Separate multivariate observations
ACOS 2010 Standards of Mathematical Practice
Data Mining Techniques
ESL Phases & ESL Scale Curriculum Corporation 1994.
LANGUAGE ACQUISITION AND DEVELOPMENT STANDARDS KNOWLEDGE BASES PLANNING STANDARDS KNOWLEDGE BASES PLANNING.
Reasoning Abilities Slide #1 김 민 경 Reasoning Abilities David F. Lohman Psychological & Quantitative Foundations College of Education University.
Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])
Chapter 2 Dimensionality Reduction. Linear Methods
Overview of the Database Development Process
Chapter 14: Artificial Intelligence Invitation to Computer Science, C++ Version, Third Edition.
TEA Science Workshop #3 October 1, 2012 Kim Lott Utah State University.
T 7.0 Chapter 7: Questioning for Inquiry Chapter 7: Questioning for Inquiry Central concepts:  Questioning stimulates and guides inquiry  Teachers use.
Speech Analysing Component in Automatic Tutoring Systems Presentation by Doris Diedrich and Benjamin Kempe.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Texture. Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 8-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Weighting and Matching against Indices. Zipf’s Law In any corpus, such as the AIT, we can count how often each word occurs in the corpus as a whole =
Katrin Erk Vector space models of word meaning. Geometric interpretation of lists of feature/value pairs In cognitive science: representation of a concept.
Shane T. Mueller, Ph.D. Indiana University Klein Associates/ARA Rich Shiffrin Indiana University and Memory, Attention & Perception Lab REM-II: A model.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 1. The Statistical Imagination.
LATENT SEMANTIC INDEXING Hande Zırtıloğlu Levent Altunyurt.
SINGULAR VALUE DECOMPOSITION (SVD)
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Richard Woods, Georgia’s School Superintendent “Educating Georgia’s Future” gadoe.org Assessment for Learning Series Module 4: Working through Complex.
Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry University of Tennessee.
1 Business Proprietary © 2009 Oculus Info Inc. Everyone’s a Critic: Memory Models and Uses for an Artificial Turing Judge W. Joseph MacInnes, Blair C.
Lecture 12 Factor Analysis.
1 Latent Concepts and the Number Orthogonal Factors in Latent Semantic Analysis Georges Dupret
1 CS 430: Information Discovery Lecture 11 Latent Semantic Indexing.
Link Distribution on Wikipedia [0407]KwangHee Park.
Natural Language Processing Topics in Information Retrieval August, 2002.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
COURSE AND SYLLABUS DESIGN
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Chapter 8 Thinking and Language.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Search Engine and Optimization 1. Agenda Indexing Algorithms Latent Semantic Indexing 2.
Automatic Writing Evaluation
You Can’t Afford to be Late!
Plan for Today’s Lecture(s)
Vector-Space (Distributional) Lexical Semantics
May 26, 2005: Empiricism versus Rationalism in Language Learning
Efficient Estimation of Word Representation in Vector Space
Section VI: Comprehension
CS 430: Information Discovery
Georgia Department of Education Assessment and Accountability Division
Generalized Diagnostics with the Non-Axiomatic Reasoning System (NARS)
Presentation transcript:

Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong

Overview The question of interest The Problem The Proposed Solution: LSA Latent Semantic Analysis What is it? What can it do? How does it do it? Evaluation of the model Additional Considerations Demonstrations of LSA

The Problem of Induction Plato’s problem: the poverty of the stimulus How do people acquire as much knowledge as they do based on the little information they get? Example: Language Acquisition Chomsky (1991) – Observing adult language is insufficient for children’s development of grammar or a typical lexicon Pinker (1994) – Language learning must be innate – a “language instinct”

Problem of induction in cognitive terms... Problem of categorization What is the mechanism in which concepts (cheetah, tigers) come to be treated as the same for some purpose (predators that will eat me) Problem of similarity How does experience combine disparate things into a feature identity (“wing” different for a bird, insect, bat)

Latent Semantic Analysis: What is it? “Latent Semantic Analysis (LSA) is a mathematical/statistical technique for extracting and representing the similarity of meaning of words and passages by analysis of large bodies of text.” More simply, it is a computer model of human associative learning through experience Does not embody human knowledge beyond its general learning mechanism

What can LSA do? Performance on standard vocabulary and subject matter tests comparable to humans Demonstrates similar mechanism for word sorting and category judgments Processes word-word and passage-word lexical priming data It can accurately estimate: Passage coherence Learnability of passages by individual students The quality and quantity of knowledge contained in essays Can perform humanlike generalizations based on learning that isn’t dependent upon primitive perceptual relations/representations

How does LSA work? Definitions Semantic space Singular value decomposition (SVD) Dimensionality Procedure 1) Matrix Input 2) Cell Transformation 3) Singular Value Decomposition 4) Dimension Reduction

Semantic Space A semantic space is a mathematical representation of a large body of text (e.g. Encyclopedias, Psychology Texts) Each term or combination of terms has its own high- dimensional vector representation within the semantic space Similarity between vectors for words and context is measured by cosine of their combined angle Note: Terms can only be compared within a semantic space, not directly between semantic spaces If vectors were projected onto a sphere surrounding the semantic space, points close together would have closer semantic relations

Example of similarities within Semantic Space Submitting a term/short text and receiving list of terms that are nearest to it in semantic space Matrix comparison of multiple terms

Singular Value Decomposition A mathematical matrix decomposition technique (general case of factor analysis), condenses large matrix of word-by-content data into smaller matrix Smaller matrix typically has a dimensional representation The right number of dimensions critical for optimal simulation

Dimensionality Knowing appropriate dimensionality improves estimates Example: Three separate house, ABC are arranged as follows: A is 5 units from both B and C, and B and C are separated by 8 units Oh, also, all on the same straight, flat road A BC A BC

Procedure: Matrix Input Rows = individual word types Columns = meaning-bearing passages (i.e. sentences or paragraphs) Cells = frequency with which a word occurs in a passage

Procedure: Cell Transformation Transformation 1: Approximates standard empirical growth functions of simple learning Taking a word’s appearance frequency Transformation 2: makes primary association better represent the informative relation between the entities rather than co-occurrence Entropy for a word Transformation 1Transformation 2

Procedure: SVD & Dimension Reduction SVD: [ij] = [ik] [kk] [jk]' in which [ik] and [jk] have orthonormal columns, [kk] is a diagonal matrix of singular values, and k <= max (i,j). Dimension reduction: all but the d largest singular values are set to zero, where d = number of dimensions to be used

Word (w) x Context (c) Matrix (X) m columns of W and m rows of C’ are linearly independent Diagonal Matrix Orthonormal Matrices

LSA Example c1: Human machine interface for ABC computer applications c2: A survey of user opinion of computer system response time c3:The EPS user interface management system c4: System and human system engineering testing of EPS c5: Relation of user perceived response time to error measurement m1: The generation of random, binary, ordered trees m2: The intersection graph pf paths in trees m3: Graph minors IV: Widths of trees and well-quasi ordering m4: Graph minors: A survey

r(human user) = 0.94

Evaluating the Model Four Questions to keep in mind: 1. Can a simple linear model acquire knowledge of humanlike word meaning similarities given sufficient input? 2. If successful, is it dependent upon dimensionality of representation? 3. Is the rate of acquisition comparable to a human? 4. What degree of this knowledge is from indirect inferences from combinations of information across samples?

Is It Acquiring Knowledge Model’s knowledge tested with standard multiple-choice synonym test After training on approx. 2,000 pages of English text, LSA scored as well as average test-takers on the synonym portion of TOEFL Acquired knowledge attributed to indirect inference as opposed to direct co-occurrence relations

Two explanations… 1) A substantial portion of the information needed to answer common vocabulary questions could be inferred from the contextual statistics of usage alone 2) Model employs a means of induction-dimension matching that amplifies its learning ability, resulting in correct inference of similarity relations only implicit in temporal correlations of experience

Is dimensionality a factor? Varied number of dimensions retained Note: What happens when there is no dimensionality reduction at all Choosing optimal dimensionality approximately triples the number of words learned

Comparable rate? Learning comparable to the rate at which school aged children improve their performance on similar tests as a result of reading Rate of acquisition for late elementary and high school years estimated at 3, ,400 words per year (10-15 per day)

Calculating Comparable Rate: Direct & Indirect Effects LSA simulations consider Average number of contexts in which test word appeared (the parameter) And the total number of other contexts, those that contained no words from the synonym test items Varied by randomly replacing test words with nonsense words and choosing random subsamples of total text Joint effects of direct and indirect textual experience

LSA simulation of total vocabulary gain Came up with a model to fit data: z = a(log b T)(log c S) T : total number of text samples analyzed S : number of text samples containing stem word r =.89 For every word estimates were made for Probability that a word of its frequency appears in the next sample Number of times individual would have encountered the word previously Expected increase in z with the addition of a passage containing the word Expected increase in z with the addition of a passage that doesn’t contain it Converted z to probability correct x corresponding frequencies Cumulated gains in number correct / all individuals words in the language to get the total vocabulary gains from reading single text sample

Conclusions from Vocabulary Simulations LSA learns meanings similarities of words from text, amount equivalent to test scores of moderately competent English readers Three-fourths of LSA’s knowledge is a product of indirect induction (the exposure of text not containing the word) Expression of hypothesis that word meanings grow continuously and that correct performance is a stochastic event governed by individual differences in experience i.e. word meanings are constantly in flux

Other Considerations Neurocognitive & Psychological Plausibility Neural net models Similarity to biological models Parallels with memory Meaning – Independent of word order? Contextual Disambiguation – In LSA, words have only one vector representation, thus only one meaning

Mathematical Machine Analogy: a three-layered neural net LAYER 1: WORD TYPE LAYER 2: CONCEPTUAL REPRESENTATIONS LAYER 3: TEXT WINDOW

Neural Net Analogy Network is symmetrical – can run in either direction Different computations made to assess similarity between two episodes, event types, or an episode and an event type

Similarity to Biological Models Interneuronal communication Vector multiplication between axons, dendrites and cell bodies Excitation is proportional to dot product of output and sensitivities of surrounding neurons Single-cell recordings Population effects described as vector averages of individual direction representations

Word-versus-context difference: Analogy to Episodic & Semantic Memories Word representations are semantic, meanings abstracted and averaged from many experiences Context representations are episodic, unique combinations that occurred only once ever Both words and episodes represented by same defining dimensions, and relation to one another is still retained

Word-versus-context difference: Analogy to Explicit & Implicit Memories Retrieving a context vector brings past happening to mind - explicit memory Retrieving a word vector instantiates abstraction of many happenings brought together - implicit memory

Meaning: independent of word order? Text segments treated as “bags of words” LSA makes no use of word order, syntax or grammar Despite assertions that “scrambled sentences would be worthless context for vocabulary instruction” (Durkin,1983), LSA acquires 100% of its knowledge via “scrambled sentences” and still performs relatively well at deciphering meaning

Expertise LSA account of knowledge brings new perspective for expertise Simulated expert learns four times more about an item per exposure than the simulated novice LSA suggests that great masses of knowledge contribute to superior performance by Direct application of stored knowledge to a problem Greater ability to add new knowledge to long term memory To infer indirect relations among bits of knowledge and to generalize from instances and experience

Contextual Disambiguation Frequency-weighted average of predicted usages Acceptable for words that generate only one or a few closely related meanings (majority of words) Balanced homographs such as bear result in an LSA vector that doesn’t resemble any of their major meanings While LSA’s single-vector representation can’t account for multiple word-meaning phenomena at this stage, it is not a fatal flaw (local context will aid in disambiguation)

Text Comprehension: An LSA Interpretation of Construction- Integration Theory Research in which individual word senses aren’t represented, but overall meaning of phrases/sentences/paragraphs is constructed from linear combination of their words Vector average reflects overall topic or meaning or passage

Criticisms/ Further Issues Remember: SVD is just one possible, simple case for a model Assumption: All necessary semantic information is gleaned from a word’s context (ex. – “love”) Linguistic structures (i.e. syntax) which show obvious importance for derivation of meaning should be incorporated

Educational Applications of LSA Performance on college exams Scoring the content of an essay Selecting most appropriate text for learners with different levels of background knowledge Assisting students to summarize material

Performance on College Exams

Essay Grading

Demonstrations: Write to Learn Promotes writing skills and reading comprehension

Demonstrations: Intelligent Essay Assessor (IEA) Assesses and critiques electronically submitted essays Provides assessment and feedback

Demonstration: Summary Street Web-based reading comprehension and writing instruction tool Compares student summaries to each section of text and provides feedback

Demonstration: Super Manual Program that allows one to identify, develop, and test better ways to organize and present information customized to individual maintainers' level of expertise

Educational Text Selection Predicts how much readers will learn from texts based on estimated conceptual knowledge of topic and information present in the text they read

Demonstration: State the Essence! LSA provides evaluations to student summaries of text Guides students toward content that had been noted by experts to consider most significant A way to measure reading comprehension Summary writing requires construction of mental representations that joins elements of text information with each other and elements of prior knowledge

Summary People appear to know significantly more than they could have learned from temporally local experiences Proposed induction method dependant on reconstruction of system of multiple similarity relations in high dimensional space Implemented dimensionality-optimizing induction though SVD matrix decomposition Model scored as well as the mean scores of foreign students on TOEFL exams Model learned at a rate similar to school-children and through induction from data about other words Because LSA didn’t have access to word-similarity information based on spoken language, morphology, syntax, logic or perceptual word knowledge, concluded that induction method is sufficient to account for Plato’s paradox, at least in domain of knowledge measured by synonym tests