Adam Houston 1, Chris Westbury 1 & Morton Gernsbacher 2 1 Department of Psychology, University of Alberta, Canada, 2 Department of Psychology, University.

Slides:



Advertisements
Similar presentations
Design of Experiments Lecture I
Advertisements

Chapter 16 Inferential Statistics
Modality-specific interaction between phonology and semantics Gail Moroschan & Chris Westbury Department of Psychology, University of Alberta, Edmonton,
Session 8b Decision Models -- Prof. Juran.
Infant sensitivity to distributional information can affect phonetic discrimination Jessica Maye, Janet F. Werker, LouAnn Gerken A brief article from Cognition.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Experiment 2: MEG Study Materials and Methods: 11 right-handed subjects with 20:20 vision were run. 3 subjects’ data was discarded because of poor performance.
Individual Differences in Rapid Word Recognition and its Relation to Reading Ability Laura Halderman 1, Christine Chiarello 1, Suzanne Welcome 1, Christiana.
Psycholinguistic methodology Psycholinguistics: Questions and methods.
Chapter 12 Simple Regression
Simple Linear Regression
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Elementary hypothesis testing Purpose of hypothesis testing Type of hypotheses Type of errors Critical regions Significant levels Hypothesis vs intervals.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Sex Differences in Visual Field Lateralization: Where are they? Christine Chiarello 1, Laura K. Halderman 1, Suzanne Welcome 1, Janelle Julagay 1 & Christiana.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Lecture 10 Comparison and Evaluation of Alternative System Designs.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Lecture 19 Simple linear regression (Review, 18.5, 18.8)
Quantitative Genetics
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
3 CHAPTER Cost Behavior 3-1.
Gaussian process modelling
1 Validation & Verification Chapter VALIDATION & VERIFICATION Very Difficult Very Important Conceptually distinct, but performed simultaneously.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
Inferences for Regression
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Correlation & Regression
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Chapter 7 Probability and Samples: The Distribution of Sample Means
Chap. 5 Building Valid, Credible, and Appropriately Detailed Simulation Models.
STATISTICS AND OPTIMIZATION Dr. Asawer A. Alwasiti.
Chapter 8: Simple Linear Regression Yang Zhenlin.
T tests comparing two means t tests comparing two means.
1 1 Slide Simulation Professor Ahmadi. 2 2 Slide Simulation Chapter Outline n Computer Simulation n Simulation Modeling n Random Variables and Pseudo-Random.
RESEARCH METHODS IN INDUSTRIAL PSYCHOLOGY & ORGANIZATION Pertemuan Matakuliah: D Sosiologi dan Psikologi Industri Tahun: Sep-2009.
Statistics (cont.) Psych 231: Research Methods in Psychology.
Copyright © 2009 Pearson Education, Inc t LEARNING GOAL Understand when it is appropriate to use the Student t distribution rather than the normal.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 7 l Hypothesis Tests 7.1 Developing Null and Alternative Hypotheses 7.2 Type I & Type.
VISUAL WORD RECOGNITION. What is Word Recognition? Features, letters & word interactions Interactive Activation Model Lexical and Sublexical Approach.
Simulation Modeling.
Stats Methods at IC Lecture 3: Regression.
Chapter 13 Simple Linear Regression
23. Inference for regression
Hypothesis Tests l Chapter 7 l 7.1 Developing Null and Alternative
Inferences for Regression
Inference for Regression
Chapter 11: Simple Linear Regression
Elementary Statistics
Lecture Slides Elementary Statistics Twelfth Edition
The Practice of Statistics in the Life Sciences Fourth Edition
Simulation Modeling.
Integration of sensory modalities
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
15.1 The Role of Statistics in the Research Process
Mathematical Foundations of BME Reza Shadmehr
Inferences for Regression
Last Update 12th May 2011 SESSION 41 & 42 Hypothesis Testing.
DESIGN OF EXPERIMENTS by R. C. Baker
Objectives 6.1 Estimating with confidence Statistical confidence
Objectives 6.1 Estimating with confidence Statistical confidence
F test for Lack of Fit The lack of fit test..
Presentation transcript:

Adam Houston 1, Chris Westbury 1 & Morton Gernsbacher 2 1 Department of Psychology, University of Alberta, Canada, 2 Department of Psychology, University of Wisconsin, USA Objectifying subjectivity: A quantitative approach to subjective familiarity 1.) Computational Estimation of Subjectivity Familiarity Rating We used a computational method known as genetic programming (GP) to develop a mathematical model of subjectivity familiarity ratings in terms of a non-linear combination of well-defined lexical variables. GP uses natural selection to evolve mathematical equations (see sidebar, top right) using any number of input variables and without making assumptions about distribution or linearity. We evolved functions that maximized the correlation between each equation’s output and human subjective familiarity ratings for letter words of frequency 0 (Gernsbacher, 1984). The input variables were 52 lexical variables from the WordMine Database (Buchanan & Westbury, 2000). These included measures of orthographic and phonological neighborhood size and frequency, and length-controlled and length-uncontrolled biphone/bigram size and frequencies. None of those 52 variables was significantly linearly correlated with the subjective frequency measures (p < 0.05). GP is stochastic, and cannot guarantee that any solution it finds is the best solution. Accordingly, we repeated the run many times and picked out the best one. We ran 25 runs of 2500 equations evolving for 75 generations. The best evolved equation combined seven of the input variables (show in Table 1) to generate estimates that correlated with the subjective frequency measures at r = 0.57 (see Figure 2). How does genetic programming work? Any mathematical equation can be expressed as a tree. For example, consider the tree at the top left in Figure 1. It expresses the equation: w + ((y * z) / x). The one beside it expresses the equation: (a / b) * log(w). We can mate any two equations by randomly swapping subtrees that compose them to produce children: equations that have the same elements as their parents. The two trees at the bottom are children of the two at the top. GP ensures that only the best parents are allowed to mate: in this case, the ones that best predict the validity scores. This selectivity ensures that produced children will contain elements that may be useful for the problem at hand. Across many generations of selective breeding, average and best fitness increase. Since fitness is determined here by utility for solving the problem, increases in fitness = better solutions to the problem of interest. The process is formally identical to selective breeding in biology, where the breeder decides which animal is good enough to be allowed to breed. Following repeated breeding sessions, we select the best solution that has evolved. 2.) Testing the Computational Estimation of Subjectivity Familiarity Rating In order to test the evolved equation, we used it to select a large number of words and non-words that it predicted to be either high or low familiarity. None of the words appeared in the original GP input set, and all were known to be recognized by at least 70% of undergraduate subjects. Generating two disjunct and unique stimulus sets for every subject, we asked 34 right-handed native English speakers to undertake two randomly-ordered tasks. Task A: Rating Subjective Familiarity One task was a rating task (Figure 3), which required subjects to use a Likert scale to rate a set of words and nonwords on subjective familiarity. The input file also included some very unword-like strings that contained either vowels or no vowels, in order to encourage subjects to use the full range of the scale. Subjects rated the nonwords selected as ‘low familiarity’ by the evolved equation as significantly less familiar than the words selected as ‘high familiarity’ (t(33) = 4.72; p 0.05). Task B: Lexical Decision The second task was a lexical decision task (Figure 4), to determine whether the predicted familiarity would correlate with recognition latencies for words and nonwords predicted to be high or low subjective familiarity. The input file also included very unword-like strings that contained either vowels or no vowels. Subjects were significantly slower to reject nonwords that had been predicted to be high familiarity than those that had been predicted to be low familiarity (t(33) = 2.82; p 0.05, perhaps because the nonwords included highly word-like strings. Figure 1: Some equations as trees. Figure 3: Familiarity ratings by human subjects, as predicted in advance by the evolved GP equation. Bars are standard errors. Figure 4: LD RTs for correct responses by human subjects, as predicted in advance by the evolved GP equation. Bars are standard errors. Percentages are correct response rate. 3.) What does it mean?: Discussion and conclusions We have analyzed the evolved equation and simplified it as much as possible. However, it remains complex even in its simplest form. Essentially it operates by breaking up the variance into two roughly orthogonal components that are added together. The first component uses a function of five variables (heavily weighed for the uncontrolled bigram frequency) if there are no orthographic neighbours. If there are orthographic neighbours, this first component uses a function of two controlled biphone measures. The second component is a function of uncontrolled bigram frequency and controlled biphone frequency, modulated in rare circumstances when there is a high frequency phonological neighbour. We should not be surprised that most of the variables in the evolved equation are sublexical, since most of the input variables were. What is of interest is that such variables could account for such a large proportion of the variance in measures of subjective familiarity. We have shown that it is possible to objectively capture a significant portion of the variance in the notion of ‘subjective familiarity’, at least for nonwords. We do not wish to claim that our evolved equation is the ‘true’ function used by human decision makers making familiarity judgments. However, if our equation may be used to infer anything about human decision making, it suggests that humans have a subtle sensitivity to a complex set of subword frequency measures. Some experimental evidence directly manipulating small subword features suggests that this is true (Westbury & Buchanan, 2002). Gernsbacher, M.A. (1984). Resolving 20 years of inconsistent interactions between lexical familiarity and orthography, concreteness, and polysemy. Journal of Experimental Psychology: General. 113(2): Buchanan, L. & Westbury, C (2000). Wordmine database: Probabilistic values for all four to seven letter words in the English Language. Westbury, C. & Buchanan, L. (2002). The probability of the least likely non length-controlled bigram affects lexical decision RTs. Brain and Language, 81(1/2/3): Abstract: Gernsbacher (1984) showed that low-frequency words could vary in their subjective familiarity, as measured by subjects’ ratings. She also demonstrated that manipulation of stimuli by subjective familiarity rating had an effect on word recognition latencies. The finding of this kind of consistency in the subjective ratings of words of equal low frequency raises the question: What aspects of the word are subjects using to make their judgments? The current work addresses this question. We used genetic programming to develop an explicitly-specified mathematical model of Gernsbacher’s subjective familiarity ratings for words with an orthographic frequency of 0. We tested the model experimentally using two tasks. It was able to account for a significant amount of variance in Gernsbacher’s subjective familiarity, and to predict word judgment and recognition judgments for nonwords. Figure 2: Regression line of best evolved equation for predicting subjective familiarity Table 1: The 7 input variables used by the best evolved equation for predicting subjective familiarity 86%92%99% 89%88%98%