Presentation is loading. Please wait.

Presentation is loading. Please wait.

Language Comprehension reading

Similar presentations


Presentation on theme: "Language Comprehension reading"— Presentation transcript:

1 Language Comprehension reading

2 Research Methods Recording eye movements during reading
Computational modeling Neuropsychology

3 Eye movement analyses Saccadic movement: rapid movement of the eyes from one spot to another spot as one reads Fixation: these occur between saccadic movements. Information is obtained at fixation

4 Eye fixation durations during normal reading
We “take in” words during brief static fixations Eyemovement studies of reading. Leisure reading: about 280 words per minute What can our eye movements tell us about reading? Eye Tracking device • Presentation on a computer screen • Position of eye is monitored • Eye position is mapped on to position of text on screen Fixations • Rest period • msecs Saccades • Rapid eye movements taking msecs • Ballistic movement • Typically 7 to 9 character spaces with each saccade • No visual information during saccade Regressions • Saccades that move backwards • percent of all saccades   Visual Acuity • Foveal area subtends about 2 degrees of visual angle around fixation point • Parafoveal area subtends 10 degrees of visual angle arounnd fixation point Span of fixation Rayner & Pollatsek (1988)

5 Normal reader Speed reader Skimmer

6 Moving window technique
THE HANDSOME FROG KISSED THE PRINCESS AND TURNED … XHZ KLNDSOME FROG KISSED THE PRINCAWS NBD YRWVAA … GJUI DHABOPLH DROG KISSED THE PRINCESS ANQ DWEVDTA … Random letters presented outside window; window moves with eyes When window is large enough should have no effect (Rayner, 1975, 1981, 1986)

7 Moving window technique
Perceptual span to identify words: ~3 letters to left of fixation ~8 letters to right of fixation Span is asymmetric to right Span reverses for people who read from right-left (e.g. Hebrew) and is asymmetric to left (Rayner, 1975, 1981, 1986)

8 Reading From orthography to meaning

9 Context Semantics Orthography text Phonology speech meaning
Grammar pragmatics Semantics meaning Orthography text Phonology speech Connectionist framework for lexical processing, adapted from Seidenberg and McClelland (1989) and Plaut et al (1996).

10 Context Semantics Orthography text Phonology speech meaning
Grammar pragmatics Semantics meaning Direct access Phonologically mediated route Orthography text Phonology speech Connectionist framework for lexical processing, adapted from Seidenberg and McClelland (1989) and Plaut et al (1996).

11 Reading Pathways There are two possible routes from the printed word to its meaning: Spelling→meaning, the route from the spelling of the printed word to meaning at the top (2) Spelling→phonology→meaning: the print is first related to the phonological representation and then the phonological code is linked to meaning, just as in speech perception.  Both routes may be used in various degrees

12 Phonological mediation occurs in reading
Evidence for usage of route Semantic decisions on homophones e.g. Van Orden (1987) icecream a food? meet a food? -> slow “no” response rows a flower? -> slow “no” response

13 But... phonological mediation not necessary
Some brain-damaged patients can understand (some) written words without any apparent access to their sound pattern Phonological dyslexics can still read (Levine et al, 1982) Patient EB Reading comprehension slow but accurate Unable to choose which 2 of 4 written words sounded the same, or rhymed The relative contribution of the two routes to meaning-activation depends on word frequency (e.g. Jared & Seidenberg, 1991, JEP:Gen)

14 Deep Dyslexia: example patient
Semantic Errors canoe  kayak onion  orange window  shade paper  pencil nail  fingernail ache  Alka Seltzer Visual Errors cat  cot fear  flag rage  race

15 Modeling Deep Dyslexia
Mapping between these networks might be disrupted Semantics meaning Orthography text Phonology speech Plaut and Shallice (1993); Hinton, Plaut and Shallice (1993)

16 Neural Network Model for Deep Dyslexia
Network learns mapping between letter features and meaning features Hidden units provide a (non-linear) mapping between letter codes and meaning features Feedback connections: part of a feedback loop that adjusts the meaning output to stored patterns Learning was done with back-propagation Meaning features Hidden units Letter features Plaut and Shallice (1993); Hinton, Plaut and Shallice (1993)

17 What the network learns
The network created semantic attractors: each word meaning is a point in semantic space and has its own basin of attraction. semantic space cot cat visual space For a demonstration of attractor networks with visual patterns:

18 Simulating Brain Damage
Damage to the semantic units can change the boundaries of the attractors. This explains both semantic as well as visual errors -- meanings fall into a neighboring attractor. old semantic space new semantic space “cot” “cot” “cat” “cat” Visual error: Cat might be called “cot” Semantic error: Bed might be called “cot”

19 Reading aloud from orthography to phonology

20 Context Semantics Orthography text Phonology speech meaning
Grammar pragmatics Semantics meaning Orthography text Phonology speech Reading out loud

21 Dual Route Models of Reading
Orthography Lexical Route Spelling lookup Sublexical route Grapheme-phoneme conversion rules Lexicon necessary for exception words, e.g. PINT, COLONEL necessary for regular and unfamiliar words, e.g. VINT Phonology (e.g., Colheart, Curtis, Atkins, & Haller, 1993)

22 Surface Dyslexia Difficulty reading irregular words.
tendency to regularize irregular words (e.g. broad--> “brode”) Patients read GLOVE as rhyming with COVE and FLOOD with MOOD Damage to lexical route?

23 Explaining Surface Dyslexia
Orthography Lexical Route Spelling lookup Sublexical route Grapheme-phoneme conversion rules Lexicon necessary for exception words, e.g. PINT, COLONEL Phonology (e.g., Colheart, Curtis, Atkins, & Haller, 1993)

24 Phonological Dyslexia
Difficulty reading nonwords Correctly read irregular words (e.g. YACHT) regular words (e.g. CUP) Damage to sublexical route? Video demonstration Language->introduction->reading aloud words/nonwords

25 Explaining phonological dyslexia
Orthography Lexical Route Spelling lookup Sublexical route Grapheme-phoneme conversion rules Lexicon Phonology (e.g., Colheart, Curtis, Atkins, & Haller, 1993)

26 Neural Network Approach
E.g., Seidenberg and McClelland (1989) and Plaut (1996). Central to these models is the absence of any lexicon. No multiple routes from orthography to phonology are needed. Instead, rely on distributed representations The model has no stored information about words and ‘… knowledge of words is encoded in the connections in the network.’

27 A Neural Network Model Phonology speech Orthography print
/th/ /ih/ /k/ Phonology speech Phonemes (output) Hidden units Graphemes (input) Orthography print th i ck Plaut et al. (1996)

28 Plaut et al. (1996) Simulations
Network learned from 3000 written-spoken word pairs by backpropagation. Performance of the network closely resembled that of adult readers Lesions to model led to decreases in performance on irregular words, especially low frequency words  simulated performance in surface dyslexia Model learns about regularity and consistency

29 Plaut et al. (1996) Simulations
Predictions that match human data: Irregular slower than regular: RT( Pint ) > RT( Pond ) Frequency effect: RT( Cottage ) > RT( House ) Consistentency effects for nonwords: RT( MAVE ) > RT( NUST ) Model learns about regularity and consistency

30 Demo http://psych.rice.edu/mmtbn/ Chapter “language”
Section “word production II” End of page launches demo of Plaut et al. model From: Reading aloud from print is a heavily researched topic. The pronunciations of most English words adhere to standard spelling-sound correspondence (e.g., DIVE, MINT), whereas others deviate considerably from this regulation (e.g., GIVE, PINT). Nonetheless, skilled readers are able to process both types of words easily and correctly. What mechanism, then, might be involved when a reader faces these two different types of words? One general class of models to account for this phenomenon adopts what is called "dual-route architecture." The fundamental property of this model is that skilled readers have at their disposal two different procedures for converting print to speech. If the reader already knows the word, its pronunciation is retrieved by looking it up in an internal lexicon, where an entry with both the printing form and corresponding pronunciation of that word has been stored. If, however, the reader encounters letter strings that they have never seen before, such as novel words or pronunciable nonwords, a nonlexical route is taken. This requires the reader to use a system of rules specifying the relationship between letters and sounds in English. In summary, the central tenet of dual-route models is that different mechanisms, each in response to different types of input and operating according to fundamentally different principles, underlie the process that is known collectively as reading. Although dual-route models have gained popularity among psychologists, many researchers have adopted the alternative PDP approach to studying the same phenomenon. In contrast to dual-route models, this alternative approach assumes that there is a single, uniform procedure for computing a phonological representation from an orthographic representation, and this mechanism applies to a variaty of input, including exception words and nonwords as well as regular words. Within such a research paradigm, the microstructure of the cognitive processing takes the form of a PDP network system. Within the system, the units at the input level receive information about the orthography of the word, and the units at the output level generate its phonology. The input and output units are connected through intermediate, hidden units. The system learns by adjusting weights on connections between units in a way that is sensitive to the contigency between the statistical property of the environment and the behavior of the network. As a result, there is no sharp distinction of different types of input. Rather all words (with their respective orthographical and phonological forms) co-exist within a single system whose representations and processing reflect the relative degree of consistency in the mappings for different words. A number of PDP sytems have been developed to simulate the process of reading aloud from print (Seidenberg & McClelland, 1989; Plaut, McClelland, Seidenberg & Patterson, 1996), each with a varied degree of success in accounting for empirical data. To this day there is still controversy among psychologists as to whether the PDP approach is a viable alternative to dual-route models. However, it has also become evident that this approach does possess two important features that the dual-route model currently lacks: The model is computational, and it learns. By providing a rich set of general computational principles as well as specifying how such processes might be learned, the PDP approach offers a useful way of thinking about human performance in this particular domain. Plaut et al. (1996) reported a study in which they developed a PDP network structure to simulate English word reading in skilled readers. Their work was based on a careful and thorough linguistic analysis of a large set of monosyllabic words that has been entensively used in empirical reading experiments (Gluck, 1979). Specifically, these researchers condensed spelling-sound regularities among 2998 monosyllabic words and implemented these regularities directly into the PDP network. A monosyllable English word, by befinition, contains only a single vowel and may contain both an initial and a final consonant cluster. Because of the structure of the articulatory system, there are strong phonotactic constraints within the initial or final consonant cluster, so that in both cases a certain phoneme can occur only once, and the order of phonemes is considerably constrained. For example, if the phonemes /p/, /h/ and /r/ all appear in the initial consonant cluster, they can appear only in the order /phr/. As a result, if, at both the input and output level, three groups of units are designated for the initial consonant cluster, the vowel, and the final consonant cluster respectively, only a small amount of replication is needed to provide a unique representation of virtually every monosyllabic word in the training corpse. The PDP network that Plaut et al. developed consists of three layers of processing units. The input layer contains 105 orthographical units, each representing a grapheme. The output layer contains 61 phonological units, one for each phoneme. Phonotactic constraints are expressed by grouping phoneme units into mutually exclusive sets, and ordering these sets from left to right in accordance with the left-to-right ordering constraints imposed within consonant clusters. Between the two layers there is an intermediate layer of 100 hidden units. To best accomodate human subjects data, this model was implemented with various network specifications. In their first experiment, a simple feedforward network structure was adopted so that the network only maps from orthography to phonology. In a second simulation, in addition to the feedforward structure, each phoneme unit is also connected to each other phoneme unit as well as back to the hidden units. Plaut et al. reported that the simulations generated results that were very similar to those obtained in empirical studies. First, the network, once trained, was able to read nonwords in much the same way as human subjects do. Second, the simulations show the effects of frequency and consistency as are often found in reading experiments. Third, by selectively demaging the network, the model is able to generate results that resemble the reading performance of patients with various forms of dyslexia. 3. The Word Production Simulation This simulation is based on Plaut et al.'s (1996) PDP network structure. It consists of three layers of processing units. The input layer is made up of 105 units, each representing a distinctive orthographical unit (grapheme). The output layer has 61 units, each corresponding to a specific phonological representation (phoneme). A third, hidden layer, consisting of 100 processing units, mediates the input and output layer, so that all the input units feed into each hidden unit, and in turn feed into each of the output unit. Therefore, this is a simple feed-forward network structure, and maps only from orthography to phonology. The network architecture is represented in this slide. The specifications of the input (grapheme) and output (phoneme) layers can be viewed by clicking the "Network Structure" button in the simulation. The simulation is run in two stages. The first is a training stage, during which the network is exposed to a training set to learn the correspondence between words and their pronunciations. This process is repeated until the network is able to read these words on its own. There are three parameters associated with network training: Learning Rate, Momentum, and Number of Training Epochs. The first two parameters determine how the connection weights are modified during training. The Number of Training Epochs needed to train the network depends on the combination of these two parameters. In the simulation contained here, each parameter comes with a default that was empirically tested to be the optimal value. You are encouraged to try out other combinations of parameters to see how they affect the training. Because of the network configuration, training demands a tremendous amount of the computer's calculating resource and may take quite some time to complete. If you wish to bypass this stage you can take the shortcut. By clicking on the "Shortcut" button in the simulation, the weights resulting from a previous simulation will be automatically loaded into the network. The essence of training a network is to expose it to a large corpus of words and their respective pronunciations so that the connections among processing units in the network finally captures the statistical property of orthography-phonology correspondence in English word reading. The network learns by optimizing its weight pattern so as to best reflect the statistical information conveyed in the training set. Throughout the training session, connection weights among different layers of units can be viewed by clicking on the "Weight Graph (from input to hidden)" or "Weight Graph (from hidden to output)" button. The second phase of the simulation is a testing stage. During this stage, the network reads lists of words (or nonwords), its output is then compared to the correct pronunciations. At issue is whether the network, once trained, can read a large corpse of words and pronunciable nonwords as well as skilled readers. Seven testing sets are used in the simulation. The first is the training set itself. The network is considered fully trained only when it can correctly pronounce all the words in the original training set. The next four testing sets adapt from Taraban and McClelland (1987) experiment, and consist of high-frequency consistent words, low-frequency consistent words, high-frequency exception words, and low-frequency exception words, respectively. They are included to test whether the network can replicate the effect of frequency and consistency in naming latency. For this purpose, the average cross entropy, a measure of the difference between the network's generated pronunciation and a word's correct pronunciation, is used as an analogue of naming latency. You should explore whether there is an interaction between frequency and consistency in the network's output. The simulation also includes two lists of nonwords, adapted from an experiment by Glushko (1979), as testing sets. One list consists of consistent nonwords derived from regular words (e.g., BEED from BEEF), and the other list consists of inconsistent nonwords whose pronunciations were derived from exception words (e.g., BINT from PINT). Glushko reported that human subjects in his experiment had an accuracy rate of 93.8% in reading the consistent nonwords. In contrast, accuracy for reading the inconsistent nonwords was only 78.3%. In a previous simulation, we found that the network correctly pronounced 90% of the consistent nonwords, but only 78% of the inconsistent nonwords. You should compare the network output when these two different nonword lists are used. In addition to these seven test sets, the simulation also allows for single word (nonword) testing. You can provide an input to the trained network by typing in individual letter strings. The pronunciation generated by the network will be displayed below the input entry. One advantage of PDP network over other serial-order modeling of psychological phenomenon is that it allows for graceful degradation of performance. The current simulation demonstrates this by means of lesioning the network. When the network is lesioned (usually after being trained), a certain percentage of the hidden units are disconnected from the input and output units, hence a certain portion of information is lost, and the network's performance deteriorates. In the current simulation, once the percentage for the lesion is decided, the network randomly selects the hidden units to be removed. As a result, the network performance after lesioning always has a random characteristic. For example, lesioning the network with the same percentage does not yield identical results. However, generally speaking, the more the network is lesioned, the less accurate its performance. The current simulation comes with two sets of training words. The first set, consisting of 2998 words, is adopted from the Plaut et al's (1996) study. Under ideal training condition, it takes about 300 epochs to fully train the network to learn this set. For demonstration purposes, a second, condensed set was constructed to include only 200 words randomly chosen from the original set. With an optimal combination of training parameters, it takes much less time to train the network. As a result, the two nonword testing sets accompanying the condensed training corpse consist of only part of the corresponding originals. Because the condensed set is a much reduced training corpse, you should expect the network to perform somewhat differently from when it is trained on the original set. However, our previous simulation results showed that even with this limited training set, the network demonstrated the effect of frequency and consistency in naming latency as well as differentiated accuracy in pronouncing consistent and inconsistent nonwords.


Download ppt "Language Comprehension reading"

Similar presentations


Ads by Google