Korea Maritime and Ocean University NLP Jung Tae LEE
` 1. Window Size Two reason of choose the window with seven letter First, significant amount of the information needed to correctly pronounce a letters is contributed by the nearby letters. Secondly, limited by computational resources to exploring small networks => the limited size of the window also meant that some important nonlocal information about pronunciation and stress could not be properly taken into account by our model.
` Mutual information provided by neighboring letters and the correct pronunciation of the center letter as a function of distance from the center letter.
` 2. Changes in the network Changed network performence Dictionary : common English word layer repeat 11input groups & 80hidden units 7input groups & 80hidden units 25 passes7% higher > 55 passes97.5%95% The number of input groups was varied from seven to eleven.
Changed network performence Dictionary : common English word Adding an extra layer of hidden units also improved the performace. layer repeat Two layers of 80hidden units 7input groups & 120hidden units 55 passes97% 1passes87%85% Network with two layers of hidden units was better at generalization but about the same in absolute performance.
` 3. Analysis of the Hidden Units Graphical representation of activation of the hidden units Levels of activation in the layer of hidden units for a variety of words Phoneme, /E/ was produced by output. The input string is shown at the left with center letter emphasized. The area of the white square is proportional to the activity level. Chief_ speak_ negro nity_ least believe equa arty_ see_ appy_ each nily_ only_
` Hierarchical clustering of hidden units for letter to sound correspondences.
` A hierarchical clustering technique was used to arrange the letter-to- sound vectors in groups based on a Euclidean metric in the 80- demensional space of hidden units. Hierarchical clustering of hidden units for letter to sound correspondences. Shown figure, was striking : - the most important distinction was the complete separation of consonants and vowels. For the vowels : - the next most important variable was the letter. For the consonants : - clustered according to a mixed strategy that was based more on the similarity of their sounds.
` The same clustering procedure was repeated for three networks starting from different random starting states. Hierarchical clustering of hidden units for letter to sound correspondences. - The patterns of weights were completely different. - But, the clustering analysis revealed the same hierarchies. With some differences in the details, for all three networks.
` 4. Conclusions NETtalk is and illustration in miniature of many aspects of learning. 1. Network start out without ”innate” knowledge in the form of input and output => network could have been traind on any language with the same set of letters and phonemes. 2. Network acquired its competence through practice, went through several distinct stages, and reached a significant level of performance 3. Network is distribute the information without single unit or link 4. The network was fault tolerant and degraded gracefully with increasing damage. => but, network recovered from damage much more quickly than it took to learn initially
Conclusions NETtalk is too simple to serve as a good model for the acquisition of reading skills in humans - ex) when children learn to talk, after reprsentation for word and their meaning, they learn to read. This approach would have to be generalized to account for prosodic features in continuous text. Human level of performance would require the integration of information form several words at once
Korea Maritime and Ocean University NLP Jung Tae LEE