Download presentation
Presentation is loading. Please wait.
Published byStuart Ross Modified over 9 years ago
1
Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages Animesh Mukherjee, Monojit Choudhury, Anupam Basu and Niloy Ganguly Department of Computer Science & Engg. Indian Institute of Technology, Kharagpur
2
Redundancy in Natural Systems Reduce the risk of information loss – fault tolerance Examples of redundancy: Biological systems – Codons, genes, proteins etc. Linguistic systems – Synonymous words Human Brain – Perhaps the biggest example of neuronal redundancy
3
Redundancy in Sound Systems Like any other natural system, human speech sound systems are expected to show redundancy in the information they encode In this work we attempt to Mathematically formulate this redundancy, and, Unravel the interesting patterns (if any) that results from this formulation
4
Feature Economy: An age-old Principle Sounds, especially consonants, tend to occur in pairs that are highly correlated in terms of their features Languages tend to maximize combinatorial possibilities of a few features to produce many consonants If a language has in its inventory then it will also tend to have voiced voiceless bilabial dental /b//p/ /d/ /t/ plosive
5
Mathematical Formulation We use the concepts of information theory to quantify feature economy (assuming features are Boolean) The basic idea is to compute the number of bits req- uired to pass the information of an inventory of size N over a transmission channel Ideal Scenario Noiseless Channel Inventory of Size NInfo. Undistorted log 2 N bits are required for lossless transmission
6
Mathematical Formulation We use the concepts of information theory to quantify feature economy (assuming features are Boolean) The basic idea is to compute the number of bits req- uired to pass the information of an inventory of size N over a transmission channel General Scenario Noisy Channel Inventory of Size NInfo. Distorted > log 2 N bits are required for lossless transmission
7
Feature Entropy The actual number of bits required can be estimated by calculating the binary entropy as follows p f – number of consonants in the inventory in which feature f is present q f – number of consonants in the inventory in which feature f is absent The probability that a consonant chosen at random form the inventory has f is and that is does not have f is (=1- ) pfpf N qfqf N pfpf N
8
Feature Entropy If F denote the set of all features, F E = –∑ fєF log 2 + log 2 Redundancy Ratio (RR) RR = The excess number of bits required to represent the inventory pfpf N pfpf N qfqf N qfqf N FEFE log 2 N
9
Example
10
Experimentation Data Source UCLA Phonological Inventory Database Samples data uniformly from almost all linguistic families Hosts phonological systems of 317 languages Number of Consonants: 541 Number of Vowels: 151
11
RR: Consonant Inventories The slope of the line fit is -0.0178 RR is almost invariant with respect to the inventory size The result means that consonant inventories are organized to have similar redundancy irrespective of their size important because no such explanation yet Inventory Size Redundancy Ratio
12
The Invariance is not “by chance” The invariance in the distribution of RRs for consonant inventories did not emerge by chance Can be validated by a standard test of hypothesis Null Hypothesis: The invariance in the distribution of RRs observed across the real consonant inventories is also prevalent across the randomly generated inventories.
13
Generation of Random Inventories Model I – Purely random model The distribution of the consonant inventory size is assumed to be known a priori Conceive of 317 bins corresponding to the languages in UPSID Pick a bin and fill it by randomly choosing consonants (without repetition) from the pool of 541 available consonants Repeat the above step until all the bins are packed /p/ /b/ /d/ /k/ 4 /p/ /g/ /d/ /t/ 6 /n/ /m/ /d//t//n//b//p//k//m/ ……………… …………………………………………….. Bin 1Bin 2Bin 317 2 /p//n/ Pool of phonemes Fill randomly
14
Model II – Random model based on Occurrence Frequency For each consonant c let the frequency of occurrence in UPSID be denoted by f c. Let there be 317 bins each corresponding to a language in UPSID. f c bins are then chosen uniformly at random and the consonant c is packed into these bins without repetition. Generation of Random Inventories /p/ /b/ /d/ /k/ /p/ /g/ /d/ /t/ /n/ /m/ …………………………………………….. Bin 1Bin 2Bin 317 /p//n/ /t/ (25)/n/ (12)/p/ (100) ……………………. Pool of phonemes /t/ Choose 25 bins randomly and fill with /t/
15
Results Model I – t-test indicates that the null hypothesis can be rejected with (100 - 9.29e-15)% confidence Model II – Once again in this case t- test shows that the null hypothesis can be rejected with (100–2.55e–3)% confidence Occurrence frequency governs the organization of the consonant inventories at least to some extent Inventory Size Average Redundancy Ratio Model I Model II Real
16
The Case of Vowel Inventories The slope of the line fit is -0.125 For small inventories RR is not invariant while for Larger ones (size > 12) it is so Smaller inventories perceptual contrast and Larger inventories feature economy t-test shows that we can be 99.93% confident that the two inventories are different in terms of RR Inventory Size Redundancy Ratio Vowels Consonants
17
Error Correcting Capability For most of the consonant inventories the average hamming distance between two consonants is 4 1 bit error correcting capability Vowel inventories do not indicate any such fixed error correcting capability Consonants Vowels Inventory Size Average Hamming Distance
18
Conclusions Redundancy ratio is almost an invariant property of the consonant inventories with respect to the inventory size, This invariance is a direct consequence of the fixed error correcting capabilities of the consonant inventories, Unlike the consonant inventories, the vowel inventories are not indicative (at least not all of them) of such an invariance.
19
Discussions Cause of the origins of redundancy in a linguistic system Fault tolerance: Redundancy acts as a failsafe mechanism against random distortion Evolutionary Cause: Redundancy allows a speaker to successfully communicate with speakers of neighboring dialects – “Linguistic junk” as pointed out by Lass (Lass, 1997)
20
Děkuji
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.