Presentation is loading. Please wait.

Presentation is loading. Please wait.

Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

Similar presentations


Presentation on theme: "Self-Organization of the Sound Inventories: An Explanation based on Complex Networks."— Presentation transcript:

1 Self-Organization of the Sound Inventories: An Explanation based on Complex Networks

2 Overview of the Talk Motivation Approach & Objective Principle of Occurrence in Consonant Inventories Principle of Co-Occurrence in Consonant Inventories Findings Conclusions and Future Work

3 Sabda Bramha: Sound is Eternity sabda-brahma su-durbodham pranendriya-mano-mayam ananta-param gambhiram durvigahyam samudra-vat –Sound is eternal and as well very difficult to comprehend. It manifests within the life air, the senses, and the mind. It is unlimited and unfathomable, just like the ocean.

4 Several living organisms can produce sound –They emit sound signals to communicate –These signals are mapped to certain symbols (meanings) in the brain –E.g., mating calls, danger alarms Signals and Symbols & § ۞ ☼ ♥

5 Human Communication Human beings also produce sound signals Unlike other organisms, they can concatenate these sounds to produce new messages – Language Language is one of the primary cause/effect of human intelligence

6 Human Speech Sounds Human speech sounds are called phonemes – the smallest unit of a language Phonemes are characterized by certain distinctive features like Mermelstein’s Model I.Place of articulation II.Manner of articulation III.Phonation

7 Types of Phonemes Vowels Consonants Diphthongs /ai/ L /a/ /i/ /u//p/ /t/ /k/

8 Choice of Phonemes How a language chooses a set of phonemes in order to build its sound inventory? Is the process arbitrary? Certainly Not! What are the forces affecting this choice?

9 Forces of Choice /a/ Speaker Listener / Learner /a/ Desires “ease of articulation” Desires “perceptual contrast” / “ease of learnability” A Linguistic System – How does it look? The forces shaping the choice are opposing – Hence there has to be a non-trivial solution

10 Vowels: A (Partially) Solved Mystery Languages choose vowels based on maximal perceptual contrast. For instance if a language has three vowels then in more than 95% of the cases they are /a/,/i/, and /u/. Maximally Distinct /u/ /a/ /i/

11 Consonants: A puzzle Research: From 1929 – Date No single satisfactory explanation of the organization of the consonant inventories –The set of features that characterize consonants is much larger than that of vowels –No single force is sufficient to explain this organization –Rather a complex interplay of forces goes on in shaping these inventories J i g s a w

12 The Approach & Objective We adopt a Complex Network Approach to attack the problem of consonant inventories We try to figure out the principle of the distribution of the occurrence of consonants over languages We also attempt to figure out the co-occurrence patterns (if any) that are found across the consonant inventories

13 Principle of Occurrence PlaNet – The “Phoneme-Language Network” –A bipartite network N=(V L,V C,E) –V L : Nodes representing languages of the world –V C : Nodes representing consonants –E : Set of edges which run between V L and V C There is an edge e Є E between two nodes v l Є V L and v c Є V C if the consonant c occurs in the language l. L1L1 L4L4 L2L2 L3L3 /m/ /ŋ//ŋ/ /p/ /d/ /s/ /θ//θ/ Consonants Languages The Structure of PlaNet

14 Construction of PlaNet Data Source : UCLA Phonological Inventory Database (UPSID) Number of nodes in V L is 317 Number of nodes in V C is 541 Number of edges in E is 7022

15 Degree Distribution Degree of a node is defined as the number of edges connected to the node. Degree Distribution (DD) is the fraction of nodes, p k, having degree equal to k. The Cumulative Degree Distribution (CDD) is the fraction of nodes, P k, having degree  k.

16 Degree Distribution of PlaNet 0 50100 150 0.02 0.04 0.06 0.08 Language inventory size (degree k) pkpk p k = beta(k) with α = 7.06, and β = 47.64 p k = Γ(54.7) k 6.06 (1-k) 46.64 Γ(7.06) Γ(47.64) k min = 5, k max = 173, k avg = 21 200 PkPk 1000 Degree of a consonant, k P k = k -0.71 Exponential Cut-off 1 10 100 0.001 0.01 0.1 1 DD of the language nodes follows a β- distribution DD of the consonant nodes follows a power-law with an exponential cut-off Distribution of Consonants over Languages follow a power-law

17 Preferential Attachment: The Key to Power Law Power law distributions observed in –Social Networks –Biological Networks –Internet Graphs –Citation Networks These distributions emerge due to preferential attachment $ $ $ $ $ $ $ $ $ $ RICH RICHER

18 Synthesis of PlaNet Given: V L = {L 1, L 2,..., L 317 } sorted in the ascending order of their degrees and 541 unlabeled nodes in V C. Step 0: All nodes in V C have degree 0. Step t+1: Choose a language node L j (in order) with cardinality k j (inventory size) for c running from 1 to k j do Pr(C i ) = d i α + ε ∑ x  V* (d x α + ε) Connect L j preferentially with a consonant node C i  V C, to which it is already not connected, with a probability where, d i = degree of node C i at step t and V* = subset of V C not connected to L j at t and ε is the smoothing parameter.

19 L1L1 L3L3 L2L2 L4L4 L1L1 L3L3 L2L2 L4L4 The Preferential Mechanism of Synthesis After step 3 After step 4

20 Simulation Result The parameters α and ε are 1.44 and 0.5 respectively. The results are averaged over 100 runs PlaNet rand PlaNet PlaNet syn 1 10 100 1000 1.1.01.001 Degree (k) PkPk

21 Principle of Co-occurrence Consonants tend to co-occur in groups or communities These groups tend to be organized around a few distinctive features (based on: manner of articulation, place of articulation & phonation) – Principle of feature economy If a language has in its inventory then it will also tend to have voiced voiceless bilabial dental /b//p/ /d//t/ plosive

22 How to Capture these Co-occurrences? PhoNet – “Phoneme Phoneme Network” –A weighted network N=(V C,E) –V C : Nodes representing consonants –E : Set of edges which run between the nodes in V C There is an edge e Є E between two nodes v c1,v c2 Є V C if the consonant c 1 and c 2 co-occur in a language. The number of languages in which c 1 and c 2 co-occurs defines the edge-weight of e. The number of languages in which c 1 occurs defines the node-weight of v c1. /k w / /k′/ /k/ /d′/ 42 14 38 13 283 17 50 39

23 Construction of PhoNet Data Source : UPSID Number of nodes in V C is 541 Number of edges is 34012 PhoNet

24 Community Structures in PhoNet Radicchi et al. algorithm (for unweighted networks) – Counts number of triangles that an edge is a part of. Inter-community edges will have low count so remove them. Modification for a weighted network like PhoNet –Look for triangles, where the weights on the edges are comparable. –If they are comparable, then the group of consonants co-occur highly else it is not so. –Measure strength S for each edge (u,v) in PhoNet where S is, –Remove edges with S less than a threshold η S = w uv √Σ i Є V c -{u,v} (w ui – w vi ) 2 if √Σ i Є V c -{u,v} (w ui – w vi ) 2 >0 else S = ∞

25 3 1 2 4 100 110 101 10 5 6 46 52 45 3 1 2 4 11.11 10.94 7.14 0.06 5 6 3.77 5.17 7.5 S η >1 3 1 2 6 4 5 Community Formation For different values of η we get different sets of communities

26 Consonant Societies! η=1.25 η=0.72 η=0.60 η=0.35

27 Evaluation of the Communities: Occurrence Ratio Hypothesis: The communities obtained from the algorithm should be found frequently in UPSID We define occurrence ratio to capture the “intensity” of occurrence, –N is the number of consonants in C (ranked by the ascending order of frequency of occurrence), M is the number of consonants of C that occur in a language L and R top is the rank of the highest ranking consonant in L that is also present in C –If a high-frequency consonant is present in L it is not necessary that the low-frequency one should be present; but if a lower one is already present then it is expected that the higher one must be present O L = M N – (R top – 1)

28 Computing Occurrence Ratio: An Example X /k h / /k/ /k w / /k h / X /k w / /k h / /k/ /k h / /k w / C L1L1 L2L2 L3L3 R =1 R =2 R =3 M=3, N=3, R top =1 O L =3/3=1 M=2, N=3, R top =2 O L =2/2=1 M=2, N=3, R top =1 O L =2/3=0.66

29 Average Occurrence Ratio For a given community it will have an occurrence ratio in each language L in UPSID We average this ratio over all L as, where L occur is the number of languages where at least one of the members of C has occurred O av = L occur Σ L Є UPSID O L

30 Results of the Evaluation Consonants show patterns of co-occurrence in 80% or more of the world’s languages η > 0.3 O av > 0.8

31 The Binding Force of the Communities: Feature Economy Feature Entropy: The idea is borrowed from information theory For a community C of size N, let there be p f consonants for which a particular feature f is present and q f other consonants for which f is absent – probability that a consonant chosen from C has f is p f /N and that it does have f is q f /N or (1- p f /N) Feature entropy can be therefore defined as where F is the set of all features present in the consonants in C Essentially the number of bits needed to transmit the entire information about C through a channel. Σ FЄf (-(p f /N)log(p f /N) – (q f /N)log(q f /N)) F E =

32 Computing Feature Entropy Lower F E -> C 1 economizes on the number of features Higher F E -> C 2 does not economize on the number of features

33 If the Inventories had Evolved by Chance! Construction of PhoNet rand –For each consonant c let the frequency of occurrence in UPSID be denoted by fc. –Let there be 317 bins each corresponding to a language in UPSID. –fc bins are then chosen uniformly at random and the consonant c is packed into these bins without repetition. –Thus the consonant inventories of the 317 languages corresponding to the bins are generated. –PhoNet rand can be constructed from these new consonant inventories similarly as PhoNet. Cluster PhoNet rand by the method proposed earlier

34 PhoNet PhoNet rand 0 5 10 15 20 10 5 0 Average Feature Entropy Community Size The curve shows the average feature entropy of the communities of a particular size versus the community size Comparison between PhoNet and PhoNet rand

35 Our Findings The distribution of the occurrence of consonants over languages follow a power-law behavior; A preferential attachment-based model can reproduce this distribution of occurrence to a very close approximation (mean error ~0.01); The patterns of co-occurrence of the consonants, reflected through communities in PhoNet, are observed in 80% or more of the world's languages; Such patterns of co-occurrence would not have emerged if the consonant inventories had evolved just by chance;

36 The Epilogue How to explain preferential attachment? –Perhaps it is due to the linguistic heterogeneity involved in the process of language change (at the microscopic level) –Consonants belonging to languages that are prevalent among the speakers in one generation have a higher (and higher) chance of getting transmitted to the speakers of the subsequent generations –The above heterogeneity manifests as preferential attachment in the mesoscopic level What is the cause of the origin of feature economy? –Perhaps it is the outcome of the interplay of the functional forces such as the perceptual contrast and ease of learnability that is reflected as feature economy Indo-European family of languages

37 Danke!


Download ppt "Self-Organization of the Sound Inventories: An Explanation based on Complex Networks."

Similar presentations


Ads by Google