Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

Slides:



Advertisements
Similar presentations
Mobile Communication Networks Vahid Mirjalili Department of Mechanical Engineering Department of Biochemistry & Molecular Biology.
Advertisements

Design of the fast-pick area Based on Bartholdi & Hackman, Chpt. 7.
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Analysis and Modeling of Social Networks Foudalis Ilias.
Delay and Throughput in Random Access Wireless Mesh Networks Nabhendra Bisnik, Alhussein Abouzeid ECSE Department Rensselaer Polytechnic Institute (RPI)
VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Synopsis of “Emergence of Scaling in Random Networks”* *Albert-Laszlo Barabasi and Reka Albert, Science, Vol 286, 15 October 1999 Presentation for ENGS.
Practical Applications of Complex Network Theory Niloy Ganguly (IIT Kharagpur)
4. PREFERENTIAL ATTACHMENT The rich gets richer. Empirical evidences Many large networks are scale free The degree distribution has a power-law behavior.
Directional triadic closure and edge deletion mechanism induce asymmetry in directed edge properties.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Evaluating Hypotheses
Advanced Topics in Data Mining Special focus: Social Networks.
ROC Curves.
Computability and Complexity 24-1 Computability and Complexity Andrei Bulatov Approximation.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Maximal Independent Set Distributed Algorithms for Multi-Agent Networks Instructor: K. Sinan YILDIRIM.
Computer Science 1 Web as a graph Anna Karpovsky.
Explanation for Language Universals Marta i Aleksandra.
Attack-Resistant Networks Allen G. Taylor Communication networks have four primary objectives: Minimize.
Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages Animesh Mukherjee, Monojit Choudhury, Anupam Basu and Niloy.
Modeling Cross-linguistic Relationships Across Consonant Inventories: A Complex Network Approach.
Entropy in Machine Transliteration & Phonology Bhargava Reddy B.Tech Project.
Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.
Optimization Based Modeling of Social Network Yong-Yeol Ahn, Hawoong Jeong.
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
COGNITIVE RADIO FOR NEXT-GENERATION WIRELESS NETWORKS: AN APPROACH TO OPPORTUNISTIC CHANNEL SELECTION IN IEEE BASED WIRELESS MESH Dusit Niyato,
Independent + Relational Analyses Systemic Phonological Analysis of Child Speech (SPACS)
Biological Networks Lectures 6-7 : February 02, 2010 Graph Algorithms Review Global Network Properties Local Network Properties 1.
Author: M.E.J. Newman Presenter: Guoliang Liu Date:5/4/2012.
Computing and Communicating Functions over Sensor Networks A.Giridhar and P. R. Kumar Presented by Srikanth Hariharan.
Organization of Railway Networks Animesh Mukherjee Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur Collaborators:
Finding dense components in weighted graphs Paul Horn
1 11 Subcarrier Allocation and Bit Loading Algorithms for OFDMA-Based Wireless Networks Gautam Kulkarni, Sachin Adlakha, Mani Srivastava UCLA IEEE Transactions.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Models and Algorithms for Complex Networks Power laws and generative processes.
Theory of αBiNs: Alphabetic Bipartite Networks Animesh Mukherjee Dept. of Computer Science and Engineering Indian Institute of Technology, Kharagpur Collaborators:
Texture. Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
Chih-Ming Chen, Student Member, IEEE, Ying-ping Chen, Member, IEEE, Tzu-Ching Shen, and John K. Zao, Senior Member, IEEE Evolutionary Computation (CEC),
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Emergence of Scaling and Assortative Mixing by Altruism Li Ping The Hong Kong PolyU
Class 10: Introduction to CINET Using CINET for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski.
Robustness of complex networks with the local protection strategy against cascading failures Jianwei Wang Adviser: Frank,Yeong-Sung Lin Present by Wayne.
Digital Image Processing Lecture 10: Image Restoration March 28, 2005 Prof. Charlene Tsai.
Networks Igor Segota Statistical physics presentation.
1 11 Channel Assignment for Maximum Throughput in Multi-Channel Access Point Networks Xiang Luo, Raj Iyengar and Koushik Kar Rensselaer Polytechnic Institute.
Random Graph Generator University of CS 8910 – Final Research Project Presentation Professor: Dr. Zhu Presented: December 8, 2010 By: Hanh Tran.
Digital Image Processing Lecture 10: Image Restoration
Mitigation strategies on scale-free networks against cascading failures Jianwei Wang Adviser: Frank,Yeong-Sung Lin Present by Chris Chang.
Chapter II phonology II. Classification of English speech sounds Vowels and Consonants The basic difference between these two classes is that in the production.
Community Detection Algorithms: A Comparative Analysis Authors: A. Lancichinetti and S. Fortunato Presented by: Ravi Tiwari.
Du, Faloutsos, Wang, Akoglu Large Human Communication Networks Patterns and a Utility-Driven Generator Nan Du 1,2, Christos Faloutsos 2, Bai Wang 1, Leman.
Language and Social Class
Spectrum Sensing In Cognitive Radio Networks
1 CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014 Network models Tamer Kahveci.
A Simulation-Based Study of Overlay Routing Performance CS 268 Course Project Andrey Ermolinskiy, Hovig Bayandorian, Daniel Chen.
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
Chance Constrained Robust Energy Efficiency in Cognitive Radio Networks with Channel Uncertainty Yongjun Xu and Xiaohui Zhao College of Communication Engineering,
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Hierarchical Organization in Complex Networks by Ravasz and Barabasi İlhan Kaya Boğaziçi University.
Cmpe 588- Modeling of Internet Emergence of Scale-Free Network with Chaotic Units Pulin Gong, Cees van Leeuwen by Oya Ünlü Instructor: Haluk Bingöl.
Chap 14 Perceptual and linguistic phonetics
Volume 25, Issue 1, Pages 1-9 (January 2015)
Presentation transcript:

Self-Organization of the Sound Inventories: An Explanation based on Complex Networks

Overview of the Talk Motivation Approach & Objective Principle of Occurrence in Consonant Inventories Principle of Co-Occurrence in Consonant Inventories Findings Conclusions and Future Work

Sabda Bramha: Sound is Eternity sabda-brahma su-durbodham pranendriya-mano-mayam ananta-param gambhiram durvigahyam samudra-vat –Sound is eternal and as well very difficult to comprehend. It manifests within the life air, the senses, and the mind. It is unlimited and unfathomable, just like the ocean.

Several living organisms can produce sound –They emit sound signals to communicate –These signals are mapped to certain symbols (meanings) in the brain –E.g., mating calls, danger alarms Signals and Symbols & § ۞ ☼ ♥

Human Communication Human beings also produce sound signals Unlike other organisms, they can concatenate these sounds to produce new messages – Language Language is one of the primary cause/effect of human intelligence

Human Speech Sounds Human speech sounds are called phonemes – the smallest unit of a language Phonemes are characterized by certain distinctive features like Mermelstein’s Model I.Place of articulation II.Manner of articulation III.Phonation

Types of Phonemes Vowels Consonants Diphthongs /ai/ L /a/ /i/ /u//p/ /t/ /k/

Choice of Phonemes How a language chooses a set of phonemes in order to build its sound inventory? Is the process arbitrary? Certainly Not! What are the forces affecting this choice?

Forces of Choice /a/ Speaker Listener / Learner /a/ Desires “ease of articulation” Desires “perceptual contrast” / “ease of learnability” A Linguistic System – How does it look? The forces shaping the choice are opposing – Hence there has to be a non-trivial solution

Vowels: A (Partially) Solved Mystery Languages choose vowels based on maximal perceptual contrast. For instance if a language has three vowels then in more than 95% of the cases they are /a/,/i/, and /u/. Maximally Distinct /u/ /a/ /i/

Consonants: A puzzle Research: From 1929 – Date No single satisfactory explanation of the organization of the consonant inventories –The set of features that characterize consonants is much larger than that of vowels –No single force is sufficient to explain this organization –Rather a complex interplay of forces goes on in shaping these inventories J i g s a w

The Approach & Objective We adopt a Complex Network Approach to attack the problem of consonant inventories We try to figure out the principle of the distribution of the occurrence of consonants over languages We also attempt to figure out the co-occurrence patterns (if any) that are found across the consonant inventories

Principle of Occurrence PlaNet – The “Phoneme-Language Network” –A bipartite network N=(V L,V C,E) –V L : Nodes representing languages of the world –V C : Nodes representing consonants –E : Set of edges which run between V L and V C There is an edge e Є E between two nodes v l Є V L and v c Є V C if the consonant c occurs in the language l. L1L1 L4L4 L2L2 L3L3 /m/ /ŋ//ŋ/ /p/ /d/ /s/ /θ//θ/ Consonants Languages The Structure of PlaNet

Construction of PlaNet Data Source : UCLA Phonological Inventory Database (UPSID) Number of nodes in V L is 317 Number of nodes in V C is 541 Number of edges in E is 7022

Degree Distribution Degree of a node is defined as the number of edges connected to the node. Degree Distribution (DD) is the fraction of nodes, p k, having degree equal to k. The Cumulative Degree Distribution (CDD) is the fraction of nodes, P k, having degree  k.

Degree Distribution of PlaNet Language inventory size (degree k) pkpk p k = beta(k) with α = 7.06, and β = p k = Γ(54.7) k 6.06 (1-k) Γ(7.06) Γ(47.64) k min = 5, k max = 173, k avg = PkPk 1000 Degree of a consonant, k P k = k Exponential Cut-off DD of the language nodes follows a β- distribution DD of the consonant nodes follows a power-law with an exponential cut-off Distribution of Consonants over Languages follow a power-law

Preferential Attachment: The Key to Power Law Power law distributions observed in –Social Networks –Biological Networks –Internet Graphs –Citation Networks These distributions emerge due to preferential attachment $ $ $ $ $ $ $ $ $ $ RICH RICHER

Synthesis of PlaNet Given: V L = {L 1, L 2,..., L 317 } sorted in the ascending order of their degrees and 541 unlabeled nodes in V C. Step 0: All nodes in V C have degree 0. Step t+1: Choose a language node L j (in order) with cardinality k j (inventory size) for c running from 1 to k j do Pr(C i ) = d i α + ε ∑ x  V* (d x α + ε) Connect L j preferentially with a consonant node C i  V C, to which it is already not connected, with a probability where, d i = degree of node C i at step t and V* = subset of V C not connected to L j at t and ε is the smoothing parameter.

L1L1 L3L3 L2L2 L4L4 L1L1 L3L3 L2L2 L4L4 The Preferential Mechanism of Synthesis After step 3 After step 4

Simulation Result The parameters α and ε are 1.44 and 0.5 respectively. The results are averaged over 100 runs PlaNet rand PlaNet PlaNet syn Degree (k) PkPk

Principle of Co-occurrence Consonants tend to co-occur in groups or communities These groups tend to be organized around a few distinctive features (based on: manner of articulation, place of articulation & phonation) – Principle of feature economy If a language has in its inventory then it will also tend to have voiced voiceless bilabial dental /b//p/ /d//t/ plosive

How to Capture these Co-occurrences? PhoNet – “Phoneme Phoneme Network” –A weighted network N=(V C,E) –V C : Nodes representing consonants –E : Set of edges which run between the nodes in V C There is an edge e Є E between two nodes v c1,v c2 Є V C if the consonant c 1 and c 2 co-occur in a language. The number of languages in which c 1 and c 2 co-occurs defines the edge-weight of e. The number of languages in which c 1 occurs defines the node-weight of v c1. /k w / /k′/ /k/ /d′/

Construction of PhoNet Data Source : UPSID Number of nodes in V C is 541 Number of edges is PhoNet

Community Structures in PhoNet Radicchi et al. algorithm (for unweighted networks) – Counts number of triangles that an edge is a part of. Inter-community edges will have low count so remove them. Modification for a weighted network like PhoNet –Look for triangles, where the weights on the edges are comparable. –If they are comparable, then the group of consonants co-occur highly else it is not so. –Measure strength S for each edge (u,v) in PhoNet where S is, –Remove edges with S less than a threshold η S = w uv √Σ i Є V c -{u,v} (w ui – w vi ) 2 if √Σ i Є V c -{u,v} (w ui – w vi ) 2 >0 else S = ∞

S η > Community Formation For different values of η we get different sets of communities

Consonant Societies! η=1.25 η=0.72 η=0.60 η=0.35

Evaluation of the Communities: Occurrence Ratio Hypothesis: The communities obtained from the algorithm should be found frequently in UPSID We define occurrence ratio to capture the “intensity” of occurrence, –N is the number of consonants in C (ranked by the ascending order of frequency of occurrence), M is the number of consonants of C that occur in a language L and R top is the rank of the highest ranking consonant in L that is also present in C –If a high-frequency consonant is present in L it is not necessary that the low-frequency one should be present; but if a lower one is already present then it is expected that the higher one must be present O L = M N – (R top – 1)

Computing Occurrence Ratio: An Example X /k h / /k/ /k w / /k h / X /k w / /k h / /k/ /k h / /k w / C L1L1 L2L2 L3L3 R =1 R =2 R =3 M=3, N=3, R top =1 O L =3/3=1 M=2, N=3, R top =2 O L =2/2=1 M=2, N=3, R top =1 O L =2/3=0.66

Average Occurrence Ratio For a given community it will have an occurrence ratio in each language L in UPSID We average this ratio over all L as, where L occur is the number of languages where at least one of the members of C has occurred O av = L occur Σ L Є UPSID O L

Results of the Evaluation Consonants show patterns of co-occurrence in 80% or more of the world’s languages η > 0.3 O av > 0.8

The Binding Force of the Communities: Feature Economy Feature Entropy: The idea is borrowed from information theory For a community C of size N, let there be p f consonants for which a particular feature f is present and q f other consonants for which f is absent – probability that a consonant chosen from C has f is p f /N and that it does have f is q f /N or (1- p f /N) Feature entropy can be therefore defined as where F is the set of all features present in the consonants in C Essentially the number of bits needed to transmit the entire information about C through a channel. Σ FЄf (-(p f /N)log(p f /N) – (q f /N)log(q f /N)) F E =

Computing Feature Entropy Lower F E -> C 1 economizes on the number of features Higher F E -> C 2 does not economize on the number of features

If the Inventories had Evolved by Chance! Construction of PhoNet rand –For each consonant c let the frequency of occurrence in UPSID be denoted by fc. –Let there be 317 bins each corresponding to a language in UPSID. –fc bins are then chosen uniformly at random and the consonant c is packed into these bins without repetition. –Thus the consonant inventories of the 317 languages corresponding to the bins are generated. –PhoNet rand can be constructed from these new consonant inventories similarly as PhoNet. Cluster PhoNet rand by the method proposed earlier

PhoNet PhoNet rand Average Feature Entropy Community Size The curve shows the average feature entropy of the communities of a particular size versus the community size Comparison between PhoNet and PhoNet rand

Our Findings The distribution of the occurrence of consonants over languages follow a power-law behavior; A preferential attachment-based model can reproduce this distribution of occurrence to a very close approximation (mean error ~0.01); The patterns of co-occurrence of the consonants, reflected through communities in PhoNet, are observed in 80% or more of the world's languages; Such patterns of co-occurrence would not have emerged if the consonant inventories had evolved just by chance;

The Epilogue How to explain preferential attachment? –Perhaps it is due to the linguistic heterogeneity involved in the process of language change (at the microscopic level) –Consonants belonging to languages that are prevalent among the speakers in one generation have a higher (and higher) chance of getting transmitted to the speakers of the subsequent generations –The above heterogeneity manifests as preferential attachment in the mesoscopic level What is the cause of the origin of feature economy? –Perhaps it is the outcome of the interplay of the functional forces such as the perceptual contrast and ease of learnability that is reflected as feature economy Indo-European family of languages

Danke!