Theory of αBiNs: Alphabetic Bipartite Networks Animesh Mukherjee Dept. of Computer Science and Engineering Indian Institute of Technology, Kharagpur Collaborators:

Theory of αBiNs: Alphabetic Bipartite Networks Animesh Mukherjee Dept. of Computer Science and Engineering Indian Institute of Technology, Kharagpur Collaborators: Monojit Choudhury, Microsoft Research India, Bangalore Niloy Ganguly, Abyayananda Maiti, Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur Fernando Peruani, Service de Physique de l'Etat Condense & Complex System Institute Paris - Ile-de-France, Paris, France Lutz Brusch and Andreas Deutsch, Centre for Information Services and High Performance Computing, Technical University of Dresden, Germany

Discrete Combinatorial System (DCS) ‏ A DCS is a system where the basic building blocks are a finite set of elementary units and the system is a collection of potentially infinite number of discrete combinations of these units Examples include two of the greatest wonders on earth – life and language Life  Elementary units are the nucleotides or codons while their discrete combinations give rise to the different genes Language  Elementary units are the letters or words and the discrete combinations are the sentences formed from them.

αBiNs to Model a DCS αBiNs  A special class of complex networks o Bipartite in nature o One partition contains nodes corresponding to the basic units (or alphabets) while the other contains nodes that represent the discrete combinations of the basic units o An edge represents that a particular basic unit is a part of a discrete combination

Example: Phoneme-language Network (PlaNet) ‏ Basic Unit  Phonemes that human beings can articulate Discrete Combination  Phoneme inventory of a language, i.e., the repertoire of phonemes that the speakers of the language use for communication l1l1 l2l2 l3l3 l4l4 /s/ /p/ /k/ /d/ /t/ /n/ PlaNet - Phoneme-Language Network

Topological Properties of PLaNet Degree distribution of language nodes Degree distribution of phoneme nodes 0 50100 150 0.02 0.04 0.06 0.08 Language inventory size (degree k) ‏ pkpk p k = beta(k) with α = 7.06, and β = 47.64 p k = Γ(54.7) k 6.06 (1-k) 46.64 Γ(7.06) Γ(47.64) ‏ k min = 5, k max = 173, k avg = 21 200 1000 Degree of a consonant, k P k = k -0.71 Exponential Cut-off 1 10 100 0.001 0.01 0.1 1 Networks constructed from the data available at UCLA Phonological Segment Inventory Database (UPSID)  hosts 317 inventories with 541 different consonants found across them

Network Synthesis Can we simulate a stochastic network growth model which has similar DD? Clue: Preferential attachment leads to power-law degree distributions in both unipartite and unbounded bipartite networks

Evolution of PlaNet Rules of the game: A new language is born Chooses from the set of existing phonemes preferentially based on the degree k +   (k +  )‏ all phonemes Phonemes Languages

Wow! We are quite close ACL 2006

Theoretical Investigation: The Three Sides of the Coin Sequential Attachment o Only one edge per incoming node o Exclusive set-membership: Language – {speaker, webpage}, country – citizen Parallel Attachment With Replacement o All incoming nodes has  > 1 edges o Sequences: letter-word, word-document Parallel Attachment Without Replacement o Sets: phoneme-languages, station-train

Sequential Attachment Markov Chain Formulation t – #nodes in growing partition N – #nodes in fixed partition p k,t – p k after adding t nodes *One edge added per node EPL, 2007 Notations

The Hard part Average degree of the fixed partition diverges Methods based on steady-state and continuous time assumptions fail Closed-form Solution EPL, 2007

A tunable distribution k (degree) p k (probability that randomly chosen node has degree k )  =  = 2  = 1  = 4e-4 1<  <   < (N/  -1) -1 EPL, 2007

Parallel attachment with replacement Either use approximation: p k,t ~ B(k/t; ε, Nε/μ – ε) where  (> 1) is the number of incoming edges An exact Markov Chain: Could not solve for exact solution  But have some closer approximations To be Submitted to PRE

Parallel Attachment with replacement results  = 1  = 0.0625  =40, N = 100 Red broken line  Approximation Blue symbols  Stochastic Simulation Black line  Numerical integration of the Markov chain For very low  the approximation falls out of range

One-Mode Projection of the fixed Partition One mode projection onto the nodes of the fixed partition corresponds to a network of basic units where two basic units are connected as many times as they are part of discrete combinations: example  Phoneme-phoneme Network (PhoNet)‏ PhoNet - Phoneme-Phoneme Network /s/ /n//k/ /p/ /t//d/ 1 11 2 2 2 1 2 1 1 1 1 1

Weighted DD  = 5  = 15 N = 500,  = 1 Blue dots  Stochastic Simulation, Black line  Theory q = k(  - 1)‏

Comparison with real data Not a very good match 

A lot of work for future Derive closed form solutions for o Parallel attachment with replacement o Parallel attachment without replacement Strike a model and its associated theory to match the properties of the one-mode Study other real-world systems with an underlying αBiN-structure

To-DAH

Theory of αBiNs: Alphabetic Bipartite Networks Animesh Mukherjee Dept. of Computer Science and Engineering Indian Institute of Technology, Kharagpur Collaborators:

Similar presentations

Presentation on theme: "Theory of αBiNs: Alphabetic Bipartite Networks Animesh Mukherjee Dept. of Computer Science and Engineering Indian Institute of Technology, Kharagpur Collaborators:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Theory of αBiNs: Alphabetic Bipartite Networks Animesh Mukherjee Dept. of Computer Science and Engineering Indian Institute of Technology, Kharagpur Collaborators:

Similar presentations

Presentation on theme: "Theory of αBiNs: Alphabetic Bipartite Networks Animesh Mukherjee Dept. of Computer Science and Engineering Indian Institute of Technology, Kharagpur Collaborators:"— Presentation transcript:

Similar presentations

About project

Feedback