Quantifying the Structure of Language and Music Damián H. Zanette, Centro Atómico Bariloche & Instituto Balseiro
Language and music are outputs of a very complex system. They must display patterns at several organizational levels. Written language: a 1D array of symbols (words). And music?
Organization in written language Frequency of words (Zipf’s law) music Ordering of words (a linguistic universal) Word distribution (the “scales of meaning”)
f (r) ~ r-z Zipf’s law Make a list of the different words in a text, from the most frequent to the least frequent. The frequency of each word is inversely proportional to a power of its rank in the list . George K. Zipf (1902-1950) f (r) ~ r-z
Zipf’s law
D. H. Zanette, Musicae Scientiae 10, 3 (2006)
However… If words are shuffled at random, Zipf’s law persists but meaning is lost! How much information is stored in the order of words?
Shannon and language entropy Claude E. Shannon (1916-2001)
“The Shannon entropy of a symbolic sequence is a lower bound for the length of any lossless compression of the sequence.” Estimators for H can be given from the Lempel-Ziv algorithm
M. A. Montemurro, D. H. Zanette, PLoS ONE 6(5) e19875 (2011) D = 3.2 – 3.5 bits/word
Our conjecture The universal value of the information stored in the order of words is related to a (cognitive?) constraint between the diversity of semantic symbols and the typical lengths of word ordering.
The scales of meaning Burstiness in Darwin’s “On the Origin of Species”
How precisely do words tag the different parts of a text? What is the optimal size of parts?
M. A. Montemurro, D. H. Zanette, Adv. Compl. Sys. 13, 135 (2010) Divide the text into P equal parts of size s and calculate the mutual information between words and parts. Compare with a random shuffling of the text.
M. A. Montemurro, D. H. Zanette, Adv. Compl. Sys. 13, 135 (2010)
M. A. Montemurro, D. H. Zanette, Adv. Compl. Sys. 13, 135 (2010)
M. A. Montemurro, D. H. Zanette, Adv. Compl. Sys. 13, 135 (2010)