Quantifying the Structure of Language and Music

Slides:



Advertisements
Similar presentations
Lecture 2: Basic Information Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
Advertisements

Quantitative aspects of literary texts Adam J. Callahan & Gary E. Davis Department of Mathematics University of Massachusetts.
T.Sharon-A.Frank 1 Multimedia Compression Basics.
15-583:Algorithms in the Real World
15 Data Compression Foundations of Computer Science ã Cengage Learning.
Data Compression CS 147 Minh Nguyen.
Chapter 6 Information Theory
Empirical Investigations of WWW Surfing Paths Jim Pitkow User Interface Research Xerox Palo Alto Research Center.
Compression & Huffman Codes
An analysis of “Using sequence compression to speed up probabilistic profile matching” by Valerio Freschi and Alessandro Bogliolo Cory Tobin.
School of Computing Science Simon Fraser University
Compression Techniques. Digital Compression Concepts ● Compression techniques are used to replace a file with another that is smaller ● Decompression.
SWE 423: Multimedia Systems
Data Broadcast in Asymmetric Wireless Environments Nitin H. Vaidya Sohail Hameed.
IN350: Text properties, Zipf’s Law,and Heap’s Law. Judith A. Molka-Danielsen September 12, 2002 Notes are based on Chapter 6 of the Article Collection.
Judith Molka-Danielsen, Høgskolen i Molde1 IN350: Document Management and Information Steering: Class 5 Text properties and processing, File Organization.
Coding, Information Theory (and Advanced Modulation) Prof. Jay Weitzen Ball 411
Noise, Information Theory, and Entropy
©Brooks/Cole, 2003 Chapter 15 Data Compression. ©Brooks/Cole, 2003 Realize the need for data compression. Differentiate between lossless and lossy compression.
Albert Gatt Corpora and Statistical Methods. Probability distributions Part 2.
Some basic concepts of Information Theory and Entropy
Huffman Coding Vida Movahedi October Contents A simple example Definitions Huffman Coding Algorithm Image Compression.
The College of Saint Rose CSC 460 / CIS 560 – Search and Information Retrieval David Goldschmidt, Ph.D. from Search Engines: Information Retrieval in Practice,
8. Compression. 2 Video and Audio Compression Video and Audio files are very large. Unless we develop and maintain very high bandwidth networks (Gigabytes.
Lecture 10 Data Compression.
CS324e - Elements of Graphics and Visualization Java Intro / Review.
Modern Information Retrieval Chapter 7: Text Processing.
Source Coding-Compression
(Important to algorithm analysis )
Channel Capacity.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
Information & Communication INST 4200 David J Stucki Spring 2015.
Information Theory Basics What is information theory? A way to quantify information A lot of the theory comes from two worlds Channel.
Coding Theory Efficient and Reliable Transfer of Information
Source Coding Efficient Data Representation A.J. Han Vinck.
Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.
Statistical Properties of Text
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Bounds on Redundancy in Constrained Delay Arithmetic Coding Ofer ShayevitzEado Meron Meir Feder Ram Zamir Tel Aviv University.
Oliver Schulte Machine Learning 726 Decision Tree Classifiers.
SEAC-3 J.Teuhola Information-Theoretic Foundations Founder: Claude Shannon, 1940’s Gives bounds for:  Ultimate data compression  Ultimate transmission.
UNIT I. Entropy and Uncertainty Entropy is the irreducible complexity below which a signal cannot be compressed. Entropy is the irreducible complexity.
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
Information theory Data compression perspective Pasi Fränti
1 Information Theory and Dolphin Vocalizations Review of papers: B. McCowan, S. F. Hanser and L. R. Doyle. "Quantitative tools for comparing animal communication.
Data Compression Michael J. Watts
Chapter 1: Data Storage.
File Compression 3.3.
Compression & Huffman Codes
Shannon Entropy Shannon worked at Bell Labs (part of AT&T)
(Important to algorithm analysis )
Succinct Data Structures
CS644 Advanced Topics in Networking
Data Compression.
Introduction to Information theory
Algorithms in the Real World

Applied Algorithmics - week7
Corpora and Statistical Methods
Huffman Coding, Arithmetic Coding, and JBIG2
Data Compression CS 147 Minh Nguyen.
(Important to algorithm analysis )
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
Chapter 11 Data Compression
CSE 589 Applied Algorithms Spring 1999
Image Transforms for Robust Coding
15 Data Compression Foundations of Computer Science ã Cengage Learning.
1/f, Zipf’s law and Fractal in the brain
15 Data Compression Foundations of Computer Science ã Cengage Learning.
Presentation transcript:

Quantifying the Structure of Language and Music Damián H. Zanette, Centro Atómico Bariloche & Instituto Balseiro

Language and music are outputs of a very complex system. They must display patterns at several organizational levels. Written language: a 1D array of symbols (words). And music?

Organization in written language Frequency of words (Zipf’s law) music Ordering of words (a linguistic universal) Word distribution (the “scales of meaning”)

f (r) ~ r-z Zipf’s law Make a list of the different words in a text, from the most frequent to the least frequent. The frequency of each word is inversely proportional to a power of its rank in the list . George K. Zipf (1902-1950) f (r) ~ r-z

Zipf’s law

D. H. Zanette, Musicae Scientiae 10, 3 (2006)

However… If words are shuffled at random, Zipf’s law persists but meaning is lost! How much information is stored in the order of words?

Shannon and language entropy Claude E. Shannon (1916-2001)

“The Shannon entropy of a symbolic sequence is a lower bound for the length of any lossless compression of the sequence.” Estimators for H can be given from the Lempel-Ziv algorithm

M. A. Montemurro, D. H. Zanette, PLoS ONE 6(5) e19875 (2011) D = 3.2 – 3.5 bits/word

Our conjecture The universal value of the information stored in the order of words is related to a (cognitive?) constraint between the diversity of semantic symbols and the typical lengths of word ordering.

The scales of meaning Burstiness in Darwin’s “On the Origin of Species”

How precisely do words tag the different parts of a text? What is the optimal size of parts?

M. A. Montemurro, D. H. Zanette, Adv. Compl. Sys. 13, 135 (2010) Divide the text into P equal parts of size s and calculate the mutual information between words and parts. Compare with a random shuffling of the text.

M. A. Montemurro, D. H. Zanette, Adv. Compl. Sys. 13, 135 (2010)

M. A. Montemurro, D. H. Zanette, Adv. Compl. Sys. 13, 135 (2010)

M. A. Montemurro, D. H. Zanette, Adv. Compl. Sys. 13, 135 (2010)