INFORMATION THEORY AND IMPLICATIONS

Slides:

Advertisements

Similar presentations

Request Dispatching for Cheap Energy Prices in Cloud Data Centers

Advertisements

SpringerLink Training Kit

Luminosity measurements at Hadron Colliders

From Word Embeddings To Document Distances

Choosing a Dental Plan Student Name

Virtual Environments and Computer Graphics

Chương 1: CÁC PHƯƠNG THỨC GIAO DỊCH TRÊN THỊ TRƯỜNG THẾ GIỚI

THỰC TIỄN KINH DOANH TRONG CỘNG ĐỒNG KINH TẾ ASEAN –

D. Phát triển thương hiệu

NHỮNG VẤN ĐỀ NỔI BẬT CỦA NỀN KINH TẾ VIỆT NAM GIAI ĐOẠN

Điều trị chống huyết khối trong tai biến mạch máu não

BÖnh Parkinson PGS.TS.BS NGUYỄN TRỌNG HƯNG BỆNH VIỆN LÃO KHOA TRUNG ƯƠNG TRƯỜNG ĐẠI HỌC Y HÀ NỘI Bác Ninh 2013.

Nasal Cannula X particulate mask

Evolving Architecture for Beyond the Standard Model

HF NOISE FILTERS PERFORMANCE

Electronics for Pedestrians – Passive Components –

Parameterization of Tabulated BRDFs Ian Mallett (me), Cem Yuksel

L-Systems and Affine Transformations

CMSC423: Bioinformatic Algorithms, Databases and Tools

Some aspect concerning the LMDZ dynamical core and its use

Bayesian Confidence Limits and Intervals

实习总结（Internship Summary)

Current State of Japanese Economy under Negative Interest Rate and Proposed Remedies Naoyuki Yoshino Dean Asian Development Bank Institute Professor Emeritus,

Front End Electronics for SOI Monolithic Pixel Sensor

Face Recognition Monday, February 1, 2016.

Solving Rubik's Cube By: Etai Nativ.

CS284 Paper Presentation Arpad Kovacs

انتقال حرارت 2 خانم خسرویار.

Summer Student Program First results

Theoretical Results on Neutrinos

HERMESでのHard Exclusive生成過程による核子内クォーク全角運動量についての研究

Wavelet Coherence & Cross-Wavelet Transform

yaSpMV: Yet Another SpMV Framework on GPUs

Creating Synthetic Microdata for Higher Educational Use in Japan: Reproduction of Distribution Type based on the Descriptive Statistics Kiyomi Shirakawa.

MOCLA02 Design of a Compact L-band Transverse Deflecting Cavity with Arbitrary Polarizations for the SACLA Injector Sep. 14th, 2015 H. Maesaka, T. Asaka,

Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,

Fuel cell development program for electric vehicle

Overview of TST-2 Experiment

Optomechanics with atoms

داده کاوی سئوالات نمونه

Inter-system biases estimation in multi-GNSS relative positioning with GPS and Galileo Cecile Deprez and Rene Warnant University of Liege, Belgium

ლექცია 4 - ფული და ინფლაცია

10. predavanje Novac i financijski sustav

Wissenschaftliche Aussprache zur Dissertation

FLUORECENCE MICROSCOPY SUPERRESOLUTION BLINK MICROSCOPY ON THE BASIS OF ENGINEERED DARK STATES* *Christian Steinhauer, Carsten Forthmann, Jan Vogelsang,

Particle acceleration during the gamma-ray flares of the Crab Nebular

Interpretations of the Derivative Gottfried Wilhelm Leibniz

Advisor: Chiuyuan Chen Student: Shao-Chun Lin

Widow Rockfish Assessment

SiW-ECAL Beam Test 2015 Kick-Off meeting

On Robust Neighbor Discovery in Mobile Wireless Networks

Chapter 6 并发：死锁和饥饿 Operating Systems: Internals and Design Principles

You NEED your book!!! Frequency Distribution

Y V =0 a V =V0 x b b V =0 z

Fairness-oriented Scheduling Support for Multicore Systems

Climate-Energy-Policy Interaction

Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,

Ch48 Statistics by Chtan FYHSKulai

The ABCD matrix for parabolic reflectors and its application to astigmatism free four-mirror cavities.

Measure Twice and Cut Once: Robust Dynamic Voltage Scaling for FPGAs

Online Learning: An Introduction

Factor Based Index of Systemic Stress (FISS)

What is Chemistry? Chemistry is: the study of matter & the changes it undergoes Composition Structure Properties Energy changes.

THE BERRY PHASE OF A BOGOLIUBOV QUASIPARTICLE IN AN ABRIKOSOV VORTEX*

Quantum-classical transition in optical twin beams and experimental applications to quantum metrology Ivano Ruo-Berchera Frascati.

The Toroidal Sporadic Source: Understanding Temporal Variations

FW 3.4: More Circle Practice

ارائه یک روش حل مبتنی بر استراتژی های تکاملی گروه بندی برای حل مسئله بسته بندی اقلام در ظروف

Decision Procedures Christoph M. Wintersteiger 9/11/2017 3:14 PM

Limits on Anomalous WWγ and WWZ Couplings from DØ

Presentation transcript:

INFORMATION THEORY AND IMPLICATIONS Dušica Filipović Đurđević Laboratory for Experimental Psychology, Department of Psychology, Faculty of Philosophy, University of Novi Sad, Novi Sad, Serbia Laboratory for Experimental Psychology, Department of Psychology, Faculty of Philosophy, University of Belgrade, Belgrade, Serbia

OUR PRODUCTS TELL A STORY OF US

OUR PRODUCTS TELL A STORY OF US Feature of some artefact that is made to fit us Our feature

Feature of some product that is made to fit us UNDERSTANDING US Our feature Feature of some product that is made to fit us

LANGUAGE AS A NATURAL SYSTEM Language structure Mirrors human mind Window into human mind Wundt: Higher mental functions can be analysed only through understanding of it’s products

OUR GOAL RT Complexity of a given aspect of language

HOW TO DESCRIBE COMPLEXITY OF LANGUAGE? Linguistic descriptions Provide general framework Detailed systemization Probability Theory Frequencies of language events Probabilities of language events Information Theory Brings the two together

PROBABILITY Relative frequency of an outcome – event e in a series of n identical experiments: (Pascal, 1654) Relative frequency of an item in a corpus

PROBABILITY: EXAMPLE Imagine a corpus of 100 complex words with three suffixes 50 -ness 40 -ly 10 -less Probability of finding particular suffix: P({ness}) =50/100=0.5 P({ly}) =40/100=0.4 P({less}) =10/100=0.1

INFORMATION LOAD Minus logarithm of probability (the base of logarithm can vary) The less likely the event, the larger the amount of information it conveys.

INFORMATION LOAD: PROCESSING EFFECTS Inflectional morphology Serbian inflected forms Kostić, 1991; 1995; Kostić, Marković, & Baucal, 2003 Sentence processing Surprisal Frank, 2010; 2013; Hale, 2001; Levy, 2008

SERBIAN INFLECTIONAL MORPHOLOGY Nouns Masculine Feminine Neuter Singular Plural Nominative konj-ø konj-i vil-a vil-e sel-o sel-a Genitive konj-a Dative konj-u konj-ima vil-i vil-ama sel-u sel-ima Accusative konj-e vil-u Instrumental konj-em vil-om sel-om Locative

SERBIAN INFLECTIONAL MORPHOLOGY Nouns Masculine Feminine Neuter Singular Plural Nominative konj-ø konj-i vil-a vil-e sel-o sel-a Genitive konj-a Dative konj-u konj-ima vil-i vil-ama sel-u sel-ima Accusative konj-e vil-u Instrumental konj-em vil-om sel-om Locative

SERBIAN INFLECTIONAL MORPHOLOGY Feminine nouns suffix F(ei) p(i)=F(ei)/ F(e) Ii=-logp(i) -a 18715 0.26 1.94 -e 27803 0.39 1.36 -i 7072 0.10 3.32 -u 9918 0.14 2.84 -om 4265 0.06 4.06 -ama 4409 f(e)=72182

NOT JUST FREQUENCY Average probability per syntactic function/meaning

SERBIAN INFLECTIONAL MORPHOLOGY Feminine nouns suffix case F(ei) R(ei) F(ei)/R(ei) p(i)=[F(ei)/ F(e)]/Σ Ii=-logp(i) -a Nom. Sg Gen. Pl. 18715 54 346.57 0.31 1.47 -e Gen. Sg. Nom. Pl. Acc. Pl. 27803 112 248.24 0.22 2.25 -i Dat. Sg. Loc. Sg. 7072 43 164.47 0.15 2.74 -u Acc. Sg. 9918 58 171 -om Ins. Sg. 4265 32 133.28 0.12 3.32 -ama Dat. Pl. Loc. Pl. Ins. Pl 4409 75 58.79 0.05 5.06 Σ=1122.35

SERBIAN INFLECTIONAL MORPHOLOGY Kostić, 1991; 1995; Kostić, Marković, & Baucal, 2003

PROBABILITY: ADDITIVITY Probability of finding ness or ly or less equals 1 If the two events do not overlap, probability of finding either of them equals the sum of their probabilities; for example: if ø

JOINT PROBABILITIES Probability of joint occurrence of multiple events Our example, corpus of 100 words What is the probability of finding ness and ly? we already know that P({ness}) = 0.5 P({ly}) = 0.4

JOINT PROBABILITIES Intuitively: We find ness in 50% of cases, and we find ly in 40% of cases. Therefore, jointly, we find them in 50% of 40 % of cases, that is in 20% of cases. Formally: Generally, for independent events if e1 and e2 are independent events

CONDITIONAL PROBABILITY Often, events are dependent An example: Imagine a corpus in which ly in a word is always followed by ness. Probability of finding ness in a word given that we found ly in that word equals 1. Definition Probability of the event e2 under assumption that the event e1 has already occurred is called conditional probability of the event e1 and is marked as P(e2|e1)

CONDITIONAL PROBABILITY An example: P({ly}) =0.4 P({ness} |{ly}) =1 Their joint probability Generally

CONDITIONAL PROBABILITY Another example: P({ly}) = 0.4 P({ness} |{ly}) = 0 Their joint probability

SENTENCE PROCESSING Surprisal 𝐼 𝑤 𝑡+1 = - log p 𝑤 𝑡+1 𝑤 1 𝑡 Informatin load of the word given previously seen word(s) (Frank, 2010; 2013; Hale, 2001; Levy, 2008) Predicts reading latencies Less expected – more suprising words take more time to process 𝐼 𝑤 𝑡+1 = - log p 𝑤 𝑡+1 𝑤 1 𝑡

EVENT VS. SYSTEM (GROUP OF EVENTS) Amount of information – description of one event What about random variables? How to calculate the amount of information carried by random variable manifesting through several events? e.g. how to calculate uncertainty of feminine nouns inflection

ENTROPY Answer: by calculating entropy of probability distribution of potential values of the given random variable, that is by calculating the average uncertainty of the value of the given random variable (X)

AN EXAMPLE Variable in question: suffix Potential values: ness, ly, less The same example... P({ness}) =50/100=0.5 P({ly}) =40/100=0.4 P({less}) =10/100=0.1

ENTROPY Higher number of events  higher uncertainty More balanced probabilities  higher uncertainty

MAXIMUM ENTROPY Maximum uncertainty – entropy of the distribution with equally probable outcomes. An example: P({ness}) =0.33 P({ly}) =0.33 P({less}) =0.33

SHANNON EQUITABILITY Ratio of observed and maximum entropy General measure of “order” within a system, that is, the distance between the observed state of the system and complete unpredictability. Good for comparison of systems with different number of elements Our example

REDUNDANCY Complement of Shannon equitability, the sum of the two being 1 Tells the same story as relative entropy, but in the opposite direction.

ENTROPY: PROCESSING EFFECTS Morphology Inflectional and derivational entropy Moscoso del Prado Martin, Kostić, & Baayen, 2004; Baayen, Feldman, Schreuder, 2006 Lexical ambiguity Semantic entropy Filipović Đurđević, & Kostić, 2006 Auditory comprehension Cohort entropy Kemps, Wurm, Ernestus, Schreuder, Baayen, 2005

SERBIAN INFLECTIONAL MORPHOLOGY Feminine nouns suffix F(ei) p(i)=F(ei)/ F(e) -p(i)logp(i) -a 18715 0.26 0.51 -e 27803 0.39 0.53 -i 7072 0.10 0.33 -u 9918 0.14 0.40 -om 4265 0.06 0.24 -ama 4409 f(e)=72182 H=2.25

SEMANTIC ENTROPY Polysemy as sense uncertainty Balanced sense probabilities Unbalanced sense probabilities horn shell Filipović Đurđević, 2007

SEMANTIC ENTROPY Measure of uncertainty Higher entropy  higher uncertainty  shorter RT Sense probability Filipović Đurđević, 2007

COHORT ENTROPY cathedral cat captain caravan CA … Kemps, Wurm, Ernestus, Schreuder, Baayen, 2005

ENTROPY AND THE BRAIN Entropy: Hypocampus Surprisal: Sensory processing Expected novelty before it occurs Novelty per se Strange, Duggins, Penny, Dolan & Friston, 2005

JOINT ENTROPY – TOTAL ENTROPY OF THE SYSTEM Additivity of entropy If we imagine two unrelated systems, X and Y, total entropy will be: X Y

ENTROPY OF MORPHOLOGICAL FAMILY logN = Hmax Family size Entropy

JOINT ENTROPY BUT, notice: these events are mutually exclusive H(thinker | [think],[thinker]) = H([think] + H([thinker] | [think]) + H(thinker | [think], [thinker]) BUT, notice: these events are mutually exclusive Moscoso del Prado Martin, Kostić, & Baayen, 2004

JOINT ENTROPY – TOTAL ENTROPY OF THE SYSTEM If X and Y describe related events, we look at all possible outcomes pair-wise P(x,y) X Y

JOINT ENTROPY - CHARACTERISTICS It is never smaller than the entropy of the initial system. Adding new system can never reduce uncertainty! Two systems taken together can never have larger entropy than the sum of their individual entropies.

CONDITIONAL ENTROPY Reveals the level of uncertainty that remains in random variable X given that value of random variable Y is familiar. Equals zero if X is completely predictable based on Y - when X=f(Y) Maximum, that is equal to H(X) when X and Y are independent, that is when Y tells nothing on X.

CONDITIONAL ENTROPY: PROCESSING EFFECTS Inflectional morphology Paradigm cell filling problem Ackerman, Blevins, & Malouf, 2009; Ackerman, & Malouf, 2013 Sentence processing Syntactic entropy Frank, 2010; 2013; Hale, 2001; Levy, 2008

? CONDITIONAL ENTROPY prozoru prozore Paradigm cell filling problem NOM GEN DAT ACC VOC INS LOC učenik Ø a u e om i ima Slavko o / Pavle prozor selo polje em ime kube žena ama sudija o(a) stvar ju (i) Ackerman, Blevins, & Malouf, 2009; Ackerman, & Malouf, 2013

ENTROPY NOM GEN DAT ACC VOC INS LOC učenik Ø a u e om i ima Slavko o / Pavle prozor selo polje em ime kube žena ama sudija o(a) stvar ju (i) H(gen.sg) = - p(a)logp(a) – p(e)logp(e) – p(i)logp(i) = =-8/11*log(8/11) – 2/11*log(2/11) – 1/11*log(1/11) = = … Ackerman, Blevins, & Malouf, 2009; Ackerman, & Malouf, 2013

CONDITIONAL ENTROPY NOM GEN DAT ACC VOC INS LOC učenik Ø a u e om i ima Slavko o / Pavle prozor selo polje em ime kube žena ama sudija o(a) stvar ju (i) H(gen.sg | dat.sg = i) = – p(e)logp(e) – p(i)logp(i) = = – 2/3*log(2/3) – 1/3*log(1/3) = = … Ackerman, Blevins, & Malouf, 2009; Ackerman, & Malouf, 2013

CONDITIONAL ENTROPY: PROCESSING EFFECTS Uncertainty about the rest of the sentence Probability estimation based on positional frequencies of parts of speech in a finite set of words e.g. N=10 𝐻 𝑡 =− 𝑝( 𝑤 𝑡 )𝑙𝑜𝑔𝑝( 𝑤 𝑡 ) t: Mary 𝐻 𝑡+1 =− 𝑝 𝑤 𝑡+1 𝑤 𝑡 𝑙𝑜𝑔𝑝 𝑤 𝑡+1 𝑤 𝑡 t+1: Mary left ∆𝐻=𝐻 𝑡 −𝐻(𝑡+1) ∆𝐻>0 Frank, 2010; 2013; Hale, 2001; Levy, 2008

MUTUAL INFORMATION Amount of information shared by X and Y Equals zero when X and Y are independent (tell nothing of each other) Maximum, equal to H(X), and H(Y) when X and Y are identical. All the information is already contained in X, adding of Y tells nothing new (and vice versa).

RELATIONS AMONG MEASURES I(X,Y)=H(X)-H(X|Y)= =H(Y)-H(Y|X)= =H(X)+H(Y)-H(X,Y)

MEASURES OF DISTANCE BETWEEN DISTRIBUTIONS Kullback-Leibler divergence (Relative entropy) Measures the distance between two distributions, but is not a true distance measure, for reasons of asymmetry. distribution we are trying to predict starting distribution

MEASURES OF DISTANCE BETWEEN DISTRIBUTIONS Kullback-Leibler divergence (Relative entropy) Reveals the additional amount of information we need to predict q(x), if we already know p(x). Measure of inefficiency of assuming that the distribution is q when the true distribution is p Number of bits needed to chose an event from the set of possibilities when the coding scheme is based on q instead on true distribution – p

MEASURES OF DISTANCE BETWEEN DISTRIBUTIONS Jensen-Shannon divergence (true distance) Cross-entropy

MEASURES OF DISTANCE: PROCESSING EFFECTS Inflectional morphology Inflectional paradigms and classes Milin, Filipović Đurđević, & Moscoso del Prado Martin, 2009 Auditory comprehension Balling, & Baayen, 2012 Derivational morphology Derivational mini-paradigms and mini-classes Milin, Kuperman, Kostić, & Baayen, 2009

PARADIGM AND CLASS Frequency distribution Inflectional paradigm Inflectional class Feminine nouns Inflected form Inflected form probability Suffix Suffix probability saun-a saun-e saun-i saun-u saun-om saun-ama 0.31 0.09 0.34 0.16 0.05 -a -e -i -u -om -ama 0.26 0.39 0.10 0.14 0.06

RELATIVE ENTROPY p(i) – probability distribution of inflected forms of inflectional paradigm (e.g. of the word knjiga) q(i) - probability distribution of inflected forms of inflectional class (e.g. feminine nouns)

D(p||q) PREDICTS RT

RELATIVE ENTROPY IN AUDITORY COMPREHENSION ab abd abc 0.06 0.26 0.29 abde 0.13 0.14 abdef 0.52 0.57 0.03 Balling, & Baayen, 2012

WEIGHTED RELATIVE ENTROPY Baayen, et al, 2011 Masked priming Self-paced sentence reading VLDT

DERIVATIONAL MINI-CLASSES AND MINI-PARADIGMS Milin, Kuperman, Kostić, & Baayen, 2009 Derived words Sufixes and prefixes Word pairs PARADIGM CLASS KIND – UNKIND KIND – UNKIND TRUE – UNTRUE PLEASANT – UNPLEASANT … Cross entropy predicts RT

CONCLUSION Many ways to describe language in terms of Information Theory However, we learn nothing of implementation Information Theory helps us understand the constraints of the system Why something is optimal Important step towards understanding of how something is processed how it’s implemented in the brain

THANK YOU! Ovo istraživanje finansirano je od strane Ministarstva prosvete, nauke i tehnološkog razvoja Republike Srbije (projekat broj: 179033 i 179006).

READING MATERIAL Chapter 2 – "Mathematical foundations" in Manning, C. D. and Schuetze, H. (2000). Foundations of Statistical Natural Language Processing. Cambridge, MA: The MIT Press. Bod, R. (2003). Introduction to Elementary Probability Theory and Formal Stochastic Language Theory. In Bod, R., Hay, J., and Jannedy, S., (eds.), Probabilistic Linguistics. The MIT Press. MacKay, D. J. C. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge, UK: Cambridge University Press. http://www.inference.phy.cam.ac.uk/mackay/itila/book.html Pluymaekers, M., Ernestus, M. and Baayen, R. H. (2005) Articulatory planning is continuous and sensitive to informational redundancy. Phonetica, 62, 146-159. Wurm, L.H., Ernestus, M., Schreuder, R., and Baayen, R. H. (2006) Dynamics of the Auditory Comprehension of Prefixed Words: Cohort Entropies and Conditional Root Uniqueness Points, The Mental Lexicon, 1, 125-146. Milin, P., Filipović Đurđević, D., Kostić, A. & Moscoso del Prado Martìn, F. (2009). The simultaneous effects of inflectional paradigms and classes on lexical recognition: Evidence from Serbian. Journal of Memory and Language, 60(1), 50-64. Milin, P., Kuperman, V., Kostic, A. and Baayen, R. H. (2009) Paradigms bit by bit: an information theoretic approach to the processing of paradigmatic structure in inflection and derivation. In Blevins, J.P. And Blevins, J. (Eds), Analogy in grammar: Form and acquisition, Oxford University Press, Oxford, 2009, 214-252. Moscoso del Prado Martin, F., Kostić, A. & Baayen, H. (2004). Putting the bits together: An information-theoretical perspective on morphological processing. Cognition, 94, 1-18. Kostić, A. and Mirković, J. (2002). Processing of inflected nouns and levels of cognitive sensitivity. Psihologija, 35 (3-4), 287-297. Kostić, A., Marković, T. i Baucal, A. (2003). Inflectional morphology and word meaning: orthogonal or co-implicative cognitive domains? U H. Baayen & R. Schreuder (Eds.): Morphological Structure in Language Processing. Mouton de Gruyter. Berlin, 1-45. Tabak, W., Schreuder, R. and Baayen, R. H. (2005). Lexical statistics and lexical processing: semantic density, information complexity, sex, and irregularity in Dutch. In M. Reis and S. Kepser (eds), Linguistic Evidence, Mouton, 529-555. Balling, L. and Baayen, R.H. (2012) Probability and surprisal in auditory comprehension of morphologically complex words. Cognition, 125, 80-106. Kemps, R., Wurm, L., Ernestus, M., Schreuder, R. and Baayen, R. H. (2005) Prosodic cues for morphological complexity: Comparatives and agent nouns in Dutch and English. Language and Cognitive Processes, 20, 43-73. Baayen, R. H., Milin, P., Filipovic Durdevic, D., Hendrix, P. and Marelli, M. (2011), An amorphous model for morphological processing in visual comprehension based on naive discriminative learning. Psychological Review 118, 438-482. Frank, S.L. (2013). Uncertainty reduction as a measure of cognitive load in sentence comprehension. Topics in Cognitive Science, 5, 475-494.