A Word at a Time Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky, Eugene Agichteiny, Evgeniy Gabrilovichz, Shaul Markovitch
Introduction A rich source of information can be revealed by studying the patterns of word occurrence over time Example: peace and war Corpus: New York Times over 130 years Word time series of its occurrence in NYT articles Hypothesis: Correlation between 2 words time series Semantic Relation Proposed method: Temporal Semantic Analysis (TSA)
Introduction
1. TSA
Temporal Semantic Analysis 3 main steps: 1.Represent words as concepts vectors 1.Extract temporal dynamics for each concept 1.Extend static representation with temporal dynamics
1. Words as concept vectors
2. Temporal dynamics c : concept represented by a sequence of words wc 1,…,wc k d : a document ε : proximity relaxation parameter (ε = 20 in the experiments) c appears in d if its words appear in d with a distance of at most ε words between each pair wc i, wc j Example: Great Fire of London
2. Temporal dynamics t 1,…,t n : a sequence of consecutive discrete time points (days) H = D 1,…,D n : history represented by a set of document collections, where D i is a collection of documents associated with time t i the dynamics of a concept c is the time series of its frequency of appearance in H
3. Extend static representation
2. Using TSA for computing Semantic Relatedness
Using TSA for computing Semantic Relatedness Compare by weighted distance between time series of concept vectors Combine it with the static semantic similarity measure
Algorithm t 1, t 2 : words C(t 1 ) = {c 1,…,c n }and C(t 2 ) = {c 1,…,c m }: sets of concepts of t 1 and t 2 Q(c 1,c 2 ) : function that determines relatedness between two concepts c 1 and c 2 using their dynamics (time series)
Algorithm
Cross Correlation Pearson's product-moment coefficient: A statistic method for measuring similarity of two random variables Example: computer and radio
Dynamic Time Warping Measure similarity between 2 time series that may differ in time scale but similar in shape Used in speech recognition It defines a local cost matrix Temporal Weighting Function
3. Experimentations
Experimentations: Setup New York Times archive (1863 – 2004) Each day: average of 50 abstracts of article 1.42 Gb of texts distinct words A new algorithm to automatically benchmark word relatedness tasks Same vector representation for each method tested Comparison to human judgment (WS-353 and Amazon MTurk)
TSA vs. ESA
TSA vs. Temporal Word Similarity
Word Frequency Effects
Size of Temporal Concept Vector
Conclusion Two innovations: o Temporal Semantic Analysis o A new method for measuring semantic relatedness of terms Many advantages (robustness, tunable, can be used to study language evolution over time) Significant improvements in computing words relatedness