Download presentation
Presentation is loading. Please wait.
Published byJaden Farrant Modified over 10 years ago
1
A Word at a Time Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky, Eugene Agichteiny, Evgeniy Gabrilovichz, Shaul Markovitch
2
Introduction A rich source of information can be revealed by studying the patterns of word occurrence over time Example: peace and war Corpus: New York Times over 130 years Word time series of its occurrence in NYT articles Hypothesis: Correlation between 2 words time series Semantic Relation Proposed method: Temporal Semantic Analysis (TSA)
3
Introduction
5
1. TSA
6
Temporal Semantic Analysis 3 main steps: 1.Represent words as concepts vectors 1.Extract temporal dynamics for each concept 1.Extend static representation with temporal dynamics
7
1. Words as concept vectors
8
2. Temporal dynamics c : concept represented by a sequence of words wc 1,…,wc k d : a document ε : proximity relaxation parameter (ε = 20 in the experiments) c appears in d if its words appear in d with a distance of at most ε words between each pair wc i, wc j Example: Great Fire of London
9
2. Temporal dynamics t 1,…,t n : a sequence of consecutive discrete time points (days) H = D 1,…,D n : history represented by a set of document collections, where D i is a collection of documents associated with time t i the dynamics of a concept c is the time series of its frequency of appearance in H
10
3. Extend static representation
11
2. Using TSA for computing Semantic Relatedness
12
Using TSA for computing Semantic Relatedness Compare by weighted distance between time series of concept vectors Combine it with the static semantic similarity measure
13
Algorithm t 1, t 2 : words C(t 1 ) = {c 1,…,c n }and C(t 2 ) = {c 1,…,c m }: sets of concepts of t 1 and t 2 Q(c 1,c 2 ) : function that determines relatedness between two concepts c 1 and c 2 using their dynamics (time series)
14
Algorithm
15
Cross Correlation Pearson's product-moment coefficient: A statistic method for measuring similarity of two random variables Example: computer and radio
16
Dynamic Time Warping Measure similarity between 2 time series that may differ in time scale but similar in shape Used in speech recognition It defines a local cost matrix Temporal Weighting Function
17
3. Experimentations
18
Experimentations: Setup New York Times archive (1863 – 2004) Each day: average of 50 abstracts of article 1.42 Gb of texts 565 540 distinct words A new algorithm to automatically benchmark word relatedness tasks Same vector representation for each method tested Comparison to human judgment (WS-353 and Amazon MTurk)
19
TSA vs. ESA
20
TSA vs. Temporal Word Similarity
21
Word Frequency Effects
22
Size of Temporal Concept Vector
23
Conclusion Two innovations: o Temporal Semantic Analysis o A new method for measuring semantic relatedness of terms Many advantages (robustness, tunable, can be used to study language evolution over time) Significant improvements in computing words relatedness
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.