Download presentation
Presentation is loading. Please wait.
Published bySabina Hampton Modified over 9 years ago
1
NEW EVENT DETECTION AND TOPIC TRACKING STEPS
2
PREPROCESSING Removal of check-ins and other redundant data Removal of URL’s maybe Stemming of words using TRMorph – Get the root form of a word
3
PREPROCESSING(2) Expand tweets with co-occurance statistics of words – OzerOzdikisAsonam (language independent) Syntagmatic relations-> If two words appear together very frequently in texts Paradigmic relations-> If words can replace each other Use of WordNet (BalkaNet for Turkish, not so succesful) Use of Latent Semantic Indexing for expanding the tweets might be used
4
PREPROCESSING(3) Normalize the tweets to produce unit-length vectors Put the tweets and words in a vector space model with the words tf-idf values The ones with hashtags can be increased to get a better result (an idea) *Times of tweets can be used in a way*
5
ALGORITHM Clusters are vectors of the average values of belonging tweets Calculate cosine similarity between a new tweet and all the clusters If the similarity is greater than a threshold – Add the tweet to the corresponding cluster – Update the cluster ?addition to more than one cluster if the value is above threshold fore more clusters?
6
ALGORITHM(2) If the cosine similarity is below the threshold for all the clusters, this is a new event and a new cluster
7
ALGORITHM(3) We might extract queries(word groups that represents the topics) for clusters to look for the cluster-tweet similarities.[2] Update the query with each update to the cluster
8
EVALUATION Precision-Recall, F score Intra-distance similarities [1]
9
REFERENCES [1] http://ieeexplore.ieee.org/xpl/articleDetails.js p?arnumber=6425790 [1] http://ieeexplore.ieee.org/xpl/articleDetails.js p?arnumber=6425790 [2] http://citeseerx.ist.psu.edu/viewdoc/summar y?doi=10.1.1.42.8942 [2] http://citeseerx.ist.psu.edu/viewdoc/summar y?doi=10.1.1.42.8942
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.