NEW EVENT DETECTION AND TOPIC TRACKING STEPS. PREPROCESSING Removal of check-ins and other redundant data Removal of URL’s maybe Stemming of words using.

NEW EVENT DETECTION AND TOPIC TRACKING STEPS

PREPROCESSING Removal of check-ins and other redundant data Removal of URL’s maybe Stemming of words using TRMorph – Get the root form of a word

PREPROCESSING(2) Expand tweets with co-occurance statistics of words – OzerOzdikisAsonam (language independent) Syntagmatic relations-> If two words appear together very frequently in texts Paradigmic relations-> If words can replace each other Use of WordNet (BalkaNet for Turkish, not so succesful) Use of Latent Semantic Indexing for expanding the tweets might be used

PREPROCESSING(3) Normalize the tweets to produce unit-length vectors Put the tweets and words in a vector space model with the words tf-idf values The ones with hashtags can be increased to get a better result (an idea) *Times of tweets can be used in a way*

ALGORITHM Clusters are vectors of the average values of belonging tweets Calculate cosine similarity between a new tweet and all the clusters If the similarity is greater than a threshold – Add the tweet to the corresponding cluster – Update the cluster ?addition to more than one cluster if the value is above threshold fore more clusters?

ALGORITHM(2) If the cosine similarity is below the threshold for all the clusters, this is a new event and a new cluster

ALGORITHM(3) We might extract queries(word groups that represents the topics) for clusters to look for the cluster-tweet similarities.[2] Update the query with each update to the cluster

EVALUATION Precision-Recall, F score Intra-distance similarities [1]

REFERENCES [1] http://ieeexplore.ieee.org/xpl/articleDetails.js p?arnumber=6425790 [1] http://ieeexplore.ieee.org/xpl/articleDetails.js p?arnumber=6425790 [2] http://citeseerx.ist.psu.edu/viewdoc/summar y?doi=10.1.1.42.8942 [2] http://citeseerx.ist.psu.edu/viewdoc/summar y?doi=10.1.1.42.8942

NEW EVENT DETECTION AND TOPIC TRACKING STEPS. PREPROCESSING Removal of check-ins and other redundant data Removal of URL’s maybe Stemming of words using.

Similar presentations

Presentation on theme: "NEW EVENT DETECTION AND TOPIC TRACKING STEPS. PREPROCESSING Removal of check-ins and other redundant data Removal of URL’s maybe Stemming of words using."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

NEW EVENT DETECTION AND TOPIC TRACKING STEPS. PREPROCESSING Removal of check-ins and other redundant data Removal of URL’s maybe Stemming of words using.

Similar presentations

Presentation on theme: "NEW EVENT DETECTION AND TOPIC TRACKING STEPS. PREPROCESSING Removal of check-ins and other redundant data Removal of URL’s maybe Stemming of words using."— Presentation transcript:

Similar presentations

About project

Feedback