NEW EVENT DETECTION AND TOPIC TRACKING STEPS
PREPROCESSING Removal of check-ins and other redundant data Removal of URL’s maybe Stemming of words using TRMorph – Get the root form of a word
PREPROCESSING(2) Expand tweets with co-occurance statistics of words – OzerOzdikisAsonam (language independent) Syntagmatic relations-> If two words appear together very frequently in texts Paradigmic relations-> If words can replace each other Use of WordNet (BalkaNet for Turkish, not so succesful) Use of Latent Semantic Indexing for expanding the tweets might be used
PREPROCESSING(3) Normalize the tweets to produce unit-length vectors Put the tweets and words in a vector space model with the words tf-idf values The ones with hashtags can be increased to get a better result (an idea) *Times of tweets can be used in a way*
ALGORITHM Clusters are vectors of the average values of belonging tweets Calculate cosine similarity between a new tweet and all the clusters If the similarity is greater than a threshold – Add the tweet to the corresponding cluster – Update the cluster ?addition to more than one cluster if the value is above threshold fore more clusters?
ALGORITHM(2) If the cosine similarity is below the threshold for all the clusters, this is a new event and a new cluster
ALGORITHM(3) We might extract queries(word groups that represents the topics) for clusters to look for the cluster-tweet similarities.[2] Update the query with each update to the cluster
EVALUATION Precision-Recall, F score Intra-distance similarities [1]
REFERENCES [1] p?arnumber= [1] p?arnumber= [2] y?doi= [2] y?doi=