Presentation is loading. Please wait.

Presentation is loading. Please wait.

EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University.

Similar presentations


Presentation on theme: "EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University."— Presentation transcript:

1 EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University

2 Social Media Sites Host Many “Event” Documents Photo-sharing: Flickr Video-sharing: YouTube Social networking: Facebook 2 “Event”= something that occurs at a certain time in a certain place [Yang et al. ’99]  Popular, widely known events Presidential Inauguration, Thanksgiving Day Parade  Smaller events, without traditional news coverage Local food drive, street fair  … Social media documents for “All Points West” festival, Liberty State Park, New Jersey, 8/8/08

3 Identifying Events and Associated Social Media Documents  Applications  Event search and browsing  Local search …… 3  General approach: group similar documents via clustering Each cluster corresponds to one event and its associated social media documents

4 Event Identification: Challenges  Uneven data quality  Missing, short, uninformative text  … but revealing structured context available: tags, date/time, geo-coordinates  Scalability  Dynamic data stream of event information  Unknown number of events  Necessary for many clustering algorithms  Difficult to estimate 4

5 Clustering Social Media Documents  Social media document representation  Social media document similarity  Social media document clustering  Clustering task: definition  Ensemble algorithm: combining multiple clustering results  Preliminary evaluation 5

6 Social Media Document Representation Title Description Tags Date/Time Location All-Text 6

7 Social Media Document Similarity  Text: tf-idf weights, cosine similarity 7 Title Description Tags Date/Time Location All-Text Title Description Tags Date/Time- Keywords Location- Proximity All-Text Location- Keywords Date/Time- Proximity time  Location: geo-coordinate proximity A A A A A A B B B B B B  Time: proximity in minutes

8 Social Media Document Clustering Framework Document feature representation Social media documents Event clusters 8

9 Consensus Function: combine ensemble similarities Consensus Function: combine ensemble similarities Clustering: Ensemble Algorithm W title W tags W time 9 f(C,W) C title C tag s C time Ensemble clustering solution Learned in a training step

10 Clustering: Measuring Quality  Homogeneous clusters 10 ✔ ✔  Complete clusters  Metric: Normalized Mutual Information (NMI) Shared information between clustering solution and “ground truth”

11 Experimental Setup  Data: >270K Flickr photos  Event labels from Yahoo!’s “upcoming” event database  Split into 3 parts for training/validation/testing  Clusterers: single pass algorithm with centroid similarity  Weighing scheme: Normalized Mutual Information (NMI) scores on validation set  Consensus function: weighted average of clusterers’ binary predictions  Final prediction step: single pass clustering algorithm 11

12 Preliminary Evaluation Results  Individual clusterer performance  Highest NMI: Tags, All-Text  Lowest NMI: Description, Title  Ensemble performance, compared against all individual clusterers  Highest overall performance in terms of NMI  More homogenous clusters: each event is spread over fewer clusters 12 Details in paper

13 Document similarity metric  Ensemble approach Weight assignment Choice of clusterers  Train a classifier to predict document similarity Features correspond to similarity scores All-text, title, tags, time, location, etc. Numeric values in [0,1] State-of-the-art classifiers: SVM, Logistic Regression, … 13 Future Work: Alternative Choices

14  Final clustering step  Apply graph partitioning algorithms Requires estimating the number of clusters  Evaluation metrics: beyond NMI  Datasets  Flickr LastFM, YouTube  Exploit social network connections 14

15 Conclusions  Identified events and their corresponding social media documents  Proposed a clustering solution  Leveraged different representations of social media documents  Employed various social media similarity metrics  Developed a weighted ensemble clustering approach  Reported preliminary results of our event identification approach on a large-scale dataset of Flickr photographs 15


Download ppt "EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University."

Similar presentations


Ads by Google