Download presentation
Presentation is loading. Please wait.
Published byAnabel Martin Modified over 9 years ago
1
EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University
2
Social Media Sites Host Many “Event” Documents Photo-sharing: Flickr Video-sharing: YouTube Social networking: Facebook 2 “Event”= something that occurs at a certain time in a certain place [Yang et al. ’99] Popular, widely known events Presidential Inauguration, Thanksgiving Day Parade Smaller events, without traditional news coverage Local food drive, street fair … Social media documents for “All Points West” festival, Liberty State Park, New Jersey, 8/8/08
3
Identifying Events and Associated Social Media Documents Applications Event search and browsing Local search …… 3 General approach: group similar documents via clustering Each cluster corresponds to one event and its associated social media documents
4
Event Identification: Challenges Uneven data quality Missing, short, uninformative text … but revealing structured context available: tags, date/time, geo-coordinates Scalability Dynamic data stream of event information Unknown number of events Necessary for many clustering algorithms Difficult to estimate 4
5
Clustering Social Media Documents Social media document representation Social media document similarity Social media document clustering Clustering task: definition Ensemble algorithm: combining multiple clustering results Preliminary evaluation 5
6
Social Media Document Representation Title Description Tags Date/Time Location All-Text 6
7
Social Media Document Similarity Text: tf-idf weights, cosine similarity 7 Title Description Tags Date/Time Location All-Text Title Description Tags Date/Time- Keywords Location- Proximity All-Text Location- Keywords Date/Time- Proximity time Location: geo-coordinate proximity A A A A A A B B B B B B Time: proximity in minutes
8
Social Media Document Clustering Framework Document feature representation Social media documents Event clusters 8
9
Consensus Function: combine ensemble similarities Consensus Function: combine ensemble similarities Clustering: Ensemble Algorithm W title W tags W time 9 f(C,W) C title C tag s C time Ensemble clustering solution Learned in a training step
10
Clustering: Measuring Quality Homogeneous clusters 10 ✔ ✔ Complete clusters Metric: Normalized Mutual Information (NMI) Shared information between clustering solution and “ground truth”
11
Experimental Setup Data: >270K Flickr photos Event labels from Yahoo!’s “upcoming” event database Split into 3 parts for training/validation/testing Clusterers: single pass algorithm with centroid similarity Weighing scheme: Normalized Mutual Information (NMI) scores on validation set Consensus function: weighted average of clusterers’ binary predictions Final prediction step: single pass clustering algorithm 11
12
Preliminary Evaluation Results Individual clusterer performance Highest NMI: Tags, All-Text Lowest NMI: Description, Title Ensemble performance, compared against all individual clusterers Highest overall performance in terms of NMI More homogenous clusters: each event is spread over fewer clusters 12 Details in paper
13
Document similarity metric Ensemble approach Weight assignment Choice of clusterers Train a classifier to predict document similarity Features correspond to similarity scores All-text, title, tags, time, location, etc. Numeric values in [0,1] State-of-the-art classifiers: SVM, Logistic Regression, … 13 Future Work: Alternative Choices
14
Final clustering step Apply graph partitioning algorithms Requires estimating the number of clusters Evaluation metrics: beyond NMI Datasets Flickr LastFM, YouTube Exploit social network connections 14
15
Conclusions Identified events and their corresponding social media documents Proposed a clustering solution Leveraged different representations of social media documents Employed various social media similarity metrics Developed a weighted ensemble clustering approach Reported preliminary results of our event identification approach on a large-scale dataset of Flickr photographs 15
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.