Download presentation
Presentation is loading. Please wait.
Published byAgnes Thomas Modified over 9 years ago
1
Topic cluster of Streaming Tweets based on GPU-Accelerated Self Organizing Map Group 15 Chen Zhutian Huang Hengguang
4
Unsupervised, Clustering algorithm. Organize large document collections according to textual similarities. Create visible result for searching and exploring large document collections.
5
WEBSOM system Based on Self Organizing Map. Generate topic map for documents. Explore large documents just like explore Google map.
6
What WEBSOM looks like?
7
Gap WEBSOM – Long document, static, long training time. Twitter – Short text, dynamic, streaming data How to adapt SOM to streaming Twitter data?
8
What our system looks like
11
Pipeline Detect Event Build Dictionary Vectorize Tweets Reduce Dimension SOM Cluster Show the SOM map Detect Event
12
Only focus on unusual events. How to identify abnormal events on Twitter?
13
1. Similar to TCP’s congestion control mechanism. 2. Count the number of tweets in a moving window. 3. Weighted moving average and variance. 4. Threshold to determine whether it’s an event. Detect Event
14
Test Data
15
Time of PeakWhat’s happen? 4:11First Goal! 4:25Goal! X 3 in 3 minute 4:30Goal! 5:07Second Half Begin 5:25Goal! 5:35Goal! 5:46Goal! 5:50End! Detect Event
16
Build Dictionary Vectorize Tweets Reduce Dimension SOM Cluster Show the SOM map Detect Event Build Dictionary
17
Detect Event Build Dictionary Vectorize Tweets Reduce Dimension SOM Cluster Show the SOM map Build Dictionary
18
1. Remove stop words 2. Stemming – Snow Balls 3. Remove words whose occurrence less that 10% 4. Remove words whose occurrence greater that 50% Build Dictionary
19
1. Vector Space model 2. TF-IDF 3. Normalization Vectorize Tweets
21
Reduce Dimension Show the SOM map SOM Cluster Reduce Dimension Vectorize Tweets Build Dictionary Detect Event
22
Reduce Dimension Random Projection 1. No Training. 2. Matrix Operation. Based on Johnson-Lindenstrauss lemma
23
Show the SOM map SOM Cluster Reduce Dimension Vectorize Tweets Build Dictionary Detect Event SOM Cluster
24
What is SOM? Self-organization Map. SOM Cluster
27
Test Data http://web.ist.utl.pt/acardoso/datasets/http://web.ist.utl.pt/acardoso/datasets/.
28
MethodRandom Projection Macro Accuracy(%) Micro Accuracy(%) Renato’s SOMNO6867 Our MethodYES6061 Conclusion: Random projection will result in losing precision. Hence the performance will decrease after dimension reduction. 20 Newsgroup Test
29
MethodRandom Projection Macro Accuracy(%) Micro Accuracy(%) Renato’s SOMNO6867 Our MethodYES6061 Matlab repeat Renato’s SOM NO6362 Matlab repeat Renato’s SOM YES6160 20 Newsgroup Test
30
FIFA Data
34
Conclusion
35
Thanks for Watching Q & A
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.