Microblogs: Information and Social Network Huang Yuxin
Millions of users in Microblogs By July 2009, Twitter has attracted 41 million users. By March 2011, size of Twitter has grown to 175 million. The registered id in Sina Microblog has reached 100 million by March 2011
People can publish posts and share information on Microblogs
Social network in Microblogs
What information can we extract from Microblogs Plain Text User reference (1/2 posts) Hashtag (1/9 posts) Retweet Emoticons Shortened URL (resource) (1/2 posts) Time Users’ Geology info
Basic features from text of twitter Tiny URL UsersPost Time Emoticons Hashtag Mention (User reference)
What is Twitter(WWW 2010) People who are more active tend to have more followers The case is different for people with very high popularity.(Because they are celebrities)
Small World Average Path length of Twitter: 4.12
Reciprocity? (Whole dataset) 77.9% of user pairs with any link between them are connected one-way. And 67.6% of users are not followed by any of their followings. The rate of reciprocity is higher in Asian countries than America. (www 2010) (Part of active users 72.4% of the users in Twitter follow more than 80% of their followers 80.5% of users have 80% of users they are following follow them back (wsdm 2010) The difference of conclusion between these two papers is caused by different data extraction method
Celebrities And Popular Topics
Users’ participation in topics A topic can only attract certain group of users
Content types on twitter Daily Chatter Conversations Sharing Information Reporting and Spreading News
Understanding following Behavior----a statistics made in a paper Why we follow: professional interest, technology, tone of presentation, keeping up with friends Why we unfollow: Too many posts in general, too much status/personal info, spam, duplicative posts.
Interesting Research Topics on Twitter Vertical Search on Twitter (partial indexing + time sensitive information retrieval) Static Topic Detection (topic model) Burst Event Detection (topic specific) Topic Biased Expert Recommendation (graph feature+ activeness+ textual feature) Cascading Feature Analysis (Network structure + topic spreading behavior on different topics)
Related Works
People I need to follow vs. Content I need to know TWEET Listen
People I need to follow vs. Content I need to know An active publisher may has interest in many topics My page is always filled with non-valuable latest chatting I may only need to subscribe certain topics of an author Can we automatically classify one’s content and filter out irrelevant ones?
Topics spreads through network EARTHQUAKE
Detecting hot Topics with community keywords temporal feature Hot topics are biased to a group of users, or a certain time period Retweet Trees, Social Networks accompanied with users’ expertise can all participate in the model training
Topic Model with network regularization (WWW 08) 21 e.g. coauthor network Document d kk 11 22 O(C,G)=L(C)+ R(G,C) keyword list
?????
Rumors have attracted much attention
Intuitions Rumors spread furiously and cause hot discussion Rumors tends to be controversial (people spreading it and people against it) The source of Rumor (celebrities? Nobody?) Maybe a study of the spreading of particular rumor is interesting. Celebrities will clarify the truth?
Challenges How to differentiate rumors with personal view Most of the comments are subjective (expression of feelings) Most of the comments are subjective
Rumors vs. meaningless Topics
Suggestions and ideas are really Welcome