Twitter Community Discovery & Analysis Using Topologies Andrew McClain Karen Aguar
Outline Introduction Motivation Project Description Community Discovery Data Collection Analysis & Application Results Why, What, How?
Introduction Many people use services like twitter to stay in contact with groups in which they are members or to interact with other people with similar interests These groups are considered “communities”
Community? A network or group of nodes with greater ties internally than to the rest of the network There are various derivations of a community: Some communities are tightly bound together Others are loose associations of people
Motivation To classify these communities & find real world implications of their digital associations Project Description: Discovering communities & examining the properties of the graph to give us insight into the community itself. Ex: Find the organizers of a hobby group by the twitter activity
Our Project Our project is composed of 2 main sections Twitter community discovery Analysis of the community graphs & its correlation to the real world community structure
Community Discovery Collected data from a diverse number of individuals from known real-world communities Generated graphs of the communities Partitioned graphs based on in/out degrees to isolate the community
Community Discovery Communities: @CNN @AthensGroupRide @AthensChurch @UniversityOfGA @ChickFilA
Data Collection Relationships Modeled: Parameters Followed By/ Following Replies to Mentions Parameters 1.5 Levels Limit # of people included in network Most limited ~ 300
Analysis & Application Manually reconstructed the hierarchy of the real-world known communities Use Gephi to detect behavior patterns and structures in twitter communities Shape, interconnectivity, how the information flows through it Analyzed the relationships in the graphs against known community structures
Analysis via Gephi Gephi -- open source graph visualization platform We used Gephi to isolate the community from the noisy background
Analysis via Gephi After isolating the communities, labels were sized based on in-degree The assumption is that the people who are listened to are followed most in the community The spline on the right shows the scale of the labels At this time, the analysis of importance is done visually
Results What we found: An interesting dichotomy between primarily online & primarily offline communities “Celebrity” Noise Effect Once a celebrity is introduced to a community, everyone follows them and they become a center individual in the community structure
Results Online Community: Offline Community Athens Group ride --- Make predictions about who is / is not important (by looking at in-degree) Athens Church – Most significant members are represented in the graph A mega-church pastor introduces celebrity noise into the community Offline Community ChickFilA’s information distribution is largely a uni-directional relationship. It doesn’t receive much information. Semi-Online Communities (in between) CNN, UniversityofGa Their graphs reveal information about the community structure such as large organizations involved, but not much about the individuals in the network
Results Athens Group Ride Ty_Magner, Philgaimon, Joeyrosskopf are determined to be most influential
Results University of GA
Results Athens Church Andy Stanley -celebrity effect
Results Chick-Fil-A Offline Community
Results CNN Extremely small community once filters are applied
Thank You! Questions? References: Community Discovery in Social Networks: Applications, Methods and Emerging Trends S. Parthasarathy, Y. Ruan, V. Satuluri [2011] gephi.com nodexl.codeplex.com twitter.com