CS 765 – Fall 2014 Paulo Alexandre Regis Reddit analysis
Outline ABOUT REDDIT WHY REDDIT PREVIOUS WORKS INITIAL PROPOSAL Q&A
What is reddit? Reddit is an open-source platform that supports the interaction of communities. It has been used as news hub, Q&A platform, internet hoax/meme propagatio.
Features Subreddits Voting Karma Public API
Why reddit? Growing communities Diverse usage Open-source platform Unexplored opportunities
Why reddit?
The API Easy to parse, returns JSON objects 30 requests per minute limit 60 requests per minute if using Oauth Useful links: Dev community: API documentation:
Previous works PRAW Information and social analysis Identifying social roles Backbone networks
PRAW Python Reddit API Wrapper Open-source Respects Reddit’s guidelines Easy integration Well documented Project website:
Information and social analysis of reddit Insights on comments section Generated 3 social graphs: – Loose: user A comments on user B establishes an edge – Tight: user A commenting on user B and user B commenting on user A – Strict: user A comments 4 times on user B and vice-versa
Information and social analysis of reddit
Limited data collection: – Time constraints – 1% (250) of the top subcommunities crawled Results:
Identifying social roles in reddit Identify specific role (answer-person: responds to questions but only in a few different discussions. i.e. Q&A) in reddit Sampled top users from top submissions and targeted communities Used PRAW Crawler script open- sourcehttps://github.com/cbuntain/redditResponseExtractorhttps://github.com/cbuntain/redditResponseExtractor
(a) Mike Shuttleworth (Ubuntu) IAmA Q&A (b) Regular user from other subreddit
Using backbone networks to map user interests in social media Focus on communities (subreddits) Communities linked by users (bipartite graph) Small-world (shortest path ~= 3.71) Roughly 1/3 of users crawled Anonymized data available:
Initial proposal Analyze the influence of social hubs in reddit’s network. Se if high degree nodes attract more attention from lower degree nodes. An edge would be formed when both nodes comment in the same post. The degree of the nodes would be their predefined “karma”. And it could be compared with other ranking algorithms (i.e. PageRank)
Questions?