CS 765 – Fall 2014 Paulo Alexandre Regis Reddit analysis.

Slides:



Advertisements
Similar presentations
Stelios Lelis UAegean, FME: Special Lecture Social Media & Social Networks (SM&SN)
Advertisements

Analysis and Modeling of Social Networks Foudalis Ilias.
The Last Procedure Before First Functional Prototype Grant Boomer, Brett Papineau, Tanis Lopez, Archana Shrestha CS 383.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
An Analysis of Social Network-Based Sybil Defenses Sybil Defender
Day 1 SOCIAL MEDIA CERTIFICATE SERIES DAY 4 - LINKEDIN.
What means for you Alisa Leonard Vice President, Marketing Strategy iCrossing.
 Review  Methodology –Dataset –Data Cleaning –Technology –Analysis Degree Distribution Hubs Top 100 Evolution Anonymous Users.
Networks. Graphs (undirected, unweighted) has a set of vertices V has a set of undirected, unweighted edges E graph G = (V, E), where.
Understanding, maximizing and leveraging social media in recruitment and employer branding Mr. Mahesh Jain, Head - TA at Collabera.
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
Chapter 20: Social Service Selection Service-Oriented Computing: Semantics, Processes, Agents – Munindar P. Singh and Michael N. Huhns, Wiley, 2005.
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
San Francisco Bay Area News Ecology Daniel Ramos CS790G Fall 2010.
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
Measurement and Analysis of Online Social Networks Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Presented.
Final Presentation Undergraduate Researchers: Graduate Student Mentor: Faculty Mentor: Jordan Cowart, Katie Allmeroth Krist Culmer Dr. Wenjun Zeng Investigating.
RSS/ INFORMATION AGGREGATORS Clare Santos- Gacad EDT 180 Nex t.
San Francisco Bay Area News Ecology Hayreddin Ceker.
What’s New in Search? How destinations can leverage new search trends.
SEO PLAN Presented By Mangesh Dolse. Lead Management Tool( Sample)
Web Information Retrieval Projects Ida Mele. Rules Students can work in teams (max 3 people) The project must be delivered by the deadline that will be.
Xpantrac connection with IDEAL Sloane Neidig, Samantha Johnson, David Cabrera, Erika Hoffman CS /6/2014.
Projects ( ) Ida Mele. Rules Students have to work in teams (max 2 people). The project has to be delivered by the deadline that will be published.
Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.
Analysis and Modeling of the Open Source Software Community Yongqin Gao, Greg Madey Computer Science & Engineering University of Notre Dame Vincent Freeh.
TECHNOLOGICAL ENABLERS TO ASSIST YOUR LIBRARY'S MARKETING STRATEGIES: THE POWER OF SOCIAL MEDIA PRESENTED BY MS MOSHIANE RAMAUBE MS MANDISA LAKHENI.
SOCIAL NETWORKS AND THEIR IMPACTS ON BRANDS Edwin Dionel Molina Vásquez.
Natalie McAllister Jackson | Myappsanywhere ADVANCED SOCIAL MEDIA TACTICS FOR CREATIVE MARKETERS.
Introduction to Information Retrieval CS 5604: Information Storage and Retrieval ProjCINETViz by Maksudul Alam, S M Arifuzzaman, and Md Hasanuzzaman Bhuiyan.
INFOBALT, October 22, 2004, Vinius IST4Balt project information dissemination using web-based knowledge systems Zigmas Bigelis EU projects consultant Asociation.
Fall, Privacy&Security - Virginia Tech – Computer Science Click to edit Master title style Design Extensions to Google+ CS6204 Privacy and Security.
University of California at Santa Barbara Christo Wilson, Bryce Boe, Alessandra Sala, Krishna P. N. Puttaswamy, and Ben Zhao.
User Group Priorities for Development. Assumptions ER system still remains in place –Capture individual user input –Repository of good ideas that will.
What’s New in Search? How destinations can leverage new search trends.
Social Media at LISC June LISC Social Media What is it? New ways to distribute our news and stories that engages, interacts and shares. Why do it?
A Geographical Characterization of YouTube: a Latin American View Fernando Duarte, Fabrício Benevenuto, Virgílio Almeida, Jussara Almeida Federal University.
Data Analysis in YouTube. Introduction Social network + a video sharing media – Potential environment to propagate an influence. Friendship network and.
By Chris Zachor.  Introduction  Background  Changes  Methodology  Data Collection  Network Topologies  Measures  Tools  Conclusion  Questions.
CS 765 – Fall 2014 Paulo Alexandre Regis Reddit analysis.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
A Web Services Search Engine CS 8803 [AIA] - Spring 2008 Roland Krystian Alberciak Piotr Kozikowski Sudnya Padalikar Tushar Sugandhi.
Pinterest By: Rachel Schroeder/ BUS111. Introduction Users grew 400% from September to December of 2011 “Studies show that Pinterest drives more visitors.
To Blog or Not to Blog: Characterizing and Predicting Retention in Community Blogs Imrul Kayes 1, Xiang Zuo 1, Da Wang 2, Jacob Chakareski 3 1 University.
TWITTER What is Twitter, a Social Network or a News Media? Haewoon Kwak Changhyun Lee Hosung Park Sue Moon Department of Computer Science, KAIST, Korea.
Presented By :Ayesha Khan. Content Introduction Everyday Examples of Collaborative Filtering Traditional Collaborative Filtering Socially Collaborative.
B Topological Network Design: Access Networks Dr. Greg Bernstein Grotto Networking
The new online platform. Proposed Platform Evolution 5 year old platform New Platform for the next 5 years Focus in courses and Hot topics User Centric.
03/19/02Scalab Seminar Series1 Mapping the Gnutella Network Macroscopic Properties of Large Scale P2P Systems Ramaswamy N.Vadivelu Scalab, ASU.
Gennaro Cordasco - How Much Independent Should Individual Contacts be to Form a Small-World? - 19/12/2006 How Much Independent Should Individual Contacts.
--He Xiangnan PhD student Importance Estimation of User-generated Data.
What Is SEO? Search engine optimization (SEO) is the art and science of publishing and marketing information that ranks well for valuable keywords in.
Peer Centrality in Socially-Informed P2P Topologies Nicolas Kourtellis, Adriana Iamnitchi Department of Computer Science & Engineering University of South.
Mining real world data Web data. World Wide Web Hypertext documents –Text –Links Web –billions of documents –authored by millions of diverse people –edited.
Building an LDT Alumni Network Team Members: Heidi Boruszewski & Krisha Moeller ED 795/Spring 15 Final Presentation.
LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor : Dr. Koh Jia-Ling Speaker : Tu.
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
Chapter 20: Social Service Selection Service-Oriented Computing: Semantics, Processes, Agents – Munindar P. Singh and Michael N. Huhns, Wiley, 2005.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
1 Discovering Web Communities in the Blogspace Ying Zhou, Joseph Davis (HICSS 2007)
Web mining is the use of data mining techniques to automatically discover and extract information from Web documents/services
Turning Small Business into Big Business. © 2011 Biz2Credit, LLC. All Rights Reserved - Proprietary and Confidential Biz2Credit: In a few words… Biz2Credit.
Twitter Community Discovery & Analysis Using Topologies Andrew McClain Karen Aguar.
Redmond Protocols Plugfest 2016 Sudhi Ramamurthy Excel Integration using Microsoft Graph APIs Program Manager.
Strategy Document. Road Map Total Population: 190 million (1.6% annual growth) Internet Users: 30 million (Penetration: 15%) Mobile Subscribers: 120.
GUILLOU Frederic. Outline Introduction Motivations The basic recommendation system First phase : semantic similarities Second phase : communities Application.
Recommending Forum Posts to Designated Experts
A Network Science Approach to Fake News Detection on Social Media
CS 594: Empirical Methods in HCC Social Network Analysis in HCI
Presentation transcript:

CS 765 – Fall 2014 Paulo Alexandre Regis Reddit analysis

Outline ABOUT REDDIT WHY REDDIT PREVIOUS WORKS INITIAL PROPOSAL Q&A

What is reddit? Reddit is an open-source platform that supports the interaction of communities. It has been used as news hub, Q&A platform, internet hoax/meme propagatio.

Features Subreddits Voting Karma Public API

Why reddit? Growing communities Diverse usage Open-source platform Unexplored opportunities

Why reddit?

The API Easy to parse, returns JSON objects 30 requests per minute limit 60 requests per minute if using Oauth Useful links: Dev community: API documentation:

Previous works PRAW Information and social analysis Identifying social roles Backbone networks

PRAW Python Reddit API Wrapper Open-source Respects Reddit’s guidelines Easy integration Well documented Project website:

Information and social analysis of reddit Insights on comments section Generated 3 social graphs: – Loose: user A comments on user B establishes an edge – Tight: user A commenting on user B and user B commenting on user A – Strict: user A comments 4 times on user B and vice-versa

Information and social analysis of reddit

Limited data collection: – Time constraints – 1% (250) of the top subcommunities crawled Results:

Identifying social roles in reddit Identify specific role (answer-person: responds to questions but only in a few different discussions. i.e. Q&A) in reddit Sampled top users from top submissions and targeted communities Used PRAW Crawler script open- sourcehttps://github.com/cbuntain/redditResponseExtractorhttps://github.com/cbuntain/redditResponseExtractor

(a) Mike Shuttleworth (Ubuntu) IAmA Q&A (b) Regular user from other subreddit

Using backbone networks to map user interests in social media Focus on communities (subreddits) Communities linked by users (bipartite graph) Small-world (shortest path ~= 3.71) Roughly 1/3 of users crawled Anonymized data available:

Initial proposal Analyze the influence of social hubs in reddit’s network. Se if high degree nodes attract more attention from lower degree nodes. An edge would be formed when both nodes comment in the same post. The degree of the nodes would be their predefined “karma”. And it could be compared with other ranking algorithms (i.e. PageRank)

Questions?