CS 765 – Fall 2014 Paulo Alexandre Regis Reddit analysis.

Slides:



Advertisements
Similar presentations
Reddit and Popular Internet Memes Pecha-Kucha
Advertisements

Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
Integrated Digital Event Web Archive and Library (IDEAL) and Aid for Curators Archive-It Partner Meeting Montgomery, Alabama Mohamed Farag & Prashant Chandrasekar.
Understanding Cancer-based Networks in Twitter using Social Network Analysis Dhiraj Murthy Daniela Oliveira Alexander Gross Social Network Innovation Lab.
Finding your friends and following them to where you are by Adam Sadilek, Henry Kautz, Jeffrey P. Bigham Presented by Guang Ling 1.
LYRIC-BASED ARTIST NETWORK METHODOLOGY Derek Gossi CS 765 Fall 2014.
 Review  Methodology –Dataset –Data Cleaning –Technology –Analysis Degree Distribution Hubs Top 100 Evolution Anonymous Users.
Small-World File-Sharing Communities Adriana Iamnitchi, Matei Ripeanu and Ian Foster,
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
 We developed a fast and tunable crawler, Cruiser.  Cruiser uses a master-slave architecture, parallel crawling, and leverages the two-tier topology.
WHAT IS CYTOSCAPE? WHAT CAN I DO WITH IT? HOW DO I IMPORT DATA? HOW DO I VISUALIZE DATA? HOW DO I ANALYZE DATA? WHERE CAN I LEARN MORE?
San Francisco Bay Area News Ecology Daniel Ramos CS790G Fall 2010.
CS 765 – Fall 2014 Paulo Alexandre Regis Reddit analysis.
Final Presentation Undergraduate Researchers: Graduate Student Mentor: Faculty Mentor: Jordan Cowart, Katie Allmeroth Krist Culmer Dr. Wenjun Zeng Investigating.
RSS/ INFORMATION AGGREGATORS Clare Santos- Gacad EDT 180 Nex t.
1 Experiences from extracting large data sets from Swedish public offices Fredrik Liljeros.
Midterm Presentation Undergraduate Researchers: Graduate Student Mentor: Faculty Mentor: Jordan Cowart, Katie Allmeroth Krist Culmer Dr. Wenjun (Kevin)
San Francisco Bay Area News Ecology Hayreddin Ceker.
Web Information Retrieval Projects Ida Mele. Rules Students can work in teams (max 3 people) The project must be delivered by the deadline that will be.
On the Anonymity of Anonymity Systems Andrei Serjantov (anonymous)
Projects ( ) Ida Mele. Rules Students have to work in teams (max 2 people). The project has to be delivered by the deadline that will be published.
Social Network Analysis via Factor Graph Model
TECHNOLOGICAL ENABLERS TO ASSIST YOUR LIBRARY'S MARKETING STRATEGIES: THE POWER OF SOCIAL MEDIA PRESENTED BY MS MOSHIANE RAMAUBE MS MANDISA LAKHENI.
Multigraph Sampling of Online Social Networks Minas Gjoka, Carter Butts, Maciej Kurant, Athina Markopoulou 1Multigraph sampling.
SOCIAL NETWORKS AND THEIR IMPACTS ON BRANDS Edwin Dionel Molina Vásquez.
Introduction to Information Retrieval CS 5604: Information Storage and Retrieval ProjCINETViz by Maksudul Alam, S M Arifuzzaman, and Md Hasanuzzaman Bhuiyan.
University of California at Santa Barbara Christo Wilson, Bryce Boe, Alessandra Sala, Krishna P. N. Puttaswamy, and Ben Zhao.
1 Using Information Systems for Decision Making BUS Abdou Illia, Spring 2007 (Week 13, Thursday 4/5/2007)
Data Analysis in YouTube. Introduction Social network + a video sharing media – Potential environment to propagate an influence. Friendship network and.
By Chris Zachor.  Introduction  Background  Changes  Methodology  Data Collection  Network Topologies  Measures  Tools  Conclusion  Questions.
Collusion-Resistance Misbehaving User Detection Schemes Speaker: Jing-Kai Lou 2015/10/131.
TWITTER What is Twitter, a Social Network or a News Media? Haewoon Kwak Changhyun Lee Hosung Park Sue Moon Department of Computer Science, KAIST, Korea.
Science: Graph theory and networks Dr Andy Evans.
Structural Properties of Networks: Introduction Networked Life NETS 112 Fall 2015 Prof. Michael Kearns.
 Building Networks. First Decisions  What do the nodes represent?  What do the edges represent?  Know this before doing anything with data!
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
Charting Collections of Connections in Social Media: Creating Visualizations with NodeXL Cody Dunne Philip Merrill College of Journalism.
User Interactions in Social Networks and their Implications Christo Wilson, Bryce Boe, Alessandra Sala, Krishna P. N. Puttaswamy, Ben Y. Zhao (UC Santa.
Advanced Software Engineering PROJECT November 2015.
Tagging Systems and Their Effect on Resource Popularity Austin Wester.
Community-enhanced De-anonymization of Online Social Networks Shirin Nilizadeh, Apu Kapadia, Yong-Yeol Ahn Indiana University Bloomington CCS 2014.
An Effective Method to Improve the Resistance to Frangibility in Scale-free Networks Kaihua Xu HuaZhong Normal University.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
“Pajek”: Large Network Analysis. 2 Agenda Introduction Network Definitions Network Data Files Network Analysis 2.
CS 590 Term Project Epidemic model on Facebook
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
Networks are connections and interactions. Networks are present in every aspect of life. Examples include economics/social/political sciences. Networks.
Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.
Twitter Community Discovery & Analysis Using Topologies
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Advisor-Advisee Relationships from Research Publication.
Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.
Network Computing Laboratory Load Balancing and Stability Issues in Algorithms for Service Composition Bhaskaran Raman & Randy H.Katz U.C Berkeley INFOCOM.
Knowledge Building CSC 249 Fall Knowledge Building Questions  A question that asks why, or how  not a simple yes/no question  A question for.
1 Patterns of Cascading Behavior in Large Blog Graphs Jure Leskoves, Mary McGlohon, Christos Faloutsos, Natalie Glance, Matthew Hurst SDM 2007 Date:2008/8/21.
Twitter Community Discovery & Analysis Using Topologies Andrew McClain Karen Aguar.
Data Science W205 Project Presentation Building a Subreddit Profiler Jason Goodman.
GUILLOU Frederic. Outline Introduction Motivations The basic recommendation system First phase : semantic similarities Second phase : communities Application.
Use cases for BnF broad crawls Annick Lorthios. 2 Step by step, the first in-house broad crawl The 2010 broad crawl has been performed in-house at the.
Cmpe 588- Modeling of Internet Emergence of Scale-Free Network with Chaotic Units Pulin Gong, Cees van Leeuwen by Oya Ünlü Instructor: Haluk Bingöl.
PSY 103 Week 4 DQ 1 Locate a current article from the Internet, University library, or other resources (newspaper/magazine) that concerns some aspect of.
BIO 315 Week 2 Learning Team Exercises Complete the following: After reading Ch. 1 of Essentials of Ecology, create an outline of the main interactions.
A Network Science Approach to Fake News Detection on Social Media
Information propagation in social networks
Multi-Dimensional Data Visualization
Dejun Yang (Arizona State University)
Fintan The Amazing Fish of Knowledge…
A graphing calculator is required for some problems or parts of problems 2000.
CS 594: Empirical Methods in HCC Social Network Analysis in HCI
...and the Scientific Method
Analyzing Two Participation Strategies in an Undergraduate Course Community Francisco Gutierrez Gustavo Zurita
Presentation transcript:

CS 765 – Fall 2014 Paulo Alexandre Regis Reddit analysis

Outline REVIEW REDDIT API DATA COLLECTION / CLEANING NETWORK CREATION TOOLS CONCLUSION Q&A

What is reddit? Reddit is an open-source platform that supports the interaction of communities. It has been used as news hub, Q&A platform, internet hoax/meme propagation. Some characteristics include voting, posting, commenting. Has public API that allows data crawling. Has not been deeply studied.

The API 30 requests per minute limit, max. 100 results per request = 3000 results per minute PRAW: API wrapper, takes care of API limits Comment tree can be flattened by PRAW (not like described in the report)

Comment tree

Data collection Total subreddits (17 MB) Filtered subreddits Posts (721 MB) Comments * (> 2GB) Usersunknown * Estimated, in progress

Reddit stats

Data cleaning At least 300 subscribers Not a snapshot Reddit doesn’t stop! Repeated results Anonymizer before data is available

Data cleaning

Network creation Nodes are users Edges happens when they comment on the same post Examine when a threshold is applied

Tools PRAW (did I mention this library is important?) Graph visualizer (Pajek, Gephi, igraph)

Analysis proposal Calculate the node degree (number of links) in different scenarios Compare the calculated value with the users “karma” Compare network with other social networks previous studies Is it power-law? Small-world?

Conclusion Time constraint Expected crawling time? More 2 weeks just for comments Plan B: analyze with data collected

Questions?