Final Presentation Undergraduate Researchers: Graduate Student Mentor: Faculty Mentor: Jordan Cowart, Katie Allmeroth Krist Culmer Dr. Wenjun Zeng Investigating.

Slides:



Advertisements
Similar presentations
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Advertisements

Test Case Management and Results Tracking System October 2008 D E L I V E R I N G Q U A L I T Y (Short Version)
University of Illinois Visualizing Text Loretta Auvil UIUC February 25, 2011.
This work was supported by the TRUST Center (NSF award number CCF ) Introduction With recent advances in technology comes an increase in the quantity.
DISTRIBUTED COMPUTING & MAP REDUCE CS16: Introduction to Data Structures & Algorithms Thursday, April 17,
Development of a visual studio plugin to visualize a Blocks-Graph
Building and Analyzing Social Networks Web Data and Semantics in Social Network Applications Dr. Bhavani Thuraisingham February 15, 2013.
Topical Crawling for Business Intelligence Gautam Pant * and Filippo Menczer ** * Department of Management Sciences The University of Iowa, Iowa City,
GenSpace: Exploring Social Networking Metaphors for Knowledge Sharing and Scientific Collaborative Work Chris Murphy, Swapneel Sheth, Gail Kaiser, Lauren.
DEPARTMENT OF COMPUTER SCIENCE SOFTWARE ENGINEERING, GRAPHICS, AND VISUALIZATION RESEARCH GROUP 15th International Conference on Information Visualisation.
Web Projections Learning from Contextual Subgraphs of the Web Jure Leskovec, CMU Susan Dumais, MSR Eric Horvitz, MSR.
Software Issues Derived from Dr. Fawcett’s Slides Phil Pratt-Szeliga Fall 2009.
Midterm Presentation Undergraduate Researchers: Graduate Student Mentor: Faculty Mentor: Jordan Cowart, Katie Allmeroth Krist Culmer Dr. Wenjun (Kevin)
Big Data Course Plans at Purdue Ananth Iyer. Big Data/Analytics Coursera course on Big Data by Bill Howe claims that Big Data involves issues of
Web Information Retrieval Projects Ida Mele. Rules Students can work in teams (max 3 people) The project must be delivered by the deadline that will be.
Projects ( ) Ida Mele. Rules Students have to work in teams (max 2 people). The project has to be delivered by the deadline that will be published.
Analysis and Modeling of the Open Source Software Community Yongqin Gao, Greg Madey Computer Science & Engineering University of Notre Dame Vincent Freeh.
A Distributed and Privacy Preserving Algorithm for Identifying Information Hubs in Social Networks M.U. Ilyas, Z Shafiq, Alex Liu, H Radha Michigan State.
Introduction to Information Retrieval CS 5604: Information Storage and Retrieval ProjCINETViz by Maksudul Alam, S M Arifuzzaman, and Md Hasanuzzaman Bhuiyan.
Bibliometric Analysis with Sci2: Choose Your Own Adventure Laura Ridenour School of Library and Information Science, Indiana University.
Fall, Privacy&Security - Virginia Tech – Computer Science Click to edit Master title style Design Extensions to Google+ CS6204 Privacy and Security.
Configuration Management and Server Administration Mohan Bang Endeca Server.
1 BTEC HNC Systems Support Castle College 2007/8 Systems Analysis Lecture 9 Introduction to Design.
Zhonghua Qu and Ovidiu Daescu December 24, 2009 University of Texas at Dallas.
Copyright © Allyn & Bacon 2008 POWER PRACTICE Chapter 7 The Internet and the World Wide Web START This multimedia product and its contents are protected.
Automatic Identification of Concurrency in Handel-C Joseph C Libby, Kenneth B Kent, Farnaz Gharibian Faculty of Computer Science University of New Brunswick.
Data Analysis in YouTube. Introduction Social network + a video sharing media – Potential environment to propagate an influence. Friendship network and.
CS 765 – Fall 2014 Paulo Alexandre Regis Reddit analysis.
Dynamic Content On Edge Cache Server (using Microsoft.NET) Name: Aparna Yeddula CS – 522 Semester Project Project URL: cs.uccs.edu/~ayeddula/project.html.
Introduction to Nutch CSCI 572: Information Retrieval and Search Engines Summer 2010.
Mantid Development introduction Nick Draper 11/04/2008.
Nick Draper 05/11/2008 Mantid Manipulation and Analysis Toolkit for ISIS data.
NEWonline 2.0 What’s in it for you Features and functions of the Network’s all-new web platform June 15, 2009.
Freelib: A Self-sustainable Digital Library for Education Community Ashraf Amrou, Kurt Maly, Mohammad Zubair Computer Science Dept., Old Dominion University.
Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.
June 25, 2007 ALA Washington, DC Emmanuel d’Alzon Library Assumption College Using Your LibQUAL+ Results Dr. Dawn Thistle Director of Library Services.
Google Refine for Data Quality / Integrity. Context BioVeL Data Refinement Workflow Synonym Expansion / Occurrence Retrieval Data Selection Data Quality.
Introduction to the Semantic Web and Linked Data
Machine Learning as a Service
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Implementation of a Relational Database as an Aid to Automatic Target Recognition Christopher C. Frost Computer Science Mentor: Steven Vanstone.
Mantid Stakeholder Review Nick Draper 01/11/2007.
Reviews Crawler (Detection, Extraction & Analysis) FOSS Practicum By: Syed Ahmed & Rakhi Gupta April 28, 2010.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
GROUP PresentsPresents. WEB CRAWLER A visualization of links in the World Wide Web Software Engineering C Semester Two Massey University - Palmerston.
Topical Analysis and Visualization of (Network) Data Using Sci2 Ted Polley Research & Editorial Assistant Cyberinfrastructure for Network Science Center.
The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
Indexing The World Wide Web: The Journey So Far Abhishek Das, Ankit Jain 2011 Paper Presentation : Abhishek Rangnekar 1.
GRAPH AND LINK MINING 1. Graphs - Basics 2 Undirected Graphs Undirected Graph: The edges are undirected pairs – they can be traversed in any direction.
Data mining in web applications
Big Data is a Big Deal!.
Multi-Layer Network Representation of the NTC Environment Lili Sun, Proof School Arijit Das, Computer Science Introduction The United States Army’s National.
Open Source distributed document DB for an enterprise
Spark Presentation.
NETVIZZ Facebook
Network Visualization Software
A Network Science Approach to Fake News Detection on Social Media
Comparison of Social Networks by Likhitha Ravi
RELATIONSHIP MINING IN SOCIAL NETWORKS
Clustering tweets and webpages
The Anatomy of a Large-Scale Hypertextual Web Search Engine
Analyzing and Securing Social Networks
Data Exploration Of Wikipedia
Collection Management Webpages Final Presentation
Procedural Information Extraction from Text:
Graph and Link Mining.
Academic & More Group 4 谢知晖 王逸雄 郭嘉宋 程若愚.
Analyzing Massive Graphs - ParT I
Presentation transcript:

Final Presentation Undergraduate Researchers: Graduate Student Mentor: Faculty Mentor: Jordan Cowart, Katie Allmeroth Krist Culmer Dr. Wenjun Zeng Investigating the effectiveness of EVC and PCC as community-centered centrality measures

Outline Personal Experiences Project Overview Implementations ▫Scripts ▫Gephi ▫Web Crawler ▫EVC / PCC Results ▫Data Analysis Conclusion Accomplishments Future Work 2

Outline Personal Experiences Project Overview Implementations ▫Scripts ▫Gephi ▫Web Crawler ▫EVC / PCC Results ▫Data Analysis Conclusion Accomplishments Future Work 3

Personal Experiences Learned: ▫Python  Built a web crawler ▫Social Networks  Centrality measures  Social Capital ▫Graph Visualization software; Gephi and SNAP  Extend functionality of open source software (PCC Plugin) ▫Research process ▫Related research 4

Outline Personal Experiences Project Overview Implementations ▫Scripts ▫Gephi ▫Web Crawler ▫EVC / PCC Results ▫Data Analysis Conclusion Accomplishments Future Work 5

Project Overview 6 Which node is more important, A or B?

Project Overview Node A: ▫Current centrality measures would say this node is more important because it has a large number of connections. But what about Node B? ▫It’s connections are more diverse; indicating it has a higher value of social capital 7 Which node is more important, A or B?

Project Overview Goal of project: ▫Investigate if/how current centrality measures take community membership into account and if necessary develop a centrality measure that considers community membership Why is this important: ▫It will allow us to have a better understanding of who the most important individuals are in a network 8

Two measures of centrality Eigenvector Centrality (EVC) Principal Component Centrality (PCC) 9 Ilyas, M.U.; Radha, H. "Identifying Influential Nodes in Online Social Networks Using Principal Component Centrality", Communications (ICC), 2011 IEEE International Conference on, On page 6.

Project Overview 1. Develop a web crawler to extract user data from the Mizzou Facebook graph, including ground-truth user- defined communities 2. Apply Eigenvector Centrality (EVC) and Principal Component Centrality (PCC) to collected and provided data sets to characterize properties of the algorithms 3. Develop analysis tools as needed 4. Develop a centrality measure that takes community membership and resource availability into consideration 10

Outline Personal Experiences Project Overview Implementations ▫Scripts ▫Gephi ▫Web Crawler ▫EVC / PCC Results ▫Data Analysis Conclusion Accomplishments Future Work 11

Scripts Wrote a ton of scripts to aid us with our work ▫Wrote scripts to:  Clean data  Extract information from a text and convert to a.graph file  To visualize SNAP data in Gephi  Color graph  To compile information to make pie charts (shown later) 12

Gephi Visualization and exploration platform for networks Understanding data like below.. is hard. 13

14 Easier, right?

Web Crawler Built a web crawler in Python to gather publicly available Facebook Data ▫Faster to deploy and more flexible than Facebook’s API Collects only information from Mizzou declared students ▫Accomplishes this by using a targeted, breadth- first search 15

Web Crawler Collected the following data: ▫Major, gender, home town, Mizzou friends, Mizzou related likes Majors are condensed into colleges (e.g. Computer Science -> College of Engineering) Likes are condensed into general community resources (e.g. GPC, MSA, GSA -> Student Governance) 16

Web Crawler 17

Web Crawler Problems encountered: ▫Facebook’s HTML/CSS (Inconsistent format) ▫BeautifulSoup’s Speed and Parsing ▫Logging in + staying logged in ▫Gathering entire Friends/Like list ▫Complete error handling (URLLib / Regex Errors) ▫Facebook probation 18

EVC / PCC Used the EVC implementation found in the Stanford Network Analysis Platform (SNAP) and in Gephi Developed a PCC analysis tool using functions made available in the SNAP C++ Library Problems encountered: ▫Poor documentation ▫Typedef’s for everything 19

Outline Personal Experiences Project Overview Implementations ▫Scripts ▫Gephi ▫Web Crawler ▫EVC / PCC Results ▫Data Analysis Conclusion Accomplishments Future Work 20

Data Analysis 21

22 Data Analysis - EVC

Data Analysis - PCC 23

Data Analysis Looked at friendship demographics for the top 20% ▫Education majors tend to cluster with themselves ▫Graduates brag about credentials more than undergrads Kris plans to look at portion of friends for undergraduate vs. graduates and see how their demographics compare 24

Outline Personal Experiences Project Overview Implementations ▫Scripts ▫Gephi ▫Web Crawler ▫EVC / PCC Results ▫Data Analysis Conclusion Accomplishments Future Work 25

Conclusion Based on analysis from open source data sets, it appears that a centrality measure that explicitly takes community membership into account is needed to determine the criticality of a node or nodes in a social network We developed a framework of principles for the new centrality measure ▫Resilience against isolation due to edge severance ▫Number of Inter/Extra-community connections ▫Value of community membership ▫Ability to direct resources to (influence) other nodes 26

Outline Personal Experiences Project Overview Implementations ▫Scripts ▫Gephi ▫Web Crawler ▫EVC / PCC Results ▫Data Analysis Conclusion Accomplishments Future Work 27

Accomplishments Starting a research project from scratch Learned Python Built a web crawler Learned to work with open source frameworks Survived REU experience 28

Outline Personal Experiences Project Overview Implementations ▫Scripts ▫Gephi ▫Web Crawler ▫EVC / PCC Results ▫Data Analysis Conclusion Accomplishments Future Work 29

Future Work Collect more data from Mizzou Facebook Improve the web crawler ▫Implement Parallel crawls ▫Make available for other researchers Refine our definition of social capital as needed Research on other social networks ▫Twitter, LinkedIn, etc. Using the previously mentioned principles, develop a new centrality method 30

Thank you! 31