RELATIONSHIP MINING IN SOCIAL NETWORKS CS 8803 AIA RELATIONSHIP MINING IN SOCIAL NETWORKS GROUP 16 Abhishek Saxena Ankit Kharadi Chirag Rajan
Outline Introduction Goal (s) of our Project Social Networks Principles Phases of our project Components of our project Applications
Introduction Social Networks are growing at an exponential rate These networks contain a wealth of information which can be used
Goal(s) of our Project To uncover hidden associations as functions of visible relationships among social network entities Visualize the relations
Statistics about Top Social Networks Table 1 Top 10 Social Networking Sites For April 2006 (US, Home and Work) Site Apr-05 Apr-06 YOY Growth MySpace 8210 38359 367% Blogger 10301 18508 80% Classmates Online 11672 12865 10% YouTube NA 12505 MSN Groups 12352 10570 -14% AOL Hometown 11236 9590 -15% Yahoo! Groups 8262 9165 11% MSN Spaces 1857 7165 286% Six Apart TypePad 5065 6711 32% Xanga 5202 6631 27%
Top Social Networking Sites 1) MySpace 2) Facebook 3) Takepart 4) Nextcat 5) LinkedIn 6) Friendster 7) Flickr 8) Massify 9) Idealist.org 10) Goodreads.com
Principles Mining results should be governed by user preferences Multiple Relationships play a role in the existence of a hidden relationship
Phases of our project Phase 1: Gathering data Phase 2: Transforming data into usable format Phase 3: Mining named entities using CRF based extractor Phase 4: Applying Data Mining Techniques based on expectation maximization Phase 5: Visualization of Results
What we used.... We chose DBLP as the social network to mine Through this project we hope to shed light on advantages and issues pertinent to mining hidden relationships
Components of our project We have used the following tools/languages to extract the data and process it to generate relational information.... Crawlers Parsers Stanford NER Library MATLAB JUNG
Crawlers We have used crawlers (mostly WebSphinx) to crawl researchers web pages and fetch data about papers published and ... In which conferences In what years
Stanford NER Library The Stanford NER Library is an open source library used to perform analysis of given data The NER can detect 3 kinds of entities namely... Person Location Organization It employs a conditional random field based extractor
MATLAB We used MATLAB to perform matrix analysis on the given graphs MATLAB offers several tools to compute statistical measures over massive data sets
JUNG JUNG is a Java Library that helps draw graphs and visualize complex relationships among different entities We're using JUNG to visualize the relationships between different authors
Output 0.072 VLDB 0.181 ICDE 0.602 KDD 0.145 SIGMOD Coefficient Query : Researcher X,Researcher Y,Researcher Z 0.072 VLDB 0.181 ICDE 0.602 KDD 0.145 SIGMOD Coefficient Relation
Applications Friend Suggestion Targeted Marketing Network Prediction
Thank You Questions/Comments?