Scalable Learning of Collective Behavior Based on Sparse Social Dimensions Lei Tang, Huan Liu CIKM ’ 09 Speaker: Hsin-Lan, Wang Date: 2010/02/01.

Slides:



Advertisements
Similar presentations
Entity-Centric Topic-Oriented Opinion Summarization in Twitter Date : 2013/09/03 Author : Xinfan Meng, Furu Wei, Xiaohua, Liu, Ming Zhou, Sujian Li and.
Advertisements

Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida.
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Social Media Mining Chapter 5 1 Chapter 5, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool, September, 2010.
Predictive Semantic Social Media Analysis David A. Ostrowski System Analytics and Environmental Sciences Research and Advanced Engineering Ford Motor Company.
1 Modularity and Community Structure in Networks* Final project *Based on a paper by M.E.J Newman in PNAS 2006.
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool,
Discovering Overlapping Groups in Social Media Xufei Wang, Lei Tang, Huiji Gao, and Huan Liu Arizona State University.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Unsupervised Learning of Categories from Sets of Partially Matching Image Features Dominic Rizzo and Giota Stratou.
Lesson 8: Machine Learning (and the Legionella as a case study) Biological Sequences Analysis, MTA.
Rodent Behavior Analysis Tom Henderson Vision Based Behavior Analysis Universitaet Karlsruhe (TH) 12 November /9.
Social Position & Social Role Lei Tang 2009/02/13.
Analyzing System Logs: A New View of What's Important Sivan Sabato Elad Yom-Tov Aviad Tsherniak Saharon Rosset IBM Research SysML07 (Second Workshop on.
A scalable multilevel algorithm for community structure detection
Map-Reduce and Parallel Computing for Large-Scale Media Processing Youjie Zhou.
Mobile Filtering for Error-Bounded Data Collection in Sensor Networks Dan Wang Hong Kong Polytechnic Univ. Jianliang Xu ∗ Hong Kong Baptist Univ. Jiangchuan.
EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University.
COMMUNITIES IN MULTI-MODE NETWORKS 1. Heterogeneous Network Heterogeneous kinds of objects in social media – YouTube Users, tags, videos, ads – Del.icio.us.
Nirmalya Roy School of Electrical Engineering and Computer Science Washington State University Cpt S 223 – Advanced Data Structures Graph Algorithms: Minimum.
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
Data mining and machine learning A brief introduction.
Lecture 18 Community structures Slides modified from Huan Liu, Lei Tang, Nitin Agarwal.
Focused Matrix Factorization for Audience Selection in Display Advertising BHARGAV KANAGAL, AMR AHMED, SANDEEP PANDEY, VANJA JOSIFOVSKI, LLUIS GARCIA-PUEYO,
Presented by Tienwei Tsai July, 2005
C LUSTERING NETWORKED DATA BASED ON LINK AND SIMILARITY IN A CTIVE LEARNING Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang 1.
Community Evolution in Dynamic Multi-Mode Networks Lei Tang, Huan Liu Jianping Zhang Zohreh Nazeri Danesh Zandi & Afshin Rahmany Spring 12SRBIAU, Kurdistan.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
DATA MINING LECTURE 13 Pagerank, Absorbing Random Walks Coverage Problems.
Data Mining and Machine Learning Lab Unsupervised Feature Selection for Linked Social Media Data Jiliang Tang and Huan Liu Computer Science and Engineering.
Uncovering Overlap Community Structure in Complex Networks using Particle Competition Fabricio A. Liang
Mining Social Networks for Personalized Prioritization Shinjae Yoo, Yiming Yang, Frank Lin, II-Chul Moon [KDD ’09] 1 Advisor: Dr. Koh Jia-Ling Reporter:
1 Helping Editors Choose Better Seed Sets for Entity Set Expansion Vishnu Vyas, Patrick Pantel, Eric Crestan CIKM ’ 09 Speaker: Hsin-Lan, Wang Date: 2010/05/10.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Advisor : Prof. Sing Ling Lee Student : Chao Chih Wang Date :
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Spectral Sequencing Based on Graph Distance Rong Liu, Hao Zhang, Oliver van Kaick {lrong, haoz, cs.sfu.ca {lrong, haoz, cs.sfu.ca.
Learning Spectral Clustering, With Application to Speech Separation F. R. Bach and M. I. Jordan, JMLR 2006.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
CISC Machine Learning for Solving Systems Problems Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware Automatic.
CS378 Final Project The Netflix Data Set Class Project Ideas and Guidelines.
Data Structures and Algorithms in Parallel Computing Lecture 7.
About Me Swaroop Butala  MSCS – graduating in Dec 09  Specialization: Systems and Databases  Interests:  Learning new technologies  Application of.
Mining information from social media
CS 590 Term Project Epidemic model on Facebook
Unsupervised Streaming Feature Selection in Social Media
{ Adaptive Relevance Feedback in Information Retrieval Yuanhua Lv and ChengXiang Zhai (CIKM ‘09) Date: 2010/10/12 Advisor: Dr. Koh, Jia-Ling Speaker: Lin,
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
3.3 Network-Centric Community Detection  Network-Centric Community Detection –consider the global topology of a network. –It aims to partition nodes of.
A K-Main Routes Approach to Spatial Network Activity Summarization(SNAS) Group 8.
Item Based Recommender System SUPERVISED BY: DR. MANISH KUMAR BAJPAI TARUN BHATIA ( ) VAIBHAV JAISWAL( )
GUILLOU Frederic. Outline Introduction Motivations The basic recommendation system First phase : semantic similarities Second phase : communities Application.
Label Embedding Trees for Large Multi-class Tasks Samy Bengio Jason Weston David Grangier Presented by Zhengming Xing.
CATEGORIZATION OF NEWS ARTICLES USING NEURAL TEXT CATEGORIZER
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Greedy Algorithm for Community Detection
Dissertation for the degree of Philosophiae Doctor (PhD)
Hierarchical and Ensemble Clustering
Learning with information of features
Hierarchical and Ensemble Clustering
Scaling up Link Prediction with Ensembles
3.3 Network-Centric Community Detection
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.
GANG: Detecting Fraudulent Users in OSNs
NON-NEGATIVE COMPONENT PARTS OF SOUND FOR CLASSIFICATION Yong-Choon Cho, Seungjin Choi, Sung-Yang Bang Wen-Yi Chu Department of Computer Science &
Presentation transcript:

Scalable Learning of Collective Behavior Based on Sparse Social Dimensions Lei Tang, Huan Liu CIKM ’ 09 Speaker: Hsin-Lan, Wang Date: 2010/02/01

2 Outline Introduction Collective Behavior Learning Social Dimensions Algorithm Edge-Centric View K-means Variant Experiment Setup Experiment Results Conclusions and Future Work

3 Introduction Social media facilitate people of all walks of life to connect to each other. We study how networks in social media can help predict some sorts of human behavior and individual preference.

4 Introduction In social media, the connections of the same network are not homogeneous. However, this relation type information is not readily available in reality. A framework based on social dimensions is proposed to address this heterogeneity.

5 Introduction In the initial study, modularity maximization is exploited to extract social dimensions. With huge number of actors, the dimensions cannot even be held in memory. In this work, we propose an effective edge- centric approach to extract sparse social dimensions.

6 Collective Behavior Learning When people are exposed in a social network environment, their behaviors can be influenced by the behaviors of their friends. People are more likely to connect to others sharing certain similarity with them.

7 Collective Behavior Learning K class labels network V is the vertex set, E is the edge set and are the class labels of a vertex Given known values of for some subsets of vertices. How to infer the values of for the remaining vertices

8 Social Dimensions

9 To address the heterogeneity presented in connections, we have proposed a framework (SocDim) for collective behavior learning. Framework SocDim is composed of two steps: 1. social dimension extraction 2. discriminative learning

10 Social Dimensions These social dimensions can be treated as features of actors. Since network is converted into features, typical classifier such as support vector machine can be employed.

11 Social Dimensions Concerns about the scalability of SocDim with modularity maximization: The social dimensions extracted according to modularity maximization are dense. Requires the computation of the top eigenvectors of a modularity matrix which is of size n*n. The dynamic nature of networks entails efficient update of the model for collective behavior prediction.

12 Algorithm - Edge-Centric View Treat each edge as one instance, and the nodes that define edges as features.

13 Algorithm - Edge-Centric View Based on the features of each edge, we can cluster the edges into two sets. One actor is considered associated with one affiliation as long as any of his connections is assigned to that affiliation.

14 Algorithm - Edge-Centric View In summary, to extract social dimensions, we cluster edges rather than nodes in a network into disjoint sets. Because the affiliations of one actor are no more than the connections he has, the social dimensions based on edge-centric clustering are guaranteed to be sparse.

15 Algorithm - K-means Variant

16 Algorithm

17 Experiment Setup - Social Media Data

18 Experiment Results - Prediction Performance

19 Experiment Results - Prediction Performance

20 Experiment Results - Prediction Performance Prediction performance on all the studied social media data is around % for F1 measure. This is partly due to : large number of labels in the data only employ the network information

21 Experiment Results - Scalability Study

22 Experiment Results - Scalability Study

23 Experiment Results - Sensitivity Study

24 Conclusions and Future Work To address the scalability issue, we propose an edge-centric clustering scheme to extract social dimensions and a scalable k-means variant to handle edge clustering. The model based on the sparse social dimensions shows comparable prediction performance as earlier proposed approaches to extract social dimensions.

25 Conclusions and Future Work In reality, each edge can be associated with multiple affiliations while our current model assumes only one dominant affiliation. The proposed EdgeCluster model is sensitive to the number of social dimensions.