Mining Social Ties Beyond Homophily Hongwei Liang * Ke Wang * Feida Zhu # * Simon Fraser University, Canada # Singapore Management University, Singapore.

Slides:

Advertisements

Similar presentations

Finding Skyline Nodes in Large Networks. Evaluation Metrics:  Distance from the query node. (John)  Coverage of the Query Topics. (Big Data, Cloud Computing,

Advertisements

Modeling and Querying Possible Repairs in Duplicate Detection George Beskales Mohamed A. Soliman Ihab F. Ilyas Shai Ben-David.

Jianxin Li, Chengfei Liu, Rui Zhou Swinburne University of Technology, Australia Wei Wang University of New South Wales, Australia Top-k Keyword Search.

LEARNING INFLUENCE PROBABILITIES IN SOCIAL NETWORKS Amit Goyal Francesco Bonchi Laks V. S. Lakshmanan University of British Columbia Yahoo! Research University.

Frequent Closed Pattern Search By Row and Feature Enumeration

Mining Top-K Large Structural Patterns in a Massive Network Feida Zhu 1, Qiang Qu 2, David Lo 1, Xifeng Yan 3, Jiawei Han 4, and Philip S. Yu 5 1 Singapore.

Ming Hua, Jian Pei Simon Fraser UniversityPresented By: Mahashweta Das Wenjie Zhang, Xuemin LinUniversity of Texas at Arlington The University of New South.

Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.

Data Mining Association Analysis: Basic Concepts and Algorithms

Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.

1 Synthesizing High-Frequency Rules from Different Data Sources Xindong Wu and Shichao Zhang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL.

Data Mining – Intro.

Predicting Missing Provenance Using Semantic Associations in Reservoir Engineering Jing Zhao University of Southern California Sep 19 th,

On Anomalous Hot Spot Discovery in Graph Streams

Using Friendship Ties and Family Circles for Link Prediction Elena Zheleva, Lise Getoor, Jennifer Golbeck, Ugur Kuter (SNAKDD 2008)

Abrar Fawaz AlAbed-AlHaq Kent State University October 28, 2011

Link Recommendation In P2P Social Networks Yusuf Aytaş, Hakan Ferhatosmanoğlu, Özgür Ulusoy Bilkent University, Ankara, Turkey.

1 SciCSM: Novel Contrast Set Mining over Scientific Datasets Using Bitmap Indices Gangyi Zhu, Yi Wang, Gagan Agrawal The Ohio State University.

Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.

Mehdi Kargar Aijun An York University, Toronto, Canada Discovering Top-k Teams of Experts with/without a Leader in Social Networks.

1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor.

Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.

Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.

Understanding Cross-site Linking in Online Social Networks Yang Chen 1, Chenfan Zhuang 2, Qiang Cao 1, Pan Hui 3 1 Duke University 2 Tsinghua University.

Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,

Querying Structured Text in an XML Database By Xuemei Luo.

Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.

To Blog or Not to Blog: Characterizing and Predicting Retention in Community Blogs Imrul Kayes 1, Xiang Zuo 1, Da Wang 2, Jacob Chakareski 3 1 University.

Mining High Utility Itemset in Big Data

Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.

P-Rank: A Comprehensive Structural Similarity Measure over Information Networks CIKM’ 09 November 3 rd, 2009, Hong Kong Peixiang Zhao, Jiawei Han, Yizhou.

RecBench: Benchmarks for Evaluating Performance of Recommender System Architectures Justin Levandoski Michael D. Ekstrand Michael J. Ludwig Ahmed Eldawy.

1/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science

A Method for Mining Infrequent Causal Associations and Its Application in Finding Adverse Drug Reaction Signal Pairs.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.

Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.

Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.

Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.

Towards Efficient Large-Scale VPN Monitoring and Diagnosis under Operational Constraints Yao Zhao, Zhaosheng Zhu, Yan Chen, Northwestern University Dan.

Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.

Clustering by Pattern Similarity in Large Data Sets Haixun Wang, Wei Wang, Jiong Yang, Philip S. Yu IBM T. J. Watson Research Center Presented by Edmond.

Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC.

Most of contents are provided by the website Introduction TJTSD66: Advanced Topics in Social Media Dr.

MINING COLOSSAL FREQUENT PATTERNS BY CORE PATTERN FUSION FEIDA ZHU, XIFENG YAN, JIAWEI HAN, PHILIP S. YU, HONG CHENG ICDE07 Advisor: Koh JiaLing Speaker:

Measuring Association Rules Shan “Maggie” Duanmu Project for CSCI 765 Dec 9 th 2002.

Lecture by: Prof. Pooja Vaishnav.  Language Processor implementations are highly influenced by the kind of storage structure used for program variables.

Patch Based Prediction Techniques University of Houston By: Paul AMALAMAN From: UH-DMML Lab Director: Dr. Eick.

Measuring Behavioral Trust in Social Networks

Mining Top-K Large Structural Patterns in a Massive Network Feida Zhu 1, Qiang Qu 2, David Lo 1, Xifeng Yan 3, Jiawei Han 4, and Philip S. Yu 5 1 Singapore.

1 Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery.

Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,

More Than Relevance: High Utility Query Recommendation By Mining Users' Search Behaviors Xiaofei Zhu, Jiafeng Guo, Xueqi Cheng, Yanyan Lan Institute of.

Kijung Shin Jinhong Jung Lee Sael U Kang

Supervised Random Walks: Predicting and Recommending Links in Social Networks Lars Backstrom (Facebook) & Jure Leskovec (Stanford) Proc. of WSDM 2011 Present.

Efficient Discovery of XML Data Redundancies Cong Yu and H. V. Jagadish University of Michigan, Ann Arbor - VLDB 2006, Seoul, Korea September 12 th, 2006.

University at BuffaloThe State University of New York Pattern-based Clustering How to cluster the five objects? qHard to define a global similarity measure.

1 Double-Patterning Aware DSA Template Guided Cut Redistribution for Advanced 1-D Gridded Designs Zhi-Wen Lin and Yao-Wen Chang National Taiwan University.

Outline Introduction State-of-the-art solutions Equi-Truss Experiments

Cohesive Subgraph Computation over Large Graphs

Evolutionary Technique for Combinatorial Reverse Auctions

Ishan Sharma Abhishek Mittal Vivek Raj

E-Commerce Theories & Practices

CARPENTER Find Closed Patterns in Long Biological Datasets

Using Friendship Ties and Family Circles for Link Prediction

Resource Allocation for Distributed Streaming Applications

Route Metric Proposal Date: Authors: July 2007 Month Year

Relax and Adapt: Computing Top-k Matches to XPath Queries

Accelerating Regular Path Queries using FPGA

Presentation transcript:

Mining Social Ties Beyond Homophily Hongwei Liang * Ke Wang * Feida Zhu # * Simon Fraser University, Canada # Singapore Management University, Singapore ICDE2016 Helsinki, Finland1

Outline  Introduction & Motivation  Problem Formulation  Solution  Evaluation  Conclusion & Future Work ICDE2016 Helsinki, Finland2

Integrating graphs and demographic data IDSEXRACELOCATOIN 1FAsianUS 2FLatinoUS ………… ICDE2016 Helsinki, Finland  Graph data and demographic data are everywhere  But the two aspects of data maybe incomplete within single social network  More comprehensive analysis can be done by integrating them (a) Graph topology(b) User profile integration

Facebook Dating App Example ICDE2016 Helsinki, Finland4 All except black women preferred white men, while all men except Asians preferred Asian women.

 R 2 : (Sex: M, Race: Asian) (Sex: F, Race: Asian) conf = 0; supp = 0 Social Ties (Group Relationships)  R 1 : (Sex: M) (Sex: F, Race: Asian) ICDE2016 Helsinki, Finland5  Form: USCanada AsianLatinoWhite Finland M F conf =supp = 7/15 /14; 7

Homophily in Social Network  Homophily principle: love of the same  Homophily effect is well-known and is often “dominant”  R 3 : (Sex: F, Location: US) (Sex: M, Location: US) conf = 4/6; supp = 4/15  Homophily captures “primary” bond  Literature largely focuses on applications based on homophily  e.g.: community detection, link prediction, friend/product recommendation ICDE2016 Helsinki, Finland6

support of the homophily effect is 4/15 (Sex: F, Location: US) (Sex: M, Location: US) Beyond Homophily  Unearth the treasures beyond homophily?  Assume “Location” is homophilic in a dating network R 4 : (Sex: F, Location: US) (Sex: M, Location: Canada) ICDE2016 Helsinki, Finland7 standard confidence? conf = 2/6, not interesting new metric that remove homophily? nhp = 2/ (6 – 4) = 100%, interesting ! VS USCanadaFinland M F Reads as: if a female from US does NOT want her partner to be from US, there is a high chance that she prefers a partner from Canada.

Potential Applications  Target advertising  Homophily pattern : (JOB : Lawyer, PRODUCT : Stocks) → (PRODUCT : Stocks) Non-Homphily pattern: (JOB : Lawyer, PRODUCT : Stocks) → (PRODUCT : Bond)  Helpful in link predicting, beyond homphily  Friend/dating Recommendation  User behaviors/habits analysis  Profile completion  Criminal investigation ICDE2016 Helsinki, Finland8

 Non-homophily preference: a probability of links going to a node described by, given and exclude the homophily effect Example: (Sex: F, Location: US) (Sex: M, Location: Canada) (Sex: F, Location: US) (Sex: M, Location: US) Non-homophily Preference ICDE2016 Helsinki, Finland9  Captures “secondary bonds” beyond “primary bonds”  nhp does not have the regular anti-monotonicity  Adding an attribute on the RHS may increase supp(homophily effect)

Outline  Introduction & Motivation  Problem Formulation  Solution  Evaluation  Conclusion & Future Work ICDE2016 Helsinki, Finland10

Problem - Mining Top-k GRs  Given  an multi-dimensional information network  the setting of homophily for attributes  a supp threshold, a nhp threshold and an integer k  Goal  discover the top-k GRs, ranked by nph followed by supp, and each of them satisfies the supp and nhp thresholds ICDE2016 Helsinki, Finland11

Challenges  Storage  Space =, if single table  Computation  Exponential order of attributes value combination  nhp does not have anti-monotonicity  If only supp pruning: small threshold, and post-processing is needed  How to deal with?  Storage: favourable data modeling  Computation: ingenious enumeration with efficient pruning strategies ICDE2016 Helsinki, Finland12

Outline  Introduction & Motivation  Problem Formulation  Solution  Evaluation  Conclusion & Future Work ICDE2016 Helsinki, Finland13

Data Model  Compact 3-table data presentation: combines profile data and graph topology together  No redundant records, data are linked by pointers  Space complexity ICDE2016 Helsinki, Finland14

SFDF Enumeration  Subset-First Depth-First (SFDF) Enumeration  Subset-First: some kind of reverse order, all parts of supp, including that for homophily effect, are available when computing nhp  Depth-First: only materialize the current branch ICDE2016 Helsinki, Finland15

Dynamic Ordering ICDE2016 Helsinki, Finland16  How to make nhp anti-monotone?  Dynamically order the homophily attributes, on the basis of whether the same homophily attributes were enumerated in the LHS  for the GRs with same is anti-monotone, with the help of dynamic ordering assume both A and B are homophily attributes dynamic ordering

Multiple Pruning Strategies  supp based pruning  nhp based pruning  enabled with the help of dynamic ordering  Top-k pruning tights up the nph threshold ICDE2016 Helsinki, Finland17  The mining task finishes in one phase

Data partition and Pruning ICDE2016 Helsinki, Finland18  Partition attributes while computing supp and nhp?  Recursive partition with linear CountingSort  At each node, GR representing the homophily effect is generated first, e.g. is generated earlier than b1b1 b2b2 b3b3 8B8B b1b1b1b1 b1b2b1b2 b1b3b1b3 10 b 1 B b1b1a2b1b1a2 b1b1a3b1b1a3 11 b 1 b 1 A b2b2b2b2 b2b3b2b3 b2b1b2b1 10 b 2 B …... b1b2a1b1b2a1 b1b2a3b1b2a3 11 b 1 b 2 A

Outline  Introduction & Motivation  Problem Formulation  Solution  Evaluation  Conclusion & Future Work ICDE2016 Helsinki, Finland19

Experimental Evaluation  Implementation: C++  Platform: CentOS 6.4 with Intel 8-core processors 2.53GHz and 12G of RAM  Real Datasets  Pokec Social Network data 1 o 1,436,515 users and 21,078,140 edges, 6 node attributes  DBLP co-authorship data 2 o 28,702 authors and 66,832 directed edges, 2 node and 1 edge attributes  Evaluation Measures  Interestingness  Efficiency (runtime) ICDE2016 Helsinki, Finland [Zhao et al. SIGMOD 11]

Interestingness: Case Study ICDE2016 Helsinki, Finland21  Case study  A top GR from Pockec data derives: This pair suggests a big difference in the preference of opposite sex partners by males and females when looking for sexual partners  A top GR from DBLP data Authors in the DB area often collaborate with those in the DM area when collaborating with those not in their own area

Efficiency Study: Pokec Data ICDE2016 Helsinki, Finland22 A+B+C+D A+B+C A+B A A: supp based pruning B: compact 3-table data storage C: nhp based pruning D: top-k pruning Default Parameters Setting minSupp = 50 (absolute value) minNhp = 50% k = 100

Outline  Introduction & Motivation  Problem Formulation  Solution  Evaluation  Conclusion & Future Work ICDE2016 Helsinki, Finland23

Conclusion, Extensions and Future Work  Conclusion  Mining social ties beyond homophily, many potential applications  Compact data presentation  Novel enumeration with multiple pruning strategies  Interestingness and efficiency study on real data  Extensions and future work  Alternative metrics other than nhp, such as lift, laplace, gain, etc  Deal with unstructured data  Predictive model ICDE2016 Helsinki, Finland24

Q & A ? Thanks ! ICDE2016 Helsinki, Finland25