Mining Collaboration Patterns

Slides:



Advertisements
Similar presentations
Peer-to-Peer and Social Networks Power law graphs Small world graphs.
Advertisements

Network biology Wang Jie Shanghai Institutes of Biological Sciences.
Emergence of Scaling in Random Networks Albert-Laszlo Barabsi & Reka Albert.
Analysis and Modeling of Social Networks Foudalis Ilias.
VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.
Advanced Topics in Data Mining Special focus: Social Networks.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
Topology Generation Suat Mercan. 2 Outline Motivation Topology Characterization Levels of Topology Modeling Techniques Types of Topology Generators.
Trends in Object-Oriented Software Evolution: Investigating Network Properties Alexander Chatzigeorgiou George Melas University of Macedonia Thessaloniki,
Network Models Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Models Why should I use network models? In may 2011, Facebook.
Identifying Patterns in Road Networks Topographic Data and Maps Henri Lahtinen Arto Majoinen.
Common Properties of Real Networks. Erdős-Rényi Random Graphs.
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
Search in a Small World JIN Xiaolong Based on [1].
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
On Distinguishing between Internet Power Law B Bu and Towsley Infocom 2002 Presented by.
CS8803-NS Network Science Fall 2013
Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.
Optimization Based Modeling of Social Network Yong-Yeol Ahn, Hawoong Jeong.
Analysis and Modeling of the Open Source Software Community Yongqin Gao, Greg Madey Computer Science & Engineering University of Notre Dame Vincent Freeh.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Biological Networks Lectures 6-7 : February 02, 2010 Graph Algorithms Review Global Network Properties Local Network Properties 1.
Complex network geometry and navigation Dmitri Krioukov CAIDA/UCSD F. Papadopoulos, M. Kitsak, kc claffy, A. Vahdat M. Á. Serrano, M. Boguñá UCSD, December.
LANGUAGE NETWORKS THE SMALL WORLD OF HUMAN LANGUAGE Akilan Velmurugan Computer Networks – CS 790G.
By Chris Zachor.  Introduction  Background  Changes  Methodology  Data Collection  Network Topologies  Measures  Tools  Conclusion  Questions.
Principles of Social Network Analysis. Definition of Social Networks “A social network is a set of actors that may have relationships with one another”
Topological Analysis in PPI Networks & Network Motif Discovery Jin Chen MSU CSE Fall 1.
COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.
COLOR TEST COLOR TEST. Social Networks: Structure and Impact N ICOLE I MMORLICA, N ORTHWESTERN U.
Today’s topics Strength of Weak Ties Next Topic Acknowledgements
Science: Graph theory and networks Dr Andy Evans.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Neural Network of C. elegans is a Small-World Network Masroor Hossain Wednesday, February 29 th, 2012 Introduction to Complex Systems.
Yongqin Gao, Greg Madey Computer Science & Engineering Department University of Notre Dame © Copyright 2002~2003 by Serendip Gao, all rights reserved.
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
Patterns around Gnutella Network Nodes Sui-Yu Wang.
Most of contents are provided by the website Network Models TJTSD66: Advanced Topics in Social Media (Social.
Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel University of Haifa, Israel.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
March 3, 2009 Network Analysis Valerie Cardenas Nicolson Assistant Adjunct Professor Department of Radiology and Biomedical Imaging.
Performance Evaluation Lecture 1: Complex Networks Giovanni Neglia INRIA – EPI Maestro 10 December 2012.
Informatics tools in network science
Analyzing Networks. Milgram’s Experiments “Six degrees of Separation” Milgram’s letters to various recruits in Nebraska who were asked to forward the.
Hierarchical Organization in Complex Networks by Ravasz and Barabasi İlhan Kaya Boğaziçi University.
1 New metrics for characterizing the significance of nodes in wireless networks via path-based neighborhood analysis Leandros A. Maglaras 1 Dimitrios Katsaros.
Cohesive Subgraph Computation over Large Graphs
A Viewpoint-based Approach for Interaction Graph Analysis
Analysis of University Researcher Collaboration Network Using Co-authorship Jiadi Yao School of Electronic and Computer Science,
Uncovering the Mystery of Trust in An Online Social Network
Lecture 1: Complex Networks
Biological networks CS 5263 Bioinformatics.
Date of download: 11/12/2017 Copyright © ASME. All rights reserved.
Link-Based Ranking Seminar Social Media Mining University UC3M
How Do “Real” Networks Look?
Section 8.6: Clustering Coefficients
Generative Model To Construct Blog and Post Networks In Blogosphere
Section 8.6 of Newman’s book: Clustering Coefficients
How Do “Real” Networks Look?
Social Network Analysis
A Locality Model of the Evolution of Blog Networks
The likelihood of linking to a popular website is higher
Peer-to-Peer and Social Networks Fall 2017
Department of Computer Science University of York
Clustering Coefficients
Peer-to-Peer and Social Networks
is the primary Leadership skill
Lecture 9: Network models CS 765: Complex Networks
Social Network Analysis with Apache Spark and Neo4J
Advanced Topics in Data Mining Special focus: Social Networks
Approximate Graph Mining with Label Costs
Presentation transcript:

Mining Collaboration Patterns from a Large Developer Network(PP-DSN) MF1432031 李玉

Mining Collaboration Patterns Internet, communication devices Developers from diverse locations Globally distributed software development Common interests Mining collaboration patterns Learn, aid Dynamic, properties, weakness

Mining Collaboration Patterns extract patterns High level of details, network-level statistics Low level of details, topological patterns SourceForge.net(是全球最大开源软件开发平台和仓库,为开源软件提供一个存储、协作和发布的平台) A large graph of network node--->developer Edge--->collaboration

Mining Collaboration Patterns Q1: How connected are the developers? Are all developers connected to every other developers in the network? If no, how many clusters of connected developers are there in the network? Q2: What are some characteristics of developer collaboration clusters? Q3: What are some common topological collaboration patterns appearing in these developer collaboration clusters? Q4: Within a connected collaboration cluster, are all developers connected to every other developers in 6 hops following the small world phenomenon?

Definitions and Concepts Collaboration graph G(N, E, NL) non-directed N(developers) E([u, v]) NL(labels) Collaboration Pattern(N, E) Definition 1 (Sub-Graph Isomorphism): Consider two graphs G=(N,E,NL) and G’=(N’, E’, NL’). Sub-graph isomorphism is an injective function f : N → N’, s.t., (1), ∀n ∈ N,l(n) = l ‘(f(n)); (2),∀(u, v) ∈ E,(f(u), f(v)) ∈ E’. The function or mapping f is referred to as the embedding of G in G’.

Definitions and Concepts Definition 2 (Pattern Match): A patern P=(N, E) matches a graphs G=(N’, E’, NL’). If there exists an injective function f : N → N’, s.t., ∀(u, v) ∈ E, (f(u), f(v)) ∈ E’. Definition 3 (Frequent Pattern Mining): Given a graph dataset GSet and a threshold msup, find all patterns that appear in more than msup graphs in Gset.(P is, P’ also is. P’ is sub-graph of P) Definition 4(Closed Graphs): Given a set of graphs GSet, a graph pattern g is closed, if there does not exists another pattern g’ where g’ is a super-graph of g and both g and g’ match the same set of graphs in GSet. If there exists such a g’, we say that g is being subsumed by g’

Definitions and Concepts Definition 5 (Frequent Closed Pattern Mining): Given a graph dataset GSet and a threshold msup, find all patterns that are frequent and closed.

Approach Collaboration network Extract clusters, cc(connected components)-->database(CGD) Top-k Topological Pattern Mining(graph mining & graph matching) CGD, CGDl, CGDs Frequent closed patterns in CGDs, sup(P, CGDs) Each P match in CGDl, sup(P, CGDl) sup(p, CGD)= sup(P, CGDs)+sup(P, CGDl)

Experiment Initial How connected are the developers?(55,694) Inactive(no developers & >=100 downloads) 192,706--->28,087 Use contributor information, not SVN/CVS committer identifier How connected are the developers?(55,694) not a connected one, many disjoint components(6744) developers work alone very low, 1.5%(838) a very large cluster, 30,111, 54.07%, core community others much smaller size(second 117)

Experiment What are some characteristics of developer collaboration clusters? Power law(幂律,马太效应):节点具有的连线数和这样的节点数目乘积是一个定值(richer-get-richer)

Experiment Connectivity: edges/nodes Degree of Nodes: numbers of neighbors(largest cc) suggests that having indirect “neighbor” helps in fostering more collaborations among developers

Experiment

Experiment

Experiment What are some common topological collaboration patterns? Large(more 254nodes and 254 edges) Small(more 254nodes or 254 edges) Very small

Experiment Analysis: Lower rank derived from higher rank, expand hub Frequent patterns small sizes, most 6 nodes triadic closure principle (G2, G3, 96.5%)、(G6, G7, 94.2%)、(G7, G8, 99.6%) Least dense, chain. Most dense, complete graph (G2, G3, 96.5%)、(G4, G8, 92.9%)、(G10, G20, 90.0%) suggests that indirect links are likely to realize into direct links Lower rank derived from higher rank, expand hub (G2, G1)、(G6, G3)

Experiment Does six-degree-of-separation exist? 你和任何一个陌生人之间所间隔的人不会超过五个,也就是说,最多通过五个中间人你就能够认识任何一个陌生人 Largest CC 30111 nodes, average of the shortest path is 6.55 exists

Conclusions not all developers connected to every other Many clusters(disjoint) nodes, connectivity follow power law, but edges and node degrees not Project sizes, developer participation, cluster sizes(largest one) small-world phenomenon exists Indirect--->direct Expand hub Collaborative networks not as random, preferentially connected Linchpin nodes

谢谢!!!