A MULTI-MODEL DOCKING EXPERIMENT OF DYNAMIC SOCIAL NETWORK SIMULATIONS

Slides:



Advertisements
Similar presentations
Peer-to-Peer and Social Networks Power law graphs Small world graphs.
Advertisements

Complex Networks Advanced Computer Networks: Part1.
The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford.
Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 – CHAPTER 4 GRAPHS 1.
Emergence of Scaling in Random Networks Albert-Laszlo Barabsi & Reka Albert.
Analysis and Modeling of Social Networks Foudalis Ilias.
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
The influence of search engines on preferential attachment Dan Li CS3150 Spring 2006.
Hierarchy in networks Peter Náther, Mária Markošová, Boris Rudolf Vyjde : Physica A, dec
1 Evolution of Networks Notes from Lectures of J.Mendes CNR, Pisa, Italy, December 2007 Eva Jaho Advanced Networking Research Group National and Kapodistrian.
Topology Generation Suat Mercan. 2 Outline Motivation Topology Characterization Levels of Topology Modeling Techniques Types of Topology Generators.
Directional triadic closure and edge deletion mechanism induce asymmetry in directed edge properties.
Networks. Graphs (undirected, unweighted) has a set of vertices V has a set of undirected, unweighted edges E graph G = (V, E), where.
Scale-free networks Péter Kómár Statistical physics seminar 07/10/2008.
Alon Arad Alon Arad Hurst Exponent of Complex Networks.
Towards Understanding: A Study of the SourceForge.net Community using Modeling and Simulation Yongqin Gao Greg Madey Computer Science & Engineering University.
On Distinguishing between Internet Power Law B Bu and Towsley Infocom 2002 Presented by.
DESIGN AND IMPLEMENTATION OF A WEB MINING RESEARCH SUPPORT SYSTEM
Random Graph Models of Social Networks Paper Authors: M.E. Newman, D.J. Watts, S.H. Strogatz Presentation presented by Jessie Riposo.
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
Analysis and Modeling of the Open Source Software Community Yongqin Gao, Greg Madey Computer Science & Engineering University of Notre Dame Vincent Freeh.
MINING AND MODELING THE OPEN SOURCE SOFTWARE COMMUNITY
An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &
Topology and Evolution of the Open Source Software Community Advisors: Dr. Vincent W. Freeh Dr. Kevin Bowyer Supported in part by the National Science.
Social Network Analysis Prof. Dr. Daning Hu Department of Informatics University of Zurich Mar 5th, 2013.
10 December, 2008 CIMCA2008 (Vienna) 1 Statistical Inferences by Gaussian Markov Random Fields on Complex Networks Kazuyuki Tanaka, Takafumi Usui, Muneki.
Yongqin Gao, Greg Madey Computer Science & Engineering Department University of Notre Dame © Copyright 2002~2003 by Serendip Gao, all rights reserved.
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
Class 9: Barabasi-Albert Model-Part I
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
Lake Arrowhead 2005Scott Christley, Understanding Open Source Understanding the Open Source Software Community Presented by Scott Christley Dept. of Computer.
The simultaneous evolution of author and paper networks
Network (graph) Models
Database Systems: Design, Implementation, and Management Tenth Edition
2010 IEEE Global Telecommunications Conference (GLOBECOM 2010)
Chapter 12: Simulation and Modeling
Structures of Networks
OPERATING SYSTEMS CS 3502 Fall 2017
Shortest Path Problem Under Triangular Fuzzy Neutrosophic Information
Chapter 4: Business Process and Functional Modeling, continued
Hiroki Sayama NECSI Summer School 2008 Week 2: Complex Systems Modeling and Networks Network Models Hiroki Sayama
ITEC 3220A Using and Designing Database Systems
Object-Oriented Analysis and Design
Lecture 1: Complex Networks
Learning to Generate Networks
Applications of graph theory in complex systems research
Empirical analysis of Chinese airport network as a complex weighted network Methodology Section Presented by Di Li.
The Entity-Relationship Model
任課教授:陳朝鈞 教授 學生:王志嘉、馬敏修
Section 8.6: Clustering Coefficients
Chapter 2 Database Environment.
Network Science: A Short Introduction i3 Workshop
Section 8.6 of Newman’s book: Clustering Coefficients
The Watts-Strogatz model
An Efficient method to recommend research papers and highly influential authors. VIRAJITHA KARNATAPU.
File Systems and Databases
Effective Social Network Quarantine with Minimal Isolation Costs
Models of Network Formation
Models of Network Formation
Peer-to-Peer and Social Networks Fall 2017
Indexing and Hashing Basic Concepts Ordered Indices
Models of Network Formation
Peer-to-Peer Information Systems Week 6: Performance
Clustering Coefficients
Database Design Hacettepe University
Korea University of Technology and Education
Network Science: A Short Introduction i3 Workshop
A Semantic Peer-to-Peer Overlay for Web Services Discovery
Network Models Michael Goodrich Some slides adapted from:
Advanced Topics in Data Mining Special focus: Social Networks
Presentation transcript:

A MULTI-MODEL DOCKING EXPERIMENT OF DYNAMIC SOCIAL NETWORK SIMULATIONS Jin Xu, Yongqin Gao, Jeff Goett, Gregory Madey Department of Computer Science and Engineering University of Notre Dame Notre Dame, IN 46556 Supported by NSF, CISE/IIS-Digital Science and Technology October 2, 2003 The topic I will be talking about is “A MULTI-MODEL DOCKING EXPERIMENT OF DYNAMIC SOCIAL NETWORK SIMULATIONS”. This work was done with Yongqin Gao, Jeff Goett and Dr. Gregory Madey. This research was partially supported by the US National Science Foundation, CISE/IIS-Digital Science and Technology.

OUTLINE INTRODUCTION SOCIAL NETWORK MODEL OPEN SOURCE SOFTWARE (OSS) NETWORK DOCKING RESULTS CONCLUSIONS & FUTURE WORK I will first give a brief introduction about docking experiment. Then, present different models of social networks. Based on social network theory, Open Source Software network models will be discussed. Next, I will describe docking procedure on four models of Open source software and present docking results. At last, Conclusions and Future work will be given.

INTRODUCTION Three ways of Validation Docking Comparison with real phenomenon Comparison with mathematical models Docking with other simulations Docking Verify simulation correctness Discover pros & cons of toolkits There are three ways to validate an agent-based simulation. The first way is to compare the simulation output with the real phenomenon. This way is relatively simple and straightforward. However, often we cannot get complete real data on all aspect of the phenomenon. The second way compares agent-based simulation results with results of mathematical models. The disadvantage of this way is that we need to construct mathematical models which may be difficult to formulate for a complex system. The third way is by docking with other simulations of the same phenomenon. Docking is the process of aligning two dissimilar models to address the same question or problem, to investigate their similarities and their differences. It can verify the correctness of simulations and discover advantages and disadvantages Of different development toolkits.

OSS DOCKING EXPERIMENT Four Models of OSS Developer Social Network Random graphs Preferential attachment Preferential attachment with constant fitness Preferential attachment with dynamic fitness Agent-based Simulation Swarm Repast Our docking experiment are part of a study of the Open Source Software phenomenon. Data about the SourceForge OSS developer site has been collected for over 2 years. Developer membership in projects is used to model the social network of developers. Social networks based on random graphs, preferential attachment, preferential attachment with constant fitness, and preferential attachment with dynamic fitness are modeled and compared to collected data. We use two agent-based simulation tools—swarm and repast to build simulations for these four models. Repast and Swarm simulations are docked by comparing properties of social networks such as degree distribution, diameter and clustering coefficient.

SOCIAL NETWORK MODEL Graph Representation ER (random) Graph Node/vertex – Social Agent Edge/link – Relationship Index/degree - The number of edges connected to a node ER (random) Graph Edges attached in a random process No power law distribution Social network theory is the basis of the conceptual framework through which we view the OSS developer activities. The theory, built on mathematical graph theory, depicts interrelated social agents as nodes or vertices of a graph and their relationships as links or edges drawn between the nodes. The number of edges (or links) connected to a node (or vertex) is called the index or degree of the node. Early work in this field by Erdos and Renyi focuses on random graphs, i.e., those where edges between vertices were attached in a random process (called ER graphs here). However, the distributions of index values for the random graphs do not agree with the observed power law distribution for many social networks, including the OSS developer network at SourceForge.

SOCIAL NETWORK MODEL(Cont.) Watts-Strogatz (WS) Model Include some random reattachment No power law distribution Barabasi-Albert (BA) Model with Preferential Attachment Addition of preferential attachment Power law distribution BA Model with Constant Fitness Addition of random fitness BA Model with Dynamic Fitness Some other evolutionary mechanisms include: 1) the Watts-Strogatz (WS) model, 2) the Barabasi-Albert (BA) model with preferential attachment, 3) the modified BA model with fitness, and 4) an extension of the BA model (with fitness) to include dynamic fitness based on project life cycle reported. The WS model captures the local clustering property of social networks and was extended to include some random reattachment to capture the small world property, but failed to display the power-law distribution of index values. The BA model added preferential attachment, both preserving the realistic properties of the WS model and also displaying the power-law distribution. The BA model was extended with the addition of random fitness to capture the fact that sometimes newly added nodes grow edges faster than previously added nodes.

OSS NETWORK A Classic Example of a Dynamic Social Network Two Entities: developer, project Graph Representation Node – developers Edge – two developers are participating in the same project Activities Create projects Join projects Abandon projects Continue with current projects The Open Source Software community is a classic example of a dynamic social network. In our model of the OSS collaboration network, there are two entities -- developer and project. The network can be illustrated as a graph. In this network, nodes are developers. An edge will be added if two developers are participating in the same project. Edges can be removed if two developers are no longer participating on the same project. A developer can have several activities: create new projects, join existing projects, abandon Projects or continue with same projects.

OSS MODEL Agent: developer Each Time Interval: Certain number developers generated New developers: create or join Old developers: create, join, abandon, idle Update preference for preferential models We use agent-based modeling to simulate the OSS development community. Unlike developers, projects are passive elements of the social network. Thus, we only define developers as the agents which encapsulate a real developer's possible daily interactions with the development network. Our simulation is time stepped instead of event driven, with one day of real time as a time step. Each day, a certain number of new developers are created. Newly created developers use decision rules to create new projects or join other projects. Also, each day existing developers can decide to abandon a randomly selected project, to continue their current projects, or to create a new project. If a preferential model is used, Developers’ and projects’ preference need to be updated.

Clustering coefficient DOCKING PROCESS Parameters OSS Models Toolkits Diameter ER BA BAC BAD Swarm Degree distribution Docking Clustering coefficient Repast This Figure shows the docking process. The initial simulation was written using Swarm. Our docking process began when the author of the swarm simulation wrote the docking specification. Then, the Repast version was written based on the docking specification. Swarm simulations and Repast simulations are docked for four models of the OSS network. Simulations are validated by comparing four network attributes generated by running these two simulation models—diameter, degree distribution, clustering coefficient and community size. Community size

SWARM SIMULATION ModelSwarm ObserverSwarm main Developer (agent) Creats developers Controls the activities of developers in the model Generate a schedule ObserverSwarm Collects information and draws graphs main Developer (agent) Properties: ID, degree, participated projects Methods: daily actions Our swarm simulation has a hierarchical structure which consists of a {\em developer} class, a {\em modelswarm} class, an {\em observerswarm} class and a {\em main} program. The {\em modelswarm} handles creating developers and controls the activities of developers. In {\em modelswarm}, a schedule is generated to define a set of activities of the agents. The {\em observerswarm} is used to implement data collection and draw graphs. The {\em main} program is a driver to start the whole simulation. The core of a swarm simulation consists of a group of agents. Agents in our simulation are developers. Each developer is an instance of a Java class. A developer has an identification id, a degree which is the number of links, and a list of projects participated by this developer. Furthermore, a developer class has methods to describe possible daily actions: create, join, abandon a project or continue the developer's current collaborations.

REPAST SIMULATON Model Developer (agent) Project Edge Creates and controls the activities of developers Collects information and draws graphs Network display Movie Snapshot Developer (agent) Project Edge Our RePast simulation of OSS developer network consists of a {\em model} class, a {\em developer} class, an {\em edge} class and a {\em project} class. The class structure of the simulation is different from that of the Swarm simulation. This is due in part to the graphical network display feature of Repast. The model classis responsible for creation and control of the activities of developers. Furthermore, information collection and display are also encapsulated in the {\em model} class. The {\em developer} class is similar to that in Swarm simulation. An {\em edge} class is used to define an edge in OSS network. We also create a {\em project} class with properties and methods to simulate a project.

DOCKING PROCEDURE Process: comparisons of parameters corresponding models. Findings: Different Random Generators Databases creation errors in the original version Different starting time of schedulers Our docking process sought to verify our Repast migration against the original Swarm simulation. To do so, the process began with a comparison of network parameters between corresponding models. Upon finding differences, we compared each developer's actions. In our docking process, we found that swarm and Repast use different random generator. For example, the Repast simulation used the COLT random number generator from the European Laboratory for Particle Physics (CERN). Different random generator causes systematic differences between the two simulations' outputted data. We ran the two simulations using the exact same set of random numbers: each simulation used the same random number generator with the same seed and determined that the random number generators did not cause this systematic difference. To determine the exact reasons for this difference, we had the simulations log the action that each developer took during each step. Comparing these logs, two reasons for the differences emerged. First, we determined that one simulation would occasionally throw an SQL Exception (we store simulation data in a relational database for post-simulation analysis). Such an error can cause more discrepancies between the two simulations at future time steps since the developer's previous actions affect its future actions. We found the cause of this error to be a problem with the primary keys in the links table of our SQL database (this is a programming bug). Furthermore, We found that the Swarm scheduler begins at time step 0 while the Repast scheduler begins with time 1. Thus, Swarm had actually performed one extra time step. With these two problems corrected, the corresponding logs of \the developers' actions matched. Using the same sequence of random numbers, the Swarm and Repast simulations produced identical output

DOCKING PARAMETERS Diameter Degree distribution Average length of shortest paths between all pairs of vertices Degree distribution The distribution of degrees throughout a network Clustering coefficient (CC) CCi: Fraction representing the number of links actually present relative to the total possible number of links among the vertices in its neighborhood. CC: average of all CCi in a network Community size Degree distribution, diameter and clustering coefficient are frequent attributes used to describe a network. The diameter of a network is the maximum distance between any pair of connected nodes. The diameter can also be defined as the average length of the shortest paths between any pair of nodes in the graph. In this paper, the second definition is used since the average value is more suitable for studying the topology of the OSS network. The neighborhood of a node consists of the set of nodes to which it is connected. The clustering coefficient of a node is a fraction representing the number of links actually present relative to the total possible number of links among the nodes in its neighborhood. The clustering coefficient of a graph is the average of all the clustering coefficients of the nodes . Degree distribution is the distribution of the degree throughout the network. Degree distribution was believed to be a normal distribution, but Albert and Barabasi recently found it fit a power law distribution in many real networks

DIAMETER ER BA This figure shows the evolution of the diameter of the network with the time period. We can see that Repast simulations and Swarm simulations are docked. In the real SourceForge developer collaboration network, the diameter of the network decreases as the network grows. In our models, we can observe that ER model does not fit the SourceForge network, while other three models match the real network. However, there are slight differences between Swarm results and Repast results. We believe this difference is caused by different random generators associated with RePast and Swarm. BAC BAD

DEGREE DISTRIBUTION ER BA BAC BAD This figure \ref{degree_distribution} gives developer distributions in four models implemented by Swarm and Repast. The $X$ coordinate is the number of projects in which each developer participated, and the $Y$ coordinate is the number of developers in the related categories. From the figure, we can observe that there is no power law distribution in ER model. The distribution looks more like the mathematically proven normal distribution. Developer distributions in the other three models match the power law distribution. BAC BAD

CLUSTERING COEFFICIENT BA Clustering coefficients for the developer network as a function of time is shown in this Figure. All models are docked very well. We can observe the decaying trend of the clustering coefficient in all four models. The reason is that with the evolution of the developer network, two co-developers will less likely join a new project together because their participated projects are approaching their limits. BAC BAD

COMMUNITY SIZE DEVELOPMENT ER BA This figure shows the total number of developers and projects relative to the time period in four models, which describe the developing trends of size of developers and projects in the network. The upper part is developers’ trend for both simulations, and the lower part is the projects’ trend. The size of developers and projects are almost the same for Swarm and RePast simulations. BAC BAD

CONCLUSION Same Results for Both Simulations Better Performance of Repast Better Display Provided by Repast Network display Random Layout The docking process validates four simulation models of OSS developer network using Swarm and RePast. Properties of social networks such as degree distribution, diameter and clustering coefficient are used to dock simulations. This docking process showed that a docking process can also be used to validate a migration of a simulation from one software package to another. In our case, the docking process helped with the transfer to Repast to take advantages of its features. Repast simulation runs faster than Swarm simulation because RePast is written in pure Java while Swarm is originally written in Object C which may cause some overhead for Java Swarm. Furthermore, RePast provides more display library packages such as network package which help users to do analysis. This two graphs show random layout and circular layout generated by Repast network package. Circular Layout

FUTURE WORK Many runs instead of one Statistical analysis More network parameters Average degree Cluster size distribution Fitness and life cycle There are several directions of future work. First, currently, we just compare the results of one run simulations. Because there is slight difference between docking results which we believe is caused by Different random generators. We will run both swarm simulations and Repast simulations many times to see if this difference can be ignored. Second, we will do statistic analysis, for example, hypothesis tests, on these results. Moreover, we will dock more network parameters such as Average degree, Cluster size distribution And Fitness and life cycle