Analysis and Modeling of the Open Source Software Community Yongqin Gao, Greg Madey Computer Science & Engineering University of Notre Dame Vincent Freeh.

Slides:



Advertisements
Similar presentations
Scale Free Networks.
Advertisements

Network biology Wang Jie Shanghai Institutes of Biological Sciences.
The Small World of Software Reverse Engineering Ahmed E. Hassan and Richard C. Holt SoftWare Architecture Group (SWAG) University Of Waterloo.
Analysis and Modeling of Social Networks Foudalis Ilias.
CONNECTIVITY “The connectivity of a network may be defined as the degree of completeness of the links between nodes” (Robinson and Bamford, 1978).
CS728 Lecture 5 Generative Graph Models and the Web.
Trends in Object-Oriented Software Evolution: Investigating Network Properties Alexander Chatzigeorgiou George Melas University of Macedonia Thessaloniki,
Networks. Graphs (undirected, unweighted) has a set of vertices V has a set of undirected, unweighted edges E graph G = (V, E), where.
By Chris Zachor.  Introduction  Background  Open Source Software  The SourceForge community and network  Previous Work  What can be done different?
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
NAACSOS 2005Scott Christley, Temporal Analysis of Social Positions An Algorithm for Temporal Analysis of Social Positions Scott Christley, Greg Madey Dept.
CS 728 Lecture 4 It’s a Small World on the Web. Small World Networks It is a ‘small world’ after all –Billions of people on Earth, yet every pair separated.
Peer-to-Peer and Grid Computing Exercise Session 3 (TUD Student Use Only) ‏
Global topological properties of biological networks.
Towards Understanding: A Study of the SourceForge.net Community using Modeling and Simulation Yongqin Gao Greg Madey Computer Science & Engineering University.
Supported in part by the National Science Foundation – ISS/Digital Science & Technology Analysis of the Open Source Software development community using.
Conceptual Framework for Agent- Based Modeling and Simulation: The Computer Experiment Yongqin GaoVincent Freeh Greg Madey CSE DepartmentCS Department.
Agent-Based Modeling and Simulation of Collaborative Social Networks Research in Progress Greg Madey Yongqin Gao Computer Science & Engineering University.
The Very Small World of the Well-connected. (19 june 2008 ) Lada Adamic School of Information University of Michigan Ann Arbor, MI
Social Media: YouTube as a Case. 2 New generation of video sharing service Feb.15th, 2005 Some statistics: 60 hours video uploaded very minute 4 billion.
Peer-to-Peer and Social Networks Random Graphs. Random graphs E RDÖS -R ENYI MODEL One of several models … Presents a theory of how social webs are formed.
Random Graph Models of Social Networks Paper Authors: M.E. Newman, D.J. Watts, S.H. Strogatz Presentation presented by Jessie Riposo.
Optimization Based Modeling of Social Network Yong-Yeol Ahn, Hawoong Jeong.
The United States air transportation network analysis Dorothy Cheung.
MINING AND MODELING THE OPEN SOURCE SOFTWARE COMMUNITY
LANGUAGE NETWORKS THE SMALL WORLD OF HUMAN LANGUAGE Akilan Velmurugan Computer Networks – CS 790G.
By Chris Zachor.  Introduction  Background  Changes  Methodology  Data Collection  Network Topologies  Measures  Tools  Conclusion  Questions.
An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &
MEDUSA – New Model of Internet Topology Using k-shell Decomposition Shai Carmi Shlomo Havlin Bloomington 05/24/2005.
COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.
Science: Graph theory and networks Dr Andy Evans.
Topology and Evolution of the Open Source Software Community Advisors: Dr. Vincent W. Freeh Dr. Kevin Bowyer Supported in part by the National Science.
Emergence of Scaling and Assortative Mixing by Altruism Li Ping The Hong Kong PolyU
EU funded R&D collaboration networks in the area of Information Society Technologies and the role of Greek actors Aimilia Protogerou Team for the Technological,
Agent 2004Scott Christley, Public Goods Theory of Open Source Community Public Goods Theory of the Open Source Development Community using Agent-based.
A project from the Social Media Research Foundation: Finding direction in a sea of connection:
Yongqin Gao, Greg Madey Computer Science & Engineering Department University of Notre Dame © Copyright 2002~2003 by Serendip Gao, all rights reserved.
The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?
Measuring Behavioral Trust in Social Networks
Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel University of Haifa, Israel.
How Do “Real” Networks Look?
Brief Announcement : Measuring Robustness of Superpeer Topologies Niloy Ganguly Department of Computer Science & Engineering Indian Institute of Technology,
A RESEARCH SUPPORT SYSTEM FRAMEWORK FOR WEB DATA MINING Jin Xu, Yingping Huang, Gregory Madey Department of Computer Science and Engineering University.
Informatics tools in network science
The Structure of Scientific Collaboration Networks by M. E. J. Newman CMSC 601 Paper Summary Marie desJardins January 27, 2009.
Models of Web-Like Graphs: Integrated Approach
A Research Collaboratory for Open Source Software Research Yongqin Gao, Matt van Antwerp, Scott Christley, Greg Madey Computer Science & Engineering University.
Lake Arrowhead 2005Scott Christley, Understanding Open Source Understanding the Open Source Software Community Presented by Scott Christley Dept. of Computer.
Information Retrieval Search Engine Technology (10) Prof. Dragomir R. Radev.
Response network emerging from simple perturbation Seung-Woo Son Complex System and Statistical Physics Lab., Dept. Physics, KAIST, Daejeon , Korea.
Dynamic Network Analysis Case study of PageRank-based Rewiring Narjès Bellamine-BenSaoud Galen Wilkerson 2 nd Second Annual French Complex Systems Summer.
The simultaneous evolution of author and paper networks
Lecture 23: Structure of Networks
Structures of Networks
Empirical analysis of Chinese airport network as a complex weighted network Methodology Section Presented by Di Li.
How Do “Real” Networks Look?
Lecture 23: Structure of Networks
Network Science: A Short Introduction i3 Workshop
Section 8.6 of Newman’s book: Clustering Coefficients
How Do “Real” Networks Look?
How Do “Real” Networks Look?
Peer-to-Peer and Social Networks Fall 2017
Predict Failures with Developer Networks and Social Network Analysis
How Do “Real” Networks Look?
Department of Computer Science University of York
Clustering Coefficients
Peer-to-Peer and Social Networks
A MULTI-MODEL DOCKING EXPERIMENT OF DYNAMIC SOCIAL NETWORK SIMULATIONS
Lecture 23: Structure of Networks
Network Science: A Short Introduction i3 Workshop
Presentation transcript:

Analysis and Modeling of the Open Source Software Community Yongqin Gao, Greg Madey Computer Science & Engineering University of Notre Dame Vincent Freeh Computer Science Dept. NCSU NAACSOS Conference Pittsburgh, PA June 25, 2003 Supported in part by the National Science Foundation – Digital Science & Technology

Outline Overview Data collection Network modeling Topological statistical analysis Conclusion

Overview What is OSS Free to use, distribution Unlimited user and usage Source code available and modifiable Potential advantages over commercial software Higher quality Faster development Lower cost Our goal Understanding the OSS phenomenon Approach SourceForge is the source of our empirical data Modeling as social network Analysis of topological statistics

Data Collection — Monthly Web crawler (scripts) Python Perl AWK Sed Monthly Since Jan 2001 ProjectID DeveloperID Almost 2 million records Relational database PROJ|DEVELOPER 8001|dev |dev |dev |dev |dev |dev |dev |dev |dev |dev8972

Modeling as collaboration network What is collaboration network A social network representing the collaborating relationships. Movie actor network and scientist collaboration network Difference of SourceForge collaboration network Detachment Virtual collaboration Voluntary Global Bipartite property of collaboration network

Collaboration network - bipartite

SourceForge developer network dev[59] dev[54] dev[49] dev[64] dev[61] Project 6882 Project 9859 Project 7597 Project 7028 Project OSS Developer Network (Part) Developers are nodes / Projects are links 24 Developers 5 Projects 2 hub Developers 1 Cluster

Topological analysis Statistics inspected Diameter Average degree Clustering coefficient Degree distribution Cluster size distribution Relative size of major cluster Fitness and lift cycle Evolution of these statistics

Diameter of developer network vs. time The average of shortest paths between any pairs of vertices The values for developer network (30,000 – 70,000) are between 6 and 8

Diameter of project network vs. time The values for project network (20,000 – 50,000) are between 6 and 7 Diameter decreasing with time both for developer network and project network

Average degree vs. time The values for developer network are between 7 and 8 The values for project network are just between 3 and 4

Clustering coefficient of developer network vs. time

Clustering coefficient of project network vs. time

Degree distribution (developers) Power law in developer distribution. R 2 =

Degree distribution (projects) Power law in project distribution R 2 =

Cluster size distribution Cluster distribution of developer network R 2 with major cluster is R 2 without major cluster is

Relative size of major cluster vs. time Stable increase of the relative size of the major cluster Going to slowly converge to some fixed percentage at around 35% May be an indication of the network evolution

Existence of fitness Investigation of development of single project can verify the existence of “young upcomer” phenomenon We tracked the development of every new project in July 2001 until now (total 1660 projects) Maximal monthly growth per project is 13 while average monthly growth per project is just

Life cycle of project

Summary of results Power law rules Degree distributions, cluster distribution Average degree increasing with time Diameter decreasing with time Clustering coefficient decreasing with time Fitness existed in SourceForge Projects have life cycle behaviors

Conclusion Study of SourceForge collaboration network can help us understanding the OSS community We investigate not only the topological statistics but also the evolution of these statistics. Simulation is needed to further investigation of SourceForge collaboration network.

Thank you

Terminology Degree The count of edges connected to given vertex Degree distribution The distribution of degrees throughout a network Cluster The connected components of the network Diameter Average length of shortest paths between all pairs of vertices Clustering coefficient (CC) CC i : Fraction representing the number of links actually present relative to the total possible number of links among the vertices in its neighborhood. CC: average of all CC i in a network