Conceptual Framework for Agent- Based Modeling and Simulation: The Computer Experiment Yongqin GaoVincent Freeh Greg Madey CSE DepartmentCS Department CSE Department University of Notre DameNCSU University of Notre Dame NAACSOS Conference Pittsburgh, PA June 25, 2003 Supported in part by the National Science Foundation - Digital Society & Technology Program
The Computer Experiment
Agent-Based Simulation as a Component of the Scientific Method Modeling (Hypothesis) Agent -Based Simulation (Experiment) Observation Social Network Model of F/OSS Grow Artificial SourceForge Analysis of SourceForge Data
Outline Investigation: Free/Open Source Software (F/OSS)Investigation: Free/Open Source Software (F/OSS) Conceptual framework(s)Conceptual framework(s) Model descriptionModel description ER modelER model BA modelBA model BA model with constant fitnessBA model with constant fitness BA model with dynamic fitnessBA model with dynamic fitness SummarySummary
Open Source Software (OSS) Free …Free … –to view source –to modify –to share –of cost ExamplesExamples –Apache –Perl –GNU –Linux –Sendmail –Python –KDE –GNOME –Mozilla –Thousands more Linux GNU Savannah
Free Open Source Software (F/OSS) DevelopmentDevelopment –Mostly volunteer –Global teams –Virtual teams –Self-organized - often peer-based meritocracy –Self-managed - but often a “ charismatic ” leader –Often large numbers of developers, testers, support help, end user participation –Rapid, frequent releases –Mostly unpaid
Typical Charismatic Leaders? Linus Tolvalds Linux Larry Wall Perl Richard Stallman GNU Manifesto Eric Raymond Cathedral and Bazaar
F/OSS: Significance Contradicts traditional wisdom:Contradicts traditional wisdom: –Software engineering –Coordination, large numbers –Motivation of developers –Quality –Security –Business strategy Almost everything is done electronically and available in digital formAlmost everything is done electronically and available in digital form Opportunity for Social Science Research -- large amounts of online data availableOpportunity for Social Science Research -- large amounts of online data available Research issues: –Understanding motives –Understanding processes –Intellectual property –Digital divide –Self-organization –Government policy –Impact on innovation –Ethics –Economic models –Cultural issues –International factors
SourceForge VA Software Part of OSDN Started 12/1999 Collaboration tools 58,685 Projects 80,000 Developers 590,00 Registered Users
Savannah Uses SourceForge Software Free Software Foundation 1,508 Projects 15,265 Registered Users
F/OSS: Importance Major Component of e-Technology Infrastructure with major presence in e-Commerce e-Science e-Government e-Learning Apache has over 65% market share of Internet Web servers Linux on over 7 million computers Most Internet runs on Sendmail Tens of thousands of quality products Part of product offerings of companies like IBM, Apple Apache in WebSphere, Linux on mainframe, FreeBSD in OSX Corporate employees participating on OSS projects
Free/Open Source Software Seems to challenge traditional economic assumptionsSeems to challenge traditional economic assumptions Model for software engineeringModel for software engineering New business strategiesNew business strategies –Cooperation with competitors –Beyond trade associations, shared industry research, and standards processes — shared product development! Virtual, self-organizing and self-managing teamsVirtual, self-organizing and self-managing teams Social issues, e.g., digital divide, international participationSocial issues, e.g., digital divide, international participation Government policy issues, e.g., US software industry, impact on innovation, security, intellectual propertyGovernment policy issues, e.g., US software industry, impact on innovation, security, intellectual property
Research Model
Data Collection — Monthly Web crawler (scripts)Web crawler (scripts) –Python –Perl –AWK –Sed MonthlyMonthly Since Jan 2001Since Jan 2001 ProjectIDProjectID DeveloperIDDeveloperID Almost 2 million recordsAlmost 2 million records Relational databaseRelational database PROJ|DEVELOPER 8001|dev |dev |dev |dev |dev |dev |dev |dev |dev |dev8975
dev[59] dev[54] dev[49] dev[64] dev[61] Project 6882 Project 9859 Project 7597 Project 7028 Project F/OSS Developers - Social Network Component Developers are nodes / Projects are links 24 Developers 5 Projects 2 Linchpin Developers 1 Cluster
Models of the F/OSS Social Network (Alternative Hypotheses) General model featuresGeneral model features –Agents are nodes on a graph (developers or projects) –Behaviors: Create, join, abandon and idle –Edges are relationships (joint project participation) –Growth of network: random or types of preferential attachment, formation of clusters –Fitness –Network attributes: diameter, average degree, power law, clustering coefficient Four specific modelsFour specific models –ER (random graph) –BA (scale free) –BA ( + constant fitness) –BA ( + dynamic fitness)
ER model – degree distribution Degree distribution is binomial distribution while it is power law in empirical data R 2 = for developer network R 2 = for project network
ER model - diameter Average degree is decreasing while it is increasing in empirical data Diameter is increasing while it is decreasing in empirical data
ER model – clustering coefficient Clustering coefficient is relatively low around 0.4 while it is around 0.7 in empirical data. Clustering coefficient is decreasing while it is increasing in empirical data
ER model – cluster distribution Cluster distribution in ER model also have power law distribution with R 2 as ( without the major cluster) while R 2 in empirical data is ( without the major cluster) The actual distribution is different from empirical data The later models (BA and further models) have similar behaviors
BA model – degree distribution Power laws in degree distribution, similar to empirical data (+ for simulated data and x for empirical data). For developer distribution: simulated data has R 2 as and empirical data has R 2 as For project distribution: simulated data has R 2 as and empirical data has R 2 as
BA model – diameter and CC Small diameter and high clustering coefficient like empirical data Diameter and clustering coefficient are both decreasing like empirical data
BA model with constant fitness Power laws in degree distribution, similar to empirical data (+ for simulated data and x for empirical data). For developer distribution: simulated data has R2 as and empirical data has R2 as For project distribution: simulated data has R2 as and empirical data has R2 as Diameter and CC are similar to simple BA model.
BA model with dynamic fitness Power laws in degree distribution, similar to empirical data (+ for simulated data and x for empirical data). For developer distribution: simulated data has R2 as and empirical data has R2 as For project distribution: simulated data has R2 as and empirical data has R2 as
Advantage of BA with dynamic fitness Intuition: Fitness should decreasing with time.Intuition: Fitness should decreasing with time. Statistics: project has life cycle behavior which can not be replicated by BA model with constant fitness but can be replicated by BA model with dynamic fitnessStatistics: project has life cycle behavior which can not be replicated by BA model with constant fitness but can be replicated by BA model with dynamic fitness
Conceptual Framework Agent-Based Modeling and Simulation as Components of the Scientific Method Observation Hypothesis Experiment
Summary We use ABM to model and simulate the SourceForge collaboration network.We use ABM to model and simulate the SourceForge collaboration network. Conceptual framework is proposed for agent- based modeling and simulation.Conceptual framework is proposed for agent- based modeling and simulation. Case study of this framework: SourceForge study through ER, BA, BA with constant fitness and BA with dynamic fitness.Case study of this framework: SourceForge study through ER, BA, BA with constant fitness and BA with dynamic fitness.
Thank you