Presentation is loading. Please wait.

Presentation is loading. Please wait.

Agent-Based Modeling and Simulation of Collaborative Social Networks Research in Progress Greg Madey Yongqin Gao Computer Science & Engineering University.

Similar presentations


Presentation on theme: "Agent-Based Modeling and Simulation of Collaborative Social Networks Research in Progress Greg Madey Yongqin Gao Computer Science & Engineering University."— Presentation transcript:

1 Agent-Based Modeling and Simulation of Collaborative Social Networks Research in Progress Greg Madey Yongqin Gao Computer Science & Engineering University of Notre Dame Vincent Freeh Computer Science North Carolina State University Renee Tynan Chris Hoffman Department of Management University of Notre Dame Supported in part by the National Science Foundation - Digital Society & Technology Program AMCIS2003 Tampa, FL August 2003

2 Outline Definitions: Agents, models, simulations, collaborative social networks, computer experimentsDefinitions: Agents, models, simulations, collaborative social networks, computer experiments Phenomenon: Free/Open Source Software (F/OSS)Phenomenon: Free/Open Source Software (F/OSS) Conceptual modelsConceptual models –ER model –BA model –BA model with constant fitness –BA model with dynamic fitness Experiments and resultsExperiments and results SummarySummary Some discussion questionsSome discussion questions

3 Agent-Based Modeling and Simulation Conceptual models of a phenomenonConceptual models of a phenomenon Simulations are computer implementations of the conceptual modelsSimulations are computer implementations of the conceptual models Agents in models and simulations are distinct entities (instantiated objects)Agents in models and simulations are distinct entities (instantiated objects) –Tend to be simple, but with large numbers of them (thousands, or more) - i.e., swarm intelligence –Contrasted with higher level “ intelligent agents ” Foundations in complexity theoryFoundations in complexity theory –Self-organization –Emergence

4 Collaborative Social Networks Research-paper co-authorship, small world phenomenon, e.g., Erdos number (Barabasi 2001, Newman 2001)Research-paper co-authorship, small world phenomenon, e.g., Erdos number (Barabasi 2001, Newman 2001) Movie actors, small world phenomenon, e.g., Kevin Bacon number (Watts 1999, 2003)Movie actors, small world phenomenon, e.g., Kevin Bacon number (Watts 1999, 2003) Interlocking corporate directorshipsInterlocking corporate directorships Open-source software developers (Madey et al, AMCIS 2002)Open-source software developers (Madey et al, AMCIS 2002) Collaborators are nodes in a graph, and collaborative relationship are the edges of the graphCollaborators are nodes in a graph, and collaborative relationship are the edges of the graph

5 Classical Scientific Method 1.Observe the world a)Identify a puzzling phenomenon 2.Generate a falsifiable hypothesis (K. Popper) 3.Design and conduct an experiment with the goal of disproving the hypothesis a)If the experiment “ fails ”, then the hypothesis is accepted (until replaced) b)If the experiment “ succeeds ”, then reject hypothesis, but additional insight into the phenomenon may be obtained and steps 2-3 repeated

6 The Computer Experiment

7 Agent-Based Simulation as a Component of the Scientific Method Modeling (Hypothesis) Agent -Based Simulation (Experiment) Observation

8 Agent-Based Simulation as a Component of the Scientific Method Modeling (Hypothesis) Agent -Based Simulation (Experiment) Observation Social Network Model of F/OSS Grow Artificial SourceForge Analysis of SourceForge Data

9 Open Source Software (OSS) Free …Free … –to view source –to modify –to share –of cost ExamplesExamples –Apache –Perl –GNU –Linux –Sendmail –Python –KDE –GNOME –Mozilla –Thousands more Linux GNU Savannah

10 Free Open Source Software (F/OSS) DevelopmentDevelopment –Mostly volunteer –Global teams –Virtual teams –Self-organized - often peer-based meritocracy –Self-managed - but often a “ charismatic ” leader –Often large numbers of developers, testers, support help, end user participation –Rapid, frequent releases –Mostly unpaid

11 F/OSS Developers Linus Tolvalds Linux Larry Wall Perl Richard Stallman GNU GNU Manifesto Eric Raymond Cathedral and Bazaar

12 F/OSS: A Puzzling Phenomenon Contradicts traditional wisdom:Contradicts traditional wisdom: –Software engineering –Coordination, large numbers –Motivation of developers –Quality –Security –Business strategy Almost everything is done electronically and available in digital formAlmost everything is done electronically and available in digital form Opportunity for IS Research -- large amounts of online data availableOpportunity for IS Research -- large amounts of online data available Research issues: –Understanding motives –Understanding processes –Intellectual property –Digital divide –Self-organization –Government policy –Impact on innovation –Ethics –Economic models –Cultural issues –International factors

13 SourceForge VA Software Part of OSDN Started 12/1999 Collaboration tools 58,685 Projects 80,000 Developers 590,00 Registered Users

14 Savannah Uses SourceForge Software Free Software Foundation 1,508 Projects 15,265 Registered Users

15 F/OSS: Importance Major Component of e-Technology Infrastructure with major presence in e-Commerce e-Science e-Government e-Learning Apache has over 65% market share of Internet Web servers Linux on over 7 million computers Most Internet e-mail runs on Sendmail Tens of thousands of quality products Part of product offerings of companies like IBM, Apple Apache in WebSphere, Linux on mainframe, FreeBSD in OSX Corporate employees participating on OSS projects

16 Free/Open Source Software Seems to challenge traditional economic assumptionsSeems to challenge traditional economic assumptions Model for software engineeringModel for software engineering New business strategiesNew business strategies –Cooperation with competitors –Beyond trade associations, shared industry research, and standards processes — shared product development! Virtual, self-organizing and self-managing teamsVirtual, self-organizing and self-managing teams Social issues, e.g., digital divide, international participationSocial issues, e.g., digital divide, international participation Government policy issues, e.g., US software industry, impact on innovation, security, intellectual propertyGovernment policy issues, e.g., US software industry, impact on innovation, security, intellectual property

17 Research Model

18 Observations Web miningWeb mining Web crawler (scripts)Web crawler (scripts) –Python –Perl –AWK –Sed MonthlyMonthly Since Jan 2001Since Jan 2001 ProjectIDProjectID DeveloperIDDeveloperID Almost 2 million recordsAlmost 2 million records Relational databaseRelational database PROJ|DEVELOPER 8001|dev378 8001|dev8975 8001|dev9972 8002|dev27650 8005|dev31351 8006|dev12509 8007|dev19395 8007|dev4622 8007|dev35611 8008|dev8975

19 Models of the F/OSS Social Network (Alternative Hypotheses) General model featuresGeneral model features –Agents are nodes on a graph (developers or projects) –Behaviors: Create, join, abandon and idle –Edges are relationships (joint project participation) –Growth of network: random or types of preferential attachment, formation of clusters –Fitness –Network attributes: diameter, average degree, degree distribution, clustering coefficient Four specific modelsFour specific models –ER (random graph) - (1960) –BA (preferential attachment) - (1999) –BA ( + constant fitness) - (2001) –BA ( + dynamic fitness) - (2003)

20 dev[59] dev[54] dev[49] dev[64] dev[61] Project 6882 Project 9859 Project 7597 Project 7028 Project 15850 F/OSS Developers - Collaboration Social Network Developers are nodes / Projects are links 24 Developers 5 Projects 2 Linchpin Developers 1 Cluster

21 Computer Experiments Agent-based simulationsAgent-based simulations Java programs using Swarm class libraryJava programs using Swarm class library –Validation (docking) exercises using Java/Repast Grow artificial SourceForge ’ s (Epstein & Axtell, 1996)Grow artificial SourceForge ’ s (Epstein & Axtell, 1996) –Parameterized with observed data, e.g., developer behaviors Join ratesJoin rates New project additionsNew project additions Leave projectsLeave projects –Evaluation of four models (hypotheses) –Verification/validation

22 Four Cycles of Modeling & Simulation Modeling (Hypothesis) Agent -Based Simulation (Experiment) Observation Social Network Models ER => BA => BA+Fitness => BA+Dynamic Fitness Grow Artificial SourceForge Analysis of SourceForge Data Degree Distribution Average Degree Diameter Clustering Coefficient Cluster Size Distribution

23 ER model – degree distribution Degree distribution is binomial distribution while it is power law in empirical data Fit fails

24 ER model - diameter Average degree is decreasing while it is increasing in empirical data Diameter is increasing while it is decreasing in empirical data Fit fails

25 ER model – clustering coefficient Clustering coefficient is relatively low around 0.4 while it is around 0.7 in empirical data. Clustering coefficient is decreasing while it is increasing in empirical data Fit fails

26 ER model – cluster distribution Cluster distribution in ER model also have power law distribution with R 2 as 0.6667 (0.9953 without the major cluster) while R 2 in empirical data is 0.7457 (0.9797 without the major cluster) The actual distribution is different from empirical data The later models (BA and further models) have similar behaviors Fit fails

27 BA model – degree distribution Power laws in degree distribution, similar to empirical data (+ for simulated data and x for empirical data). For developer distribution: simulated data has R 2 of 0.9798 and empirical data has R 2 of 0.9712. –Fit succeeds For project distribution: simulated data has R 2 of 0.6650 and empirical data has R 2 of 0.9815. –Fit fails

28 BA model – diameter and CC Small diameter and high clustering coefficient like empirical data Diameter and clustering coefficient are both decreasing like empirical data Fit succeeds

29 BA model with constant fitness Power laws in degree distribution, similar to empirical data (+ for simulated data and x for empirical data). For developer distribution: simulated data has R 2 as 0.9742 and empirical data has R 2 as 0.9712. –Fit succeeds For project distribution: simulated data has R 2 as 0.7253 and empirical data has R 2 as 0.9815. –Fit fails Diameter and CC are similar to simple BA model. –Fit succeeds

30 Discovery: BA with dynamic fitness Problem with BA with constant fitnessProblem with BA with constant fitness –Intuition: Project fitness might change with time. Data mining observation: project “ life cycle ” property - fitness generally decreases with timeData mining observation: project “ life cycle ” property - fitness generally decreases with time New model not in the literatureNew model not in the literature –Hypothesis: BA with dynamic fitness of projects –Computer experiment

31 BA model with dynamic fitness Power laws in degree distribution, similar to empirical data (+ for simulated data and x for empirical data). For developer distribution: simulated data has R 2 as 0.9695 and empirical data has R 2 as 0.9712. –Fit succeeds (as before) For project distribution: simulated data has R 2 as 0.8051 and empirical data has R 2 as 0.9815. –Fit is better, but more work needed

32 Agent-Based Modeling and Simulation as Components of the Scientific Method Observation Hypothesis Experiment

33 Summary Why Agent-Based Modeling and Simulation?Why Agent-Based Modeling and Simulation? –Can be used as components of the Scientific Method –A research approach for studying socio-technical systems Case study: F/OSS - Collaboration Social NetworksCase study: F/OSS - Collaboration Social Networks –SourceForge conceptual models: ER, BA, BA with constant fitness and BA with dynamic fitness. –Simulations Computer experiments that tested conceptual modelsComputer experiments that tested conceptual models Provided insight into the phenomenon under study and guided data mining of collected observationsProvided insight into the phenomenon under study and guided data mining of collected observations

34 Discussion “The social sciences are, in fact, the ‘hard’ sciences”,“The social sciences are, in fact, the ‘hard’ sciences”, Herbert Simon (1987) Computational social science: agent-based modeling and simulation Kuhn’s periods of “Normal Science” punctuated by “Paradigm shifts” ”Karl Popper’s “theory-testing through falsification” Relevant literature on the role of simulation in the process of scientific discovery

35 Thank you


Download ppt "Agent-Based Modeling and Simulation of Collaborative Social Networks Research in Progress Greg Madey Yongqin Gao Computer Science & Engineering University."

Similar presentations


Ads by Google