Agent-Based Modeling and Simulation of Collaborative Social Networks Research in Progress Greg Madey Yongqin Gao Computer Science & Engineering University.

Slides:



Advertisements
Similar presentations
Overview of Free/Open Source Software for Librarians Eric Goldhagen
Advertisements

Emergence of Scaling in Random Networks Albert-Laszlo Barabsi & Reka Albert.
Analysis and Modeling of Social Networks Foudalis Ilias.
©2002, Pearson Education Canada 1.1 c h a p t e r 1 1 MANAGING THE DIGITAL FIRM: CANADA AND BEYOND CANADA AND BEYOND.
Trends in Object-Oriented Software Evolution: Investigating Network Properties Alexander Chatzigeorgiou George Melas University of Macedonia Thessaloniki,
By Chris Zachor.  Introduction  Background  Open Source Software  The SourceForge community and network  Previous Work  What can be done different?
Web as Graph – Empirical Studies The Structure and Dynamics of Networks.
CPSC 695 Future of GIS Marina L. Gavrilova. The future of GIS.
Towards Understanding: A Study of the SourceForge.net Community using Modeling and Simulation Yongqin Gao Greg Madey Computer Science & Engineering University.
© 2007 The MITRE Corporation. All rights reserved Approved for Public Release; Distribution Unlimited Potential New Ideas from Complexity Science.
Social Networks: Advertising, Pricing and All That Zvi Topol & Itai Yarom.
Supported in part by the National Science Foundation – ISS/Digital Science & Technology Analysis of the Open Source Software development community using.
Conceptual Framework for Agent- Based Modeling and Simulation: The Computer Experiment Yongqin GaoVincent Freeh Greg Madey CSE DepartmentCS Department.
Summary from Previous Lecture Real networks: –AS-level N= 12709, M=27384 (Jan 02 data) route-views.oregon-ix.net, hhtp://abroude.ripe.net/ris/rawdata –
Conceptual Modeling of the Healthcare Ecosystem Eng. Andrei Vasilateanu.
Conference title 1 A Few Bad Apples Are Enough. An Agent-Based Peer Review Game. Juan Bautista Cabotà, Francisco Grimaldo (U. València) Lorena Cadavid.
Patrick Adam Wagstrom October 2004 Community Building in Open Source Software Ecosystems Patrick Adam Wagstrom Department.
ICT TEACHERS` COMPETENCIES FOR THE KNOWLEDGE SOCIETY
E-Business Technology Adoption Assessing B2B and e-Procurement in Canada Sandra Charles Raymond Lepage Presentation for the OECD Electronic Commerce Business.
Peer-to-Peer and Social Networks Random Graphs. Random graphs E RDÖS -R ENYI MODEL One of several models … Presents a theory of how social webs are formed.
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
Analysis and Modeling of the Open Source Software Community Yongqin Gao, Greg Madey Computer Science & Engineering University of Notre Dame Vincent Freeh.
1.Knowledge management 2.Online analytical processing 3. 4.Supply chain management 5.Data mining Which of the following is not a major application.
Prof. Yuan-Shyi Peter Chiu
SCI Scientific Inquiry The Big Picture: Science, Technology, Engineering, etc.
Computers and Society Examine the extent to which Richard Stallman’s GNU manifesto has succeeded in challenging the dominance of conventionally distributed.
Exploring the dynamics of social networks Aleksandar Tomašević University of Novi Sad, Faculty of Philosophy, Department of Sociology
Towards a user generated state? Citizen-government relations in the web 2.0 era Valerie Frissen.
Open Source Web Services Peer to Peer Networking and Canada’s Innovation Agenda Bill St. Arnaud CANARIE Inc
1 Networks The Internet may be described as part technology and part human interaction. To describe it as one or the other is not quite accurate. Unlike.
MINING AND MODELING THE OPEN SOURCE SOFTWARE COMMUNITY
An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
A Self-Manageable Infrastructure for Supporting Web-based Simulations Yingping Huang Xiaorong Xiang Gregory Madey Computer Science & Engineering University.
1 Burning a graph as a model of social contagion Anthony Bonato Ryerson University Institute of Software Chinese Academy of Sciences.
Topology and Evolution of the Open Source Software Community Advisors: Dr. Vincent W. Freeh Dr. Kevin Bowyer Supported in part by the National Science.
Emergence of Scaling and Assortative Mixing by Altruism Li Ping The Hong Kong PolyU
EU funded R&D collaboration networks in the area of Information Society Technologies and the role of Greek actors Aimilia Protogerou Team for the Technological,
1.less than 3 million. 2.less than 10 million. 3.over 23 million. 4.over 100 million. 5.Not sure In the U.S., the number of managers that rely on Information.
Agent 2004Scott Christley, Public Goods Theory of Open Source Community Public Goods Theory of the Open Source Development Community using Agent-based.
Complexity Bruce Kogut October We are entering the epoch of the digitalization of knowledge: past, present, and future Sciences bring to this new.
Quiz  What does Eric Morris mean by “shoehorning”?  How does Ford use Big Data for marketing?  Name 3 things we can learn from Wikinomics according.
Yongqin Gao, Greg Madey Computer Science & Engineering Department University of Notre Dame © Copyright 2002~2003 by Serendip Gao, all rights reserved.
Open Source Examples – Linux; Apache; Firefox Requirements – Distributed w/ source code – License allows for modifications (GPL) – License remains w/ any.
Prof. Lars-Erik Cederman ETH - Center for Comparative and International Studies (CIS) Seilergraben 49, Room G.2, Nils.
Theories and Hypotheses. Assumptions of science A true physical universe exists Order through cause and effect, the connections can be discovered Knowledge.
Most of contents are provided by the website Network Models TJTSD66: Advanced Topics in Social Media (Social.
Changing Role of Librarians in Digital Era and Need of Professional skills, Efficiency & Competency By Goutam Biswas
Mass Collaboration and Crowd-sourcing: How does the Internet help Innovate?
1 Dr. Michael D. Featherstone Introduction to e-Commerce Network Theory 101.
Lake Arrowhead 2005Scott Christley, Understanding Open Source Understanding the Open Source Software Community Presented by Scott Christley Dept. of Computer.
Politecnico di Torino Andrea Capiluppi Characterizing the Open Source Software Process: a Horizontal Study A. Capiluppi, P.
1 Kostas Glinos European Commission - DG INFSO Head of Unit, Géant and e-Infrastructures "The views expressed in this presentation are those of the author.
Netlogo demo. Complexity and Networks Melanie Mitchell Portland State University and Santa Fe Institute.
Electronic Business: Concept and Applications Department of Electrical Engineering Gadjah Mada University.
State and Future of Computing Mary Lou Soffa
Agenda Introduction Literature survey Hardware and software requirements System design System implementation System testing Conclusion and future enhancement.
Department of Computer Science Continuous Experimentation in the B2B Domain: A Case Study Olli Rissanen, Jürgen Münch 23/05/2015www.helsinki.fi/yliopisto.
Network (graph) Models
Lecture 23: Structure of Networks
Structures of Networks
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
1 MANAGING THE DIGITAL INSTITUTION.
open source and free software Najeeb Ullah Student ID
On Growth of Limited Scale-free Overlay Network Topologies
Lecture 23: Structure of Networks
Overview of Electronic Commerce
Peer-to-Peer and Social Networks Fall 2017
A MULTI-MODEL DOCKING EXPERIMENT OF DYNAMIC SOCIAL NETWORK SIMULATIONS
Lecture 23: Structure of Networks
Presentation transcript:

Agent-Based Modeling and Simulation of Collaborative Social Networks Research in Progress Greg Madey Yongqin Gao Computer Science & Engineering University of Notre Dame Vincent Freeh Computer Science North Carolina State University Renee Tynan Chris Hoffman Department of Management University of Notre Dame Supported in part by the National Science Foundation - Digital Society & Technology Program AMCIS2003 Tampa, FL August 2003

Outline Definitions: Agents, models, simulations, collaborative social networks, computer experimentsDefinitions: Agents, models, simulations, collaborative social networks, computer experiments Phenomenon: Free/Open Source Software (F/OSS)Phenomenon: Free/Open Source Software (F/OSS) Conceptual modelsConceptual models –ER model –BA model –BA model with constant fitness –BA model with dynamic fitness Experiments and resultsExperiments and results SummarySummary Some discussion questionsSome discussion questions

Agent-Based Modeling and Simulation Conceptual models of a phenomenonConceptual models of a phenomenon Simulations are computer implementations of the conceptual modelsSimulations are computer implementations of the conceptual models Agents in models and simulations are distinct entities (instantiated objects)Agents in models and simulations are distinct entities (instantiated objects) –Tend to be simple, but with large numbers of them (thousands, or more) - i.e., swarm intelligence –Contrasted with higher level “ intelligent agents ” Foundations in complexity theoryFoundations in complexity theory –Self-organization –Emergence

Collaborative Social Networks Research-paper co-authorship, small world phenomenon, e.g., Erdos number (Barabasi 2001, Newman 2001)Research-paper co-authorship, small world phenomenon, e.g., Erdos number (Barabasi 2001, Newman 2001) Movie actors, small world phenomenon, e.g., Kevin Bacon number (Watts 1999, 2003)Movie actors, small world phenomenon, e.g., Kevin Bacon number (Watts 1999, 2003) Interlocking corporate directorshipsInterlocking corporate directorships Open-source software developers (Madey et al, AMCIS 2002)Open-source software developers (Madey et al, AMCIS 2002) Collaborators are nodes in a graph, and collaborative relationship are the edges of the graphCollaborators are nodes in a graph, and collaborative relationship are the edges of the graph

Classical Scientific Method 1.Observe the world a)Identify a puzzling phenomenon 2.Generate a falsifiable hypothesis (K. Popper) 3.Design and conduct an experiment with the goal of disproving the hypothesis a)If the experiment “ fails ”, then the hypothesis is accepted (until replaced) b)If the experiment “ succeeds ”, then reject hypothesis, but additional insight into the phenomenon may be obtained and steps 2-3 repeated

The Computer Experiment

Agent-Based Simulation as a Component of the Scientific Method Modeling (Hypothesis) Agent -Based Simulation (Experiment) Observation

Agent-Based Simulation as a Component of the Scientific Method Modeling (Hypothesis) Agent -Based Simulation (Experiment) Observation Social Network Model of F/OSS Grow Artificial SourceForge Analysis of SourceForge Data

Open Source Software (OSS) Free …Free … –to view source –to modify –to share –of cost ExamplesExamples –Apache –Perl –GNU –Linux –Sendmail –Python –KDE –GNOME –Mozilla –Thousands more Linux GNU Savannah

Free Open Source Software (F/OSS) DevelopmentDevelopment –Mostly volunteer –Global teams –Virtual teams –Self-organized - often peer-based meritocracy –Self-managed - but often a “ charismatic ” leader –Often large numbers of developers, testers, support help, end user participation –Rapid, frequent releases –Mostly unpaid

F/OSS Developers Linus Tolvalds Linux Larry Wall Perl Richard Stallman GNU GNU Manifesto Eric Raymond Cathedral and Bazaar

F/OSS: A Puzzling Phenomenon Contradicts traditional wisdom:Contradicts traditional wisdom: –Software engineering –Coordination, large numbers –Motivation of developers –Quality –Security –Business strategy Almost everything is done electronically and available in digital formAlmost everything is done electronically and available in digital form Opportunity for IS Research -- large amounts of online data availableOpportunity for IS Research -- large amounts of online data available Research issues: –Understanding motives –Understanding processes –Intellectual property –Digital divide –Self-organization –Government policy –Impact on innovation –Ethics –Economic models –Cultural issues –International factors

SourceForge VA Software Part of OSDN Started 12/1999 Collaboration tools 58,685 Projects 80,000 Developers 590,00 Registered Users

Savannah Uses SourceForge Software Free Software Foundation 1,508 Projects 15,265 Registered Users

F/OSS: Importance Major Component of e-Technology Infrastructure with major presence in e-Commerce e-Science e-Government e-Learning Apache has over 65% market share of Internet Web servers Linux on over 7 million computers Most Internet runs on Sendmail Tens of thousands of quality products Part of product offerings of companies like IBM, Apple Apache in WebSphere, Linux on mainframe, FreeBSD in OSX Corporate employees participating on OSS projects

Free/Open Source Software Seems to challenge traditional economic assumptionsSeems to challenge traditional economic assumptions Model for software engineeringModel for software engineering New business strategiesNew business strategies –Cooperation with competitors –Beyond trade associations, shared industry research, and standards processes — shared product development! Virtual, self-organizing and self-managing teamsVirtual, self-organizing and self-managing teams Social issues, e.g., digital divide, international participationSocial issues, e.g., digital divide, international participation Government policy issues, e.g., US software industry, impact on innovation, security, intellectual propertyGovernment policy issues, e.g., US software industry, impact on innovation, security, intellectual property

Research Model

Observations Web miningWeb mining Web crawler (scripts)Web crawler (scripts) –Python –Perl –AWK –Sed MonthlyMonthly Since Jan 2001Since Jan 2001 ProjectIDProjectID DeveloperIDDeveloperID Almost 2 million recordsAlmost 2 million records Relational databaseRelational database PROJ|DEVELOPER 8001|dev |dev |dev |dev |dev |dev |dev |dev |dev |dev8975

Models of the F/OSS Social Network (Alternative Hypotheses) General model featuresGeneral model features –Agents are nodes on a graph (developers or projects) –Behaviors: Create, join, abandon and idle –Edges are relationships (joint project participation) –Growth of network: random or types of preferential attachment, formation of clusters –Fitness –Network attributes: diameter, average degree, degree distribution, clustering coefficient Four specific modelsFour specific models –ER (random graph) - (1960) –BA (preferential attachment) - (1999) –BA ( + constant fitness) - (2001) –BA ( + dynamic fitness) - (2003)

dev[59] dev[54] dev[49] dev[64] dev[61] Project 6882 Project 9859 Project 7597 Project 7028 Project F/OSS Developers - Collaboration Social Network Developers are nodes / Projects are links 24 Developers 5 Projects 2 Linchpin Developers 1 Cluster

Computer Experiments Agent-based simulationsAgent-based simulations Java programs using Swarm class libraryJava programs using Swarm class library –Validation (docking) exercises using Java/Repast Grow artificial SourceForge ’ s (Epstein & Axtell, 1996)Grow artificial SourceForge ’ s (Epstein & Axtell, 1996) –Parameterized with observed data, e.g., developer behaviors Join ratesJoin rates New project additionsNew project additions Leave projectsLeave projects –Evaluation of four models (hypotheses) –Verification/validation

Four Cycles of Modeling & Simulation Modeling (Hypothesis) Agent -Based Simulation (Experiment) Observation Social Network Models ER => BA => BA+Fitness => BA+Dynamic Fitness Grow Artificial SourceForge Analysis of SourceForge Data Degree Distribution Average Degree Diameter Clustering Coefficient Cluster Size Distribution

ER model – degree distribution Degree distribution is binomial distribution while it is power law in empirical data Fit fails

ER model - diameter Average degree is decreasing while it is increasing in empirical data Diameter is increasing while it is decreasing in empirical data Fit fails

ER model – clustering coefficient Clustering coefficient is relatively low around 0.4 while it is around 0.7 in empirical data. Clustering coefficient is decreasing while it is increasing in empirical data Fit fails

ER model – cluster distribution Cluster distribution in ER model also have power law distribution with R 2 as ( without the major cluster) while R 2 in empirical data is ( without the major cluster) The actual distribution is different from empirical data The later models (BA and further models) have similar behaviors Fit fails

BA model – degree distribution Power laws in degree distribution, similar to empirical data (+ for simulated data and x for empirical data). For developer distribution: simulated data has R 2 of and empirical data has R 2 of –Fit succeeds For project distribution: simulated data has R 2 of and empirical data has R 2 of –Fit fails

BA model – diameter and CC Small diameter and high clustering coefficient like empirical data Diameter and clustering coefficient are both decreasing like empirical data Fit succeeds

BA model with constant fitness Power laws in degree distribution, similar to empirical data (+ for simulated data and x for empirical data). For developer distribution: simulated data has R 2 as and empirical data has R 2 as –Fit succeeds For project distribution: simulated data has R 2 as and empirical data has R 2 as –Fit fails Diameter and CC are similar to simple BA model. –Fit succeeds

Discovery: BA with dynamic fitness Problem with BA with constant fitnessProblem with BA with constant fitness –Intuition: Project fitness might change with time. Data mining observation: project “ life cycle ” property - fitness generally decreases with timeData mining observation: project “ life cycle ” property - fitness generally decreases with time New model not in the literatureNew model not in the literature –Hypothesis: BA with dynamic fitness of projects –Computer experiment

BA model with dynamic fitness Power laws in degree distribution, similar to empirical data (+ for simulated data and x for empirical data). For developer distribution: simulated data has R 2 as and empirical data has R 2 as –Fit succeeds (as before) For project distribution: simulated data has R 2 as and empirical data has R 2 as –Fit is better, but more work needed

Agent-Based Modeling and Simulation as Components of the Scientific Method Observation Hypothesis Experiment

Summary Why Agent-Based Modeling and Simulation?Why Agent-Based Modeling and Simulation? –Can be used as components of the Scientific Method –A research approach for studying socio-technical systems Case study: F/OSS - Collaboration Social NetworksCase study: F/OSS - Collaboration Social Networks –SourceForge conceptual models: ER, BA, BA with constant fitness and BA with dynamic fitness. –Simulations Computer experiments that tested conceptual modelsComputer experiments that tested conceptual models Provided insight into the phenomenon under study and guided data mining of collected observationsProvided insight into the phenomenon under study and guided data mining of collected observations

Discussion “The social sciences are, in fact, the ‘hard’ sciences”,“The social sciences are, in fact, the ‘hard’ sciences”, Herbert Simon (1987) Computational social science: agent-based modeling and simulation Kuhn’s periods of “Normal Science” punctuated by “Paradigm shifts” ”Karl Popper’s “theory-testing through falsification” Relevant literature on the role of simulation in the process of scientific discovery

Thank you