The technical impact of social links in free software development Rishab Aiyer Ghosh MERIT/Infonomics, University of Maastricht Oxford Workshop.

Slides:



Advertisements
Similar presentations
CORRELATION. Overview of Correlation u What is a Correlation? u Correlation Coefficients u Coefficient of Determination u Test for Significance u Correlation.
Advertisements

Correlation and Linear Regression.
Quantitative Techniques
Mining Metrics to Predict Component Failures Nachiappan Nagappan, Microsoft Research Thomas Ball, Microsoft Research Andreas Zeller, Saarland University.
CORRELATION. Overview of Correlation u What is a Correlation? u Correlation Coefficients u Coefficient of Determination u Test for Significance u Correlation.
PSY 307 – Statistics for the Behavioral Sciences
Dr. Mario MazzocchiResearch Methods & Data Analysis1 Correlation and regression analysis Week 8 Research Methods & Data Analysis.
Correlational Designs
Correlation and Regression Analysis
DESIGNING, CONDUCTING, ANALYZING & INTERPRETING DESCRIPTIVE RESEARCH CHAPTERS 7 & 11 Kristina Feldner.
CORRELATIO NAL RESEARCH METHOD. The researcher wanted to determine if there is a significant relationship between the nursing personnel characteristics.
Educational Research: Correlational Studies EDU 8603 Educational Research Richard M. Jacobs, OSA, Ph.D.
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
8/10/2015Slide 1 The relationship between two quantitative variables is pictured with a scatterplot. The dependent variable is plotted on the vertical.
McGraw-Hill © 2006 The McGraw-Hill Companies, Inc. All rights reserved. Correlational Research Chapter Fifteen.
The measure that one trait (or behavior) is related to another
Correlations 11/5/2013. BSS Career Fair Wednesday 11/6/2013- Mabee A & B 12:30-2:30P.
Correlations 11/7/2013. Readings Chapter 8 Correlation and Linear Regression (Pollock) (pp ) Chapter 8 Correlation and Regression (Pollock Workbook)
Open Source: an environment for skills development and economic growth EuroIndia 2004 conference New Delhi, March 2004 Rishab Aiyer Ghosh
Correlation and Correlational Research Slides Prepared by Alison L. O’Malley Passer Chapter 5.
N318b Winter 2002 Nursing Statistics Specific statistical tests: Correlation Lecture 10.
Covariance and correlation
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
Jon Curwin and Roger Slater, QUANTITATIVE METHODS: A SHORT COURSE ISBN © Thomson Learning 2004 Jon Curwin and Roger Slater, QUANTITATIVE.
Statistical Analysis A Quick Overview. The Scientific Method Establishing a hypothesis (idea) Collecting evidence (often in the form of numerical data)
Understanding Statistics
1 Are East Asian companies benefiting from Western board practices? John Nowland Discussed by Joseph P.H. Fan Centre of Economics & Finance Chinese University.
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
L 1 Chapter 12 Correlational Designs EDUC 640 Dr. William M. Bauer.
MEASURES of CORRELATION. CORRELATION basically the test of measurement. Means that two variables tend to vary together The presence of one indicates the.
When trying to explain some of the patterns you have observed in your species and community data, it sometimes helps to have a look at relationships between.
Correlation Correlation is used to measure strength of the relationship between two variables.
By: Amani Albraikan.  Pearson r  Spearman rho  Linearity  Range restrictions  Outliers  Beware of spurious correlations….take care in interpretation.
N318b Winter 2002 Nursing Statistics Specific statistical tests: Regression Lecture 11.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Objectives 2.1Scatterplots  Scatterplots  Explanatory and response variables  Interpreting scatterplots  Outliers Adapted from authors’ slides © 2012.
Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables.
Slide 1 The introductory statement in the question indicates: The data set to use (2001WorldFactBook) The task to accomplish (association between variables)
Correlation. Correlation Analysis Correlations tell us to the degree that two variables are similar or associated with each other. It is a measure of.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
Understanding Free Software Developers: Findings from the FLOSS Study Rishab Aiyer Ghosh HBS/MIT conference, June 19, 2003 MERIT/Infonomics, University.
© (2015, 2012, 2008) by Pearson Education, Inc. All Rights Reserved Chapter 11: Correlational Designs Educational Research: Planning, Conducting, and Evaluating.
Free/Libre and Open Source Software Rishab Aiyer Ghosh Gregorio Robles Ruediger Glott - Source code analysis - International Institute of Infonomics, Maastricht.
The Correlational Research Strategy Chapter 12. Correlational Research The goal of correlational research is to describe the relationship between variables.
Level 3 Practical Investigation Where to start?. Aim This is the purpose of your practical i.e. what it is that you want to find out This is the purpose.
CORRELATION ANALYSIS.
Educational Research: Data analysis and interpretation – 1 Descriptive statistics EDU 8603 Educational Research Richard M. Jacobs, OSA, Ph.D.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Research Methods: 2 M.Sc. Physiotherapy/Podiatry/Pain Correlation and Regression.
LOOKING FOR PATTERNS BETWEEN VARIABLES WHY DO SCIENTISTS COLLECT DATA, GRAPH IT AND WRITE EQUATIONS TO EXPRESS RELATIONSHIPS BETWEEN VARIABLES ?
Sustainable Openness ● Designing Cyberinfrastructure for Collaboration and Innovation ● National Science Foundation, Washington DC ● January 29, 2007 ●
Spearman’s Rho Correlation
Selecting the Best Measure for Your Study
Correlations Correlations
Christian Hahn, M.Sc. & Lorne Campbell, PhD
Understanding Standards Event Higher Statistics Award
Module 15-2 Objectives Determine a line of best fit for a set of linear data. Determine and interpret the correlation coefficient.
CHAPTER fourteen Correlation and Regression Analysis
Spearman’s Rank Correlation Test
Inferential Statistics
Spearman’s rho Chi-square (χ2)
Theme 7 Correlation.
Formation of relationships Matching Hypothesis
Educational Research: Correlational Studies
Formation of relationships Matching Hypothesis
Logistic Regression --> used to describe the relationship between
Curvilinear Regression
An Introduction to Correlational Research
Free/Libre and Open Source Software
Presentation transcript:

The technical impact of social links in free software development Rishab Aiyer Ghosh MERIT/Infonomics, University of Maastricht Oxford Workshop on Libre Software, Oxford Internet Institute, June 25, 2004

Oxford, June 25, 2004© Rishab Aiyer Ghosh / MERIT Some data on the Linux kernel -number of developers, share of sub-modules

Oxford, June 25, 2004© Rishab Aiyer Ghosh / MERIT Some data on the Linux kernel -number of sub-modules per author

Oxford, June 25, 2004© Rishab Aiyer Ghosh / MERIT Some data on the Linux kernel -number of co-authors per author

Oxford, June 25, 2004© Rishab Aiyer Ghosh / MERIT Social choice of modules? Most modules have just a few authors Most authors have contributed to just one module However, these lone contributions are not made to 1-author modules New contributors choose modules with many other contributors

Oxford, June 25, 2004© Rishab Aiyer Ghosh / MERIT Mean developers per module Mean number of “co- developers” per module for 1-module contributors: over 45% have more than 200 co-developers per module.

Oxford, June 25, 2004© Rishab Aiyer Ghosh / MERIT Mean developers per module Mean number of “co- developers” per module: - for 1-module contributors: over 45% have more than 200 co- developers per module. - for 2-module contributors (red overlay), only a slight change. I.e. 2 nd modules are not much smaller than 1 st modules contributed to.

Oxford, June 25, 2004© Rishab Aiyer Ghosh / MERIT Mean developers per module Correlation: number of modules authored by average number of developers per module avdev v1.0 (n=158) v (n=618) v (n=2263) (Pearson 2-tailed: correlations are significant at the 0.01 level)

Oxford, June 25, 2004© Rishab Aiyer Ghosh / MERIT Developer numbers / module size Contributors don’t necessarily know how many co-authors they have / will have for a given module Code size of module is easily available Size is an explicit proxy for “importance” Hypothesis: developers are attracted to “important” projects, partly because they have many other developers

Oxford, June 25, 2004© Rishab Aiyer Ghosh / MERIT Developer numbers / module size Module size highly correlated to number of authors Correlation v1.0 (n=30) v (n=60) v (n=169) (Pearson 2-tailed: correlations are significant at the 0.01 level)

Oxford, June 25, 2004© Rishab Aiyer Ghosh / MERIT Module association Modules can be linked together to form a graph based on at least two criteria Authorship: one or more contributors are common to modules Code: one module depends on functions defined in another module It turns out that the co-incidence of these two attributes is quite high

Oxford, June 25, 2004© Rishab Aiyer Ghosh / MERIT Module association: authors, v1.0

Oxford, June 25, 2004© Rishab Aiyer Ghosh / MERIT Module association: code, v1.0

Oxford, June 25, 2004© Rishab Aiyer Ghosh / MERIT Module association: both, v1.0

Oxford, June 25, 2004© Rishab Aiyer Ghosh / MERIT Module association: both, v2.5.25

Oxford, June 25, 2004© Rishab Aiyer Ghosh / MERIT Module association: both, v2.5.25

Oxford, June 25, 2004© Rishab Aiyer Ghosh / MERIT Exploring the author-code link Strong degree of coincidence – between author link and code link Significant correlation between strength of author link and presence of code link Indications of dynamic impact of author link on future code links (and vice versa)

Oxford, June 25, 2004© Rishab Aiyer Ghosh / MERIT Exploring the author-code link Phi 4-point similarity between binary variables for author link & code link: v1.0 (n=435) v (n=1770) v (n=14196) 0.341

Oxford, June 25, 2004© Rishab Aiyer Ghosh / MERIT Exploring the author-code link Spearman’s rho: scalar variables for strength of author link against binary variable for presence of code link: v1.0 (n=435) v (n=1770) v (n=14196) (strength measured by number of common authors; also tested with other strength measures)

Oxford, June 25, 2004© Rishab Aiyer Ghosh / MERIT Exploring the author-code link Analytical problems: no good fit model with regression, possibly due to highly skewed data Hard to select strength (rather than binary presence) variable for code dependency link Time lag between versions too big for dynamic analysis

Oxford, June 25, 2004© Rishab Aiyer Ghosh / MERIT Future steps Similar exercise using finer dynamic granularity (and possibly CVS data, for activity rather than cumulative authorship) may allow better interpretation Better data on code dependency (especially dependency strength) may help in identifying relationships

Oxford, June 25, 2004© Rishab Aiyer Ghosh / MERIT Tentative conclusions Significant relationship between social and technical links between modules Direction of causality is unclear Impact on code development is, however, potentially very high “tip of the iceberg” – e.g. code reuse (including dependency on “trivial” rather than “complex” functions) may be much higher with greater social links

Oxford, June 25, 2004© Rishab Aiyer Ghosh / MERIT Future steps Use of clustering algorithms may help identify predictor relationships Common authorship may predict code dependency group-wise more than pair-wise

Oxford, June 25, 2004© Rishab Aiyer Ghosh / MERIT Further information CODD technical papers: orbiten.org/codd/ codd.berlios.de