Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University An Exploration of Power-law in Use-relation of Java Software Systems Makoto Ichii, Makoto Matsushita, Katsuro Inoue Osaka University 2008/3/26ASWEC
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Software Component Graph A software system is composed of software components. Software component (component): building unit of a software system Complex use-relation is formed between components Software component graph (component graph) represents use-relation between components node: component / edge: use-relation Various researches utilize component graphs to analyze software systems It is important to know the nature of component graphs 2008/3/26ASWEC
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Power-law distribution A graph is characterized by the degree distribution The graphs whose degree distribution follows the power-law distribution attracts attention in various research domains Link structure of WWW pages Hosts on the Internet Such graphs tend to have interesting characteristics Self similarity Fault tolerance 2008/3/26ASWEC Explore the component graphs to seek whether the degree distributions follow the power law p(x) = Cx - α
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Questions [1-2/4] Q. 1 Do the in- and out-degree distributions of a component graph of a software system follow the power law? 2008/3/26ASWEC Q. 2 Do the in- and out-degree distributions of a component graph of multiple software systems follow the power law? ? ?
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Questions [3-4/4] Q. 3 Do the in- and out-degree distributions of subgraph of a component graph follow the power law? 2008/3/26ASWEC Q. 4 What aspects of components affects the in- and out-degree distribution of component graphs? ?
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Definitions [1/2] Component: Java class (including interface) Use-relation: Any of the following six relation types acquired by static analysis of the component source files. A class or an interface extends another class or interface respectively. A class implements an interface. A class or an interface declares a variable of a class or an interface. A class instantiates a class object. A class calls a method of a class or an interface. A class or an interface references to a field variable of a class or an interface. 2008/3/26ASWEC
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Definitions [2/2] Component graph: Directed simple graph node: component edge: use-relation between components In-(Out-)degree: The number of incoming (outgoing) edges to a node 2008/3/26ASWEC class B { … A.exec(); … } class A { void exec() { … } A B class C { … A a = new A(); … } C in-degree: 2 out-degree: 0 in-degree: 0 out-degree: 1 in-degree: 0 out-degree: 1
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Observing the power-law Plot cumulative frequency on log- log axis The data forms a straight line if the distribution is the power law 2008/3/26ASWEC gradient : -α gradient : -(α-1) p(x) = Cx - α in-(or out-)degree M. E. J. Newman, "Power laws, Pareto distributions and Zipf's law", Contemporary Physics 46, (2005)
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Values shown in the experiments α: exponent Derive from the gradient of the regression line R *2 : the determination coefficient adjusted for the degree of freedom Fitness of a regression model for data [0..1] Large value means good fitness 2008/3/26ASWEC gradient : -(α-1) p(x) = Cx - α in-(or out-)degree
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Experiment 1 Setup component sets Each set contains a single software system Analyze component sets to create component graphs. Plot cumulative frequency of the degrees on log-log axis. 2008/3/26ASWEC Q. 1 Do the in- and out-degree distributions of a component graph of a software system follow the power law? Description# of components JDKJava 2 SE Software Development Kit 1.411,556 ECLIPSEEclipse ,941
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Result of experiment 1 / JDK 2008/3/26ASWEC αR *2 in-degree 2.1 ±8.6× out-degree 3.1 ±8.2× ►The in-degree follows the power law ►The out-degree does not follow the power law # of Nodes11,556 # of Edges107,198
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Result of experiment 1 / ECLIPSE 2008/3/26ASWEC ►The similar characteristics with JDK The in-degree follows the power law The out-degree does not follow the power law αR *2 in-degree 2.2 ±1.6× out-degree 3.0 ±7.7× # of Nodes13,941 # of Edges140,678
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Experiment 2 Setup component sets Each set contains multiple software systems Use-relation across the systems exists Analyze component sets to create component graphs. Plot cumulative frequency of the degrees on log-log axis. 2008/3/26ASWEC Q. 2 Do the in- and out-degree distributions of a component graph for multiple software systems follow the power law? Description# of components ASFVarious projects checked out from the repository of Apache Software Foundation 59,486 SPARS_DBThe components stored in the database of demo.spars.info (includes ASF, JDK) 180,637
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Result of experiment 2 / ASF 2008/3/26ASWEC ►The similar characteristics with Exp. 1 The in-degree follows the power law The out-degree does not follow the power law αR *2 in-degree 2.4 ±1.1× out-degree 3.4±6.4× # of Nodes59,486 # of Edges303,755
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Result of experiment 2 / SPARS_DB 2008/3/26ASWEC αR *2 in-degree 2.0 ±1.5× out-degree 3.7 ±7.0× # of Nodes180,637 # of Edges1,808,982 ►The similar characteristics with Exp. 1 The in-degree follows the power law The out degree does not follow the power-law completely ►In-degree distribution fits to the power-law straight line almost ideally.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Experiment 3 Construct subsets of SPARS_DB Keyword: The components that contain a specified keyword in the source code The keywords are randomly selected so that the number of resulting components is about 1,000/10,000 Random: 1,000/10,000 random components Analyze component sets to create component graphs. Plot cumulative frequency of the degrees on log-log axis. 2008/3/26ASWEC Q. 3 Do the in- and out-degree distributions of subgraph of a component graph for software systems follow the power law? Description# of components KWD1KThe components that contain “labels”1,002 KWD10KThe components that contain “getstring”8,938 RND1KRandomly-selected 1,000 components1,000 RND10KRandomly-selected 10,000 components10,000
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Result of experiment 3 / KWD1K 2008/3/26ASWEC αR *2 in-degree 2.2 ±3.3× out-degree 3.7 ±2.0× # of Nodes1,002 # of Edges1,564 ►The similar characteristics with SPARS_DB The in-degree follows the power law The out-degree does not follow the power law
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Result of experiment 3 / KWD10K 2008/3/26ASWEC αR *2 in-degree 2.1 ±9.3× out-degree 3.4 ±2.7× # of Nodes8,938 # of Edges24,317 ►The similar characteristics with SPARS_DB The in-degree follows the power law The out-degree does not follow the power law
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Result of experiment 3 / RND1K 2008/3/26ASWEC αR *2 in-degree 2.3 ±1.8× out-degree N/A # of Nodes1,000 # of Edges52 ►The original characteristics is almost lost
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Result of experiment 3 / RND10K 2008/3/26ASWEC αR *2 in-degree 1.9 ±2.1× out-degree 4.3 ±3.3× # of Nodes10,000 # of Edges6,184 ►The similar characteristics with SPARS_DB, however # of edges is small
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Experiment 4 List top-ten components in the in- and out-degree Calculate correlation between degrees and metric values. Spearman's rank correlation coefficient Target: SPARS_DB 2008/3/26ASWEC Q. 4 What aspects of components affects the in- and out-degree distribution of component graphs? MetricDescription LOCNon-comment source lines of code WMC1A variation of weighted methods per class (WMC) Weight of a method: constant value (1) WMC2A variation of WMC Weight of a method: Cyclomatic complexity LCOMA variation of lack of cohesion of methods: LCOM5
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Result of experiment 4 / In-degree Top-ten components The components that have fundamental/general role Correlation with metrics In-degree have low correlation with the metrics The in-degree relates to the role 2008/3/26ASWEC NameLOC In- degre e Out- degre e 1java.lang.String675116, java.lang.Object3598,2614 3java.lang.Class60529, java.lang.Exception1521,0462 5java.lang.Throwable13619, java.lang.System17019, java.util.Iterator515,5221 8java.util.List2714,4624 9java.util.ArrayList20013, java.lang.Integer28512,7369 Out- degree LOCWMC1 LCOM In- degree
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Result of experiment 4 / Out-degree Top-ten components Simply large/complex classes Correlation with metrics High correlation with LOC and WMC The out-degree relates to the size/complexity of a component 2008/3/26ASWEC In- degree LOCWMC1 LCOM Out- degree NameLOC In- degre e Out- degre e 1org.apache...FunctionEval org.jgraph.GPGraphpad2, com.jgraph.GPGraphpad2, org.jgraph.GPGraphpad org.eclipse... ASTConverter4, org.eclipse...JavaEditor1, net.sourceforge... GanttProject3, it.businesslogic... MainFrame7, org... InstConstraintVisitor1, org... ASTInstructionCompiler2,
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Answers: summary of experiments [1/4] Q. 1 Do the in- and out-degree distributions of a component graph of a software system follow the power law? The in-degree follows the power law The out-degree does not follow the power law Mixture of the power-law distribution and the lognormal distribution 2008/3/26ASWEC
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Answers: summary of experiments [2/4] Q. 2 Do the in- and out-degree distributions of a component graph for multiple software systems follow the power law? The in-degree follows the power law The out-degree does not follow the power law The similar results with that of single software systems 2008/3/26ASWEC
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Answers: summary of experiments [3/4] Q. 3 Do the in- and out-degree distributions of subgraph of a component graph for software systems follow the power law? Depends on how the subgraph is created. Keyword-based subgraph has similar characteristics with the superset Related components likely share words Random-selection-based subgraph with small number of nodes has different characteristics Few edges exist. 2008/3/26ASWEC
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Answers: summary of experiments [4/4] Q. 4 What aspects of components affects the in- and out-degree distribution of component graphs? In-degree relates to the roles of components Most of the components are used at the specific part Components with fundamental/general role are used from everywhere The more the size of component set grows, the more the value of in-degree becomes large. Out-degree relates to size/complexity of components Many components have reasonable size/complexity Some components may have relatively large size/complexity Extremely large components are unreasonable 2008/3/26ASWEC
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Summary Component graphs are investigated to seek whether the in- and out-degree distribution follows the power-law As the results, following characteristics are revealed. The in-degree distribution follows the power-law The in-degree of a component relates to the role of the component The out-degree distribution does not follows the power-law The out-degree of a component relates to the size/complexity of the component Some sort of subgraph of a component graph have the same characteristics of degree distribution with the graph. Future works Explore the other types of component graph 2008/3/26ASWEC
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2008/3/26ASWEC
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University /3/26ASWEC
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University SPARS_DB Repositories /3/26ASWEC repositories / 750 modules
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University ASF Modules (projects) 107 modules ant ant-antidote avalon avalon-components avalon-excalibur avalon-logkit avalon- phoenix avalon-sandbox cocoon-1 cocoon-2-historical cocoon-2.0 cocoon-2.1 cocoon-2.2 cocoon-lenya db-commons db-commons-sandbox db-ojb db-torque jakarta-alexandria jakarta-bcel jakarta-bsf jakarta-cactus jakarta-commons jakarta- commons-sandbox jakarta-ecs jakarta-ecs2 jakarta-hivemind jakarta-jetspeed jakarta-jetspeed-2 jakarta-jmeter jakarta-log4j jakarta-log4j-sandbox jakarta-lucene jakarta-lucene-sandbox jakarta-ojb jakarta-oro jakarta-pluto jakarta-poi jakarta- regexp jakarta-servletapi jakarta-servletapi-4 jakarta-servletapi-5 jakarta-slide jakarta-struts jakarta-taglibs jakarta-taglibs-sandbox jakarta-tapestry jakarta-tomcat jakarta-tomcat-4.0 jakarta-tomcat-5 jakarta-tomcat-catalina jakarta-tomcat- connectors jakarta-tomcat-jasper jakarta-tomcat-service jakarta-tools jakarta- turbine-2 jakarta-turbine-3 jakarta-turbine-flux jakarta-turbine-fulcrum jakarta- turbine-jcs jakarta-turbine-jyve jakarta-turbine-orgami jakarta-turbine-stratum jakarta-turbine-tdk jakarta-turbine-torque jakarta-velocity jakarta-velocity-dvsl jakarta-velocity-tools jakarta-watchdog jakarta-watchdog-4.0 james-server logging- log4j logging-log4j-attic logging-log4j-sandbox maven maven-components maven- jelly-tags maven-plugins maven-plugins-sandbox maven-scm maven-wagon ws- admin ws-axis ws-fx ws-jaxme ws-juddi ws-soap ws-wsif ws-wsil ws-wsrp4j ws- xmlrpc xml-admin xml-axkit xml-batik xml-cocoon2 xml-commons xml-contrib xml- crimson xml-fop xml-forrest xml-security xml-stylebook xml-xalan xml-xang xml- xerces xml-xindice xml-xmlbeans 2008/3/26ASWEC
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Supplements for Exp.3 / More Results for Keyword-based subset [1/2] Histogram of α / R* 2 of in-degree The keywords submitted to demo.spars.info as search queries 2008/3/26ASWEC
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Supplements for Exp.3 / More Results for Keyword-based subset [2/2] X-axis: the number of components in subset 2008/3/26ASWEC
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Supplements for Exp.4 / Cross-correlation 2008/3/26ASWEC
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Supplements for Exp.4 / Scatter plot [1/2] 2008/3/26ASWEC
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Supplements for Exp.4 / Scatter plot [2/2] 2008/3/26ASWEC
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Discussion Generative models of a power-law graph If a node is added to a graph, the nodes with large degree tend to get the edge to the new node. “rich get richer” Meanings for component graphs If a new component is added to (developed for) a software system, the new component uses the component that is already used by many components The members of frequently-used components hardly change even if the software development proceeds If the member changes, it means that the fundamental structure (design, architecture) of the software is changed 2008/3/26ASWEC
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Clustering Coefficient (of a node) The connectivity between the nodes connected to the target node C i =n/ k i C 2 n : # edges between the nodes that connects to node i k i : the degree of node i 2008/3/26ASWEC / 4 C 2 = 0.5
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Clustering Coefficient and Hierarchical Organization C(k): average clustering coefficient of nodes whose degree= k A graph has hierarchical organization if C(k) ~ k /3/26ASWEC E. Ravasz, A.L. Barabasi, "Hierarchical organization in complex networks",Physical Review E, vol 67, , 2003.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Clustering Coefficient and Hierarchical Organization / Result 2008/3/26ASWEC JDK SPARS_DB KWD1K(“labels”) C(k) ~ k 0.80 C(k) ~ k 0.93 C(k) ~ k 0.96