Download presentation
Presentation is loading. Please wait.
Published byLaurence Abel Mosley Modified over 9 years ago
1
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University An Exploration of Power-law in Use-relation of Java Software Systems Makoto Ichii, Makoto Matsushita, Katsuro Inoue Osaka University 2008/3/26ASWEC 2008 1
2
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Software Component Graph A software system is composed of software components. Software component (component): building unit of a software system Complex use-relation is formed between components Software component graph (component graph) represents use-relation between components node: component / edge: use-relation Various researches utilize component graphs to analyze software systems It is important to know the nature of component graphs 2008/3/26ASWEC 2008 2
3
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Power-law distribution A graph is characterized by the degree distribution The graphs whose degree distribution follows the power-law distribution attracts attention in various research domains Link structure of WWW pages Hosts on the Internet Such graphs tend to have interesting characteristics Self similarity Fault tolerance 2008/3/26ASWEC 2008 3 Explore the component graphs to seek whether the degree distributions follow the power law p(x) = Cx - α
4
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Questions [1-2/4] Q. 1 Do the in- and out-degree distributions of a component graph of a software system follow the power law? 2008/3/26ASWEC 2008 4 Q. 2 Do the in- and out-degree distributions of a component graph of multiple software systems follow the power law? ? ?
5
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Questions [3-4/4] Q. 3 Do the in- and out-degree distributions of subgraph of a component graph follow the power law? 2008/3/26ASWEC 2008 5 Q. 4 What aspects of components affects the in- and out-degree distribution of component graphs? ?
6
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Definitions [1/2] Component: Java class (including interface) Use-relation: Any of the following six relation types acquired by static analysis of the component source files. A class or an interface extends another class or interface respectively. A class implements an interface. A class or an interface declares a variable of a class or an interface. A class instantiates a class object. A class calls a method of a class or an interface. A class or an interface references to a field variable of a class or an interface. 2008/3/26ASWEC 2008 6
7
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Definitions [2/2] Component graph: Directed simple graph node: component edge: use-relation between components In-(Out-)degree: The number of incoming (outgoing) edges to a node 2008/3/26ASWEC 2008 7 class B { … A.exec(); … } class A { void exec() { … } A B class C { … A a = new A(); … } C in-degree: 2 out-degree: 0 in-degree: 0 out-degree: 1 in-degree: 0 out-degree: 1
8
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Observing the power-law Plot cumulative frequency on log- log axis The data forms a straight line if the distribution is the power law 2008/3/26ASWEC 2008 8 gradient : -α gradient : -(α-1) p(x) = Cx - α in-(or out-)degree M. E. J. Newman, "Power laws, Pareto distributions and Zipf's law", Contemporary Physics 46, 323-351 (2005)
9
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Values shown in the experiments α: exponent Derive from the gradient of the regression line R *2 : the determination coefficient adjusted for the degree of freedom Fitness of a regression model for data [0..1] Large value means good fitness 2008/3/26ASWEC 2008 9 gradient : -(α-1) p(x) = Cx - α in-(or out-)degree
10
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Experiment 1 Setup component sets Each set contains a single software system Analyze component sets to create component graphs. Plot cumulative frequency of the degrees on log-log axis. 2008/3/26ASWEC 2008 10 Q. 1 Do the in- and out-degree distributions of a component graph of a software system follow the power law? Description# of components JDKJava 2 SE Software Development Kit 1.411,556 ECLIPSEEclipse 3.0.113,941
11
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Result of experiment 1 / JDK 2008/3/26ASWEC 2008 11 αR *2 in-degree 2.1 ±8.6×10 -3 0.99 out-degree 3.1 ±8.2×10 -2 0.88 ►The in-degree follows the power law ►The out-degree does not follow the power law # of Nodes11,556 # of Edges107,198
12
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Result of experiment 1 / ECLIPSE 2008/3/26ASWEC 2008 12 ►The similar characteristics with JDK The in-degree follows the power law The out-degree does not follow the power law αR *2 in-degree 2.2 ±1.6×10 -2 0.96 out-degree 3.0 ±7.7×10 -2 0.86 # of Nodes13,941 # of Edges140,678
13
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Experiment 2 Setup component sets Each set contains multiple software systems Use-relation across the systems exists Analyze component sets to create component graphs. Plot cumulative frequency of the degrees on log-log axis. 2008/3/26ASWEC 2008 13 Q. 2 Do the in- and out-degree distributions of a component graph for multiple software systems follow the power law? Description# of components ASFVarious projects checked out from the repository of Apache Software Foundation 59,486 SPARS_DBThe components stored in the database of demo.spars.info (includes ASF, JDK) 180,637
14
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Result of experiment 2 / ASF 2008/3/26ASWEC 2008 14 ►The similar characteristics with Exp. 1 The in-degree follows the power law The out-degree does not follow the power law αR *2 in-degree 2.4 ±1.1×10 -2 0.98 out-degree 3.4±6.4×10 -2 0.94 # of Nodes59,486 # of Edges303,755
15
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Result of experiment 2 / SPARS_DB 2008/3/26ASWEC 2008 15 αR *2 in-degree 2.0 ±1.5×10 -3 1.00 out-degree 3.7 ±7.0×10 -2 0.90 # of Nodes180,637 # of Edges1,808,982 ►The similar characteristics with Exp. 1 The in-degree follows the power law The out degree does not follow the power-law completely ►In-degree distribution fits to the power-law straight line almost ideally.
16
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Experiment 3 Construct subsets of SPARS_DB Keyword: The components that contain a specified keyword in the source code The keywords are randomly selected so that the number of resulting components is about 1,000/10,000 Random: 1,000/10,000 random components Analyze component sets to create component graphs. Plot cumulative frequency of the degrees on log-log axis. 2008/3/26ASWEC 2008 16 Q. 3 Do the in- and out-degree distributions of subgraph of a component graph for software systems follow the power law? Description# of components KWD1KThe components that contain “labels”1,002 KWD10KThe components that contain “getstring”8,938 RND1KRandomly-selected 1,000 components1,000 RND10KRandomly-selected 10,000 components10,000
17
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Result of experiment 3 / KWD1K 2008/3/26ASWEC 2008 17 αR *2 in-degree 2.2 ±3.3×10 -2 0.98 out-degree 3.7 ±2.0×10 -1 0.93 # of Nodes1,002 # of Edges1,564 ►The similar characteristics with SPARS_DB The in-degree follows the power law The out-degree does not follow the power law
18
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Result of experiment 3 / KWD10K 2008/3/26ASWEC 2008 18 αR *2 in-degree 2.1 ±9.3×10 -3 0.99 out-degree 3.4 ±2.7×10 -1 0.93 # of Nodes8,938 # of Edges24,317 ►The similar characteristics with SPARS_DB The in-degree follows the power law The out-degree does not follow the power law
19
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Result of experiment 3 / RND1K 2008/3/26ASWEC 2008 19 αR *2 in-degree 2.3 ±1.8×10 -1 0.93 out-degree N/A # of Nodes1,000 # of Edges52 ►The original characteristics is almost lost
20
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Result of experiment 3 / RND10K 2008/3/26ASWEC 2008 20 αR *2 in-degree 1.9 ±2.1×10 -2 0.98 out-degree 4.3 ±3.3×10 -1 0.91 # of Nodes10,000 # of Edges6,184 ►The similar characteristics with SPARS_DB, however # of edges is small
21
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Experiment 4 List top-ten components in the in- and out-degree Calculate correlation between degrees and metric values. Spearman's rank correlation coefficient Target: SPARS_DB 2008/3/26ASWEC 2008 21 Q. 4 What aspects of components affects the in- and out-degree distribution of component graphs? MetricDescription LOCNon-comment source lines of code WMC1A variation of weighted methods per class (WMC) Weight of a method: constant value (1) WMC2A variation of WMC Weight of a method: Cyclomatic complexity LCOMA variation of lack of cohesion of methods: LCOM5
22
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Result of experiment 4 / In-degree Top-ten components The components that have fundamental/general role Correlation with metrics In-degree have low correlation with the metrics The in-degree relates to the role 2008/3/26ASWEC 2008 22 NameLOC In- degre e Out- degre e 1java.lang.String675116,23921 2java.lang.Object3598,2614 3java.lang.Class60529,68241 4java.lang.Exception1521,0462 5java.lang.Throwable13619,51912 6java.lang.System17019,17527 7java.util.Iterator515,5221 8java.util.List2714,4624 9java.util.ArrayList20013,65619 10java.lang.Integer28512,7369 Out- degree LOCWMC1 LCOM In- degree 0.000.070.240.080.12
23
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Result of experiment 4 / Out-degree Top-ten components Simply large/complex classes Correlation with metrics High correlation with LOC and WMC The out-degree relates to the size/complexity of a component 2008/3/26ASWEC 2008 23 In- degree LOCWMC1 LCOM Out- degree 0.000.820.640.750.39 NameLOC In- degre e Out- degre e 1org.apache...FunctionEval3641354 2org.jgraph.GPGraphpad2,196130255 3com.jgraph.GPGraphpad2,200131253 4org.jgraph.GPGraphpad542209252 5 org.eclipse... ASTConverter4,5203223 6org.eclipse...JavaEditor1,368115220 7 net.sourceforge... GanttProject3,05598216 8 it.businesslogic... MainFrame7,17746204 9 org... InstConstraintVisitor1,6263197 10 org... ASTInstructionCompiler2,4491189
24
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Answers: summary of experiments [1/4] Q. 1 Do the in- and out-degree distributions of a component graph of a software system follow the power law? The in-degree follows the power law The out-degree does not follow the power law Mixture of the power-law distribution and the lognormal distribution 2008/3/26ASWEC 2008 24
25
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Answers: summary of experiments [2/4] Q. 2 Do the in- and out-degree distributions of a component graph for multiple software systems follow the power law? The in-degree follows the power law The out-degree does not follow the power law The similar results with that of single software systems 2008/3/26ASWEC 2008 25
26
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Answers: summary of experiments [3/4] Q. 3 Do the in- and out-degree distributions of subgraph of a component graph for software systems follow the power law? Depends on how the subgraph is created. Keyword-based subgraph has similar characteristics with the superset Related components likely share words Random-selection-based subgraph with small number of nodes has different characteristics Few edges exist. 2008/3/26ASWEC 2008 26
27
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Answers: summary of experiments [4/4] Q. 4 What aspects of components affects the in- and out-degree distribution of component graphs? In-degree relates to the roles of components Most of the components are used at the specific part Components with fundamental/general role are used from everywhere The more the size of component set grows, the more the value of in-degree becomes large. Out-degree relates to size/complexity of components Many components have reasonable size/complexity Some components may have relatively large size/complexity Extremely large components are unreasonable 2008/3/26ASWEC 2008 27
28
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Summary Component graphs are investigated to seek whether the in- and out-degree distribution follows the power-law As the results, following characteristics are revealed. The in-degree distribution follows the power-law The in-degree of a component relates to the role of the component The out-degree distribution does not follows the power-law The out-degree of a component relates to the size/complexity of the component Some sort of subgraph of a component graph have the same characteristics of degree distribution with the graph. Future works Explore the other types of component graph 2008/3/26ASWEC 2008 28
29
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2008/3/26ASWEC 2008 29
30
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University + 2008/3/26ASWEC 2008 30
31
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University SPARS_DB Repositories :pserver:anoncvs@cvs.apache.org:/home/cvspublic :pserver:anoncvs@cvs.netbeans.org:/cvs :pserver:anonymous@dev.eclipse.org:/home/eclipse :pserver:anonymous@cvs.sourceforge.net:/cvsroot/azureus :pserver:anonymous@cvs.sourceforge.net:/cvsroot/borg-calendar :pserver:anonymous@cvs.sourceforge.net:/cvsroot/care2002 :pserver:anonymous@cvs.sourceforge.net:/cvsroot/celldb :pserver:anonymous@cvs.sourceforge.net:/cvsroot/cycleatlas :pserver:anonymous@cvs.sourceforge.net:/cvsroot/fipa-os :pserver:anonymous@cvs.sourceforge.net:/cvsroot/freecbr :pserver:anonymous@cvs.sourceforge.net:/cvsroot/freemind :pserver:anonymous@cvs.sourceforge.net:/cvsroot/gals :pserver:anonymous@cvs.sourceforge.net:/cvsroot/ganttproject :pserver:anonymous@cvs.sourceforge.net:/cvsroot/gnucashtoqif :pserver:anonymous@cvs.sourceforge.net:/cvsroot/gnujpdf :pserver:anonymous@cvs.sourceforge.net:/cvsroot/goalseeker :pserver:anonymous@cvs.sourceforge.net:/cvsroot/hambo :pserver:anonymous@cvs.sourceforge.net:/cvsroot/hipergate :pserver:anonymous@cvs.sourceforge.net:/cvsroot/iharder :pserver:anonymous@cvs.sourceforge.net:/cvsroot/ij-plugins :pserver:anonymous@cvs.sourceforge.net:/cvsroot/ipodder :pserver:anonymous@cvs.sourceforge.net:/cvsroot/irate :pserver:anonymous@cvs.sourceforge.net:/cvsroot/ireport :pserver:anonymous@cvs.sourceforge.net:/cvsroot/jabref :pserver:anonymous@cvs.sourceforge.net:/cvsroot/jajuk :pserver:anonymous@cvs.sourceforge.net:/cvsroot/jatlas :pserver:anonymous@cvs.sourceforge.net:/cvsroot/javahmo :pserver:anonymous@cvs.sourceforge.net:/cvsroot/javanmea :pserver:anonymous@cvs.sourceforge.net:/cvsroot/jboss :pserver:anonymous@cvs.sourceforge.net:/cvsroot/jdbcexplorer :pserver:anonymous@cvs.sourceforge.net:/cvsroot/jdrawing :pserver:anonymous@cvs.sourceforge.net:/cvsroot/jedit :pserver:anonymous@cvs.sourceforge.net:/cvsroot/jgnash :pserver:anonymous@cvs.sourceforge.net:/cvsroot/jgraph :pserver:anonymous@cvs.sourceforge.net:/cvsroot/jkaiui :pserver:anonymous@cvs.sourceforge.net:/cvsroot/jose-chess :pserver:anonymous@cvs.sourceforge.net:/cvsroot/junitee :pserver:anonymous@cvs.sourceforge.net:/cvsroot/kt-dms :pserver:anonymous@cvs.sourceforge.net:/cvsroot/luntbuild :pserver:anonymous@cvs.sourceforge.net:/cvsroot/nwn-j3d :pserver:anonymous@cvs.sourceforge.net:/cvsroot/openantivirus :pserver:anonymous@cvs.sourceforge.net:/cvsroot/oreports :pserver:anonymous@cvs.sourceforge.net:/cvsroot/osexpress :pserver:anonymous@cvs.sourceforge.net:/cvsroot/proteinmusic :pserver:anonymous@cvs.sourceforge.net:/cvsroot/safepeer :pserver:anonymous@cvs.sourceforge.net:/cvsroot/sax :pserver:anonymous@cvs.sourceforge.net:/cvsroot/schemamap :pserver:anonymous@cvs.sourceforge.net:/cvsroot/sheltermanager :pserver:anonymous@cvs.sourceforge.net:/cvsroot/simpleprofiler :pserver:anonymous@cvs.sourceforge.net:/cvsroot/simplethread :pserver:anonymous@cvs.sourceforge.net:/cvsroot/sshtools :pserver:anonymous@cvs.sourceforge.net:/cvsroot/timcam :pserver:anonymous@cvs.sourceforge.net:/cvsroot/ultravnc :pserver:anonymous@cvs.sourceforge.net:/cvsroot/utbot :pserver:anonymous@cvs.sourceforge.net:/cvsroot/vnc-tight :pserver:anonymous@cvs.sourceforge.net:/cvsroot/web-portal :pserver:anonymous@cvs.sourceforge.net:/cvsroot/xmlhelper :pserver:anonymous@cvs.sourceforge.net:/cvsroot/xweb 2008/3/26ASWEC 2008 31 57 repositories / 750 modules
32
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University ASF Modules (projects) 107 modules ant ant-antidote avalon avalon-components avalon-excalibur avalon-logkit avalon- phoenix avalon-sandbox cocoon-1 cocoon-2-historical cocoon-2.0 cocoon-2.1 cocoon-2.2 cocoon-lenya db-commons db-commons-sandbox db-ojb db-torque jakarta-alexandria jakarta-bcel jakarta-bsf jakarta-cactus jakarta-commons jakarta- commons-sandbox jakarta-ecs jakarta-ecs2 jakarta-hivemind jakarta-jetspeed jakarta-jetspeed-2 jakarta-jmeter jakarta-log4j jakarta-log4j-sandbox jakarta-lucene jakarta-lucene-sandbox jakarta-ojb jakarta-oro jakarta-pluto jakarta-poi jakarta- regexp jakarta-servletapi jakarta-servletapi-4 jakarta-servletapi-5 jakarta-slide jakarta-struts jakarta-taglibs jakarta-taglibs-sandbox jakarta-tapestry jakarta-tomcat jakarta-tomcat-4.0 jakarta-tomcat-5 jakarta-tomcat-catalina jakarta-tomcat- connectors jakarta-tomcat-jasper jakarta-tomcat-service jakarta-tools jakarta- turbine-2 jakarta-turbine-3 jakarta-turbine-flux jakarta-turbine-fulcrum jakarta- turbine-jcs jakarta-turbine-jyve jakarta-turbine-orgami jakarta-turbine-stratum jakarta-turbine-tdk jakarta-turbine-torque jakarta-velocity jakarta-velocity-dvsl jakarta-velocity-tools jakarta-watchdog jakarta-watchdog-4.0 james-server logging- log4j logging-log4j-attic logging-log4j-sandbox maven maven-components maven- jelly-tags maven-plugins maven-plugins-sandbox maven-scm maven-wagon ws- admin ws-axis ws-fx ws-jaxme ws-juddi ws-soap ws-wsif ws-wsil ws-wsrp4j ws- xmlrpc xml-admin xml-axkit xml-batik xml-cocoon2 xml-commons xml-contrib xml- crimson xml-fop xml-forrest xml-security xml-stylebook xml-xalan xml-xang xml- xerces xml-xindice xml-xmlbeans 2008/3/26ASWEC 2008 32
33
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Supplements for Exp.3 / More Results for Keyword-based subset [1/2] Histogram of α / R* 2 of in-degree The keywords submitted to demo.spars.info as search queries 2008/3/26ASWEC 2008 33
34
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Supplements for Exp.3 / More Results for Keyword-based subset [2/2] X-axis: the number of components in subset 2008/3/26ASWEC 2008 34
35
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Supplements for Exp.4 / Cross-correlation 2008/3/26ASWEC 2008 35
36
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Supplements for Exp.4 / Scatter plot [1/2] 2008/3/26ASWEC 2008 36
37
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Supplements for Exp.4 / Scatter plot [2/2] 2008/3/26ASWEC 2008 37
38
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Discussion Generative models of a power-law graph If a node is added to a graph, the nodes with large degree tend to get the edge to the new node. “rich get richer” Meanings for component graphs If a new component is added to (developed for) a software system, the new component uses the component that is already used by many components The members of frequently-used components hardly change even if the software development proceeds If the member changes, it means that the fundamental structure (design, architecture) of the software is changed 2008/3/26ASWEC 2008 38
39
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Clustering Coefficient (of a node) The connectivity between the nodes connected to the target node C i =n/ k i C 2 n : # edges between the nodes that connects to node i k i : the degree of node i 2008/3/26ASWEC 2008 39 3/ 4 C 2 = 0.5
40
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Clustering Coefficient and Hierarchical Organization C(k): average clustering coefficient of nodes whose degree= k A graph has hierarchical organization if C(k) ~ k -1 2008/3/26ASWEC 2008 40 E. Ravasz, A.L. Barabasi, "Hierarchical organization in complex networks",Physical Review E, vol 67, 261121, 2003.
41
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Clustering Coefficient and Hierarchical Organization / Result 2008/3/26ASWEC 2008 41 JDK SPARS_DB KWD1K(“labels”) C(k) ~ k 0.80 C(k) ~ k 0.93 C(k) ~ k 0.96
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.