1 Overview and Evaluation of Java Component Search System SPARS-J Reishi Yokomori , Hideo Nishi, Fumiaki Umemori**, Tetsuo Yamamoto*, Makoto Matsushita**,

1 Overview and Evaluation of Java Component Search System SPARS-J Reishi Yokomori **, Hideo Nishi**, Fumiaki Umemori**, Tetsuo Yamamoto*, Makoto Matsushita**, Shinji Kusumoto **Katsuro Inoue** *Japan Science and Technology Agency **Osaka University

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2 Outline Motivation and research aim SPARS-J SPARS-J (Outline) Ranking method System architecture Experimental evaluation for SPARS-J Conclusion and Future work

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 3 Motivation A library of software is a fount of wisdom. Reuse of software components improves productivity and quality. Example of components: source code, document ….. Maintenance activity is more easier with the library. However, a collection of software is not utilized effectively. A developer doesn’t know an existence of desirable components. Although there are a lot of components, these components are not organized. We need a system to manage components and to search suitable component.

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 4 Research aim We build a system which have functions as follows searches component, which is suitable for user’s request manages the component information Targets Intranet Closed software development environment inside a company Internet Source code from a lot of open-source-software community –Source Forge, Jakarta Project. etc.

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 6 SPARS-J (Software Product Archive ， analysis and Retrieval System for Java) SPARS-J is Java Source Code Search System analyzes and extracts components automatically. Component: a source code of class or interface builds a database based on the analysis. Use-Relation, Similar Components, Metrics,..... provides keyword-search. Three ranking methods: KR, CR, KR+CR Analysis information –Components using (used by) the component –Package hierarchy

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 7 Ranking method 1. Component used repeatedly (by important component) – Ranking based on use relation between components 2. Component suited to a user request – Frequency of word appearance (arranged TF-IDF) – A class-name, a method-name,..., have special importance 3. Integrated Ranking – Components prized both in KR and CR are very important – Integration by Borda Count method Ranking search results Keyword Rank (KR) Component Rank (CR) KR+CR Rank ( KR+CR)

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 8 System architecture of SPARS-J (Building a Database) Component analysis extracts components indexes each appeared word extracts use-relation clustering similar components calculates Component rank Database Component retrieval User interface Library (Java source files) store provide Component Information Indexes Use-Relation Clustered Component Graph Component Rank

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 9 System architecture of SPARS-J (Searching Components) Component analysis Component retrieval searches components from Indexes sorts components by CR, KR, KR+CR User User interface Query Result analyzes query Analysis condition Keywords displays search results Additional Information Source Code Use Relation Similar Components Metrics etc......... Request Components List Database Component Information Indexes Use-Relation Clustered Component Graph Component Rank Query Information

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 10 Screenshot (Top page)

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 11 Screenshot (Search results)

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 12 Screenshot (Source code)

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 13 Screenshot (Similar components)

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 14 Screenshot (Using the component)

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 15 Screenshot (Used by the component)

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 16 Screenshot (Package browsing)

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 18 Experimental Evaluation 1. Comparison of each ranking method in SPARS-J We investigate the best ranking method CR vs. KR vs. CR+KR 2. Comparison with other search engines We verify SPARS-J’s effectiveness as a software component search engine. vs. Google, Namazu 3. Application of SPARS-J in actual development environment We confirm that SPARS-J is useful to management and understanding of software.

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 19 Experiment 1: Comparison of ranking method in SPARS-J Purpose of Experiment We investigate the best method among 3 ranking method in SPARS-J. 1. CR (Based on Use-relation) 2. KR (Based on TF-IDF) 3. CR+KR ( Integrating 1 & 2) Preparation Database from Java source codes publicly available About 140,000 files from JDK, SourceForge, etc..... Keywords 10 queries assumed development of simple system

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 20 Experiment 1: Comparison of ranking method in SPARS-J Criterion of Evaluation Precision of components in the top 10 Result ： The percentage of suitable components –User tends to look at only a higher ranked results. –High precision means that there are many useful components in range of user’s visibility. Ndpm ： The percentage of the component pair which differs rank order between two ranking methods. –We define user‘s ideal ranking in advance, and calculate ndpm. »The quantitative indicator which shows a distance from ideal –Ndpm considers all the components in a search result. »Its distance becomes large when required components are ranked low.

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 21 Result (Experiment 1) Keyword CRKRCR+KRCRKRCR+KR A 1110.0360.0480.037 B 1110.1940.2610.221 C 0.5 0.1330.1170.092 D 0.40.90.80.1230.2000.189 E 0.4 0.2080.1920.194 F 0.2 0.184 0.160 G 0.9110.0810.1030.080 H 10.810.0470.1090.052 I 0.60.7 0.2100.3240.267 J 0.50.7 0.2190.2430.114 Ave.0.650.720.73 0.1430.1780.141 Precision Ndpm

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 22 Consideration (Experiment 1) By Paired-Difference T-Test, we have confirmed that following difference are significant at the 5% level. Precision: KR,CR+KR ≫ CR Ndpm: CR,CR+KR ≫ KR Characteristic of each method CR CR generally ranks components in desirable order. Higher ranked components are important but often have no relevance to keyword. KR KR generally appreciates components which have strong relevance. In required component, keyword doesn’t always appear with high frequency. CR+KR CR+KR has good result at both precision and ndpm. CR+KR has the best of both ranking We use CR+KR as a default ranking method.

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 23 Experiment 2: Comparison with other search engines Purpose of Experiment We verify SPARS-J’s effectiveness as a software component search engine. 1. SPARS-J Database from 140,000 files (Same as Experiment 1) We use CR+KR as ranking method. 2. Google Famous web search Engine Input queries to www.google.co.jp 3. Namazu Full-text search system for documents. Namazu uses TF-IDF to rank documents. Database from 140,000 files (Same files as SPARS-J) Preparation Keywords: 10 queries (Same as Experiment 1) Criterion of Evaluation: Precision of the top 10 Result

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 24 Result (Experiment 2) keyword SPARS- J GoogleNamazu A 10.70.9 B 10.40.6 C 0.50.30.4 D 0.80.30.6 E 0.40.10.3 F 0.200.1 G 10.30.4 H 10.10.2 I 0.70.4 J 0.70.40.7 Ave.0.730.30.46 Precision of the top 10 result

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 25 Consideration (Experiment 2) By Paired-Difference T-Test, we have confirmed that following difference are significant at the 5% level. Precision SPARS-J ≫ Namazu ≫ Google (*) SPARS-J (CR, KR, CR+KR) ≫ Namazu Consideration of Results Google In the result, there are many pages other than an explanation of Java source code. Performance depends on how much description there are. Namazu Since the datasets consists of only source codes, the result is better than Google. Without characteristics of Java programs, we cannot get good results. For searching software components, SPARS-J is more useful than other search engines.

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 26 Experiment 3: Application of SPARS-J in actual development environment Purpose of Experiment We confirm that SPARS-J is useful to management and understanding of software resource. Criterion of Evaluation Qualitative evaluation about SPARS-J Preparation We set up SPARS-J to a company. 7 employees use SPARS-J for two weeks. They are all engaged in the software development and the maintenance activity. We carry out a questionnaire survey about SPARS-J

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 27 Result (Experiment 3) Questionnaire Item ＼ examinee ABCDEFGMode Package Browser 4 555 433 5 Similar components 4 55 243 5,4 Components used by the class5555555 Components using the class5 1 55555 Metrics of the class 14124 5 4,1 Download of the class 13 55 2 55 Contribution to reduction of time cost 3 55 341 5,3 Improvement for software quality5 333413 Understanding of software resource 31 5 3 5 21 5,3,1 View-ability of the component-list view 44 55 33 55 View-ability of the highlighted source code 3 5555555 ( [Useful or Used repeatedly] 5 4 3 2 1 [Useless or seldom Used] ）

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 28 Consideration (Experiment 3) Highly rated questionnaire items Reference by package browser Reference by similar components Reference by components using (used by) the class View-ability of the component list view and source code Activities realized by using SPARS-J Listing of applications which uses certain component Impact analysis at reediting components

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 29 Consideration (Experiment 3) Other comment Response speed is very quick, and we have felt no stress. Since it is not necessary to install in a client, sharing of software components is easy. SPARS-J can support maintenance work effectively. Easier grasp of software components

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 30 Conclusion and Future works Conclusion We construct software component search system SPARS-J. Search engine for Java source code Ranking components with consideration of characteristics. Provision of useful relevant information. We verified the validity of SPARS-J based on experimental evaluation. SPARS-J is useful to search software components. SPARS-J is very helpful to grasp and manage components. Future works The quantitative evaluation other than ranking performance Support for other software component

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 31

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 34 Outline Motivation and research aim SPARS-J Outline System architecture Ranking method Each part Analysis part Retrieval part User Interface Experiment Conclusion and Future work

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 35 Component analysis part Extract component and its information from a Java source file The process Extract a component Index the component Extract use relations Clustering similar components Rank components based on use relations (CR method)

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 36 Extract and index a component Extracting component Find class or interface block in a java source file Location information in the file (start line number, end line number) Indexing Extract index key from the component Index key ： a word and the kind of it No reserved words are extracted Count frequency in use of the word wordkind Sort Class name quicksort Comment quicksort Method name pivot Variable name quicksort Method call ：： Index key public final class Sort { /* quicksort */ private static void quicksort(…) { int pivot; ： quicksort(…); } 1 1 1 1 2 ： frequency

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 37 Extract use relations Extract use relations among components using semantic analysis Make component graph from use relations Node: component Edge: use relation Inheritance Interface implementation Variable type Instance creation Field access Method call The kind of use relation public class Test extend Data{ ： public static void main(…) { ： Sort.quicksort(super.array); ： } Sort Data Test Component graph Inheritance Field access Method call

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 38 Similar component Similar component is copied component or minor modified component We merge similar components into single component Merged component have use relations that all component before merging have C BF AD G E Component graph BF AD E CG Clustered component graph C BF AD G E

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 39 Clustering components We measure characteristics metrics to merge components The difference ratio of each component metrics Metrics complexity –The number of methods, cyclomatic, etc. –represent a structural characteristic Token-composition –The number of appearances of each token –represent a surface characteristic

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 40 Ranking based on use relation Component Rank (CR) Reusable component have many use relation The example of use is much General purpose component Sophisticated component We measure use relation quantitatively, and rank components The component used by many components is important The component used by important component is also important Katsuro Inoue, Reishi Yokomori, Hikaru Fujiwara, Tetsuo Yamamoto, Makoto Matsushita, Shinji Kusumoto: "Component Rank: Relative Significance Rank for Software Component Search", ICSE, Portland, OR, May 6, 2003.

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 41 Propagating weights AB C 0.340.33 0.17 0.33 Ad-hoc weights are assigned to each node

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 42 Propagating weights AB C 0.330.17 0.5 0.175 0.170.5 The node weights are re-defined by the incoming edge weights

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 43 Propagating weights AB C 0.50.175 0.345 0.25 0.1750.345 We get new node weights

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 44 Propagating weights AB C 0.40.2 0.4 0.2 0.4 We get stable weight assignment next-step weights are the same as previous ones Component Rank : order of nodes sorted by the weight

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 46 Component retrieval part Search components from database, rank components The process Search components Ranking suited to a user request Aggregate two ranks (CR and KR)

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 47 Search components Search query Words a user input The kind of an index word, package name Components contain given query are searched from Database

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 48 Ranking suited to a user request Keyword Rank (KR) Components which contain words given by a user are searched Rank components using the value calculated from index word weight Index word weight –Many frequency in use of a component –A word contained particular components –A word represent the component function such as Class name Sort the sum of all given word weight TF-IDF weighting using full-text search engine

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 49 Calculation of KR value Calculate weight W ct with component c word t TF i ： The frequency with which a kind i of word t occurs in component c IDF ： the total number of components / the number of components containing word t kw i ： Weight of a kind i KR value is the sum of all word W ct the kind of a word weig ht Class name200 Interface name50 Method name200 Package name50 Import30 Method call10 Field access10 Variable type10 Instance creation10 Local var access1 Comment30 Doc comment50 Line comment10 String1

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 50 Aggregate two ranks Aggregate two ranks KR and CR Aggregation method Borda Count method known a voting system Use for single or multiple-seat elections This form of voting is extremely popular in determining awards SPARS-J Rank components both KR and CR Using KR and CR, the component that be suitable user’s request, reusable and sophisticated

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 51 Borda Count method There are 10 voters and 5 candidates (from A to E) Each voter rank candidates 1 point for last place, 2 points for second from last place …, and N points for first place 1st=5points ， 2nd=4points ， … A ： 15+3+6+4=28points B ： 38points C ： 38points D ： 22points E ： 26points 1s t 2n d 3r d 4t h 5t h 3ABCDE 3EBCDA 2CBAED 2CDBAE 1s t 3r d 4t h 5t h BCADE Aggregation

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 53 User interface Receive a user’s query and provide the search results through Web browser Microsoft Internet Explore, Mozilla, etc. The process Parse query word and the search condition Show rank ordered results Show analyzed information of the component Used by/Using the component Metrics

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 54 Analyzed information A component information are as follows Metrics The number of method, variable LOC, cyclomatic Etc. (measurable metrics in the component itself) Components used by/using the component Show lists of nodes followed use relation Components that are similar to the component Show lists of similar components

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 55 Package browsing The naming structure for Java packages is hierarchical A user can search lists of components in same package of a component easily

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 57 Experiment(1/2) Comparison with Google Register about 130,000 components get from Internet Query words ‘calculator applet’ and ‘chat server client’ Calculate relevance ratio of 10 rank higher Relevance: The component is reusable source code Google is a web search engine… Add ‘java source’ term to the query words Follow one link from the result web page

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 58 Experiment(2/2) Example 1 ： ”calculator applet” SPARS-J 9 hits 7 suited components Example 2 ： ”chat server client” SPARS-J 69 hits 57 suited components Using SPARS-J, suited component is high order SAPRS-JGoogleSPARS-JGoogle orderRelev ance RatioRelev ance RatioRelev ance RatioRelev ance ratio 1 ○ 1 ○ 1 ○ 1×0 2 ○ 1×0.5 ○ 1×0 3 ○ 1 ○ 0.67 ○ 1×0 4 ○ 1×0.5 ○ 1×0 5 ○ 1 ○ 0.6 ○ 1×0 6 ×0.83 ○ 0.67 ○ 1×0 7 ○ 0.86×0.57 ○ 1 ○ 0.14 8 ×0.75 ○ 0.63 ○ 1×0.13 9 ○ 0.78×0.56 ○ 1 ○ 0.22 10 --×0.5 ○ 1 ○ 0.3 Example1Example2

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 59 Conclusion and Future work We developed component search engine SPARS-J Using SPARS-J, retrieval of components used well is enabled easily. Future work Morphological analysis of Index keyword Collaborative filtering Investigate best ranking method The value of weight Aggregation ranks Evaluation of SPARS-J Usability

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 60 End

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 61 Component graph AB C ED F G IH System X System Y component use relation

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 62 Weight of nodes AB C ED F G IH System X System Y sum of all node weights = 1... (1) weight of node represents significance of node 0.1 0.2 0.1 0.2 0.05

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 63 Weights of edges A 0.2 0.05 B 0.2 0.05 0.15 0.4 d=1/4 d: distribution ratio Node weight is distributed to each outgoing edge Edge weights are collected at the destination node sum of all outgoing edge weights = origin node weight... (2) sum of all incoming edge weights = destination node weight... (3)

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 64 Definition of weights Under constraints (1)~(3), we have a simultaneous equation =. D t : transposed matrix of distribution ratios W : node weight vector This simultaneous equation can be solved by propagating node weight through edges in the graph

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 65 Pseudo use relation ABC Weight computation does not always converge Add a pseudo edge from a node to another, if there is no 'real' edge Distribution ratios: pseudo edges << real edges

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 66 Markov model Component rank model can be considered as a Markov Chain of user's focus User's focus moves from one component to another along a use relation at a fixed time duration Node weight represents the existence probability of the user's focus at infinite future 0.01 0.020.01 0.03 0.05 0.001 0.1

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 67 Related Works Markov models of documentation traversal Influence Weight: impact factor of journal publication thought incoming references Page Rank: weight of HTML in the Internet through incoming web links Explicit use relations No clustering (important for software products) Measurement reusability of components or interfaces Use various characteristic metrics Indirect indicator of reusability Our approach directly reflects usage of components

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 68 1. 利用関係に基づく順位付け Component Rank （ CR ）法利用関係から部品の利用実績を評価し，順位付けする多くの部品から利用されている部品は重要重要な部品から利用されている部品もまた重要多く利用される部品や，重要な箇所で利用される部品に大きな評価値が与えられる使用例が多く，汎用的な部品が上位に来る

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 69 2. キーワード出現頻度に基づく順位付け Keyword Rank （ KR ）法文書検索システムで用いられる TF-IDF 法を改変部品を特徴付ける索引キーに適当な重みを与える部品に繰り返しあらわれる索引キー希少で，特定の部品に偏ってあらわれる索引キークラス定義名など部品を象徴するトークン種類の索引キー重みの総和を評価値として順位付けクエリと適合する部品が上位に来る

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 70 3.CR と KR を統合した順位付け各順位付けの観点 1. 部品の使用例の多さ，汎用さ – CR 法 2. クエリと部品内容の適合度 – KR 法順位を統合して，両面で優れている部品を検索結果の上位に表示する Borda の手法部品検索部で行うため高速な方法が望ましい

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 71 部品群グラフをもとにした繰り返し計算計算手順 1. 各頂点に適当な重みを与える – 重みの総和は 1 2. 各有向辺の重みを求める – 頂点の重みを，出ていく辺で分配する 3. 各頂点の重みを再計算 – 頂点に入ってくる辺の重みの総和を，その頂点の重みとして再定義する 4. 頂点の重みが収束するまで， 2.3. を繰り返し計算する 5. 収束した頂点の重みを，その頂点に対応する部品群の CR 値とする – 部品の評価値は属する部品群の CR 値とする C 1 0.334 C 2 0.333 C 3 0.333 C 1 0.334 C 2 0.333 C 3 0.333 v 1 ×50% v 2 ×100%v 3 ×100% C1C1 C2C2 C3C3 0.167 0.333 C 1 0.333 C 2 0.167 C 3 0.500 C1C1 C2C2 C3C3 0.1665 0.1670.500 C 1 0.500 C 2 0.1665 C 3 0.3335 C 1 0.400 C 2 0.200 C 3 0.400 0.200 0.400 CR 値の計算

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 72 研究の目的ソフトウェア部品の効率的な検索により，既存部品を他のシステム開発で用いる再利用支援部品の関連の掌握によるプログラム理解や保守支援 SPARS-J に対して実験的評価を行い，システムの有効性を検証する

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 73 評価にあたっての問題点検索システムの順位付け性能評価評価用のテストコレクションを用いれば容易ソフトウェア部品検索システムを対象としたテストコレクションは存在していない言語，部品の単位等の違いにより，テストコレクションの構築が非常に困難独自の評価実験を適用する

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 74 実験の内容 1. 他の検索システムとの比較 – 一般的な検索システムとの比較により， SPARS- J のソフトウェア部品検索システムとしての有効性を検証する 2. SPARS-J の各順位付け手法の比較 – 各順位付け手法を比較し，最も妥当な順位付け手法を調査する 3. 実際の開発環境における SPARS-J の適用実験 – 企業において実際に利用してもらうことで，ソフトウェアの管理や理解に有用であることを確認する

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 75 実験１. 他の検索システムとの比較実験の目的一般的な検索システムとの比較により， SPARS-J のソフトウェア部品検索システムとしての有効性を検証する比較対象 Google ： Web ページ検索システムで，様々な目的での検索に利用される Namazu ：信頼性の高い日本語全文検索システム評価尺度適合率：検索結果のうちの適合部品数の割合割合が高いほど，検索結果に利用できる部品が多く含まれる

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 76 適合率の対象検索結果全てを対象として適合率を求めても意味がない適合部品数が同じであれば，上位にあっても下位にあっても同じ評価となってしまう検索結果上位の部品について適合率を求めるべきである Web ページ検索では，検索結果の最初の１ページ（ 10 件）目に該当文書が見つからない場合，２ページ目を検索するよりは検索キーワードを変更する傾向がある † 検索結果の上位 10 件の部品に対する適合率を求める † Amanda Spink, B. J. Jansen, D. Wolfram, T. Saracevic:”From E-Sex to E-Commerce: Web Search Changes” IEEE Computer,Vol.35,No.3,pp.107-109,Mar(2002).

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 77 実験１. 他の検索システムとの比較準備データベース SPARS-J と Namazu に関しては， JDK および Web 上で公開されているソースコード（約 14 万個のソースファイル）で構築 Google は公開されている検索エンジンをそのまま用いる検索キーワード簡単なシステムの開発を想定したクエリを 10 個用意する手順各検索システムの検索結果上位 10 件の部品を対象として，適合率を求める

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 78 実験１の結果 keyword SPARS- J GoogleNamazu A 10.70.9 B 10.40.6 C 0.50.30.4 D 0.80.30.6 E 0.40.10.3 F 0.200.1 G 10.30.4 H 10.10.2 I 0.70.4 J 0.70.40.7 Ave.0.730.30.46 各検索システムの適合率の比較

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 79 実験１の評価対応のある平均値の差の検定有意水準５％で以下の有意差が見られた適合率 SPARS-J ≫ Namazu ≫ Google

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 80 実験１の考察 Google Web ページ検索システムであり Java ソフトウェア部品に関係するページ以外のものも多く含まれているので，検索結果を絞りきれなかった Namazu SPARS-J とデータベースは同じ Google の結果と比較して，検索結果を絞ることができたと考えられる日本語全文検索システムでありソースコードの構文解析を行っておらず， Java ソースコードの特性を考慮していない SPARS-J は他の検索システムより，ソフトウェア部品を検索するときに有用であるデータベースによる差だと考えられる順位付け性能による差だと考えられる

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 81 実験２. SPARS-J の各順位付け手法の比較実験目的各順位付け手法を比較し，最も妥当な順位付け手法を調査する SPARS-J では３種類の評価手法による順位付け機能を実現 1. 利用関係に基づく順位付け手法 2. キーワードの出現頻度に基づく順位付け手法 3. 1.2. の手法を統合した順位付け手法評価尺度

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 82 1. 利用関係に基づく順位付け Component Rank （ CR ）法利用関係から部品重要度を評価し，順位付けする多くの部品から利用されている部品は重要重要な部品から利用されている部品もまた重要多く利用される部品や，重要な箇所で利用される部品に大きな評価値が与えられる使用例が多く，汎用的な部品が上位に来る

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 83 2. キーワード出現頻度に基づく順位付け Keyword Rank （ KR ）法文書検索システムで用いられる TF-IDF 法を改変部品を特徴付ける索引キーに適当な重みを与える部品に繰り返しあらわれる索引キー希少で，特定の部品に偏ってあらわれる索引キークラス定義名など部品を象徴するトークン種類の索引キー重みの総和を評価値として順位付けクエリと適合する部品が上位に来る

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 84 3.CR と KR を統合した順位付け各順位付けの観点 1. 部品の使用例の多さ，汎用さ – CR 法 2. クエリと部品内容の適合度 – KR 法順位を統合して，両面で優れている部品を検索結果の上位に表示する部品検索部で行うため高速な方法が望ましい

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 85 3.CR と KR を統合した順位付け Borda の手法順位に対して評価点を割り当て，その合計点をもとに昇順にソート例 ) 部品群 = { A, B, C, D, E } 以降， CR と KR を統合した順位付けを CR+KR と呼ぶことにする CRKR 1位1位 AD 2位2位 EC 3位3位 CA 4位4位 BB 5位5位 DE CRKR 合計点 A134 B448 C325 D516 E257 統合順位 1位1位 A 2位2位 C 3位3位 D 4位4位 E 5位5位 B CR と KR の順位各部品群の評価点統合された順位

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 86 実験２. SPARS-J の各順位付け手法の比較実験目的各順位付け手法を比較し，最も妥当な順位付け手法を調査する SPARS-J では３種類の評価手法による順位付け機能を実現 1. 利用関係に基づく順位付け手法（ CR ） 2. キーワードの出現頻度に基づく順位付け手法（ KR ） 3. 1.2. の手法を統合した順位付け手法（ CR+KR ）評価尺度適合率：検索結果のうちの適合部品数の割合割合が高いほど，検索結果に利用できる部品が多く含まれる ndpm 値：ユーザの順位付けとシステムの順位付けの違い値が小さいほど，理想的な順位付けと言える

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 87 ndpm 値の計算方法検索文書集合の全てのペア（ d,d’ ）における，システムとユーザで順位付けが異なるペア数（ m ）の割合文書集合の要素数を n とすると計算例

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 88 実験２. SPARS-J の各順位付け手法の比較準備データベース（実験１と同じ） 14 万個のソースコード群から構築検索キーワード（実験１と同じ） 10 個のクエリを用いる手順各順位付け手法の検索結果上位 10 件の部品を対象として適合率を求める各順位付け手法による検索結果の全ての部品を対象として，ユーザの想定する理想的な順位付けとの ndpm 値求める

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 89 実験２の結果 keyword CRKR CR+K R CRKR CR+K R A 1110.0360.0480.037 B 1110.1940.2610.221 C 0.5 0.1330.1170.092 D 0.40.90.80.1230.2000.189 E 0.4 0.2080.1920.194 F 0.2 0.184 0.160 G 0.9110.0810.1030.080 H 10.810.0470.1090.052 I 0.60.7 0.2100.3240.267 J 0.50.7 0.2190.2430.114 Ave.0.650.720.73 0.1430.1780.141 適合率 ndpm 値

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 90 実験２の評価対応のある平均値の差の検定有意水準５％で以下の有意差が見られた適合率 KR,CR+KR ≫ CR ndpm 値 CR,CR+KR ≫ KR

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 91 実験２の考察 CR 法全体的にユーザの想定した順位で並んでいる傾向がある KR 法上位には適合部品が多く存在しやすい CR+KR 法適合率と ndpm 値の両方で優れた順位付けを行うことができた CR 法と KR 法の両方で高順位になったものが上位に順位付けされていると考えられる

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 92 実験３. 企業内のソフトウェアに対する適用実験実験目的企業において実際に利用してもらうことで，ソフトウェアの管理や理解に有用であることを確認する実験内容 SPARS-J ついての定性的な評価企業内のソフトウェア開発・保守を行っている従業員７名に対して， SPARS-J についてのアンケートを実施

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 93 アンケートの結果 ABCDEFG 最頻値パッケージブラウザの利用 4 555 433 5 同グループのクラスの参照 4 55 243 5,4 検出されたクラスを利用しているクラスの参照 555555 5 検出されたクラスが利用しているクラスの参照 5 1 5555 5 クラスのメトリクス値 14124 5 4,1 ソースコードがダウンロード機能 13 55 2 5 5 時間的コストの削減 3 55 341 5,3 ソフトウェア品質の向上 5 33341 3 企業内のソフトウェア把握 31 5 3 5 21 5,3,1 検索結果一覧表示の見やすさ 44 55 33 5 5 ハイライト表示の見やすさ 3 555555 5 （良 5 4 3 2 1 悪）

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 94 アンケートの評価特に評価の高かった点パッケージブラウザの利用同グループの参照利用・被利用クラスの参照検索結果一覧・ハイライト表示の見やすさ SPARS-J を用いることで可能になること部品を利用しているアプリの把握部品改訂時の影響範囲の調査

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 95 実験３の考察ソフトウェア部品の把握が容易となるということを意味しており，保守作業の支援に繋がっていると言えるその他の感想検索速度が速く，ストレスを感じない前もって索引語を抽出しているためクライアントにインストールする必要がなく，ソフトウェア部品を共有できる共有のデータベースを構築するため SPARS-J はソフトウェア部品の検索に有用であり，また部品の把握や管理にも非常に役に立つシステムである

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 96 まとめと今後の課題ソフトウェア部品検索システム SPARS-J の実験的評価他の検索システムとの比較 Google と Namazu より優れている SPARS-J の各順位付け手法の比較 CR ・ KR それぞれ特徴があり，それらを統合することで性能が向上した企業内のソフトウェアに SPARS-J を適用した保守を行う時に非常に有効であった今後の課題再現率の調査順位付け性能以外の観点からの定量的評価他のソフトウェア部品への対応

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 97 END The End

1 Overview and Evaluation of Java Component Search System SPARS-J Reishi Yokomori , Hideo Nishi, Fumiaki Umemori**, Tetsuo Yamamoto*, Makoto Matsushita**,

Similar presentations

Presentation on theme: "1 Overview and Evaluation of Java Component Search System SPARS-J Reishi Yokomori , Hideo Nishi, Fumiaki Umemori**, Tetsuo Yamamoto*, Makoto Matsushita**,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Overview and Evaluation of Java Component Search System SPARS-J Reishi Yokomori **, Hideo Nishi**, Fumiaki Umemori**, Tetsuo Yamamoto*, Makoto Matsushita**,

Similar presentations

Presentation on theme: "1 Overview and Evaluation of Java Component Search System SPARS-J Reishi Yokomori **, Hideo Nishi**, Fumiaki Umemori**, Tetsuo Yamamoto*, Makoto Matsushita**,"— Presentation transcript:

Similar presentations

About project

Feedback

1 Overview and Evaluation of Java Component Search System SPARS-J Reishi Yokomori , Hideo Nishi, Fumiaki Umemori**, Tetsuo Yamamoto*, Makoto Matsushita**,

Presentation on theme: "1 Overview and Evaluation of Java Component Search System SPARS-J Reishi Yokomori , Hideo Nishi, Fumiaki Umemori**, Tetsuo Yamamoto*, Makoto Matsushita**,"— Presentation transcript: