Download presentation
Presentation is loading. Please wait.
Published byErin Webb Modified over 9 years ago
1
1 Overview and Evaluation of Java Component Search System SPARS-J Reishi Yokomori **, Hideo Nishi**, Fumiaki Umemori**, Tetsuo Yamamoto*, Makoto Matsushita**, Shinji Kusumoto **Katsuro Inoue** *Japan Science and Technology Agency **Osaka University
2
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2 Outline Motivation and research aim SPARS-J SPARS-J (Outline) Ranking method System architecture Experimental evaluation for SPARS-J Conclusion and Future work
3
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 3 Motivation A library of software is a fount of wisdom. Reuse of software components improves productivity and quality. Example of components: source code, document ….. Maintenance activity is more easier with the library. However, a collection of software is not utilized effectively. A developer doesn’t know an existence of desirable components. Although there are a lot of components, these components are not organized. We need a system to manage components and to search suitable component.
4
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 4 Research aim We build a system which have functions as follows searches component, which is suitable for user’s request manages the component information Targets Intranet Closed software development environment inside a company Internet Source code from a lot of open-source-software community –Source Forge, Jakarta Project. etc.
5
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 5 Outline Motivation and research aim SPARS-J SPARS-J (Outline) Ranking method System architecture Experimental evaluation for SPARS-J Conclusion and Future work
6
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 6 SPARS-J (Software Product Archive , analysis and Retrieval System for Java) SPARS-J is Java Source Code Search System analyzes and extracts components automatically. Component: a source code of class or interface builds a database based on the analysis. Use-Relation, Similar Components, Metrics,..... provides keyword-search. Three ranking methods: KR, CR, KR+CR Analysis information –Components using (used by) the component –Package hierarchy
7
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 7 Ranking method 1. Component used repeatedly (by important component) – Ranking based on use relation between components 2. Component suited to a user request – Frequency of word appearance (arranged TF-IDF) – A class-name, a method-name,..., have special importance 3. Integrated Ranking – Components prized both in KR and CR are very important – Integration by Borda Count method Ranking search results Keyword Rank (KR) Component Rank (CR) KR+CR Rank ( KR+CR)
8
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 8 System architecture of SPARS-J (Building a Database) Component analysis extracts components indexes each appeared word extracts use-relation clustering similar components calculates Component rank Database Component retrieval User interface Library (Java source files) store provide Component Information Indexes Use-Relation Clustered Component Graph Component Rank
9
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 9 System architecture of SPARS-J (Searching Components) Component analysis Component retrieval searches components from Indexes sorts components by CR, KR, KR+CR User User interface Query Result analyzes query Analysis condition Keywords displays search results Additional Information Source Code Use Relation Similar Components Metrics etc......... Request Components List Database Component Information Indexes Use-Relation Clustered Component Graph Component Rank Query Information
10
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 10 Screenshot (Top page)
11
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 11 Screenshot (Search results)
12
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 12 Screenshot (Source code)
13
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 13 Screenshot (Similar components)
14
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 14 Screenshot (Using the component)
15
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 15 Screenshot (Used by the component)
16
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 16 Screenshot (Package browsing)
17
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 17 Outline Motivation and research aim SPARS-J SPARS-J (Outline) Ranking method System architecture Experimental evaluation for SPARS-J Conclusion and Future work
18
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 18 Experimental Evaluation 1. Comparison of each ranking method in SPARS-J We investigate the best ranking method CR vs. KR vs. CR+KR 2. Comparison with other search engines We verify SPARS-J’s effectiveness as a software component search engine. vs. Google, Namazu 3. Application of SPARS-J in actual development environment We confirm that SPARS-J is useful to management and understanding of software.
19
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 19 Experiment 1: Comparison of ranking method in SPARS-J Purpose of Experiment We investigate the best method among 3 ranking method in SPARS-J. 1. CR (Based on Use-relation) 2. KR (Based on TF-IDF) 3. CR+KR ( Integrating 1 & 2) Preparation Database from Java source codes publicly available About 140,000 files from JDK, SourceForge, etc..... Keywords 10 queries assumed development of simple system
20
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 20 Experiment 1: Comparison of ranking method in SPARS-J Criterion of Evaluation Precision of components in the top 10 Result : The percentage of suitable components –User tends to look at only a higher ranked results. –High precision means that there are many useful components in range of user’s visibility. Ndpm : The percentage of the component pair which differs rank order between two ranking methods. –We define user‘s ideal ranking in advance, and calculate ndpm. »The quantitative indicator which shows a distance from ideal –Ndpm considers all the components in a search result. »Its distance becomes large when required components are ranked low.
21
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 21 Result (Experiment 1) Keyword CRKRCR+KRCRKRCR+KR A 1110.0360.0480.037 B 1110.1940.2610.221 C 0.5 0.1330.1170.092 D 0.40.90.80.1230.2000.189 E 0.4 0.2080.1920.194 F 0.2 0.184 0.160 G 0.9110.0810.1030.080 H 10.810.0470.1090.052 I 0.60.7 0.2100.3240.267 J 0.50.7 0.2190.2430.114 Ave.0.650.720.73 0.1430.1780.141 Precision Ndpm
22
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 22 Consideration (Experiment 1) By Paired-Difference T-Test, we have confirmed that following difference are significant at the 5% level. Precision: KR,CR+KR ≫ CR Ndpm: CR,CR+KR ≫ KR Characteristic of each method CR CR generally ranks components in desirable order. Higher ranked components are important but often have no relevance to keyword. KR KR generally appreciates components which have strong relevance. In required component, keyword doesn’t always appear with high frequency. CR+KR CR+KR has good result at both precision and ndpm. CR+KR has the best of both ranking We use CR+KR as a default ranking method.
23
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 23 Experiment 2: Comparison with other search engines Purpose of Experiment We verify SPARS-J’s effectiveness as a software component search engine. 1. SPARS-J Database from 140,000 files (Same as Experiment 1) We use CR+KR as ranking method. 2. Google Famous web search Engine Input queries to www.google.co.jp 3. Namazu Full-text search system for documents. Namazu uses TF-IDF to rank documents. Database from 140,000 files (Same files as SPARS-J) Preparation Keywords: 10 queries (Same as Experiment 1) Criterion of Evaluation: Precision of the top 10 Result
24
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 24 Result (Experiment 2) keyword SPARS- J GoogleNamazu A 10.70.9 B 10.40.6 C 0.50.30.4 D 0.80.30.6 E 0.40.10.3 F 0.200.1 G 10.30.4 H 10.10.2 I 0.70.4 J 0.70.40.7 Ave.0.730.30.46 Precision of the top 10 result
25
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 25 Consideration (Experiment 2) By Paired-Difference T-Test, we have confirmed that following difference are significant at the 5% level. Precision SPARS-J ≫ Namazu ≫ Google (*) SPARS-J (CR, KR, CR+KR) ≫ Namazu Consideration of Results Google In the result, there are many pages other than an explanation of Java source code. Performance depends on how much description there are. Namazu Since the datasets consists of only source codes, the result is better than Google. Without characteristics of Java programs, we cannot get good results. For searching software components, SPARS-J is more useful than other search engines.
26
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 26 Experiment 3: Application of SPARS-J in actual development environment Purpose of Experiment We confirm that SPARS-J is useful to management and understanding of software resource. Criterion of Evaluation Qualitative evaluation about SPARS-J Preparation We set up SPARS-J to a company. 7 employees use SPARS-J for two weeks. They are all engaged in the software development and the maintenance activity. We carry out a questionnaire survey about SPARS-J
27
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 27 Result (Experiment 3) Questionnaire Item \ examinee ABCDEFGMode Package Browser 4 555 433 5 Similar components 4 55 243 5,4 Components used by the class5555555 Components using the class5 1 55555 Metrics of the class 14124 5 4,1 Download of the class 13 55 2 55 Contribution to reduction of time cost 3 55 341 5,3 Improvement for software quality5 333413 Understanding of software resource 31 5 3 5 21 5,3,1 View-ability of the component-list view 44 55 33 55 View-ability of the highlighted source code 3 5555555 ( [Useful or Used repeatedly] 5 4 3 2 1 [Useless or seldom Used] )
28
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 28 Consideration (Experiment 3) Highly rated questionnaire items Reference by package browser Reference by similar components Reference by components using (used by) the class View-ability of the component list view and source code Activities realized by using SPARS-J Listing of applications which uses certain component Impact analysis at reediting components
29
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 29 Consideration (Experiment 3) Other comment Response speed is very quick, and we have felt no stress. Since it is not necessary to install in a client, sharing of software components is easy. SPARS-J can support maintenance work effectively. Easier grasp of software components
30
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 30 Conclusion and Future works Conclusion We construct software component search system SPARS-J. Search engine for Java source code Ranking components with consideration of characteristics. Provision of useful relevant information. We verified the validity of SPARS-J based on experimental evaluation. SPARS-J is useful to search software components. SPARS-J is very helpful to grasp and manage components. Future works The quantitative evaluation other than ranking performance Support for other software component
31
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 31
32
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 32
33
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 33
34
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 34 Outline Motivation and research aim SPARS-J Outline System architecture Ranking method Each part Analysis part Retrieval part User Interface Experiment Conclusion and Future work
35
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 35 Component analysis part Extract component and its information from a Java source file The process Extract a component Index the component Extract use relations Clustering similar components Rank components based on use relations (CR method)
36
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 36 Extract and index a component Extracting component Find class or interface block in a java source file Location information in the file (start line number, end line number) Indexing Extract index key from the component Index key : a word and the kind of it No reserved words are extracted Count frequency in use of the word wordkind Sort Class name quicksort Comment quicksort Method name pivot Variable name quicksort Method call :: Index key public final class Sort { /* quicksort */ private static void quicksort(…) { int pivot; : quicksort(…); } 1 1 1 1 2 : frequency
37
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 37 Extract use relations Extract use relations among components using semantic analysis Make component graph from use relations Node: component Edge: use relation Inheritance Interface implementation Variable type Instance creation Field access Method call The kind of use relation public class Test extend Data{ : public static void main(…) { : Sort.quicksort(super.array); : } Sort Data Test Component graph Inheritance Field access Method call
38
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 38 Similar component Similar component is copied component or minor modified component We merge similar components into single component Merged component have use relations that all component before merging have C BF AD G E Component graph BF AD E CG Clustered component graph C BF AD G E
39
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 39 Clustering components We measure characteristics metrics to merge components The difference ratio of each component metrics Metrics complexity –The number of methods, cyclomatic, etc. –represent a structural characteristic Token-composition –The number of appearances of each token –represent a surface characteristic
40
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 40 Ranking based on use relation Component Rank (CR) Reusable component have many use relation The example of use is much General purpose component Sophisticated component We measure use relation quantitatively, and rank components The component used by many components is important The component used by important component is also important Katsuro Inoue, Reishi Yokomori, Hikaru Fujiwara, Tetsuo Yamamoto, Makoto Matsushita, Shinji Kusumoto: "Component Rank: Relative Significance Rank for Software Component Search", ICSE, Portland, OR, May 6, 2003.
41
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 41 Propagating weights AB C 0.340.33 0.17 0.33 Ad-hoc weights are assigned to each node
42
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 42 Propagating weights AB C 0.330.17 0.5 0.175 0.170.5 The node weights are re-defined by the incoming edge weights
43
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 43 Propagating weights AB C 0.50.175 0.345 0.25 0.1750.345 We get new node weights
44
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 44 Propagating weights AB C 0.40.2 0.4 0.2 0.4 We get stable weight assignment next-step weights are the same as previous ones Component Rank : order of nodes sorted by the weight
45
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 45 Outline Motivation and research aim SPARS-J Outline System architecture Ranking method Each part Analysis part Retrieval part User Interface Experiment Conclusion and Future work
46
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 46 Component retrieval part Search components from database, rank components The process Search components Ranking suited to a user request Aggregate two ranks (CR and KR)
47
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 47 Search components Search query Words a user input The kind of an index word, package name Components contain given query are searched from Database
48
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 48 Ranking suited to a user request Keyword Rank (KR) Components which contain words given by a user are searched Rank components using the value calculated from index word weight Index word weight –Many frequency in use of a component –A word contained particular components –A word represent the component function such as Class name Sort the sum of all given word weight TF-IDF weighting using full-text search engine
49
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 49 Calculation of KR value Calculate weight W ct with component c word t TF i : The frequency with which a kind i of word t occurs in component c IDF : the total number of components / the number of components containing word t kw i : Weight of a kind i KR value is the sum of all word W ct the kind of a word weig ht Class name200 Interface name50 Method name200 Package name50 Import30 Method call10 Field access10 Variable type10 Instance creation10 Local var access1 Comment30 Doc comment50 Line comment10 String1
50
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 50 Aggregate two ranks Aggregate two ranks KR and CR Aggregation method Borda Count method known a voting system Use for single or multiple-seat elections This form of voting is extremely popular in determining awards SPARS-J Rank components both KR and CR Using KR and CR, the component that be suitable user’s request, reusable and sophisticated
51
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 51 Borda Count method There are 10 voters and 5 candidates (from A to E) Each voter rank candidates 1 point for last place, 2 points for second from last place …, and N points for first place 1st=5points , 2nd=4points , … A : 15+3+6+4=28points B : 38points C : 38points D : 22points E : 26points 1s t 2n d 3r d 4t h 5t h 3ABCDE 3EBCDA 2CBAED 2CDBAE 1s t 3r d 4t h 5t h BCADE Aggregation
52
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 52 Outline Motivation and research aim SPARS-J Outline System architecture Ranking method Each part Analysis part Retrieval part User Interface Experiment Conclusion and Future work
53
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 53 User interface Receive a user’s query and provide the search results through Web browser Microsoft Internet Explore, Mozilla, etc. The process Parse query word and the search condition Show rank ordered results Show analyzed information of the component Used by/Using the component Metrics
54
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 54 Analyzed information A component information are as follows Metrics The number of method, variable LOC, cyclomatic Etc. (measurable metrics in the component itself) Components used by/using the component Show lists of nodes followed use relation Components that are similar to the component Show lists of similar components
55
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 55 Package browsing The naming structure for Java packages is hierarchical A user can search lists of components in same package of a component easily
56
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 56 Outline Motivation and research aim SPARS-J Outline System architecture Ranking method Each part Analysis part Retrieval part User Interface Experiment Conclusion and Future work
57
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 57 Experiment(1/2) Comparison with Google Register about 130,000 components get from Internet Query words ‘calculator applet’ and ‘chat server client’ Calculate relevance ratio of 10 rank higher Relevance: The component is reusable source code Google is a web search engine… Add ‘java source’ term to the query words Follow one link from the result web page
58
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 58 Experiment(2/2) Example 1 : ”calculator applet” SPARS-J 9 hits 7 suited components Example 2 : ”chat server client” SPARS-J 69 hits 57 suited components Using SPARS-J, suited component is high order SAPRS-JGoogleSPARS-JGoogle orderRelev ance RatioRelev ance RatioRelev ance RatioRelev ance ratio 1 ○ 1 ○ 1 ○ 1×0 2 ○ 1×0.5 ○ 1×0 3 ○ 1 ○ 0.67 ○ 1×0 4 ○ 1×0.5 ○ 1×0 5 ○ 1 ○ 0.6 ○ 1×0 6 ×0.83 ○ 0.67 ○ 1×0 7 ○ 0.86×0.57 ○ 1 ○ 0.14 8 ×0.75 ○ 0.63 ○ 1×0.13 9 ○ 0.78×0.56 ○ 1 ○ 0.22 10 --×0.5 ○ 1 ○ 0.3 Example1Example2
59
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 59 Conclusion and Future work We developed component search engine SPARS-J Using SPARS-J, retrieval of components used well is enabled easily. Future work Morphological analysis of Index keyword Collaborative filtering Investigate best ranking method The value of weight Aggregation ranks Evaluation of SPARS-J Usability
60
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 60 End
61
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 61 Component graph AB C ED F G IH System X System Y component use relation
62
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 62 Weight of nodes AB C ED F G IH System X System Y sum of all node weights = 1... (1) weight of node represents significance of node 0.1 0.2 0.1 0.2 0.05
63
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 63 Weights of edges A 0.2 0.05 B 0.2 0.05 0.15 0.4 d=1/4 d: distribution ratio Node weight is distributed to each outgoing edge Edge weights are collected at the destination node sum of all outgoing edge weights = origin node weight... (2) sum of all incoming edge weights = destination node weight... (3)
64
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 64 Definition of weights Under constraints (1)~(3), we have a simultaneous equation =. D t : transposed matrix of distribution ratios W : node weight vector This simultaneous equation can be solved by propagating node weight through edges in the graph
65
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 65 Pseudo use relation ABC Weight computation does not always converge Add a pseudo edge from a node to another, if there is no 'real' edge Distribution ratios: pseudo edges << real edges
66
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 66 Markov model Component rank model can be considered as a Markov Chain of user's focus User's focus moves from one component to another along a use relation at a fixed time duration Node weight represents the existence probability of the user's focus at infinite future 0.01 0.020.01 0.03 0.05 0.001 0.1
67
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 67 Related Works Markov models of documentation traversal Influence Weight: impact factor of journal publication thought incoming references Page Rank: weight of HTML in the Internet through incoming web links Explicit use relations No clustering (important for software products) Measurement reusability of components or interfaces Use various characteristic metrics Indirect indicator of reusability Our approach directly reflects usage of components
68
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 68 1. 利用関係に基づく順位付 け Component Rank ( CR )法 利用関係から部品の利用実績を評価し,順位付け する 多くの部品から利用されている部品は重要 重要な部品から利用されている部品もまた重要 多く利用される部品や,重要な箇所で利用される 部品に大きな評価値が与えられる 使用例が多く,汎用的な部品が上位に来る
69
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 69 2. キーワード出現頻度に基づ く順位付け Keyword Rank ( KR )法 文書検索システムで用いられる TF-IDF 法を改変 部品を特徴付ける索引キーに適当な重みを与える 部品に繰り返しあらわれる索引キー 希少で,特定の部品に偏ってあらわれる索引キー クラス定義名など部品を象徴するトークン種類の索引 キー 重みの総和を評価値として順位付け クエリと適合する部品が上位に来る
70
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 70 3.CR と KR を統合した順位付け 各順位付けの観点 1. 部品の使用例の多さ,汎用さ – CR 法 2. クエリと部品内容の適合度 – KR 法 順位を統合して,両面で優れている部品を 検索結果の上位に表示する Borda の手法 部品検索部で行うため高速な方法が望まし い
71
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 71 部品群グラフをもとにした繰り返し計算 計算手順 1. 各頂点に適当な重みを与える – 重みの総和は 1 2. 各有向辺の重みを求める – 頂点の重みを,出ていく辺で分配する 3. 各頂点の重みを再計算 – 頂点に入ってくる辺の重みの総和を,その頂点の重みとして再定 義する 4. 頂点の重みが収束するまで, 2.3. を繰り返し計算する 5. 収束した頂点の重みを,その頂点に対応する部品群の CR 値とす る – 部品の評価値は属する部品群の CR 値とする C 1 0.334 C 2 0.333 C 3 0.333 C 1 0.334 C 2 0.333 C 3 0.333 v 1 ×50% v 2 ×100%v 3 ×100% C1C1 C2C2 C3C3 0.167 0.333 C 1 0.333 C 2 0.167 C 3 0.500 C1C1 C2C2 C3C3 0.1665 0.1670.500 C 1 0.500 C 2 0.1665 C 3 0.3335 C 1 0.400 C 2 0.200 C 3 0.400 0.200 0.400 CR 値の計算
72
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 72 研究の目的 ソフトウェア部品の効率的な検索により, 既存部品を他のシステム開発で用いる再利用支援 部品の関連の掌握によるプログラム理解や保守支 援 SPARS-J に対して実験的評価を行い,システ ムの有効性を検証する
73
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 73 評価にあたっての問題点 検索システムの順位付け性能評価 評価用のテストコレクションを用いれば容易 ソフトウェア部品検索システムを対象としたテス トコレクションは存在していない 言語,部品の単位等の違いにより,テストコレクショ ンの構築が非常に困難 独自の評価実験を適用する
74
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 74 実験の内容 1. 他の検索システムとの比較 – 一般的な検索システムとの比較により, SPARS- J のソフトウェア部品検索システムとしての有 効性を検証する 2. SPARS-J の各順位付け手法の比較 – 各順位付け手法を比較し,最も妥当な順位付け 手法を調査する 3. 実際の開発環境における SPARS-J の適用実 験 – 企業において実際に利用してもらうことで,ソ フトウェアの管理や理解に有用であることを確 認する
75
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 75 実験1. 他の検索システムとの比較 実験の目的 一般的な検索システムとの比較により, SPARS-J のソフトウェア部品検索システムとしての有効性 を検証する 比較対象 Google : Web ページ検索システムで,様々な目的 での検索に利用される Namazu :信頼性の高い日本語全文検索システム 評価尺度 適合率:検索結果のうちの適合部品数の割合 割合が高いほど,検索結果に利用できる部品が多く含 まれる
76
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 76 適合率の対象 検索結果全てを対象として適合率を求めても 意味がない 適合部品数が同じであれば,上位にあっても下位 にあっても同じ評価となってしまう 検索結果上位の部品について適合率を求めるべきであ る Web ページ検索では,検索結果の最初の1ページ ( 10 件)目に該当文書が見つからない場合,2 ページ目を検索するよりは検索キーワードを変更 する傾向がある † 検索結果の上位 10 件の部品に対する 適合率を求める † Amanda Spink, B. J. Jansen, D. Wolfram, T. Saracevic:”From E-Sex to E-Commerce: Web Search Changes” IEEE Computer,Vol.35,No.3,pp.107-109,Mar(2002).
77
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 77 実験1. 他の検索システムとの比較 準備 データベース SPARS-J と Namazu に関しては, JDK および Web 上で公 開されているソースコード(約 14 万個のソースファイ ル)で構築 Google は公開されている検索エンジンをそのまま用い る 検索キーワード 簡単なシステムの開発を想定したクエリを 10 個用意す る 手順 各検索システムの検索結果上位 10 件の部品を対象 として,適合率を求める
78
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 78 実験1の結果 keyword SPARS- J GoogleNamazu A 10.70.9 B 10.40.6 C 0.50.30.4 D 0.80.30.6 E 0.40.10.3 F 0.200.1 G 10.30.4 H 10.10.2 I 0.70.4 J 0.70.40.7 Ave.0.730.30.46 各検索システムの適合率の比 較
79
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 79 実験1の評価 対応のある平均値の差の検定 有意水準5%で以下の有意差が見られた 適合率 SPARS-J ≫ Namazu ≫ Google
80
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 80 実験1の考察 Google Web ページ検索システムであり Java ソフトウェア部品に関係する ページ以外のものも多く含まれているので,検索結果を絞りきれな かった Namazu SPARS-J とデータベースは同じ Google の結果と比較して,検索結果を絞ることができたと考えられる 日本語全文検索システムでありソースコードの構文解析を行ってお らず, Java ソースコードの特性を考慮していない SPARS-J は他の検索システムより,ソフトウェア部品を検索 するときに有用である データベースによる差だと考えられる 順位付け性能による差だと考えられる
81
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 81 実験2. SPARS-J の各順位付け手法の比較 実験目的 各順位付け手法を比較し,最も妥当な順位付け手法を調査 する SPARS-J では3種類の評価手法による順位付け機能を実現 1. 利用関係に基づく順位付け手法 2. キーワードの出現頻度に基づく順位付け手法 3. 1.2. の手法を統合した順位付け手法 評価尺度
82
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 82 1. 利用関係に基づく順位付 け Component Rank ( CR )法 利用関係から部品重要度を評価し,順位付けする 多くの部品から利用されている部品は重要 重要な部品から利用されている部品もまた重要 多く利用される部品や,重要な箇所で利用される 部品に大きな評価値が与えられる 使用例が多く,汎用的な部品が上位に来る
83
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 83 2. キーワード出現頻度に基づ く順位付け Keyword Rank ( KR )法 文書検索システムで用いられる TF-IDF 法を改変 部品を特徴付ける索引キーに適当な重みを与える 部品に繰り返しあらわれる索引キー 希少で,特定の部品に偏ってあらわれる索引キー クラス定義名など部品を象徴するトークン種類の索引 キー 重みの総和を評価値として順位付け クエリと適合する部品が上位に来る
84
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 84 3.CR と KR を統合した順位付け 各順位付けの観点 1. 部品の使用例の多さ,汎用さ – CR 法 2. クエリと部品内容の適合度 – KR 法 順位を統合して,両面で優れている部品を 検索結果の上位に表示する 部品検索部で行うため高速な方法が望まし い
85
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 85 3.CR と KR を統合した順位付け Borda の手法 順位に対して評価点を割り当て,その合計点をもとに昇順 にソート 例 ) 部品群 = { A, B, C, D, E } 以降, CR と KR を統合した順位付けを CR+KR と呼ぶことにす る CRKR 1位1位 AD 2位2位 EC 3位3位 CA 4位4位 BB 5位5位 DE CRKR 合計点 A134 B448 C325 D516 E257 統合順位 1位1位 A 2位2位 C 3位3位 D 4位4位 E 5位5位 B CR と KR の順位 各部品群の評価点統合された順位
86
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 86 実験2. SPARS-J の各順位付け手法の比較 実験目的 各順位付け手法を比較し,最も妥当な順位付け手法を調査 する SPARS-J では3種類の評価手法による順位付け機能を実現 1. 利用関係に基づく順位付け手法( CR ) 2. キーワードの出現頻度に基づく順位付け手法( KR ) 3. 1.2. の手法を統合した順位付け手法( CR+KR ) 評価尺度 適合率:検索結果のうちの適合部品数の割合 割合が高いほど,検索結果に利用できる部品が多く含まれる ndpm 値:ユーザの順位付けとシステムの順位付けの違い 値が小さいほど,理想的な順位付けと言える
87
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 87 ndpm 値の計算方法 検索文書集合の全てのペア( d,d’ )における, システムとユーザで順位付けが異なるペア数 ( m )の割合 文書集合の要素数を n とすると 計算例
88
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 88 実験2. SPARS-J の各順位付け手法の比較 準備 データベース (実験1と同じ) 14 万個のソースコード群から構築 検索キーワード (実験1と同じ) 10 個のクエリを用いる 手順 各順位付け手法の検索結果上位 10 件の部品を対象 として適合率を求める 各順位付け手法による検索結果の全ての部品を対 象として,ユーザの想定する理想的な順位付けと の ndpm 値求める
89
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 89 実験2の結果 keyword CRKR CR+K R CRKR CR+K R A 1110.0360.0480.037 B 1110.1940.2610.221 C 0.5 0.1330.1170.092 D 0.40.90.80.1230.2000.189 E 0.4 0.2080.1920.194 F 0.2 0.184 0.160 G 0.9110.0810.1030.080 H 10.810.0470.1090.052 I 0.60.7 0.2100.3240.267 J 0.50.7 0.2190.2430.114 Ave.0.650.720.73 0.1430.1780.141 適合率 ndpm 値
90
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 90 実験2の評価 対応のある平均値の差の検定 有意水準5%で以下の有意差が見られた 適合率 KR,CR+KR ≫ CR ndpm 値 CR,CR+KR ≫ KR
91
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 91 実験2の考察 CR 法 全体的にユーザの想定した順位で並んでいる傾向 がある KR 法 上位には適合部品が多く存在しやすい CR+KR 法 適合率と ndpm 値の両方で優れた順位付けを行う ことができた CR 法と KR 法の両方で高順位になったものが上位 に順位付けされていると考えられる
92
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 92 実験3. 企業内のソフトウェアに対する適用 実験 実験目的 企業において実際に利用してもらうことで,ソフ トウェアの管理や理解に有用であることを確認す る 実験内容 SPARS-J ついての定性的な評価 企業内のソフトウェア開発・保守を行っている従業員 7名に対して, SPARS-J についてのアンケートを実施
93
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 93 アンケートの結果 ABCDEFG 最頻値 パッケージブラウザの利用 4 555 433 5 同グループのクラスの参照 4 55 243 5,4 検出されたクラスを利用しているクラスの参 照 555555 5 検出されたクラスが利用しているクラスの参 照 5 1 5555 5 クラスのメトリクス値 14124 5 4,1 ソースコードがダウンロード機能 13 55 2 5 5 時間的コストの削減 3 55 341 5,3 ソフトウェア品質の向上 5 33341 3 企業内のソフトウェア把握 31 5 3 5 21 5,3,1 検索結果一覧表示の見やすさ 44 55 33 5 5 ハイライト表示の見やすさ 3 555555 5 (良 5 4 3 2 1 悪)
94
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 94 アンケートの評価 特に評価の高かった点 パッケージブラウザの利用 同グループの参照 利用・被利用クラスの参照 検索結果一覧・ハイライト表示の見やすさ SPARS-J を用いることで可能になること 部品を利用しているアプリの把握 部品改訂時の影響範囲の調査
95
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 95 実験3の考察 ソフトウェア部品の把握が容易となるという ことを意味しており,保守作業の支援に繋 がっていると言える その他の感想 検索速度が速く,ストレスを感じない 前もって索引語を抽出しているため クライアントにインストールする必要がなく,ソ フトウェア部品を共有できる 共有のデータベースを構築するため SPARS-J はソフトウェア部品の検索に有用であり, また部品の把握や管理にも非常に役に立つ システムである
96
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 96 まとめと今後の課題 ソフトウェア部品検索システム SPARS-J の実験的評 価 他の検索システムとの比較 Google と Namazu より優れている SPARS-J の各順位付け手法の比較 CR ・ KR それぞれ特徴があり,それらを統合することで性能が向 上した 企業内のソフトウェアに SPARS-J を適用した 保守を行う時に非常に有効であった 今後の課題 再現率の調査 順位付け性能以外の観点からの定量的評価 他のソフトウェア部品への対応
97
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 97 END The End
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.