1 Overview of Component Search System SPARS-J Tetsuo Yamamoto*,Makoto Matsushita**, Katsuro Inoue** *Japan Science and Technology Agency **Osaka University.

Slides:



Advertisements
Similar presentations
Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
Advertisements

Search in Source Code Based on Identifying Popular Fragments Eduard Kuric and Mária Bieliková Faculty of Informatics and Information.
Chapter 5: Introduction to Information Retrieval
Matrices, Digraphs, Markov Chains & Their Use by Google Leslie Hogben Iowa State University and American Institute of Mathematics Leslie Hogben Iowa State.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Automatic Image Collection of Objects with Similar Function by Learning Human Grasping Forms Shinya Morioka, Tadashi Matsuo, Yasuhiro Hiramoto, Nobutaka.
How PageRank Works Ketan Mayer-Patel University of North Carolina January 31, 2011.
Introduction Information Management systems are designed to retrieve information efficiently. Such systems typically provide an interface in which users.
7.n次の行列式   一般的な(n次の)行列式の定義には、数学的な概念がいろいろ必要である。まずそれらを順に見ていく。
1章 行列と行列式.
フーリエ級数. 一般的な波はこのように表せる a,b をフーリエ級数とい う 比率:
Excelによる積分.
1 6.低次の行列式とその応用. 2 行列式とは 行列式とは、正方行列の特徴を表す一つのスカ ラーである。すなわち、行列式は正方行列からスカ ラーに写す写像の一種とみなすこともできる。 正方行列 スカラー(実数) の行列に対する行列式を、 次の行列式という。 行列 の行列式を とも表す。 行列式と行列の記号.
3.正方行列(単位行列、逆行列、対称行列、交代行列)
LYU0101 Wireless Digital Information System Lam Yee Gordon Yeung Kam Wah Supervisor Prof. Michael Lyu Second semester FYP Presentation 2001~2002.
C言語応用 構造体.
OOP in Java Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
8.任意のデータ構造 (グラフの表現とアルゴリズム)
Information Retrieval
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Overview of Search Engines
1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Measuring Copying.
© Paradigm Publishing, Inc. 5-1 Chapter 5 Application Software Chapter 5 Application Software.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University ICSE 2003 Java.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Finding Similar.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Refactoring.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Investigation.
A SSESSING THE I MPACT OF F RAMEWORK C HANGES U SING C OMPONENT R ANKING Reishi Yokomori Nanzan University, Japan Harvey Siy University of Nebraska at.
HES-HKS & KaoS meeting Toshi Gogami 5/July/2012. Contents SPL + ENGE Gogami spectra (Λ,Σ 0, 12 Λ B) Level 1.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Automatic Categorization.
1 PARSEWeb: A Programmer Assistant for Reusing Open Source Code on the Web Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina.
Nobody’s Unpredictable Ipsos Portals. © 2009 Ipsos Agenda 2 Knowledge Manager Archway Summary Portal Definition & Benefits.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
CSE 6331 © Leonidas Fegaras Information Retrieval 1 Information Retrieval and Web Search Engines Leonidas Fegaras.
1 Decisions in games Minimax algorithm  -  algorithm Tic-Tac-Toe game Decisions in games Minimax algorithm  -  algorithm Tic-Tac-Toe game.
Querying Structured Text in an XML Database By Xuemei Luo.
Reishi Yokomori Nanzan University, Japan Harvey Siy University of Nebraska at Omaha, USA Norihiro Yoshida Nara Institute of Science and Technology, Japan.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Applying Clone.
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
Software Engineering Research Group, Graduate School of Engineering Science, Osaka University 1 Evaluation of a Business Application Framework Using Complexity.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
SINGULAR VALUE DECOMPOSITION (SVD)
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Code-Clone Detection.
Personalized Course Navigation Based on Grey Relational Analysis Han-Ming Lee, Chi-Chun Huang, Tzu- Ting Kao (Dept. of Computer Science and Information.
6Data structure design (データ構造の設計) Data structure is one of the most important aspects of a program: Program = Data Structure + Algorithm.
1 Overview and Evaluation of Java Component Search System SPARS-J Reishi Yokomori **, Hideo Nishi**, Fumiaki Umemori**, Tetsuo Yamamoto*, Makoto Matsushita**,
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.
Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi, Takashi Ishio, Tetsuya Kanda, Katsuro Inoue Department of Computer.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
1 Measuring Similarity of Large Software System Based on Source Code Correspondence Tetsuo Yamamoto*, Makoto Matsushita**, Toshihiro Kamiya***, Katsuro.
B 04 How to Type in Japanese How do you TYPE in Japanese?
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Classification.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Extracting Sequence.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
SEMINAR ON INTERNET SEARCHING PRESENTED BY:- AVIPSA PUROHIT REGD NO GUIDED BY:- Lect. ANANYA MISHRA.
Data mining in web applications
A Hierarchical Model for Object-Oriented Design Quality Assessment
Enhancing Internet Search Engines to Achieve Concept-based Retrieval
CHAPTER 3 Architectures for Distributed Systems
Data Mining Chapter 6 Search Engines
Manuscript Transcription Assistant Initiative
Panagiotis G. Ipeirotis Luis Gravano
On Refactoring Support Based on Code Clone Dependency Relation
Dotri Quoc†, Kazuo Kobori†, Norihiro Yoshida
Presentation transcript:

1 Overview of Component Search System SPARS-J Tetsuo Yamamoto*,Makoto Matsushita**, Katsuro Inoue** *Japan Science and Technology Agency **Osaka University

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2 Outline Motivation and research aim SPARS-J Outline System architecture Ranking method Each part Analysis part Retrieval part User Interface Experiment Conclusion and Future work

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 3 Motivation Reuse of Software Components is a technique of developing new software components by using the components developed in the past. Example of reusable components: source code, document ….. improves productivity and quality, and cuts down development cost as a result. However, reuse of components is not utilized effectively. A developer doesn’t know existence of desirable components. Although there are a lot of components, these components are not organized. In order to take advantage of reuse, it is required to manage components and search suitable component easily

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 4 Research aim We have built the system which have functions as follows Collects software components eagerly without preserving their inherent structures Manages the component information automatically Provides component be suitable for User’s request Targets Intranet closed software development inside a company Internet Large open source software development web site –SourceForge, Jakarta Project. etc.

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 5 Outline Motivation and research aim SPARS-J Outline System architecture Ranking method Each part Analysis part Retrieval part User Interface Experiment Conclusion and Future work

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 6 SPARS-J (Software Product Archive , analysis and Retrieval System for Java) Java Software Product Archiving, analyzing and Retrieving System Many components are analyzed automatically. A search engine is built based on the analysis information. Component: a source code of class or interface Features Keyword search Two ranking methods Frequency in use of a word Use relation Analyzed information Components using/used by a component Package hierarchy

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 7 Structure of SPARS-J Component analysis part ・ extract components from a file ・ store analyzed information to DB ・ clustering and rank components using DB Database File Analyzed information ・ store analyzed information and component Component retrieval part ・ search components in correspondence with query from DB ・ rank components based on frequency in use of a keyword ・ aggregate two rankings User User interface part Query Result ・ deliver query to component retrieval part ・ show search results QueryHit components Library (Java source files) Component information

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 8 Ranking search results Ranking method 1. Component suited to a user request – Ranking based on frequency in use of a word 2. Component used mostly – Ranking based on component use relation We make it high ranking that the component both 1 and 2 are high Search results are shown to aggregate two ranks Keyword Rank (KR) Component Rank (CR)

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 9 Outline Motivation and research aim SPARS-J Outline System architecture Ranking method Each part Analysis part Retrieval part User Interface Experiment Conclusion and Future work

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 10 Component analysis part Extract component and its information from a Java source file The process Extract a component Index the component Extract use relations Clustering similar components Rank components based on use relations (CR method)

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 11 Extract and index a component Extracting component Find class or interface block in a java source file Location information in the file (start line number, end line number) Indexing Extract index key from the component Index key : a word and the kind of it No reserved words are extracted Count frequency in use of the word wordkind Sort Class name quicksort Comment quicksort Method name pivot Variable name quicksort Method call :: Index key public final class Sort { /* quicksort */ private static void quicksort(…) { int pivot; : quicksort(…); } : frequency

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 12 Extract use relations Extract use relations among components using semantic analysis Make component graph from use relations Node: component Edge: use relation Inheritance Interface implementation Variable type Instance creation Field access Method call The kind of use relation public class Test extend Data{ : public static void main(…) { : Sort.quicksort(super.array); : } Sort Data Test Component graph Inheritance Field access Method call

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 13 Similar component Similar component is copied component or minor modified component We merge similar components into single component Merged component have use relations that all component before merging have C BF AD G E Component graph BF AD E CG Clustered component graph C BF AD G E

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 14 Clustering components We measure characteristics metrics to merge components The difference ratio of each component metrics Metrics complexity –The number of methods, cyclomatic, etc. –represent a structural characteristic Token-composition –The number of appearances of each token –represent a surface characteristic

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 15 Ranking based on use relation Component Rank (CR) Reusable component have many use relation The example of use is much General purpose component Sophisticated component We measure use relation quantitatively, and rank components The component used by many components is important The component used by important component is also important Katsuro Inoue, Reishi Yokomori, Hikaru Fujiwara, Tetsuo Yamamoto, Makoto Matsushita, Shinji Kusumoto: "Component Rank: Relative Significance Rank for Software Component Search", ICSE, Portland, OR, May 6, 2003.

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 16 Propagating weights AB C Ad-hoc weights are assigned to each node

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 17 Propagating weights AB C The node weights are re-defined by the incoming edge weights

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 18 Propagating weights AB C We get new node weights

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 19 Propagating weights AB C We get stable weight assignment next-step weights are the same as previous ones Component Rank : order of nodes sorted by the weight

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 20 Outline Motivation and research aim SPARS-J Outline System architecture Ranking method Each part Analysis part Retrieval part User Interface Experiment Conclusion and Future work

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 21 Component retrieval part Search components from database, rank components The process Search components Ranking suited to a user request Aggregate two ranks (CR and KR)

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 22 Search components Search query Words a user input The kind of an index word, package name Components contain given query are searched from Database

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 23 Ranking suited to a user request Keyword Rank (KR) Components which contain words given by a user are searched Rank components using the value calculated from index word weight Index word weight –Many frequency in use of a component –A word contained particular components –A word represent the component function such as Class name Sort the sum of all given word weight TF-IDF weighting using full-text search engine

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 24 Calculation of KR value Calculate weight W ct with component c word t TF i : The frequency with which a kind i of word t occurs in component c IDF : the total number of components / the number of components containing word t kw i : Weight of a kind i KR value is the sum of all word W ct the kind of a word weig ht Class name200 Interface name50 Method name200 Package name50 Import30 Method call10 Field access10 Variable type10 Instance creation10 Local var access1 Comment30 Doc comment50 Line comment10 String1

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 25 Aggregate two ranks Aggregate two ranks KR and CR Aggregation method Borda Count method known a voting system Use for single or multiple-seat elections This form of voting is extremely popular in determining awards SPARS-J Rank components both KR and CR Using KR and CR, the component that be suitable user’s request, reusable and sophisticated

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 26 Borda Count method There are 10 voters and 5 candidates (from A to E) Each voter rank candidates 1 point for last place, 2 points for second from last place …, and N points for first place 1st=5points , 2nd=4points , … A : =28points B : 38points C : 38points D : 22points E : 26points 1s t 2n d 3r d 4t h 5t h 3ABCDE 3EBCDA 2CBAED 2CDBAE 1s t 3r d 4t h 5t h BCADE Aggregation

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 27 Outline Motivation and research aim SPARS-J Outline System architecture Ranking method Each part Analysis part Retrieval part User Interface Experiment Conclusion and Future work

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 28 User interface Receive a user’s query and provide the search results through Web browser Microsoft Internet Explore, Mozilla, etc. The process Parse query word and the search condition Show rank ordered results Show analyzed information of the component Used by/Using the component Metrics

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 29 Analyzed information A component information are as follows Metrics The number of method, variable LOC, cyclomatic Etc. (measurable metrics in the component itself) Components used by/using the component Show lists of nodes followed use relation Components that are similar to the component Show lists of similar components

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 30 Package browsing The naming structure for Java packages is hierarchical A user can search lists of components in same package of a component easily

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 31 Screenshot (top page)

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 32 Screenshot (search results)

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 33 Screenshot (source code)

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 34 Screenshot (similar components)

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 35 Screenshot (using the component)

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 36 Screenshot (used by the component)

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 37 Screenshot (package browsing)

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 38 Outline Motivation and research aim SPARS-J Outline System architecture Ranking method Each part Analysis part Retrieval part User Interface Experiment Conclusion and Future work

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 39 Experiment(1/2) Comparison with Google Register about 130,000 components get from Internet Query words ‘calculator applet’ and ‘chat server client’ Calculate relevance ratio of 10 rank higher Relevance: The component is reusable source code Google is a web search engine… Add ‘java source’ term to the query words Follow one link from the result web page

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 40 Experiment(2/2) Example 1 : ”calculator applet” SPARS-J 9 hits 7 suited components Example 2 : ”chat server client” SPARS-J 69 hits 57 suited components Using SPARS-J, suited component is high order SAPRS-JGoogleSPARS-JGoogle orderRelev ance RatioRelev ance RatioRelev ance RatioRelev ance ratio 1 ○ 1 ○ 1 ○ 1×0 2 ○ 1×0.5 ○ 1×0 3 ○ 1 ○ 0.67 ○ 1×0 4 ○ 1×0.5 ○ 1×0 5 ○ 1 ○ 0.6 ○ 1×0 6 ×0.83 ○ 0.67 ○ 1×0 7 ○ 0.86×0.57 ○ 1 ○ ×0.75 ○ 0.63 ○ 1× ○ 0.78×0.56 ○ 1 ○ ×0.5 ○ 1 ○ 0.3 Example1Example2

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 41 Conclusion and Future work We developed component search engine SPARS-J Using SPARS-J, retrieval of components used well is enabled easily. Future work Morphological analysis of Index keyword Collaborative filtering Investigate best ranking method The value of weight Aggregation ranks Evaluation of SPARS-J Usability

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 42 End

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 43 Component graph AB C ED F G IH System X System Y component use relation

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 44 Weight of nodes AB C ED F G IH System X System Y sum of all node weights = 1... (1) weight of node represents significance of node

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 45 Weights of edges A B d=1/4 d: distribution ratio Node weight is distributed to each outgoing edge Edge weights are collected at the destination node sum of all outgoing edge weights = origin node weight... (2) sum of all incoming edge weights = destination node weight... (3)

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 46 Definition of weights Under constraints (1)~(3), we have a simultaneous equation =. D t : transposed matrix of distribution ratios W : node weight vector This simultaneous equation can be solved by propagating node weight through edges in the graph

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 47 Pseudo use relation ABC Weight computation does not always converge Add a pseudo edge from a node to another, if there is no 'real' edge Distribution ratios: pseudo edges << real edges

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 48 Markov model Component rank model can be considered as a Markov Chain of user's focus User's focus moves from one component to another along a use relation at a fixed time duration Node weight represents the existence probability of the user's focus at infinite future

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 49 Related Works Markov models of documentation traversal Influence Weight: impact factor of journal publication thought incoming references Page Rank: weight of HTML in the Internet through incoming web links Explicit use relations No clustering (important for software products) Measurement reusability of components or interfaces Use various characteristic metrics Indirect indicator of reusability Our approach directly reflects usage of components

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 50 部品群グラフをもとにした繰り返し計算 計算手順 1. 各頂点に適当な重みを与える – 重みの総和は 1 2. 各有向辺の重みを求める – 頂点の重みを,出ていく辺で分配する 3. 各頂点の重みを再計算 – 頂点に入ってくる辺の重みの総和を,その頂点の重みとして再定 義する 4. 頂点の重みが収束するまで, 2.3. を繰り返し計算する 5. 収束した頂点の重みを,その頂点に対応する部品群の CR 値とす る – 部品の評価値は属する部品群の CR 値とする C C C C C C v 1 ×50% v 2 ×100%v 3 ×100% C1C1 C2C2 C3C C C C C1C1 C2C2 C3C C C C C C C CR 値の計算