Download presentation
Presentation is loading. Please wait.
Published byMarian Hawkins Modified over 9 years ago
1
Development of a Software Search Engine for the World Wide Web Ken-ichi Matsumoto — 松本健一 Akito Monden — 門田暁人 Toshiyuki Kamei — 亀井俊之 Haruaki Tamada — 玉田春昭 Naoki Ohsugi — 大杉直樹 Software Engineering Laboratory Nara Institute of Science and Technology
2
2 Needs for Software Search from WWW What is a typical usage of this library component? Is there any other implementation for this function? Is there any useful library for my program? Search for examples Search for better implementations Search for unaware components Developer
3
3 Goal Construct a software search engine for developers: Collects various resources related to software development from the WWW, e.g. source code, executables, Tips, developer’s “blogs”, etc. Provides a flexible query interface Provides a recommendation of useful resources. In this presentation, we focus our target on Java programs.
4
4 System Architecture Interface Query Collection Analysis Retrieval Users Resource Summary Repository Software Resource Repository - Pointers to resources (url) -Recommendations Result
5
5 Three Major Features of Our Search Engine Software Search Engine User Name of a component M A set of components that use M 1: Finding a typical usage of a component User An implementation of a component M Components that have similar functionality to M 2: Finding a similar component User An unfinished program M A set of components useful for M 3: Get a recommendation
6
6 Three Major Features of Our Search Engine Software Search Engine User Name of a component M A set of components that use M 1: Finding a typical usage of a component User An implementation of a component M Components that have similar functionality to M 2: Finding a similar component User Unfinished program M A set of components useful for M 3: Get a recommendation We employ Software Birthmark, Similarity Evaluation, and Collaborative Filtering to implement these features. We employ Software Birthmark, Similarity Evaluation, and Collaborative Filtering to implement these features.
7
7 Software Birthmark A set of characteristics of a program* Constant Values in Field Variables (CVFV birthmark) Sequence of Method Calls (SMC birthmark) Inheritance Structure (IS birthmark) Used Classes (UC birthmark) etc. Useful for detection of software theft (plagiarism) Also useful for detection of a set programs having similar functionality (UC birthmark and SMC birthmark) * H. Tamada, M. Nakamura, A. Monden, and K. Matsumoto, “Design and evaluation of birthmarks for detecting theft of Java programs,” In Proc. IASTED Int’l Conf. on Software Engineering, pp.569-575, Feb. 2004. p CVFV(p)SMC(p)IS(p)UC(p) CVFVSMCISUC
8
8 Example of Software Birthmark for Java UC birthmark is a set of used classes. import java.util.Iterator; import java.lang.reflect.Array; public class ArrayIterator extends Object implements Iterator{ private Object array; private int index = 0; public ArrayIterator(Object array){ if(!Class.isArray(array.getClass())){ throw new IllegalArgumentException( “not array type”); } this.array = array; } public Object next(){ return Array.get(array, index++); } public boolean hasNext(){ return index < Array.getLength(array); }... java.lang.reflect.Array java.lang.Class java.lang.IllegalArgumentException java.lang.Object java.lang.String java.util.Iterator UC Birthmark of ArrayIterator
9
9 Similarity between Two Components Similarity computation of UC birthmark i and j based on correlation coefficient U: A set of all classfiles R u,i = # of classfiles used by i / |U| = 1 0 (u uses class i) (u does not use class i) where Other computations are also available, e.g. vector (cosine) similarity, adjusted cosine, etc.
10
10 Example (1): Search for typical usages Data source: rt.jar (9206 class files) Search for typical usages of “java.util.BitSet”
11
11 Example (2): Search for Similar Component Data source: a part of bcel5.1 (100 class files) Search for classfiles similar to “ArithmeticInstruction”
12
12 Collaborative Filtering (CF) Filtering: means selecting preferred items from a large collection of items. Collaborative: means using the other users’ preferences to filter items. Using the other users’ preferences F K A B D E C G I J H L N O M P Q S T R Large amount of items F is good!K is cool! ? ? Selecting preferred items F F K K
13
13 Two Steps in CF Evaluate similarities between target user and the other users. Estimate the preference using the other users’ preferences for target item and their similarities. Similar User Dissimilar User ? (target) 5 (prefer) 5 (prefer) User A User B Item 2 Item 1 5 (prefer) 5 (prefer) 5 (prefer) 1 (not prefer) User C User D 5 (prefer) 1 (not prefer) 1 (not prefer) 1 (not prefer) Item 4 Item 3 3 (even) 3 (even) 1 (not prefer) 3 (even) 5 (prefer) 5 (prefer) Item 5 5 (prefer) 5 (prefer) 1 (not prefer) 5 (prefer) Estimate
14
14 CF for Software Components Evaluate similarities between target component and the other components based on UC birthmark. Estimate the usefulness using the other components’ UC birthmark for target classfile and their similarities. Similar Component Dissimilar Component ? (target) 1 1 Component A Component B Class 2 Class 1 1 0 1 0 Component C Component D 1 0 0 0 Class 4 Class 3 1 1 0 1 0 0 Class 5 1 1 0 1 (useful) Estimate 0 … not used1 … used
15
15 Example (3): Get Recommendations Data source: a part of bcel5.1 (100 class files) Search for recommendation for “ArithmeticInstruction” Actually used
16
16 Summary Three features of a software search engine Providing typical usage of a component Providing a similar component Making a recommendation Three key technologies Software Birthmark Similarity Evaluation Collaborative Filtering
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.