Development of a Software Search Engine for the World Wide Web Ken-ichi Matsumoto — 松本健一 Akito Monden — 門田暁人 Toshiyuki Kamei — 亀井俊之 Haruaki Tamada — 玉田春昭.

Slides:



Advertisements
Similar presentations
Computer Science 209 Software Development Equality and Comparisons.
Advertisements

CHAPTER 4 Queues. Queue  The queue, like the stack, is a widely used data structure  A queue differs from a stack in one important way  A stack is.
CHAPTER 4 Queues MIDTERM THURSDAY, OCTOBER 17 IN LAB.
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
Chapter 9 Imperative and object-oriented languages 1.
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
Java Programming, 3e Concepts and Techniques Chapter 5 Arrays, Loops, and Layout Managers Using External Classes.
SE 555 Software Requirements & Specification1 Use-Case Modeling: Overview and Context.
1 SWE Introduction to Software Engineering Lecture 22 – Architectural Design (Chapter 13)
UCLA Digital Library UC Digital Library Forum August 5, 2002 UCLA Digital Library Presenter: Curtis Fornadley Senior Programmer/Analyst.
Cmp Sci 187: Midterm Review Based on Lecture Notes.
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
COMP T2 Lecture 5 School of Engineering and Computer Science, Victoria University of Wellington Thomas Kuehne Maps, Stacks  Thomas Kuehne, Marcus.
+ Social Bookmarking and Collaborative Filtering Christopher G. Wagner.
Principles of Computer Programming (using Java) Review Haidong Xue Summer 2011, at GSU.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Programming Languages and Paradigms Object-Oriented Programming.
Computer Science II 810:062 Section 01 Session 2 - Objects and Responsibilities.
Snap-Together Visualization Chris North Lab for Information Visualization and Evaluation Department of Computer Science Virginia Tech.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University ICSE 2003 Java.
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
Making Grey Literature Available through Institutional Repositories LeRoy J. LaFleur, Social Sciences Bibliographer Nathan A. Rupp, Metadata Librarian.
Copyright © 2010 Nara Institute of Science and Technology / Osaka University Standardizing the Software Tag in Japan for Transparency of Development Profes.
JAVA COLLECTIONS LIBRARY School of Engineering and Computer Science, Victoria University of Wellington COMP T2, Lecture 2 Marcus Frean.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
A Recommendation System for Software Function Discovery Naoki Ohsugi Software Engineering Laboratory, Graduate School of Information Science, Nara Institute.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
1 Java: AP Curriculum Focus and Java Subset Alyce Brady.
Data Design and Implementation. Definitions of Java TYPES Atomic or primitive type A data type whose elements are single, non-decomposable data items.
DataNet – Flexible Metadata Overlay over File Resources Daniel Harężlak 1, Marek Kasztelnik 1, Maciej Pawlik 1, Bartosz Wilk 1, Marian Bubak 1,2 1 ACC.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
IR Homework #2 By J. H. Wang Mar. 31, Programming Exercise #2: Query Processing and Searching Goal: to search relevant documents for a given query.
Chapter 4 Grouping Objects. Flexible Sized Collections  When writing a program, we often need to be able to group objects into collections  It is typical.
Information and Computer Sciences University of Hawaii, Manoa
INNOV-10 Progress® Event Engine™ Technical Overview Prashant Thumma Principal Software Engineer.
2014-T2 Lecture 19 School of Engineering and Computer Science, Victoria University of Wellington  Marcus Frean, Lindsay Groves, Peter Andreae, and John.
Today’s Agenda  Generic  Iterators CS2336: Computer Science II.
2013-T2 Lecture 18 School of Engineering and Computer Science, Victoria University of Wellington  Marcus Frean, Lindsay Groves, Peter Andreae, and John.
The World Wide Web: Information Resource. Hock, Randolph. The Extreme Searcher’s Internet Handbook. 2 nd ed. CyberAge Books: Medford. (2007). Internet.
1 CSCD 326 Data Structures I Software Design. 2 The Software Life Cycle 1. Specification 2. Design 3. Risk Analysis 4. Verification 5. Coding 6. Testing.
B. Trousse, R. Kanawati - JTE : Advanced Services on the Web, Paris 7 may 1999 Broadway: a recommendation computation approach based on user behaviour.
Chapter 4 Grouping Objects. Flexible Sized Collections  When writing a program, we often need to be able to group objects into collections  It is typical.
CSC 212 Sequences & Iterators. Announcements Midterm in one week  Will cover through chapter 5 of book  Midterm will be open book, open note (but, closed.
Computer Science 209 Software Development Inheritance and Composition.
JAVA COLLECTIONS LIBRARY School of Engineering and Computer Science, Victoria University of Wellington COMP T2, Lecture 2 Thomas Kuehne.
Data Design and Implementation. Definitions Atomic or primitive type A data type whose elements are single, non-decomposable data items Composite type.
What is Iterator Category: Behavioral Generic Way to Traverse Collection Not Related to Collection Implementation Details Not Related to the direction/fashion.
Chapter 5 Array-Based Structures © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.
Copyright © 2009 – Curt Hill Standard Template Library An Introduction.
Iterators, Iterator, and Iterable 2015-T2 Lecture 8 School of Engineering and Computer Science, Victoria University of Wellington COMP 103 Thomas Kuehne.
JAVA COLLECTIONS LIBRARY School of Engineering and Computer Science, Victoria University of Wellington COMP T2, Lecture 2 Marcus Frean.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
2015-T2 Lecture 19 School of Engineering and Computer Science, Victoria University of Wellington  Marcus Frean, Lindsay Groves, Peter Andreae, and John.
Graduate School of Information Science, Nara Institute of Science and Technology - Wed. 7 April 2004Profes 2004 Effort Estimation Based on Collaborative.
Part 1: Composition, Aggregation, and Delegation Part 2: Iterator COMP 401 Fall 2014 Lecture 10 9/18/2014.
PYTHON FOR HIGH PERFORMANCE COMPUTING. OUTLINE  Compiling for performance  Native ways for performance  Generator  Examples.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
General Architecture of Retrieval Systems 1Adrienn Skrop.
CS276B Text Information Retrieval, Mining, and Exploitation Practical 1 Jan 14, 2003.
1 Iterators & the Collection Classes. 2 » The Collection Framework classes provided in the JAVA API(Application Programmer Interface) contains many type.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Empirical Project Monitor and Results from 100 OSS Development Projects Masao Ohira Empirical Software Engineering Research Laboratory, Nara Institute.
SIS: A system for Personal Information Retrieval and Re-Use
Author: Kazunari Sugiyama, etc. (WWW2004)
ArrayLists 22-Feb-19.
Haystack: an Adaptive Personalized Information Retrieval System
Software Engineering Lecture #29
Coupling Interaction: It occurs due to methods of a class invoking methods of other classes. Component Coupling: refers to interaction between two classes.
Presentation transcript:

Development of a Software Search Engine for the World Wide Web Ken-ichi Matsumoto — 松本健一 Akito Monden — 門田暁人 Toshiyuki Kamei — 亀井俊之 Haruaki Tamada — 玉田春昭 Naoki Ohsugi — 大杉直樹 Software Engineering Laboratory Nara Institute of Science and Technology

2 Needs for Software Search from WWW What is a typical usage of this library component? Is there any other implementation for this function? Is there any useful library for my program? Search for examples Search for better implementations Search for unaware components Developer

3 Goal Construct a software search engine for developers:  Collects various resources related to software development from the WWW, e.g. source code, executables, Tips, developer’s “blogs”, etc.  Provides a flexible query interface  Provides a recommendation of useful resources. In this presentation, we focus our target on Java programs.

4 System Architecture Interface Query Collection Analysis Retrieval Users Resource Summary Repository Software Resource Repository - Pointers to resources (url) -Recommendations Result

5 Three Major Features of Our Search Engine Software Search Engine User Name of a component M A set of components that use M 1: Finding a typical usage of a component User An implementation of a component M Components that have similar functionality to M 2: Finding a similar component User An unfinished program M A set of components useful for M 3: Get a recommendation

6 Three Major Features of Our Search Engine Software Search Engine User Name of a component M A set of components that use M 1: Finding a typical usage of a component User An implementation of a component M Components that have similar functionality to M 2: Finding a similar component User Unfinished program M A set of components useful for M 3: Get a recommendation We employ Software Birthmark, Similarity Evaluation, and Collaborative Filtering to implement these features. We employ Software Birthmark, Similarity Evaluation, and Collaborative Filtering to implement these features.

7 Software Birthmark A set of characteristics of a program*  Constant Values in Field Variables (CVFV birthmark)  Sequence of Method Calls (SMC birthmark)  Inheritance Structure (IS birthmark)  Used Classes (UC birthmark)  etc. Useful for detection of software theft (plagiarism) Also useful for detection of a set programs having similar functionality (UC birthmark and SMC birthmark) * H. Tamada, M. Nakamura, A. Monden, and K. Matsumoto, “Design and evaluation of birthmarks for detecting theft of Java programs,” In Proc. IASTED Int’l Conf. on Software Engineering, pp , Feb p CVFV(p)SMC(p)IS(p)UC(p) CVFVSMCISUC

8 Example of Software Birthmark for Java UC birthmark is a set of used classes. import java.util.Iterator; import java.lang.reflect.Array; public class ArrayIterator extends Object implements Iterator{ private Object array; private int index = 0; public ArrayIterator(Object array){ if(!Class.isArray(array.getClass())){ throw new IllegalArgumentException( “not array type”); } this.array = array; } public Object next(){ return Array.get(array, index++); } public boolean hasNext(){ return index < Array.getLength(array); }... java.lang.reflect.Array java.lang.Class java.lang.IllegalArgumentException java.lang.Object java.lang.String java.util.Iterator UC Birthmark of ArrayIterator

9 Similarity between Two Components Similarity computation of UC birthmark i and j based on correlation coefficient U: A set of all classfiles R u,i = # of classfiles used by i / |U|    = 1 0 (u uses class i) (u does not use class i) where Other computations are also available, e.g. vector (cosine) similarity, adjusted cosine, etc.

10 Example (1): Search for typical usages Data source: rt.jar (9206 class files) Search for typical usages of “java.util.BitSet”

11 Example (2): Search for Similar Component Data source: a part of bcel5.1 (100 class files) Search for classfiles similar to “ArithmeticInstruction”

12 Collaborative Filtering (CF) Filtering: means selecting preferred items from a large collection of items. Collaborative: means using the other users’ preferences to filter items. Using the other users’ preferences F K A B D E C G I J H L N O M P Q S T R Large amount of items F is good!K is cool! ? ? Selecting preferred items F F K K

13 Two Steps in CF Evaluate similarities between target user and the other users. Estimate the preference using the other users’ preferences for target item and their similarities. Similar User Dissimilar User ? (target) 5 (prefer) 5 (prefer) User A User B Item 2 Item 1 5 (prefer) 5 (prefer) 5 (prefer) 1 (not prefer) User C User D 5 (prefer) 1 (not prefer) 1 (not prefer) 1 (not prefer) Item 4 Item 3 3 (even) 3 (even) 1 (not prefer) 3 (even) 5 (prefer) 5 (prefer) Item 5 5 (prefer) 5 (prefer) 1 (not prefer) 5 (prefer) Estimate

14 CF for Software Components Evaluate similarities between target component and the other components based on UC birthmark. Estimate the usefulness using the other components’ UC birthmark for target classfile and their similarities. Similar Component Dissimilar Component ? (target) 1 1 Component A Component B Class 2 Class Component C Component D Class 4 Class Class (useful) Estimate 0 … not used1 … used

15 Example (3): Get Recommendations Data source: a part of bcel5.1 (100 class files) Search for recommendation for “ArithmeticInstruction” Actually used

16 Summary Three features of a software search engine  Providing typical usage of a component  Providing a similar component  Making a recommendation Three key technologies  Software Birthmark  Similarity Evaluation  Collaborative Filtering