Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz.

Slides:



Advertisements
Similar presentations
The 20th International Conference on Software Engineering and Knowledge Engineering (SEKE2008) Department of Electrical and Computer Engineering
Advertisements

Personalized Navigation in the Semantic Web: An Enhanced Faceted Browser Michal Tvarožek FIIT STU BA.
Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
Search in Source Code Based on Identifying Popular Fragments Eduard Kuric and Mária Bieliková Faculty of Informatics and Information.
Chapter 5: Introduction to Information Retrieval
UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Information Retrieval in Practice
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
The Unified Software Development Process - Workflows Ivar Jacobson, Grady Booch, James Rumbaugh Addison Wesley, 1999.
Requirements Specification
ADVISE: Advanced Digital Video Information Segmentation Engine
Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.
1 SWE Introduction to Software Engineering Lecture 22 – Architectural Design (Chapter 13)
Active Repository Systems Yunwen Ye Cleaver Retreat June 14, 2001.
Reusable Software Component Retrieval: Part II Taciana Amorim Vanderlei
Mobile Web Search Personalization Kapil Goenka. Outline Introduction & Background Methodology Evaluation Future Work Conclusion.
RiSE Project: Towards a Robust Framework for Software Reuse Student: Eduardo Santana de Almeida Advisor: Silvio Romero de Lemos Meira Federal University.
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Revision Control Practices in Software Engineering Surekha, Kotiyala Madhuri, Komuravelly Suchitra, Yerramalla.
Stimulating reuse with an automated active code search tool Júlio Lins – André Santos (Advisor) –
Course Instructor: Aisha Azeem
Architectural Design Establishing the overall structure of a software system Objectives To introduce architectural design and to discuss its importance.
Software Product Lines Krishna Anusha, Eturi. Introduction: A software product line is a set of software systems developed by a company that share a common.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University ICSE 2003 Java.
Citation Recommendation 1 Web Technology Laboratory Ferdowsi University of Mashhad.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Personalized Information Retrieval in Context David Vallet Universidad Autónoma de Madrid, Escuela Politécnica Superior,Spain.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Project Coordinators: Eduardo Santana de Almeida Silvio Romero de Lemos Meira Federal University of Pernambuco Informatics Center Recife Center for Advanced.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Modern Information Retrieval Computer engineering department Fall 2005.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
TOPIC CENTRIC QUERY ROUTING Research Methods (CS689) 11/21/00 By Anupam Khanal.
Chapter 10 Analysis and Design Discipline. 2 Purpose The purpose is to translate the requirements into a specification that describes how to implement.
Information Retrieval Model Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Workshop on Software Product Archiving and Retrieving System Takeo KASUBUCHI Hiroshi IGAKI Hajimu IIDA Ken’ichi MATUMOTO Nara Institute of Science and.
Software Quality Improvements from Refactoring Wes J. Lloyd July 15, 2008 Computer Science Department Colorado State University Dr. Sudipto Ghosh, co-advisor.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
RiSE Project: Towards a Robust Framework for Software Reuse Eduardo Santana de Almeida Federal University of Pernambuco, Brazil
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Information in the Digital Environment Information Seeking Models Dr. Dania Bilal IS 530 Spring 2005.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
Information Retrieval
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
CP3024 Lecture 12 Search Engines. What is the main WWW problem?  With an estimated 800 million web pages finding the one you want is difficult!
Lecture 21: Component-Based Software Engineering
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
WHIM- Spring ‘10 By:-Enza Desai. What is HCIR? Study of IR techniques that brings human intelligence into search process. Coined by Gary Marchionini.
Information Retrieval in Practice
Information Retrieval and Web Search
Information Retrieval and Web Search
Information Retrieval
CSE 635 Multimedia Information Retrieval
Relevance and Reinforcement in Interactive Browsing
Information Retrieval and Web Design
Recommending Adaptive Changes for Framework Evolution
Presentation transcript:

Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Information Retrieval Structured Documents Unstructured Documents  No software documentation standard Semi-Structured Documents Calvin Northrup Mooers

Mooers' Law: “An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have it,” 1959 Calvin Northrup Mooers

Mass Production Software components [Mcllroy, 1968]

“software industry is weakly founded, and that one aspect of this weakness is the absence of a software components subindustry” [McIlroy, 1968]

“The storage and retrieval of software assets is nothing but a specialized form of information storage and retrieval” [Mili, 1998]

Software Library Browsing – Inspecting without a predefined criterion Retrieval – Satisfy a predefined matching criterion

Classification Scheme Facet-based  Better than hierarchical classification  Manual classification different facets  Automatic classification Controlled Vocabulary  Semantic information Uncontrolled Vocabulary  Big software libraries  Little or no descriptors

Recall and Precision High Precision – Most retrieved elements are relevant High Recall – Few elements left behind Spreading Activation (Relaxed Search) – Related matches are retrieved Coverage – The average number of assets that are visited over the total size of the library

Asset Representation Library representation is made in full knowledge of the artifact. User representation is made in ignorance of the artifact Asset representation is purposefully abstract to capture important features while overlooking miner or irrelevant details Asset's surrogate is used in retrieval literature

Asset retrieval Goals Exact retrieval – Black box reuse Approximate retrieval – White box reuse  Generative modification – Reusing the design  Compositional modification – using building blocks of the retrieved asset

Usually non included information Interface description Non-functional requirements Interoperability

Situational Model x System Model Component retrieval model [Lucrédio et. al, 2004]

“Repository representation is made in full knowledge of the artifact at hand” “User representation is made in ignorance of the artifact” [Mili, 1998]

Scott Henninger

Tools

Component Search Tools Web  Delphi Search Engine  Ispey  CSourceSearch.net (2004)  Gonzui  SourceBank  Koders (2004)  Codase (2005) Aplications  Agora (1998)  Codebroker (2002)  Koders Enterprise (2004)  Maracatu (2005)

Delphi Search Engine

Ispey.com

SPARS-J – (2003) Filter

SourceBank Filter

CSourceSearch.Net – (2004)

Koders.com – (2004)

CODASE – Launched Sep 9, 2005 Example Searches Browsing Multiple Search Options “…based on the number of people in your company, starting from $5,000 USD”

CODASE - Browsing

Other Tools

AGORA - Location and Indexing (1998) INTERNET JavaBeans Agent JavaBeans Introspector JavaBeans Agent JavaBeans Introspector JavaBeans Agent JavaBeans Introspector AltaVista Search Index Server Filter INDEX AltaVista Query Server Web Server

Component Rank (1998) V1V1 V3V3 V2V D12 = 0.5 D13 = 0.5 D23 = 1 D31 = 1 Nodes v Edges e Graph G Weight w Distribution Ratio d

“Classes defining data structures and their containers are highly ranked”

Clustered Component Graph V3 V2 V1 V1 ≡ V4, V2 ≡ V6 V7 V6 V4V5 V7 V’26 V’14V’5 V’3

NO MORE MULTIPLE DISCONNECTED COMPONENTS V3 V2 V1 V7 V6 V4V5

Component Rank System Architecture.java file ≡ component (1) Similarity Measurement (2) Clustering (3) Use Relation Extraction (4) Component Graph Construction (5) Component Rank Computation by Repetition (6) De-Clustering to Original Component Graph INPUT OUTPUT Order of Weights ≡ Component Rank of.java files

Simple Copied Components A B A B X Y Copied Components Other Components Non-clustered component Graph A’ B’ X’ Y’ 1/4 Clustering Before Weight Computation 1/4 A’ B’ X’ Y’ 1/3 Clustering After Weight Computation 1/3 1/6

DO NOT COUNT SIMPLY DUPLICATED COMPONENTS

Copied AND MODIFIED Components A B A C X Y Copied and Modified Components Other Components Non-clustered component Graph X’ Y’ Clustering Before Weight Computation 1/5 Original Components A B’C’ 2/5 1/5 X’ Y’ Clustering Before Weight Computation 1/5 1/6 A’ B’C’ 1/3 1/6

Beyond Searching and Browsing Searching and browsing  Require users to initiate the information seeking process Information access and Information Delivery

CodeBroker – (2001) Components repositories are often so large that software developers cannot learn about all of the components Component repositories are not static  New components added  Old components updated Context-Aware browsing

May not have suficient knowledge about the reuse repository May perceive that reuse costs more than developing from scratch May not be able to use the repository by formulating a proper query May not be able to understand the found components

Belief Vaguely Known Information Islands Well Known L4: Entire Information Space Unknown components

L3: Belief L2: Vaguely Known CodeBroker L1: Well Known L4: Entire Information Space Information Use: L1 – Use by Memory L2 – Use by Recall L3 – Use by Anticipation L4 – Use by Delivery Already Known Components Irrelevant Components Task Relevant Information

Program Aspects Concept  Formal  Informal Indentation, comments, identifier names (semantic)  Executability Code Constraint environment  Signature

Information delivery Feedback  After execution of the action Feedforward  Affects the execution of the action

Information delivery Interruptive Noninterruptive

Latent Semantic Analysis (LSA) Synonymy Polysemy “Text documents and queries are represented as vectors in the semantic space, based on the words contained and the similarity between a query and a document is determined by the distance of their respective vectors”

Comments signature Discourse model User model

Koders Enterprise – (2004)

M.A.R.A.C.A.T.U. – Modern Architecture for Retrieving All Components At The Universe (2005)

Using Structural Context to Recommend Source Code Examples Reid Holmes and Gail C. Murphy University of British Columbia Software Practices Lab

The Problem: A Concrete Example Frameworks can improve developer productivity. But developers can become stuck trying to use the APIs  Imagine trying to use the Eclipse APIs to place text in the status line of the Eclipse IDE  Eclipse has 38,000 public methods

Structural Context Project Repository Development Environment Examples Using Structural Context to Recommend Source Code Examples - Reid Holmes and Gail C. Murphy

Strathcona: Extract Structural Context ViewPart SampleView setMessage(String) IStatusLineManager setMessage(String)

Visual representation  Highlights key relationships between example and query  Multiple examples can be quickly viewed Strathcona: Example Navigation

Strathcona: Viewing Example Source Code view  Example shows how to get a status line manager  Example is not a perfect match, but good enough to help

Conclusion Information Delivery Similarity Analyser Ranking – Metrics Context Automatic Facet Classification  Uncontrolled vocabulary + additional terms

References [McIlroy, 1968] M. D. McIlroy, Mass Produced Software Components, NATO Software Engineering Conference Report, Garmisch, Germany, October, 1968, pp [Mili, 1998] A. Mili, R. Mili, R. T. Mittermeir, A survey of software reuse libraries, Annals of Software Engineering, Vol. 5, 1998, pp [Seacord, 1998] Robert C. Seacord, Scott A. Hissam, Kurt C. Wallnau. "Agora: A Search Engine for Software Components," IEEE Internet Computing, vol. 02, no. 6, pp , November/December, 1998 [Szyperski, 1999] Szyperski C., “Component Software: Beyond Object-Oriented Programming”. Addison Wesley, 1999 [Dey, 2001] Dey, A.. Understanding and Using Context. Personal Ubiquitous Comput. 5, 1 (Jan. 2001) [Greengrass, 2001] Greengrass, Ed. Information retrieval: A survey. DOD Technical Report TR-R , 2001 [Ye, 2001] Ye, Y. and Fischer, G. Context-Aware Browsing of Large Component Repositories. In Proceedings of the 16th IEEE international Conference on Automated Software Engineering (November , 2001). ASE. IEEE Computer Society, Washington, DC, 99. [Ye, 2002] Y. Yunwen and G. Fischer. Information delivery in support of learning reusable software components on demand. In Proceedings of the 7th international conference on Intelligent user interfaces, California, USA [Ye, 2002] Ye, Y. and Fischer, G. Supporting Reuse by Delivering Task Relevant and Personalized Information. In Proceedings of the 24th International Conference on Software Engineering. p , Orlando, Florida, May, 2002

Bibliography [Inoue, 2003] K. Inoue et al.: "Component Rank: Relative Significance Rank for Software Component Search", Proceedings of ICSE 2003 [Maxville, 2003] Valerie Maxville, Chiou Peng Lam, Jocelyn Armarego. "Selecting Components: a Process for Context-Driven Evaluation," apsec, p. 456, 10th Asia- Pacific Software Engineering Conference (APSEC'03), 2003 [Maxville, 2004] Valerie Maxville, Jocelyn Armarego, Chiou Peng Lam. "Intelligent Component Selection," compsac, pp , 28th Annual International Computer Software and Applications Conference (COMPSAC'04), [Prado, 2004] Lucrédio, D.; Almeida, E, S.; Prado, A, F. A Survey on Software Components Search and Retrieval, In the 30th IEEE EUROMICRO Conference, Component-Based Software Engineering Track, 2004, Rennes - France. IEEE Press,2004 [Holmes, 2005] Holmes, R. and Murphy, G. C Using structural context to recommend source code examples. In Proceedings of the 27th international Conference on Software Engineering (St. Louis, MO, USA, May , 2005). ICSE '05

“Imperfect technology in a working market is sustainable; perfect technology without any market will vanish” [Szyperski, 1999]