Robust Requirements Tracing Via Internet Tech:Improving an IV&V Technique SAS 2004July 20, 2004 Alex Dekhtyar Jane Hayes Senthil Sundaram Ganapathy Chidambaram.

Slides:



Advertisements
Similar presentations
A Vector Space Model for Automatic Indexing
Advertisements

Chapter 5: Introduction to Information Retrieval
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University On the Effectiveness.
MINING FEATURE-OPINION PAIRS AND THEIR RELIABILITY SCORES FROM WEB OPINION SOURCES Presented by Sole A. Kamal, M. Abulaish, and T. Anwar International.
Contextual Advertising by Combining Relevance with Click Feedback D. Chakrabarti D. Agarwal V. Josifovski.
Benjamin J. Deaver Advisor – Dr. LiGuo Huang Department of Computer Science and Engineering Southern Methodist University.
CMU SCS : Multimedia Databases and Data Mining Lecture #16: Text - part III: Vector space model and clustering C. Faloutsos.
Exploring the Neighborhood with Dora to Expedite Software Maintenance Emily Hill, Lori Pollock, K. Vijay-Shanker University of Delaware.
ADAM CZAUDERNA, MAREK GIBIEC, GREG LEACH, YUBIN LI YONGHEE SHIN (PRESENTER), ED KEENAN, JANE CLELAND-HUANG TEFSE’11, 23, MAY, 2011.
Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
Ch 4: Information Retrieval and Text Mining
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
1 Integrating User Feedback Log into Relevance Feedback by Coupled SVM for Content-Based Image Retrieval 9-April, 2005 Steven C. H. Hoi *, Michael R. Lyu.
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.
Computer comunication B Information retrieval. Information retrieval: introduction 1 This topic addresses the question on how it is possible to find relevant.
Automated Requirements Traceability Study of the Analyst Presented by Jeff Holden Advisor Alex Dekhtyar.
ITCS 6010 Natural Language Understanding. Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Multimedia Databases Text II. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Text databases Image and video.
CS580: Building Web Based Information Systems Roger Alexander & Adele Howe The purpose of the course is to teach theory and practice underlying the construction.
Information Retrieval
10-603/15-826A: Multimedia Databases and Data Mining Text - part II C. Faloutsos.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
SAS_06_STOL_Tool_Cooper Automated Systems Test and Operations Language (STOL) Analysis Tool Jason G. Cooper July 20, 2006.
Expediting Programmer AWAREness of Anomalous Code Sarah E. Smith Laurie Williams Jun Xu November 11, 2005.
A Privacy Preserving Efficient Protocol for Semantic Similarity Join Using Long String Attributes Bilal Hawashin, Farshad Fotouhi Traian Marius Truta Department.
Combining Content-based and Collaborative Filtering Department of Computer Science and Engineering, Slovak University of Technology
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
Advanced Technology Center Slide 1 Requirements-Based Testing Dr. Mats P. E. Heimdahl University of Minnesota Software Engineering Center Dr. Steven P.
Implicit An Agent-Based Recommendation System for Web Search Presented by Shaun McQuaker Presentation based on paper Implicit:
11 CANTINA: A Content- Based Approach to Detecting Phishing Web Sites Reporter: Gia-Nan Gao Advisor: Chin-Laung Lei 2010/6/7.
CSE 6331 © Leonidas Fegaras Information Retrieval 1 Information Retrieval and Web Search Engines Leonidas Fegaras.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
Research Heaven, West Virginia A Compositional Approach for Validation of Formal Models Bojan Cukic, Dejan Desovski West Virginia University NASA OSMA.
No. 1 Classification and clustering methods by probabilistic latent semantic indexing model A Short Course at Tamkang University Taipei, Taiwan, R.O.C.,
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Chapter 6: Information Retrieval and Web Search
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
LATENT SEMANTIC INDEXING Hande Zırtıloğlu Levent Altunyurt.
1 Fault-Based Analysis: Improving IV&V Through Requirements Risk Reduction '02 Jane Hayes Rama Bireddy D.N. American SAIC Department of Computer Science.
Badrul M. Sarwar, George Karypis, Joseph A. Konstan, and John T. Riedl
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004.
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
1 Using The Past To Score The Present: Extending Term Weighting Models with Revision History Analysis CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG,
Web- and Multimedia-based Information Systems Lecture 2.
ICIP 2004, Singapore, October A Comparison of Continuous vs. Discrete Image Models for Probabilistic Image and Video Retrieval Arjen P. de Vries.
Language Model in Turkish IR Melih Kandemir F. Melih Özbekoğlu Can Şardan Ömer S. Uğurlu.
Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.
A Critique and Improvement of an Evaluation Metric for Text Segmentation A Paper by Lev Pevzner (Harvard University) Marti A. Hearst (UC, Berkeley) Presented.
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
1 The Software Development Process ► Systems analysis ► Systems design ► Implementation ► Testing ► Documentation ► Evaluation ► Maintenance.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo a, Jose G. Delgado-Frias Publisher: Journal of Systems.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
CS 4311 Software Design and Implementation Spring 2012.
CS 4311 Software Design and Implementation Spring 2013.
Information Retrieval and Web Search IR models: Vector Space Model Term Weighting Approaches Instructor: Rada Mihalcea.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
EFFICIENT ALGORITHMS FOR APPROXIMATE MEMBER EXTRACTION By Swapnil Kharche and Pavan Basheerabad.
15-826: Multimedia Databases and Data Mining
Suggested Layout ** Designed to be printed on A3 paper in an assortment of colours. This is directly linked to the Computer Science Specification.
Presentation transcript:

Robust Requirements Tracing Via Internet Tech:Improving an IV&V Technique SAS 2004July 20, 2004 Alex Dekhtyar Jane Hayes Senthil Sundaram Ganapathy Chidambaram Sarah Howard Department of Computer Science University of Kentucky

Outline Requirements Tracing and Information Retrieval Methods Metrics RETRO Experimental Results NASA Research information Technology Readiness Level Potential applications Ease of finding, or availability of, data or case studies Barriers to research or application Future work

Who Is Who Sponsor NASA IV&V Center, Fairmont, WV Principal Investigators: Alexander Dekhtyar Jane Hayes Ph. D. Student: Senthil Karthekian Sundaram* M.S. Student: Sarah Howard Past Undergraduate Students: James Osborne* Rijo Jose Thozhal Subcontractor: SAIC * Supported by the NASA grant

The Problem How can we automate tracing requirments during IV&V? Relevance to NASA  Alleviate work of NASA IV&V analysts  Improve quality of IV&V for NASA software Importance/Benefits Importance/Benefits Improve analyst productivity on one of the most time-consuming IV&V tasks

Approach Use Information Retrieval Techniques for Requirements Tracing Build RETRO (REquirements TRacing On-target) Evaluate performance  TF-IDF, Thesaurus, Probabilistic IR, LSI  Analyst Feedback  Metrics  Special-purpose requirments tracing tool  Standalone version  Integrated with SAIC’s SuperTracePlus  MODIS, LOFAR, CM-1 datasets

representation Approach: IR for Requirements Tracing Matching algorithm Design Document Analyst Requirements Document Yes No Feedback

Outline Requirements Tracing and Information Retrieval Methods Metrics RETRO Experimental Results NASA Research information Technology Readiness Level Potential applications Ease of finding, or availability of, data or case studies Barriers to research or application Future work

Methods TF - IDF TF = Term Frequency IDF = Inverse Document Frequency (rare terms) Latent Semantic Indexing (LSI) term x document => “factor” x document #”factors” << # terms Enhancements: Thesaurus Feedback Processing Filtering

Outline Requirements Tracing and Information Retrieval Methods Metrics RETRO Experimental Results NASA Research information Technology Readiness Level Potential applications Ease of finding, or availability of, data or case studies Barriers to research or application Future work

Metrics N - number of low-level requirements; M - number of high-level requirments; Hits - number of correct candidate links Strikes - number of false positives Misses - number of missed links

Metrics N - number of low-level requirements; M - number of high-level requirments; Hits - number of correct candidate links Strikes - number of false positives Misses - number of missed links Precision = Hits Hits + Strikes Recall = Hits Hits +Misses

Metrics N - number of low-level requirements; M - number of high-level requirments; Hits - number of correct candidate links Strikes - number of false positives Misses - number of missed links Precision = Hits Hits + Strikes Recall = Hits Hits +Misses Selectivity = Hits + Strikes M * N

Metrics N - number of low-level requirements; M - number of high-level requirements; Hits - number of correct candidate links Strikes - number of false positives Misses - number of missed links Precision = Hits Hits + Strikes Recall = Hits Hits +Misses Selectivity = Hits + Strikes M * N AvgH = average relevance of Hits AvgS = average relevance of Strikes DiffR = AvgH - AvgS

Metrics N - number of low-level requirements; M - number of high-level requirments; Hits - number of correct candidate links Strikes - number of false positives Misses - number of missed links Precision = Hits Hits + Strikes Recall = Hits Hits +Misses Selectivity = Hits + Strikes M * N AvgH = average relevance of Hits AvgS = average relevance of Strikes DiffR = AvgH - AvgS Lag(Hit) = # Strikes for high-level requirement with Higher relevance Lag = average Lag(Hit) over all Hits

Metrics N - number of low-level requirements; M - number of high-level requirments; Hits - number of correct candidate links Strikes - number of false positives Misses - number of missed links Precision = Hits Hits + Strikes Recall = Hits Hits +Misses Selectivity = Hits + Strikes M * N Breakpoint = (threshold, Precision, Recall), s.t. Precision = Recall

Metrics Precision: signal - to - noise Recall: “coverage” Selectivity: improvement in # of comparisons vs. exhaustive search AvgH, AvgS, DiffR, Lag - separation between Hits and Strikes in candidate link lists Breakpoints - effects of filtering

Outline Requirements Tracing and Information Retrieval Methods Metrics RETRO Experimental Results NASA Research information Technology Readiness Level Potential applications Ease of finding, or availability of, data or case studies Barriers to research or application Future work

RETRO: REquirements TRacing On-target

RETRO Architecture documents Build Representation IR toolbox Filter Feedback processor Analyst

RETRO + SuperTracePlus requirements documents SFEP RETRO Build Representation RETRO IR Toolbox STP Interactive Link Anlysis RETRO Feedback STP Report Generation Traceability Reports A STP RETRO Analyst Review

Outline Requirements Tracing and Information Retrieval Methods Metrics RETRO Experimental Results NASA Research information Technology Readiness Level Potential applications Ease of finding, or availability of, data or case studies Barriers to research or application Future work

The Universe of Tests methodthesaurusthresholdfeedback TF-IDF LSI* Yes No Top 1 Top 2 Top 3 Top 4 [0.0…0.5] X XX * LSI: number of dimensions + – low-level documents – high+low-level documents – high-level, low-level documents separately

Datasets MODIS 20 high-level 49 low-level 41 true links CM-1 ~200 high-level ~300 low-level # true links - under construction

MODIS, TF-IDF, Thesaurus Top2 Feedback

MODIS, TF-IDF, Thesaurus Top2 Feedback Filtering at Iteration 0 Breakpoint

MODIS, TF-IDF, Thesaurus Top2 Feedback

Above 70%

MODIS, TF-IDF, No Thesaurus Top3, Feedback

MODIS, Comparing Feedback Traces

Above 70%

MODIS, Secondary Measures

Outline Requirements Tracing and Information Retrieval Methods Metrics RETRO Experimental Results NASA Research information Technology Readiness Level Potential applications Ease of finding, or availability of, data or case studies Barriers to research or application Future work

NASA Research Information Technology Readiness Level for RETRO Integrated with existing software system Engineering feasibility demonstrated Limited documentation available Most functionality available for demonstration and test Most software bugs removed Potential applications  Tracing bug reports to code  Identifying related/duplicate bug reports Ease of finding, or availability of, data or case studies  Data available  Issue is answerset Barriers to research or application  Answerset availability  IV&V analysts for human factors studies Publications  Paper accepted to RE 2004  1 journal paper submitted, one in progress

Outline Requirements Tracing and Information Retrieval Methods Metrics RETRO Experimental Results NASA Research information Technology Readiness Level Potential applications Ease of finding, or availability of, data or case studies Barriers to research or application Future work

Next Steps, Conclusions, Plans, Ideas IR methods work : need to implement more Productize RETRO (Check!) Data Integration with existing tools (Check!) Other IV&V problems may be alleviated Study “human factors”