Robust Requirements Tracing Via Internet Tech:Improving an IV&V Technique SAS 2004July 20, 2004 Alex Dekhtyar Jane Hayes Senthil Sundaram Ganapathy Chidambaram Sarah Howard Department of Computer Science University of Kentucky
Outline Requirements Tracing and Information Retrieval Methods Metrics RETRO Experimental Results NASA Research information Technology Readiness Level Potential applications Ease of finding, or availability of, data or case studies Barriers to research or application Future work
Who Is Who Sponsor NASA IV&V Center, Fairmont, WV Principal Investigators: Alexander Dekhtyar Jane Hayes Ph. D. Student: Senthil Karthekian Sundaram* M.S. Student: Sarah Howard Past Undergraduate Students: James Osborne* Rijo Jose Thozhal Subcontractor: SAIC * Supported by the NASA grant
The Problem How can we automate tracing requirments during IV&V? Relevance to NASA Alleviate work of NASA IV&V analysts Improve quality of IV&V for NASA software Importance/Benefits Importance/Benefits Improve analyst productivity on one of the most time-consuming IV&V tasks
Approach Use Information Retrieval Techniques for Requirements Tracing Build RETRO (REquirements TRacing On-target) Evaluate performance TF-IDF, Thesaurus, Probabilistic IR, LSI Analyst Feedback Metrics Special-purpose requirments tracing tool Standalone version Integrated with SAIC’s SuperTracePlus MODIS, LOFAR, CM-1 datasets
representation Approach: IR for Requirements Tracing Matching algorithm Design Document Analyst Requirements Document Yes No Feedback
Outline Requirements Tracing and Information Retrieval Methods Metrics RETRO Experimental Results NASA Research information Technology Readiness Level Potential applications Ease of finding, or availability of, data or case studies Barriers to research or application Future work
Methods TF - IDF TF = Term Frequency IDF = Inverse Document Frequency (rare terms) Latent Semantic Indexing (LSI) term x document => “factor” x document #”factors” << # terms Enhancements: Thesaurus Feedback Processing Filtering
Outline Requirements Tracing and Information Retrieval Methods Metrics RETRO Experimental Results NASA Research information Technology Readiness Level Potential applications Ease of finding, or availability of, data or case studies Barriers to research or application Future work
Metrics N - number of low-level requirements; M - number of high-level requirments; Hits - number of correct candidate links Strikes - number of false positives Misses - number of missed links
Metrics N - number of low-level requirements; M - number of high-level requirments; Hits - number of correct candidate links Strikes - number of false positives Misses - number of missed links Precision = Hits Hits + Strikes Recall = Hits Hits +Misses
Metrics N - number of low-level requirements; M - number of high-level requirments; Hits - number of correct candidate links Strikes - number of false positives Misses - number of missed links Precision = Hits Hits + Strikes Recall = Hits Hits +Misses Selectivity = Hits + Strikes M * N
Metrics N - number of low-level requirements; M - number of high-level requirements; Hits - number of correct candidate links Strikes - number of false positives Misses - number of missed links Precision = Hits Hits + Strikes Recall = Hits Hits +Misses Selectivity = Hits + Strikes M * N AvgH = average relevance of Hits AvgS = average relevance of Strikes DiffR = AvgH - AvgS
Metrics N - number of low-level requirements; M - number of high-level requirments; Hits - number of correct candidate links Strikes - number of false positives Misses - number of missed links Precision = Hits Hits + Strikes Recall = Hits Hits +Misses Selectivity = Hits + Strikes M * N AvgH = average relevance of Hits AvgS = average relevance of Strikes DiffR = AvgH - AvgS Lag(Hit) = # Strikes for high-level requirement with Higher relevance Lag = average Lag(Hit) over all Hits
Metrics N - number of low-level requirements; M - number of high-level requirments; Hits - number of correct candidate links Strikes - number of false positives Misses - number of missed links Precision = Hits Hits + Strikes Recall = Hits Hits +Misses Selectivity = Hits + Strikes M * N Breakpoint = (threshold, Precision, Recall), s.t. Precision = Recall
Metrics Precision: signal - to - noise Recall: “coverage” Selectivity: improvement in # of comparisons vs. exhaustive search AvgH, AvgS, DiffR, Lag - separation between Hits and Strikes in candidate link lists Breakpoints - effects of filtering
Outline Requirements Tracing and Information Retrieval Methods Metrics RETRO Experimental Results NASA Research information Technology Readiness Level Potential applications Ease of finding, or availability of, data or case studies Barriers to research or application Future work
RETRO: REquirements TRacing On-target
RETRO Architecture documents Build Representation IR toolbox Filter Feedback processor Analyst
RETRO + SuperTracePlus requirements documents SFEP RETRO Build Representation RETRO IR Toolbox STP Interactive Link Anlysis RETRO Feedback STP Report Generation Traceability Reports A STP RETRO Analyst Review
Outline Requirements Tracing and Information Retrieval Methods Metrics RETRO Experimental Results NASA Research information Technology Readiness Level Potential applications Ease of finding, or availability of, data or case studies Barriers to research or application Future work
The Universe of Tests methodthesaurusthresholdfeedback TF-IDF LSI* Yes No Top 1 Top 2 Top 3 Top 4 [0.0…0.5] X XX * LSI: number of dimensions + – low-level documents – high+low-level documents – high-level, low-level documents separately
Datasets MODIS 20 high-level 49 low-level 41 true links CM-1 ~200 high-level ~300 low-level # true links - under construction
MODIS, TF-IDF, Thesaurus Top2 Feedback
MODIS, TF-IDF, Thesaurus Top2 Feedback Filtering at Iteration 0 Breakpoint
MODIS, TF-IDF, Thesaurus Top2 Feedback
Above 70%
MODIS, TF-IDF, No Thesaurus Top3, Feedback
MODIS, Comparing Feedback Traces
Above 70%
MODIS, Secondary Measures
Outline Requirements Tracing and Information Retrieval Methods Metrics RETRO Experimental Results NASA Research information Technology Readiness Level Potential applications Ease of finding, or availability of, data or case studies Barriers to research or application Future work
NASA Research Information Technology Readiness Level for RETRO Integrated with existing software system Engineering feasibility demonstrated Limited documentation available Most functionality available for demonstration and test Most software bugs removed Potential applications Tracing bug reports to code Identifying related/duplicate bug reports Ease of finding, or availability of, data or case studies Data available Issue is answerset Barriers to research or application Answerset availability IV&V analysts for human factors studies Publications Paper accepted to RE 2004 1 journal paper submitted, one in progress
Outline Requirements Tracing and Information Retrieval Methods Metrics RETRO Experimental Results NASA Research information Technology Readiness Level Potential applications Ease of finding, or availability of, data or case studies Barriers to research or application Future work
Next Steps, Conclusions, Plans, Ideas IR methods work : need to implement more Productize RETRO (Check!) Data Integration with existing tools (Check!) Other IV&V problems may be alleviated Study “human factors”