ADAM CZAUDERNA, MAREK GIBIEC, GREG LEACH, YUBIN LI YONGHEE SHIN (PRESENTER), ED KEENAN, JANE CLELAND-HUANG TEFSE’11, 23, MAY, 2011.

Slides:



Advertisements
Similar presentations
QuEdge Testing Process Delivering Global Solutions.
Advertisements

Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
A Vector Space Model for Automatic Indexing
Chapter 5: Introduction to Information Retrieval
Lecture 11 Search, Corpora Characteristics, & Lucene Introduction.
Exploring the Neighborhood with Dora to Expedite Software Maintenance Emily Hill, Lori Pollock, K. Vijay-Shanker University of Delaware.
Introduction Information Management systems are designed to retrieve information efficiently. Such systems typically provide an interface in which users.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.
DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT MAYURI UMRANIKAR.
Modern Information Retrieval
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
Sigir’99 Inside Internet Search Engines: Search Jan Pedersen and William Chang.
The Vector Space Model …and applications in Information Retrieval.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Online Learning for Web Query Generation: Finding Documents Matching a Minority Concept on the Web Rayid Ghani Accenture Technology Labs, USA Rosie Jones.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Important Task in Patents Retrieval Recall is an Important Factor Given Query Patent -> the Task is to Search all Related Patents Patents have Complex.
Evaluation of N-grams Conflation Approach in Text-based Information Retrieval Serge Kosinov University of Alberta, Computing Science Department, Edmonton,
Query Operations: Automatic Global Analysis. Motivation Methods of local analysis extract information from local set of documents retrieved to expand.
Chapter 5: Information Retrieval and Web Search
“A Comparison of Document Clustering Techniques” Michael Steinbach, George Karypis and Vipin Kumar (Technical Report, CSE, UMN, 2000) Mahashweta Das
INFORMATION RETRIEVAL VECTOR SPACE MODEL IN-DEPTH PART 2 Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics.
Automated Diagnosis of Software Configuration Errors
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
Multi-Prototype Vector Space Models of Word Meaning __________________________________________________________________________________________________.
Samhaa R. El-Beltagy, Wendy Hall, David De Roure, and Leslie Carr Intelligence, Agents, Multimedia Department of Electronics and Computer Science University.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Enrique Alba, Sebastián Luna, and Jamal ToutouhAccuracy and Efficiency in Simulating VANETs MCO’08 Metz, France September 8-10 th, 2008 MCO’08 Metz, France.
Chapter 6: Information Retrieval and Web Search
Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.
How to improve quality control in a data conversion process? By extended usage of metadata! Dimitri Kutsenko Entimo AG - Berlin/Germany.
Evaluation of Agent Building Tools and Implementation of a Prototype for Information Gathering Leif M. Koch University of Waterloo August 2001.
Semantic Wordfication of Document Collections Presenter: Yingyu Wu.
Web- and Multimedia-based Information Systems Lecture 2.
ICT EMMSAD’05 13/ Assessing Business Process Modeling Languages Using a Generic Quality Framework Anna Gunhild Nysetvold* John Krogstie *, § IDI,
1 A Formal Study of Information Retrieval Heuristics Hui Fang, Tao Tao and ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Multi-Abstraction Concern Localization Tien-Duy B. Le, Shaowei Wang, and David Lo School of Information Systems Singapore Management University 1.
CASE (Computer-Aided Software Engineering) Tools Software that is used to support software process activities. Provides software process support by:- –
Accurate Robot Positioning using Corrective Learning Ram Subramanian ECE 539 Course Project Fall 2003.
Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval O. Chum, et al. Presented by Brandon Smith Computer Vision.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Improving the performance of personal name disambiguation.
Motivation  Methods of local analysis extract information from local set of documents retrieved to expand the query  An alternative is to expand the.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
Sudhanshu Khemka.  Treats each document as a vector with one component corresponding to each term in the dictionary  Weight of a component is calculated.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Robust Requirements Tracing Via Internet Tech:Improving an IV&V Technique SAS 2004July 20, 2004 Alex Dekhtyar Jane Hayes Senthil Sundaram Ganapathy Chidambaram.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Enhancing Text Clustering by Leveraging Wikipedia Semantics.
Information Retrieval and Web Search IR models: Vector Space Model Term Weighting Approaches Instructor: Rada Mihalcea.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
Automated Information Retrieval
Julián ALARTE DAVID INSA JOSEP SILVA
Guangbing Yang Presentation for Xerox Docushare Symposium in 2011
Block Matching for Ontologies
Automatic Global Analysis
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Presentation transcript:

ADAM CZAUDERNA, MAREK GIBIEC, GREG LEACH, YUBIN LI YONGHEE SHIN (PRESENTER), ED KEENAN, JANE CLELAND-HUANG TEFSE’11, 23, MAY, 2011

MOTIVATION Manual creation of traceability links is a time and effort consuming task. Vector space model has been used to automatically create traceability links based on term frequency, but have room for improvement. There is no standard framework for traceability experiments, making difficult to perform repeated studies on additional data sets and comparison between traceability techniques Individual researchers have to invest time and effort to construct their own experimental environment. 2/10

GRAND CHALLENGES C-GC1: Develop link recovery techniques for textual artifacts that are at least as accurate as manual processes and are much more time and cost effective. L-GC1: Define standard processes and related procedures for performing empirical studies during traceability research. L-GC2: Build benchmarks for evaluating traceability methods and techniques. 3/10

C-GC1: VECTOR SPACE MODEL USING GLOBAL IDF Vector Space Model Computes the similarity between source artifacts and target artifacts using frequency of terms. Each source and target artifact are represented as a vector of weighted terms for the terms in source artifacts. TF-IDF weighting method: Term frequency (TF)*Inverse document frequency (IDF) Similarity is computed as consign of the angle between the two artifacts. Local IDF: Compute IDF from target artifacts (project specific) Global IDF: Compute IDF from general corpus 4/10

L-GC1/L-GC2: TRACELAB An experimental workbench designed for modeling and executing traceability experiments Graphically model and execute a traceability experiment as a workflow of components Provide basic components for experiments and can be extended using TraceLab SDK. 5/10

EXPERIMENT Goal Compare the quality of traceability links generated using the local IDF and global IDF values from the experiment using TraceLab Documents to obtain global IDF American National Corpus (ANC) Data Sets Source (#)Target (#)# of traces CM1-Subset1High level req. (22)Low level req. (53)45 EasyClinicUse cases (30)Classes (47)93 E-TourUse cases (58)Java code (116)327 WV-CCHITRequirements (116)Regulatory requirements (1064) 587 6/10

EXPERIMENT MODELING ON TRACELAB 7/10

RESULTS : AVERAGE PRECISION Data SetsLocal IDFGlobal IDF CM1-Subset Easy Clinic E-Tour WV-CCHIT Average Precision Average of precision at each retrieval of documents Returns a higher value when more relevant documents are retrieved earlier 8/10

RESULTS : INTERPOLATED PRECISION / ROC 9/10

CONCLUSIONS Local IDF approach outperforms global IDF approach especially for the data sets that contain non-english languages and many technical terms. TraceLab can effectively support to model and execute traceability experiments. 10/10

THANK YOU! 11/10