Learn to Predict “Affecting Changes” in Software Engineering Xiaoxia Ren Dec. 8, 2003.

Slides:



Advertisements
Similar presentations
Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise.
Advertisements

Section 1-4 Data Collection & Sampling Techniques.
CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
 Popular Technology Blog.  Has articles on:  Tech Industry trends  Start ups  New technologies  Expert Opinions 2.
ICCS 2009 IDB Seminar – Nov 24-26, 2010 – IEA DPC, Hamburg, Germany Using the IEA IDB Analyzer to merge and analyze data.
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid Using the IEA IDB Analyzer to merge and analyze data.
Sponsored by the U.S. Department of Defense © 2005 by Carnegie Mellon University 1 Pittsburgh, PA Dennis Smith, David Carney and Ed Morris DEAS.
Active Learning and Collaborative Filtering
Multiple Instance Learning
ARE YOU STUCK? by Jason Carter and Prasun Dewan Classification via Clustering Model Stuck Making Progress Stuck6016 Making Progress Bottom Up Component.
Statistics for the Social Sciences Psychology 340 Fall 2006 Hypothesis testing.
Parameterizing Random Test Data According to Equivalence Classes Chris Murphy, Gail Kaiser, Marta Arias Columbia University.
Statistics for the Social Sciences Psychology 340 Spring 2005 Hypothesis testing.
Jeremy Wyatt Thanks to Gavin Brown
CHURN PREDICTION MODEL IN RETAIL BANKING USING FUZZY C- MEANS CLUSTERING Džulijana Popović Consumer Finance, Zagrebačka banka d.d. Consumer Finance, Zagrebačka.
Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.
1 Prediction of Software Reliability Using Neural Network and Fuzzy Logic Professor David Rine Seminar Notes.
EVALUATION David Kauchak CS 451 – Fall Admin Assignment 3 - change constructor to take zero parameters - instead, in the train method, call getFeatureIndices()
1 NASA OSMA SAS02 Software Reliability Modeling: Traditional and Non-Parametric Dolores R. Wallace Victor Laing SRS Information Services Software Assurance.
Testing. Definition From the dictionary- the means by which the presence, quality, or genuineness of anything is determined; a means of trial. For software.
University of Kansas Electrical Engineering Computer Science Jerry James and Douglas Niehaus Information and Telecommunication Technology Center Electrical.
An Evaluation of A Commercial Data Mining Suite Oracle Data Mining Presented by Emily Davis Supervisor: John Ebden.
Cost-Sensitive Bayesian Network algorithm Introduction: Machine learning algorithms are becoming an increasingly important area for research and application.
Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp )
Scientific Notation. Example 1: 1000 Example 2:
SOFTWARE Research Data Management. Research Data Management Software Getting Started with Research Data Management When choosing software; Is it unique.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
Experiments in Machine Learning COMP24111 lecture 5 Accuracy (%) A BC D Learning algorithm.
Experimental Evaluation of Learning Algorithms Part 1.
CISC Machine Learning for Solving Systems Problems Presented by: Alparslan SARI Dept of Computer & Information Sciences University of Delaware
Ensemble Based Systems in Decision Making Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: IEEE CIRCUITS AND SYSTEMS MAGAZINE 2006, Q3 Robi.
Introduction Background  Previous education research points to various “persistence factors” such as abilities, motivation, time constraints, self-regulated.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.
An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge Version 0.1 ?
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
Practical Issues of Classification Underfitting and Overfitting –Training errors –Generalization (test) errors Missing Values Costs of Classification.
Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical.
Group ID: 19 ZHU Wenya & LIN Dandan Predicting student performance from book-borrowing records.
USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.
WEKA Machine Learning Toolbox. You can install Weka on your computer from
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
CS378 Final Project The Netflix Data Set Class Project Ideas and Guidelines.
An Exercise in Machine Learning
Evaluating Classifiers Reading: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)An introduction to ROC analysis.
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Nawanol Theera-Ampornpunt, Seong Gon Kim, Asish Ghoshal, Saurabh Bagchi, Ananth Grama, and Somali Chaterji Fast Training on Large Genomics Data using Distributed.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Biology What is Life? The Scientific Method Ethics in Science.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Experience Report: System Log Analysis for Anomaly Detection
Transfer Learning in Astronomy: A New Machine Learning Paradigm
Erasmus University Rotterdam
NBA Draft Prediction BIT 5534 May 2nd 2018
Features & Decision regions
Machine Learning Week 1.
TED Talks – A Predictive Analysis Using Classification Algorithms
Weka Package Weka package is open source data mining software written in Java. Weka can be applied to your dataset from the GUI, the command line or called.
Experiments in Machine Learning
Chapter 9 – Software Evolution and Maintenance
Classification Breakdown
Hierarchical, Perceptron-like Learning for OBIE
Evaluation David Kauchak CS 158 – Fall 2019.
Presentation transcript:

Learn to Predict “Affecting Changes” in Software Engineering Xiaoxia Ren Dec. 8, 2003

Background Software systems evolve over time To adapt to environment To add desired functionality Graceful software evolution requires Only expected functionality changes occur How to guarantee the correctness? Unit or regression tests

Motivation Given: Two versions of a program, before and after editing A unit test compilable in both versions Question: For a specified edit (change), could we predict whether it affects the behavior of this unit test or not?

Data Format

Experimental Setup Data collected from Daikon System over year 2002 Each week labeled as a version Dataset was split at month boundary and each dataset included at least 800 instances Prediction based on cumulative data

Accuracy Comparison

True Positive Rate for “yes” class

Precision for “yes” class

Data Analysis and Some Thoughts Why TP and Precision rate so low? Change activity on Daikon was quite diverse. The distribution of training data is different from test data. The percentage of “yes” instance is too small (5.9%) Some Thoughts How about a project in a stable stage? Try other ML algorithms?

Q & A?