Presentation is loading. Please wait.

Presentation is loading. Please wait.

Oops…. andrian MSR’13.

Similar presentations

Presentation on theme: "Oops…. andrian MSR’13."— Presentation transcript:

1 Oops…. andrian MSR’13

2 Inevitable, due to the complexity &novelty of our work (But rarely reported, which is…. suspicious) What can we learn from those mistakes? 2

3 An MSR’13 paper: Cross-company learning Can “Us” can learn from “them”? Provided “us” selects right data from “them” –Relevancy filtering: [Turhan09] (and any others) –Selection guided by structure of “us” If “we” is small and “them” is many: –Selection guided using kernel functions learned from “them” –Result #1: out-performed [Turhan09]. Result #2: Result #1 was a coding error 3

4 Houston, we have a problem Mar 15: paper accepted to MSR – “Better cross-company defect prediction” Mar 29: camera-ready submitted, ?Apr 10: pre-prints go on-line April 29: Hyeongmin Jeon, graduate student at Pusan Natl. Univ., –Emailed us: can’t reproduce result May 4: Peters, checking code, found error –Manic week of experiments …. May11: results definitely wrong –Emails to MSR organizers 4 Btw, < 3 weeks. Wow…

5 Coding error Distance between test & training instance –Remove classes –Ran a distance function –Re-inserted the classes But…. bad re-insert –Used the training class –Not the test class 5

6 Pull the paper? In the internet age, is that even possible? –X people now have local copies of that paper –Which Google might easily stumble across Old pre-print, found May 15 Old pre-print, found May 15 6

7 Authors: report your mistakes, openly and honestly We need to expect, allow, papers with sections: “clarifications”, “errata”, “retractions” E.g. Murphy-Hill, Parnin, Black. IEEE TSE, Jan 2012 7

8 Conference organizers: encourage research honesty Need CFPs with text that encourages Repeating and testing and challenging old results 8

9 Researchers: Share data, check each other’s conclusions Reinhart & Rogoff [2010] –“countries with debt over 90% of GDP suffer notably lower economic growth.” Thomas Herndon, 3 rd year Ph.D. U.Mass. –Unable to replicate with publicly available data, –Asked Reinhart & Rogoff for their data –Got it (Their spreadsheet) –Found errors in data on economic growth vs debt levels. A triumph for open science –Sadly, reported in media as grave mistake –E.g. –Immature view of the nature of science 9

10 Supervisors : encourage a culture of research honesty What will you tell others about this paper? –A failure? Or a success of the open science method? –Its up to you but understand the implications If we don’t let grad students report mistakes –Then they won’t Students graduate, Leave you, The error emerges And you are left with with the problem 10

11 Specific lessons Data mining experiments are complex software prototypes –Version control (of code and data) –Code inspections –Trap and log your random number seeds –Rewrite data rarely Pull out the class, process, put it back? Fuhgeddaboudit Have data headers of different types –So (say) distance measures can skip over classes 11 The above error does not effect Peters & Menzies ICSE’12 and TSE’13

12 Open access science Repeatable, improvable, –and sometimes even refutable We should not celebrate the failed paper But we should celebrate –The open science community that finds such errors MSR, PROMISE, etc –The grad students that struggle to reproduce results Hyeongmin Jeon –The integrity of grad students whose first response on finding an error was to report it Fayola Peters 12

13 Was this a “useful” mistake? Is this insight within this mistake? What does it mean if using more experience makes the defect predictor worse? International workshop on Transfer Learning in Software Engineering –Nov, ASE’13 13

14 14

Download ppt "Oops…. andrian MSR’13."

Similar presentations

Ads by Google