Download presentation
Presentation is loading. Please wait.
Published byDortha Welch Modified over 9 years ago
1
Digging for diamonds: Identifying valuable web automation programs in repositories Jarrod Jackson 1, Chris Scaffidi 2, Katie Stolee 2 1 Oregon State University 2 University of Nebraska - Lincoln
2
2 Web scripts: Enabling users to enhance the browser IBM CoScripter Web Macro Problem Approach Evaluation Conclusion
3
3 Web scripts: Enabling users to enhance the browser Yahoo Pipe Problem Approach Evaluation Conclusion
4
4 Web scripts: Enabling users to enhance the browser GreaseMonkey UserScript Problem Approach Evaluation Conclusion
5
5 Repositories of end-user code: The good, the great, and the “other” C. Bogart, et al. End-User Programming in the Wild: A Field Study of CoScripter Scripts. VL/HCC 2008. Previous study: Of 1445 web macros… ~ 10% had many runs ~ 10% had many users ~ 80% were “other” This is the largest web macro repository > 6000 users, > 3000 “public” scripts Problem Approach Evaluation Conclusion
6
6 What if our repositories could… … omit pieces of code from search results if they are unlikely to be reused, anyway?... provide a UI for administrators to review (and remove?) old code that’s unlikely to be used? … advise programmers, when they upload code, about how to improve the reusability of their code? Problem Approach Evaluation Conclusion
7
7 Needed: a model for predicting reuse Key questions for discovering such a model… –What information about the code indicates reusability? –How do we combine this information to predict reuse? Similar models have been successful on OO code –Predicting reuse based on coupling & cohesion –Predicting bugginess based on code complexity metrics, information about code authors, code churn, … Web scripts are much simpler (don’t call each other, don’t have inheritance, etc)… we need different information here. Problem Approach Evaluation Conclusion
8
8 Prior work found 35 traits (in 8 categories) statistically related to reuse Mass appeal – eg popular keywords Language – eg data values are in English Annotations – eg comments Flexibility – eg parameterization (variables) Length – eg small # distinct lines of code Author information – eg early adopter? Advanced syntax – eg “control-click” keyword No Preconditions – eg no cookies needed All traits are computed automatically from one of four sources: executable code statements, URLs referenced, annotations, code history. Problem Approach Evaluation Conclusion
9
9 Given a binary measure of reuse, for each trait –Find the threshold that optimally divides the reused scripts from the un-reused scripts Model that we developed (in words & pictures) Trait level Threshold Problem Approach Evaluation Conclusion
10
10 Predicting if a macro will be reused Count how many predictors are satisfied Predict that the macro will be reused if this count exceeds some minimum –A tunable parameter –A higher minimum implies a higher bar that a script must overcome to be predicted as to be reused Fewer false positives, higher false negatives Problem Approach Evaluation Conclusion
11
11Example E.g.: Suppose that our measure of reuse is “script is reused more than 75% of other scripts” Suppose that based on this measure of reuse, the best thresholds for four predictors are… comments ≥ 3 lines_of_code ≥ 40 prev_created ≥ 10 literals ≤ 4 The model would predict that some other script would satisfy the reuse measure criterion if the script satisfies at least n of these predictors Problem Approach Evaluation Conclusion
12
12 How well does this approach work… … for different kinds of web scripts? … for different reuse measures? … when predicting future reuse based on past data? … when only a subset of traits are available? Problem Approach Evaluation Conclusion
13
13 Scripts and measures for our evaluation Problem Approach Evaluation Conclusion Measure cutoff
14
14 Accuracy varied little by measure or script type (e.g., TP ≥ 0.7 at FP = 0.4) Problem Approach Evaluation Conclusion
15
15 Yahoo Pipe accuracy slipped a bit when using past to predict future Problem Approach Evaluation Conclusion
16
16 Code-based traits gave nearly the full accuracy (History, URL, Annotations, Code) Problem Approach Evaluation Conclusion
17
17Conclusions Model is equally accurate for a range of uses –And might only require code-based traits Problem Approach Evaluation Conclusion
18
18 Conclusions and future work Model is equally accurate for a range of uses –And might only require code-based traits –But can we improve accuracy by using information available after reuse is attempted? –Can we also predict how happy people will be when reusing different pieces of code? And now to put the model to work… –Improving search engines –Providing UI for administrators to review macros –Giving programmers advice automatically Problem Approach Evaluation Conclusion
19
19 Thank You To ICISA for this opportunity to present this paper Problem Approach Evaluation Conclusion
20
20 So how do we separate the wheat from the chaff? Providing such features requires predicting whether code will ever be reused –Without relying on information that’s available after code is reused (“chicken and egg”) Ratings, reviews, etc… (For some features, of course, we can always add this information in later.) –With a fairly simple model for making predictions So that predictions can be explained to users Especially when we’re advising users about how to improve reusability of their code!!!!! Problem Approach Evaluation Conclusion
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.