Presentation is loading. Please wait.

Presentation is loading. Please wait.

Digging for diamonds: Identifying valuable web automation programs in repositories Jarrod Jackson 1, Chris Scaffidi 2, Katie Stolee 2 1 Oregon State University.

Similar presentations


Presentation on theme: "Digging for diamonds: Identifying valuable web automation programs in repositories Jarrod Jackson 1, Chris Scaffidi 2, Katie Stolee 2 1 Oregon State University."— Presentation transcript:

1 Digging for diamonds: Identifying valuable web automation programs in repositories Jarrod Jackson 1, Chris Scaffidi 2, Katie Stolee 2 1 Oregon State University 2 University of Nebraska - Lincoln

2 2 Web scripts: Enabling users to enhance the browser IBM CoScripter Web Macro Problem  Approach  Evaluation  Conclusion

3 3 Web scripts: Enabling users to enhance the browser Yahoo Pipe Problem  Approach  Evaluation  Conclusion

4 4 Web scripts: Enabling users to enhance the browser GreaseMonkey UserScript Problem  Approach  Evaluation  Conclusion

5 5 Repositories of end-user code: The good, the great, and the “other” C. Bogart, et al. End-User Programming in the Wild: A Field Study of CoScripter Scripts. VL/HCC 2008. Previous study: Of 1445 web macros… ~ 10% had many runs ~ 10% had many users ~ 80% were “other” This is the largest web macro repository > 6000 users, > 3000 “public” scripts Problem  Approach  Evaluation  Conclusion

6 6 What if our repositories could… … omit pieces of code from search results if they are unlikely to be reused, anyway?... provide a UI for administrators to review (and remove?) old code that’s unlikely to be used? … advise programmers, when they upload code, about how to improve the reusability of their code? Problem  Approach  Evaluation  Conclusion

7 7 Needed: a model for predicting reuse Key questions for discovering such a model… –What information about the code indicates reusability? –How do we combine this information to predict reuse? Similar models have been successful on OO code –Predicting reuse based on coupling & cohesion –Predicting bugginess based on code complexity metrics, information about code authors, code churn, … Web scripts are much simpler (don’t call each other, don’t have inheritance, etc)… we need different information here. Problem  Approach  Evaluation  Conclusion

8 8 Prior work found 35 traits (in 8 categories) statistically related to reuse Mass appeal – eg popular keywords Language – eg data values are in English Annotations – eg comments Flexibility – eg parameterization (variables) Length – eg small # distinct lines of code Author information – eg early adopter? Advanced syntax – eg “control-click” keyword No Preconditions – eg no cookies needed All traits are computed automatically from one of four sources: executable code statements, URLs referenced, annotations, code history. Problem  Approach  Evaluation  Conclusion

9 9 Given a binary measure of reuse, for each trait –Find the threshold that optimally divides the reused scripts from the un-reused scripts Model that we developed (in words & pictures) Trait level Threshold Problem  Approach  Evaluation  Conclusion

10 10 Predicting if a macro will be reused Count how many predictors are satisfied Predict that the macro will be reused if this count exceeds some minimum –A tunable parameter –A higher minimum implies a higher bar that a script must overcome to be predicted as to be reused Fewer false positives, higher false negatives Problem  Approach  Evaluation  Conclusion

11 11Example E.g.: Suppose that our measure of reuse is “script is reused more than 75% of other scripts” Suppose that based on this measure of reuse, the best thresholds for four predictors are… comments ≥ 3 lines_of_code ≥ 40 prev_created ≥ 10 literals ≤ 4 The model would predict that some other script would satisfy the reuse measure criterion if the script satisfies at least n of these predictors Problem  Approach  Evaluation  Conclusion

12 12 How well does this approach work… … for different kinds of web scripts? … for different reuse measures? … when predicting future reuse based on past data? … when only a subset of traits are available? Problem  Approach  Evaluation  Conclusion

13 13 Scripts and measures for our evaluation Problem  Approach  Evaluation  Conclusion Measure cutoff

14 14 Accuracy varied little by measure or script type (e.g., TP ≥ 0.7 at FP = 0.4) Problem  Approach  Evaluation  Conclusion

15 15 Yahoo Pipe accuracy slipped a bit when using past to predict future Problem  Approach  Evaluation  Conclusion

16 16 Code-based traits gave nearly the full accuracy (History, URL, Annotations, Code) Problem  Approach  Evaluation  Conclusion

17 17Conclusions Model is equally accurate for a range of uses –And might only require code-based traits Problem  Approach  Evaluation  Conclusion

18 18 Conclusions and future work Model is equally accurate for a range of uses –And might only require code-based traits –But can we improve accuracy by using information available after reuse is attempted? –Can we also predict how happy people will be when reusing different pieces of code? And now to put the model to work… –Improving search engines –Providing UI for administrators to review macros –Giving programmers advice automatically Problem  Approach  Evaluation  Conclusion

19 19 Thank You To ICISA for this opportunity to present this paper Problem  Approach  Evaluation  Conclusion

20 20 So how do we separate the wheat from the chaff? Providing such features requires predicting whether code will ever be reused –Without relying on information that’s available after code is reused (“chicken and egg”) Ratings, reviews, etc… (For some features, of course, we can always add this information in later.) –With a fairly simple model for making predictions So that predictions can be explained to users Especially when we’re advising users about how to improve reusability of their code!!!!! Problem  Approach  Evaluation  Conclusion


Download ppt "Digging for diamonds: Identifying valuable web automation programs in repositories Jarrod Jackson 1, Chris Scaffidi 2, Katie Stolee 2 1 Oregon State University."

Similar presentations


Ads by Google