Modeling Programmer Navigation

Modeling Programmer Navigation
A head-to-head empirical evaluation of predictive models Thank you. I’d like to talk to you about modeling programmer navigation, specifically in the context of predictive models. Next: Let’s get started David Piorkowski 1, Scott D. Fleming 1, Christopher Scaffidi 1, Liza John 2, Christopher Bogart 1, Bonnie E. John 2,3, Margaret Burnett 1, Rachel Bellamy 3 1: Oregon State University; 2: Carnegie Mellon University, 3: IBM Research TJ Watson

Problem Developers need to gather lots of information
Feature mapping Code relationships Defect causes Problem: Lots of Navigation 35% of time But this gathering of information can be costly, resulting in lots of navigation 35% of their time debugging Next: This is a well known problem and… Introduction Experiment Single-Factor Multi-Factor Conclusion 2

Question: Which models are best and when?
Tools for Navigation Many tools exist to help developers navigate How they work Model programmer navigation Emphasize particular factor Provide shortcuts Question: Which models are best and when? Tools such as Team Tracks, Mylyn and others So how do these tools work? Tools essentially are trying to predict where programmers want to navigate and make it easier for them to get there. Contain IMPLICIT models of programmer navigation, that identifies code that the programmer is likely to need to visit and provide shortcuts to these places If we dig a little deeper these models often emphasize a particular factor Factor, for example how frequently code is visited These tools use a variety of factors and many of them succeed but what we want to know is … Next: So to look at this question… Introduction Experiment Single-Factor Multi-Factor Conclusion 3

Empirical Evaluation Part 2: Multi-Factor Models
Part 1: Single-Factor Models Comparison between models Successful contexts Part 2: Multi-Factor Models Combining Factors We ran an empirical evaluation in two parts Next: What did we evaluate Introduction Experiment Single-Factor Multi-Factor Conclusion

Factors Evaluated Within-File Distance Recency Call Depth Working Set
Based on Previous Navigations Based on Cost of Navigation Within-File Distance Call Depth Source Topology Recency Working Set Frequency Based on Lexical Similarity Surveyed the literature and pulled out the key factors we used to predict where programmers want to go. We were able to group factors into three categories Based on what the programmer has done in the past Recency: takes into account sequence of the programmer’s past navigations (Sequence) Working Set: similar to Recency in that it took advantage of programmer navigation history, but used a limited set of recent navigations Frequency: Which takes into account how often places were visited How much effort it takes to navigate to pieces of code Within-File Distance: Proximity of the code within the same file, For example: A, B, C in same file Call Depth: Based on method call relationships in the code Source Topology: Extends call stack idea to include other hierarchical relationships such as inheritance Based on text similarity metrics Bug Report Similarity: Comparison of the bug report text to the code Next: To evaluate these factors we collected navigation data from actual human Bug Report Similarity Introduction Experiment Single-Factor Multi-Factor Conclusion 5

Experiment One participant, senior in Computer Science
3 years programming experience Two Debugging Tasks in Eclipse jEdit session Memoranda session Data Collected Navigation Logs Video Recording jEdit: A programmer’s text editor for Java Memoranda: A diary manager and scheduling tool Next: Let’s quickly look at both sessions in more detail Introduction Experiment Single-Factor Multi-Factor Conclusion

Experiment Bugs Session 1: jEdit
BUG: Problem with character-offset counter. In the lower left corner of the jEdit window, there are two counters that describe the position of the text cursor. The first counter gives the number of the line that cursor is on. The second counter gives the character offset into the line. The character-offset counter is broken. When the cursor is at the beginning of a line (i.e., before the first character in the line), jEdit shows the offset as 1. However, the offset should begin counting from 0. Thus, when the cursor is at the end of the line, it will display the number of characters in the line rather than the number of characters plus 1. Off by one error in the character offset counter Successfully debugged in 29 minutes Introduction Experiment Single-Factor Multi-Factor Conclusion

Experiment Bugs Session 2: Memoranda
BUG: Note lost when switching projects - ID: Details: When a note A (date A) for project A is active, switch project. Note B (date A) in project B is totally replaced by note A and note B is LOST forever! note B is the first note of project B on date A. Additional Comments: This only happens if note A and note B are of the same date Data is overwritten during a context switch Debugged in 85 minutes, but introduced a new defect. Next: Taking the data collected and the factors previously described… Introduction Experiment Single-Factor Multi-Factor Conclusion

How We Tested Each Factor
Created executable, predictive models Input: Participant’s navigation set Evaluated each model’s ability to predict the next navigation A model for each factor Navigation is a bit of a fuzzy word, so to be more specific: Introduction Experiment Single-Factor Multi-Factor Conclusion

How We Tested Each Factor
Method-to-method navigations Logging Plug-In for Eclipse Captures Text-Cursor Position Click-Based Navigations Logging Plug-in built into the Eclipse environment Everything that went past the participants eyes such as scrolling 676 navigations vs 123 navigations over both sessions Next: To Recap, we analyzed two things, how successful each model/factor is and how the different operationalizations of navigations affects predictive ability Screen Recording Captures What the Participant sees View-Based Navigations Introduction Experiment Single-Factor Multi-Factor Conclusion

Click-Based Hit Ratio Click-Based Navigations
Hit Ratio Percentage of correct predictions for a given N Click-Based Navigations N The number of predictions to consider Next: I’ll explain what these mean in just a moment Introduction Experiment Single-Factor Multi-Factor Conclusion

Click-Based Hit Ratio Introduction Experiment Single-Factor
Recency Working Set What we’re looking at here is the combined results across both sessions If we look at the Recency model’s top 3 predictions, 36% of the time… Working set’s similarity to Working Set was helpful, but it wasn’t able to consider as many past navigations as rececny Introduction Experiment Single-Factor Multi-Factor Conclusion

Recency Working Set First surprise Popular choice among tools Bug Report Similarity Introduction Experiment Single-Factor Multi-Factor Conclusion

Recency Working Set Frequency Parnin and Gorg Bug Report Similarity Introduction Experiment Single-Factor Multi-Factor Conclusion

Recency Working Set Frequency Within-File Distance Next: Wanted to analyze in what contexts models did well Undirected Call Depth Source Topology Bug Report Similarity Directed Call Depth Introduction Experiment Single-Factor Multi-Factor Conclusion

Click-Based Hit Ratio by Context
Navigation Action Context Number of actions Single-Factor Models Recency Working Set (Δ=N) Frequency Bug Report Similarity Within-File Distance Forward Call Depth Undirected Call Depth Source Topology Overall 123 54% 49% 36% 4% 27% 1% 11% 8% Debug view 45 62% 60% 33% 0% 38% 9% Java Editor tabs 32 69% 53% 50% 3% 22% Java Editor 23 52% 39% 57% Package Explorer view 10 10% 40% Call Hierarchy view 7 14% 29% Find Utility 5 20% Java Outline view 1 100% Java Perspective 75 48% 43% 7% 19% 15% Debug Perspective 48 63% 58% 35% Recency 54% 62% 69% 52% 10% 14% 40% 0% 48% 63% Java Editor tabs 32 69% 53% 50% 3% 0% 22% When each model is allowed to look at its top ten I’ll just highlight a couple of results Does well with tab management actions in Eclipse, boosted overall score Debug Perspective Debug Perspective 48 63% 58% 35% 0% 40% 4% 8% Introduction Experiment Single-Factor Multi-Factor Conclusion

View-Based Navigations
View-Based Hit Ratio View-Based Navigations Introduction Experiment Single-Factor Multi-Factor Conclusion

View-Based Hit Ratio Introduction Experiment Single-Factor
Within-File Distance Recency (Click-based) SCROLLING!! Introduction Experiment Single-Factor Multi-Factor Conclusion

Within-File Distance Recency (Click-based) Recency Introduction Experiment Single-Factor Multi-Factor Conclusion

Within-File Distance Recency (Click-based) Recency Within-File Distance (Click Based) Introduction Experiment Single-Factor Multi-Factor Conclusion

Within-File Distance Source Topology Recency Introduction Experiment Single-Factor Multi-Factor Conclusion

Within-File Distance Source Topology Recency Bug Report Similarity Introduction Experiment Single-Factor Multi-Factor Conclusion

Within-File Distance Source Topology Recency Working Set Frequency Undirected Call Depth Bug Report Similarity Directed Call Depth Introduction Experiment Single-Factor Multi-Factor Conclusion

View-Based Hit Ratio by Context
Navigation Action Context Number of actions Single-Factor Models Recency Working Set (Δ=N) Frequency Bug Report Similarity Within-File Distance Forward Call Depth Undirected Call Depth Source Topology Overall 676 45% 35% 21% 7% 87% 4% 15% 72% Debug view 46 59% 57% 11% 0% 37% 2% 22% 13% Java Editor tabs 33 55% 39% 6% Java Editor 574 44% 34% 8% 99% 16% 83% Package Explorer view 10 10% 30% Call Hierarchy view 7 29% Find Utility 5 20% 40% 60% Java Outline view 1 Java Perspective 559 9% 89% 3% 76% Debug Perspective 117 69% 64% 14% 74% 5% 52% Within-File Distance 87% 37% 0% 99% 60% 89% 74% Java perspective code editor is large by default, more to see Debug perspective code editor much smaller Didn’t point it out, Recency and Within-File distance swap in accuracy compared to the click-based context Next: Concludes Part 1 of the Empirical Evaluation Java Perspective 559 40% 29% 22% 9% 89% 3% 13% 76% Introduction Experiment Single-Factor Multi-Factor Conclusion

Empirical Evaluation Part 2: Multi-Factor Models
Part 1: Single-Factor Models Comparison between models Successful contexts Part 2: Multi-Factor Models Combining Factors Next: Look at composing models, perhaps the first question is what is the best we can do? Introduction Experiment Single-Factor Multi-Factor Conclusion

Optimal Composite Model
Optimal Composite Model – scored a hit if any of its models had a correct prediction Recency Source Topology Multi-Factor Frequency Introduction Experiment Single-Factor Multi-Factor Conclusion

Click-Based Optimal Composites
Composite Size Recency Frequency Within-File Distance Bug Report Similarity Source Topology Hit Ratio (N=10) Diff. with Recency (53.7%) 2 63.4% +9.8% 3 66.7% +13% 4 67.5% +13.8% 5 68.3% +14.6% Recency Frequency Within-File Distance Bug Report Similarity Source Topology Diff. with Recency (53.7%) +9.8% +13% +13.8% +14.6% 2 63.4% +9.8% ✓ 3 66.7% +13% Receny and Within-File Distance 2 showed improvement, also 3 Diminishing returns after that ✓ Introduction Experiment Single-Factor Multi-Factor Conclusion

View-Based Optimal Composites
Composite Size Recency Frequency Within-File Distance Bug Report Similarity Source Topology Hit Ratio (N=10) Diff. with Within-File Distance (86.7%) 2 91.9% +5.2% 3 92.9% +6.2% 4 93.3% +6.7% 5 93.6% +7.0% Recency Frequency Within-File Distance Bug Report Similarity Source Topology Diff. with Within-File Distance (86.7%) +5.2% +6.2% +6.7% +7.0% 2 91.9% +5.2% 3 92.9% +6.2% ✓ Same 2-combo, some improvement, not as much, again diminishing returns Next: Now that we’ve shed some light on a potential best case, let’s take a look at an existing predictive model ✓ Introduction Experiment Single-Factor Multi-Factor Conclusion

A Composite Model: PFIS3
Our existing predictive model Information Foraging Theory PFIS3 Base Model Source Code Topology and Words Combinable Factors Recency Bug Report Introduction Experiment Single-Factor Multi-Factor Conclusion

Click-Based Hit Ratio – PFIS3
Recency (Single-Factor) Introduction Experiment Single-Factor Multi-Factor Conclusion

PFIS3 + Recency Recency (Single-Factor) Introduction Experiment Single-Factor Multi-Factor Conclusion

PFIS3 + Recency Recency (Single-Factor) PFIS3 + Recency + Bug PFIS3 Can composite models do better? PFIS3 + Bug Introduction Experiment Single-Factor Multi-Factor Conclusion

View-Based Hit Ratio – PFIS3
Within File-Distance (Single-Factor) Introduction Experiment Single-Factor Multi-Factor Conclusion

Within File-Distance (Single-Factor) PFIS3 Introduction Experiment Single-Factor Multi-Factor Conclusion

Within File-Distance (Single-Factor) PFIS3 PFIS3 + Recency PFIS3 + Recency + Bug PFIS3 + Bug Introduction Experiment Single-Factor Multi-Factor Conclusion

Conclusions Recency most accurate in Click-Based Navigations
Scrolling navigations were very common Low accuracy of Bug Report Similarity Opportunities to Compose Models PFIS3 + Recency outperformed a single-factor model Scrolling under represented Models underlying accuracy changed when it was included Depending on the tool’s intent this may or may not be useful Opportunities, Recency and Within-File Distance, in both our operationalizations, diminishing returns since they covered similar navigations Nice result for our predictive model, but highlights that combining models is not trivial or easy to achieve Introduction Experiment Single-Factor Multi-Factor Conclusion

Modeling Programmer Navigation

Similar presentations

Presentation on theme: "Modeling Programmer Navigation"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Modeling Programmer Navigation

Similar presentations

Presentation on theme: "Modeling Programmer Navigation"— Presentation transcript:

Similar presentations

About project

Feedback