Using Visual Symptoms for Debugging Presentation Failures in Web Applications Sonal Mahajan, Bailan Li, Pooyan Behnamghader, William G. J. Halfond University of Southern California Los Angeles, California, USA Work supported by NSF Grant CCF-1528163
Background Information What do we mean by presentation? “Look and feel” of the website in a browser What is a presentation failure? Web page rendering ≠ expected appearance Importance of presentation Aesthetics impact users’ evaluation [Tractinsky et. al. 2006] Impacts trustworthiness and usability [Lindgaard et. al. 2011]
Usage Scenario Regression Debugging – modify current version of the web page to correct a bug or refactor the HTML structure. <table> <div> Menu | Contact <tr> <td> <div> News ---------- Username <tr> <td> <td> <div> <div> Password Sign in <tr> <td> <div> About us | Feedback| FAQ Web page Table-based layout Div-based layout
Usage Scenario – Difficulties Menu | Contact Menu | Contact News ---------- Username News ---------- Username Password Problem1 Password Sign in Sign in About us | Feedback| FAQ About us | Feedback| FAQ Problem2 Oracle (Previous version) Test web page Developer
Usage Scenario – Difficulties Analyze the observed differences Menu | Contact Menu | Contact News ---------- Username News ---------- Username Explore the UI to find the fault Password Password Sign in Sign in About us | Feedback| FAQ Background color of “Sign in” button About us | Feedback| FAQ Oracle (Previous version) Test web page Developer
Usage Scenario – Difficulties Analyze the observed difference Menu | Contact Menu | Contact Manual debugging is difficult Complex interaction between HTML, CSS, and JS Hundreds of HTML elements + CSS properties Makes labor intensive and error prone Prior user study [Mahajan et. al. ICST 2015] Correct fault identified in only 36% test cases! News ---------- Username News ---------- Username Explore the UI to find the fault Password Password Sign in Sign in About us | Feedback| FAQ Background color of “Sign in” button About us | Feedback| FAQ Oracle (Previous version) Test web page Developer
Limitations of Existing Techniques DOM comparison techniques (e.g., XBT) Not effective if DOM has changed significantly Invariant specification techniques (e.g., Selenium) Not practical, since all correctness properties need to be provided Fighting layout bugs Checks app independent problems only Our approach – Automate debugging of presentation failures
Color related presentation failure Three Key Insights Visual differences can help diagnosis Visual symptoms Sign in Oracle Color related presentation failure Sign in Test page
Definition Visual symptom – boolean predicate describing the visual difference clues to the fault Visual Symptoms CSS Properties 1. Almost matched element margin-top, padding-top, etc. 2. Shift bottom element margin-top, margin-left, etc. 3. Page size changed height, width, padding, etc. 4. Added color background-color, color, etc. 23. All diff. pixels in top of the element padding-top, border-top-width, etc. Almost matched element – only position changed – e.g.: margin-top – sub-image searching Almost matched element Shift bottom element Shift bottom element – moved downwards – e.g.: margin-top – analyze diff. pixels . . .
<button, background-color> Three Key Insights Probabilistic correlations can help identify faults <button, background-color> Color Symptom Size Symptom T F 0.0 1.0 0.2 0.8 0.95 0.05 0.7 0.3 ✔
✗ ✗ ✔ Three Key Insights Building probabilistic model Approaches Pool of known presentation failures Differences depend on page layout. Not generalizable. Historical data for the page Available only for mature pages Manual extraction from bug-tracking system ✗ ✗ Probabilistic models can be automatically generated from the faulty test page ✔
Our Approach Input Test page Oracle image (previous version screenshot, mockup etc.) Phases Detect presentation failures Build the probabilistic model Identify the most likely faults Output Ranked list of likely faults The goal is to automatically identify the fault of a presentation failure observed in a test page. Remove bullet points
P1. Detect Presentation Failures Use WebSee [Mahajan et. al. ICST 2015] Oracle image Presentation failures Visual comparison Computer vision technique, Perceptual Image Differencing (PID) Test web page
P2. Build the Probabilistic Model Model based on conditional probability Set of visual symptoms e = HTML element Fault = Root cause <e, p> p = CSS property Probability that a potential root cause, r, is faulty given the observed set of visual symptoms, S.
P2. Build the Probabilistic Model Generate data samples Inject faults into the test page Assign different values to potential root causes Observe visual symptoms Build truth table
Truth Table – Example ✗ ✔ T F F, F T, T T, F F, T Data samples Root Causes Injected values Visual Symptoms Added color Almost matched element Shift top element Shift bottom element Page size changed <p, color> blue T F <div, margin-top> 0px, 50px F, F T, T T, F F, T ✗ ✔ Data samples <div, margin-top = 0px> <div, margin-top = 50px>
P2. Build the Probabilistic Model Generate data samples Inject faults into the test page Assign different values to potential root causes Observe true visual symptoms Build truth table Calculate probabilities Individual symptoms and conditional probability Learn correlation between the root cause and visual symptoms
Probabilities Calculation Conditional probability Bayes’ theorem r = root cause S = set of visual symptoms
Probabilities Calculation P(S|r) = Probability of the status of visual symptoms S given r is the faulty root cause r = root cause S = set of visual symptoms Assumes visual symptoms are conditionally independent given the root cause Advantages Easier to calculate Parallelizable
Probabilities Calculation P(S|r) = Probability of the status of visual symptoms S given r is the faulty root cause r = root cause S = set of visual symptoms Measure P(s|r) in data samples Observe visual symptoms for a seeded root cause
Conditional Probability Table – Example Root Causes Injected values Visual Symptoms Added color Almost matched element Shift top element Shift bottom element Page size changed <p, color> blue T F <div, margin-top> 0px, 50px F, F T, T T, F F, T
Conditional Probability Table – Example Root Causes Injected values Visual Symptoms Added color Almost matched element Shift top element Shift bottom element Page size changed <p, color> blue T F <div, margin-top> 0px, 50px F, F T, T T, F F, T
Conditional Probability Table – Example Root Causes Injected values Visual Symptoms Added color Almost matched element Shift top element Shift bottom element Page size changed <p, color> blue 1.0 0.0 <div, margin-top> 0px, 50px 0.5
Probabilities Calculation P(r) = Relative probability of r being the faulty root cause r = root cause S = set of visual symptoms Assume developers cause faults with uniform probability r = <e, p> e = HTML element p = CSS property
Probabilities Calculation P(r) = Relative probability of r being the faulty root cause r = root cause S = set of visual symptoms r = <e, p> e = HTML element p = CSS property
P(p) Computation – Example Total 2 properties in the page color, margin-top
Probabilities Calculation P(S) = Probability of symptoms in S being T/F for a given page r = root cause S = set of visual symptoms P(S) is independent of r Values of s S are given
Probabilities Calculation P(e), P(S) = Constants r = root cause S = set of visual symptoms
P3. Identify Most Likely Root Causes for r R = {<e1, p1>, …, <en, pn>} 1. calculate P(p) for r = <e, p> 2. determine visual symptoms, S 3. for s S look up P(s|r) in the model 4. calculate Rank root causes by their probabilities
Empirical Evaluation RQ1: How accurate is our approach in identifying root causes of presentation failures? RQ2: What are the computational resources needed to run our approach?
Implementation Approach implemented in FieryEye (火眼) Building the probabilistic model Parallelized over 200 Amazon EC2 c4.large instances Identifying visual symptoms Used OpenCV to compare screenshots, extract color information, perform sub-image searching, etc.
Experiment Protocol Refactoring of web pages For each subject Migrate HTML 4 to 5 (<div id=‘head’> to <header>) Convert table-based layout to div based Replace deprecated tags (<font> to CSS font) For each subject Download page (H), take screenshot = oracle Refactor H to get H’ Seed presentation failure in H’ to create a variant Run FieryEye on oracle and variant WebSee, XPERT, Text Diff Tool (TDT) – diff Regression Debugging activity Generate test cases Performance comparison
Subjects Random URL generator (http://www.uroulette.com) Subject Size (Total RC) Generated # test cases Perl 1,592 36 GTK 1,121 30 Konqueror 6,779 39 Amulet 88 22 UCF 2,415 47 Remove bullet point
Quantify a range in the way developers may use the results RQ1: Accuracy Ranking of the correct root cause in the result set (Effort required to find the correct root cause) Other techniques do not rank root causes Adapted other techniques to report rank Quantify a range in the way developers may use the results Ranking U = Upper bound on effort Ranking L = Lower bound on effort
RQ1: Accuracy Results FieryEye rank = 7.9 WebSee rank-L = 10.2 FieryEye recall = 100% WebSee recall = 65.6%
In Y% cases, correct root cause ranked in the top X RQ1: Accuracy Results In Y% cases, correct root cause ranked in the top X (X, Y) FieryEye: 45% cases Correct root cause in top 5 WebSee: 5% (U), 10% (L) cases XPERT, TDT: 1% (U and L)
RQ2: Computational Resources FieryEye Fast but imprecise 200 Amazon EC2 instances 1 c4.large = $0.11 per hour Cost = 200 * $0.0018/min * 3 Model building cost = $1 Remove 12 - 22
Summary Technique for finding root cause of presentation failure Image processing to find visual symptoms Probabilistic models to predict root causes Empirical evaluation shows positive results Avg. median correct root cause rank = 7.9 Prediction time = 17 sec Model building cost = $1
Sonal Mahajan, Bailan Li, Pooyan Behnamghader, William G. J. Halfond Thank you Using Visual Symptoms for Debugging Presentation Failures in Web Applications Sonal Mahajan, Bailan Li, Pooyan Behnamghader, William G. J. Halfond spmahaja@usc.edu bailanli@usc.edu pbehnamg@usc.edu halfond@usc.edu Work supported by NSF Grant CCF-1528163
Ranking U and L WebSee: ranked list of HTML elements Techniques report HTML elements Add defined CSS properties for e reported faulty HTML elements { if (e == incorrect faulty element) rankingU = rankingU + e.getProps() rankingL = rankingL + 1 } else rankingU = rankingU + e.getProps() / 2 rankingL = rankingL + e.getProps() / 2 Set of root causes WebSee: ranked list of HTML elements XPERT, TDT: unsorted rankingU = rankingU / 2 rankingL = rankingL / 2
<div, margin-top> Definitions Root cause – tuple <e, p>, where e = HTML element and p = CSS property Oracle image Test web page <div, margin-top>
Definitions Visual symptom – boolean predicate describing the visual difference clues to the root cause Almost matched element – only position changed – e.g.: margin-top – sub-image searching Shift bottom element – moved downwards – e.g.: margin-top – analyze diff. pixels Oracle image Test web page
Full Running Example
Running example News ---------- News ---------- Menu | Contact Menu | Contact News ---------- Username News ---------- Username Password Password Sign in Sign in About us | Feedback| FAQ About us | Feedback| FAQ Oracle (Previous version) Test web page
P2. Generate Data Samples – Example 2 HTML elements in page <p> color <div> margin-top Test web page 2 Potential Root Causes <p, color> <div, margin-top>
Data sample 1: <p, color = blue> P2. Generate Data Samples – Example 2 Potential Root Causes <p, color> <div, margin-top> 3 Data Samples <p, color> blue <div, margin-top> 0px, 50px inject Data sample 1: <p, color = blue> 1 Visual Symptom Added color
P2. Generate Data Samples – Example 2 Potential Root Causes <p, color> <div, margin-top> 3 Data Samples <p, color> blue <div, margin-top> 0px, 50px inject Data sample 2: <div, margin-top = 0px> 2 Visual Symptoms Almost matched element Shift top element
P2. Generate Data Samples – Example 2 Potential Root Causes <p, color> <div, margin-top> 3 Data Samples <p, color> blue <div, margin-top> 0px, 50px inject Data sample 3: <div, margin-top = 50px> 3 Visual Symptoms Almost matched element Shift bottom element Page size changed
Truth Table – Example T F F, F T, T T, F F, T Data samples Root Causes Injected values Visual Symptoms Added color Almost matched element Shift top element Shift bottom element Page size changed <p, color> blue T F <div, margin-top> 0px, 50px F, F T, T T, F F, T Data samples
Conditional Probability Table – Example Root Causes Injected values Visual Symptoms Added color Almost matched element Shift top element Shift bottom element Page size changed <p, color> blue T F <div, margin-top> 0px, 50px F, F T, T T, F F, T
Conditional Probability Table – Example Root Causes Injected values Visual Symptoms Added color Almost matched element Shift top element Shift bottom element Page size changed <p, color> blue T F <div, margin-top> 0px, 50px F, F T, T T, F F, T
Conditional Probability Table – Example Root Causes Injected values Visual Symptoms Added color Almost matched element Shift top element Shift bottom element Page size changed <p, color> blue 1.0 0.0 <div, margin-top> 0px, 50px 0.5
P(p) Computation – Example Total 2 properties in the page color, margin-top
P3. – Example All root causes, R = {<p, color>, <div, margin-top>} r = <p, color> 1. P(p) = 0.5 2. S = {almost element matched, shift bottom element} 3. s1 = almost element matched, P(s1|r) = 0.0 s2 = shift bottom element, P(s2|r) = 0.0 = 0.0 4.
Visual Symptoms (S) – Example Almost matched element – only position changed – e.g.: margin-top – sub-image searching Shift bottom element – moved downwards – e.g.: margin-top – analyze diff. pixels Oracle image Test web page
P3. – Example All root causes, R = {<p, color>, <div, margin-top>} r = <p, color> 1. P(p) = 0.5 2. S = {almost element matched, shift bottom element} 3. s1 = almost element matched, P(s1|r) = 0.0 s2 = shift bottom element, P(s2|r) = 0.0 = 0.0 4. = P(<p, color> | S) = 0.0 * 0.5 = 0.0
P3. – Example All root causes, R = {<p, color>, <div, margin-top>} r = <div, margin-top> 1. P(p) = 0.5 2. S = {almost element matched, shift bottom element} 3. s1 = almost element matched, P(s1|r) = 1.0 s2 = shift bottom element, P(s2|r) = 0.5 = 1.0 * 0.5 = 0.5 4. = P(<div, margin-top> | S) = 0.5 * 0.5 = 0.25
✔ P3. – Example Rank root causes by their probabilities P(<div, margin-top>) = 0.25 P(<p, color>) = 0.0 ✔