HFWEB June 19, 2000 Quantitative Measures for Distinguishing Web Pages Melody Y. Ivory Rashmi R. Sinha Marti A. Hearst UC Berkeley
Quantitative Measures for Distinguishing Web Pages HFWEB June 19, 2000 Research Goals uIdentify key Web page aspects that impact usability –Easily-quantified measures for information-centric sites uExamine their effect through user studies –Establish concrete thresholds uIncorporate findings into a simulation tool (Web TANGO) –Mimic Web site usage, report quantitative results –Enable comparison of alternative designs
Quantitative Measures for Distinguishing Web Pages HFWEB June 19, 2000 Outline uMethodology uData Analysis uPredicting Web Page Rating uWrap-up
Quantitative Measures for Distinguishing Web Pages HFWEB June 19, 2000 Methodology uCollect quantitative measures from 2 groups –Ranked: Sites rated favorably via expert review or user ratings –Unranked: Sites that have not been rated favorably uStatistically compare the groups uPredict group membership
Quantitative Measures for Distinguishing Web Pages HFWEB June 19, 2000 Methodology: Quantitative Measures uIdentified 42 aspects from the literature –Page Composition (e.g., words, links, images) –Page Formatting (e.g., fonts, lists, colors) –Overall Page Characteristics (e.g., information and layout quality, download speed) uPage composition & formatting aspects are easier to measure
Quantitative Measures for Distinguishing Web Pages HFWEB June 19, 2000 Methodology: Metrics Selected –Word Count –Body Text Percentage –Emphasized Body Text Percentage –Text Positioning Count –Text Cluster Count –Link Count –Page Size –Graphic Percentage –Graphics Count –Color Count –Font Count –Reading Complexity We measured 1/2 of the aspects
Quantitative Measures for Distinguishing Web Pages HFWEB June 19, 2000 Methodology: Data Collection uCollected data for 2,015 English & non- English information-centric pages from 463 sites –Education, government, newspaper, etc. uData constraints –At least 30 words –No e-commerce pages –Exhibit high self-containment (i.e., no style sheets, scripts, applets, etc.) u1,054 pages fit constraints (52%)
Quantitative Measures for Distinguishing Web Pages HFWEB June 19, 2000 Methodology: Data Collection uRanked pages –Favorably assessed by expert review or user rating on expert-chosen sites –Sources: –Yahoo! 101 (ER) –Web 100 (UR) –PC Mag Top 100 (ER) –WiseCat’s Top 100 (ER) –Webby Awards (ER) & Peoples Voice (UR)
Quantitative Measures for Distinguishing Web Pages HFWEB June 19, 2000 Methodology: Data Collection uUnranked –Not favorably assessed by expert review or user rating on expert-chosen sites –Do not assume unranked = unfavorable –Sources: –WebCriteria’s Industry Benchmark –Yahoo Business & Economy Category –Others
Quantitative Measures for Distinguishing Web Pages HFWEB June 19, 2000 Methodology: Analysis Data u428 pages –214 ranked pages –840 unranked pages 214 chosen randomly
Quantitative Measures for Distinguishing Web Pages HFWEB June 19, 2000 Data Analysis: Findings uSeveral features are significantly associated with ranked sites uSeveral pairs of features correlate strongly –Correlations mean different things in ranked vs. unranked pages uSignificant features are partially successful at predicting if site is ranked
Quantitative Measures for Distinguishing Web Pages HFWEB June 19, 2000 Data Analysis: Significant Differences
Quantitative Measures for Distinguishing Web Pages HFWEB June 19, 2000 Data Analysis: Significant Differences uRanked pages –More text clustering (facilitates scanning) –More links (facilitate info-seeking) –More bytes (more content facilitate info seeking) –More images (clustering graphics facilitates scanning) –More colors (facilitates scanning) –Lower reading complexity (close to best numbers in Spool study facilitates scanning)
Quantitative Measures for Distinguishing Web Pages HFWEB June 19, 2000 Data Analysis: Metric Correlations
Quantitative Measures for Distinguishing Web Pages HFWEB June 19, 2000 Data Analysis: Metric Correlations uHypotheses based on correlations: –Ranked Pages Colored display text Link clustering Both patterns on all pages in random sample –Unranked Pages Display text coloring plus body text emphasis or clustering Link coloring or clustering Image links, simulated image maps, bulleted links At least 2 patterns in 70% of random sample
Quantitative Measures for Distinguishing Web Pages HFWEB June 19, 2000 Data Analysis: Example Pages uRanked ExampleRanked Example uUnranked ExampleUnranked Example
Quantitative Measures for Distinguishing Web Pages HFWEB June 19, 2000 Predicting Web Page Ranking uLinear Regression –Explains 10% of difference between groups –63% Accuracy (better at unranked prediction) u Employ machine learning techniques
Quantitative Measures for Distinguishing Web Pages HFWEB June 19, 2000 Predicting Web Page Ranking uWeb site ratings from RateItAll.com –User ratings on 5-point scale (1= Terrible! 5 = Great!) No rating criteria –Small set of 59 pages (61% ranked) 54% of pages classified consistently Only 17% unranked with high rating unranked sites properly labeled 29% ranked with medium rating difference between expert/non-expert review –Ranking predicted by graphics count with 70% accuracy – Carefully design studies with non-experts
Quantitative Measures for Distinguishing Web Pages HFWEB June 19, 2000 Predicting Web Page Ranking uHome vs. Non-home pages –Text cluster count predicts home page ranking 66% accuracy Consistent with primary goal of home pages –Non-home page prediction Consistent with full sample results 4 of 6 metrics (link count, text positioning count, color count, reading complexity) u Conduct further analysis using functional categories (genres) for pages
Quantitative Measures for Distinguishing Web Pages HFWEB June 19, 2000 Future Work uNew metrics computation tool –More quantitative measures –Process style sheets –Functional categories for pages –UI uRepeat data collection and analysis –Larger sample of pages uValidation studies with users uMore info:
Quantitative Measures for Distinguishing Web Pages HFWEB June 19, 2000 In Summary uQuantitative measures should be helpful for improving information-centric Web pages –We can empirically distinguish between ranked and unranked pages