IBM Almaden, Oct 2000 Automating Assessment of Web Site Usability Marti Hearst University of California, Berkeley.

Slides:



Advertisements
Similar presentations
End-User Perceptions of Formal and Informal Representations of Web Sites Jason Hong Francis Li James Lin James Landay Group for User Interface Research.
Advertisements

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Chapter 4 Design Approaches and Methods
1 CS 501 Spring 2002 CS 501: Software Engineering Lecture 11 Designing for Usability I.
Information Retrieval: Human-Computer Interfaces and Information Access Process.
The art and science of measuring people l Reliability l Validity l Operationalizing.
Empirically Validated Web Page Design Metrics Melody Y. Ivory, Rashmi R. Sinha, Marti A. Hearst UC Berkeley CHI 2001.
USABILITY AND EVALUATION Motivations and Methods.
The art and science of measuring people l Reliability l Validity l Operationalizing.
SIMS 213: User Interface Design & Development Marti Hearst Thurs, March 3, 2005.
User Centered Web Site Engineering Part 2. Iterative Process of User-Centered Web Engineering Prototype Evaluate Discovery Maintenance Implementation.
Measuring Information Architecture CHI 01 Panel Position Statement Marti Hearst UC Berkeley.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Web TANGO Project Melody Ivory (PhD student) Rashmi Sinha (Postdoc) Marti Hearst (Research Advisor) Undergrads - Steve Demby Anthony Lee Dave Lai HCC Retreat.
Universal Access: More People. More Situations Content or Graphics Content or Graphics? An Empirical Analysis of Criteria for Award-Winning Websites Rashmi.
Automating Assessment of WebSite Design Melody Y. Ivory and Marti A. Hearst UC Berkeley
Empirically Validated Web Page Design Metrics Melody Y. Ivory, Rashmi R. Sinha, Marti A. Hearst UC Berkeley CHI 2001.
UCB HCC Retreat Search Text Mining Web Site Usability Marti Hearst SIMS.
Measuring Information Architecture Marti Hearst UC Berkeley.
I213: User Interface Design & Development Marti Hearst March 1, 2007.
Measuring Information Architecture Marti Hearst UC Berkeley.
Information Retrieval: Human-Computer Interfaces and Information Access Process.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
HFWEB June 19, 2000 Quantitative Measures for Distinguishing Web Pages Melody Y. Ivory Rashmi R. Sinha Marti A. Hearst UC Berkeley.
Gender Issues in Systems Design and User Satisfaction for e- testing software Prepared by Sahel AL-Habashneh. Department of Business information systems.
Automating Assessment of Web Site Usability Marti Hearst Melody Ivory Rashmi Sinha University of California, Berkeley.
NEC Symposium 2000 Automating Assessment of Web Site Usability Marti Hearst University of California, Berkeley.
UCB CS Research Fair Search Text Mining Web Site Usability Marti Hearst SIMS.
The art and science of measuring people l Reliability l Validity l Operationalizing.
Measuring Information Architecture Marti Hearst UC Berkeley.
Empirical Foundations for Web Site Usability Marti Hearst Melody Ivory Rashmi Sinha University of California, Berkeley.
Towards Automated Web Design Advisors Melody Y. Ivory Marti A. Hearst School of Information Management & Systems UC Berkeley IBM Make IT Easy Conference.
Mining the Web for Design Guidelines Marti Hearst, Melody Ivory, Rashmi Sinha UC Berkeley.
Web Design cs414 spring Announcements Project status due Friday (submit pdf)
Discount Usability Engineering Marti Hearst (UCB SIMS) SIMS 213, UI Design & Development March 2, 1999.
User Interface Evaluation CIS 376 Bruce R. Maxim UM-Dearborn.
Overview of Search Engines
Evaluating User Interfaces Walkthrough Analysis Joseph A. Konstan
Search and Retrieval: Relevance and Evaluation Prof. Marti Hearst SIMS 202, Lecture 20.
Web Design and Patterns CMPT 281. Outline Motivation: customer-centred design Web design introduction Design patterns.
Expression Web 2 Concepts and Techniques Expression Web Design Feature Web Design Basics.
Testing for Accessibility and Usability Is Your Site Accessible and Usable or Just Conformant?
Evaluation of digital collections' user interfaces Radovan Vrana Faculty of Humanities and Social Sciences Zagreb, Croatia
Introduction to SDLC: System Development Life Cycle Dr. Dania Bilal IS 582 Spring 2009.
1 Validation & Verification Chapter VALIDATION & VERIFICATION Very Difficult Very Important Conceptually distinct, but performed simultaneously.
Evaluation of Adaptive Web Sites 3954 Doctoral Seminar 1 Evaluation of Adaptive Web Sites Elizabeth LaRue by.
Put it to the Test: Usability Testing of Library Web Sites Nicole Campbell, Washington State University.
Website Accessibility Testing. Why consider accessibility People with disabilities – Visual, Hearing, Physical, Cognitive (learning, reading, attention.
1 The Web & Professional Communication English 3104.
An Introduction To Websites With a little of help from “WebPages That Suck.
Heuristic evaluation Functionality: Visual Design: Efficiency:
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
1 Systems Analysis and Design in a Changing World, Thursday, January 18, 2007.
Usability and Accessibility CIS 376 Bruce R. Maxim UM-Dearborn.
Problemsolving Problem Solving – 4 Stages Analysis Design Development Evaluate (ADDE) Note: In this unit Evaluate is not covered in depth.
Chapter 8 Usability Specification Techniques Hix & Hartson.
Building Simulation Model In this lecture, we are interested in whether a simulation model is accurate representation of the real system. We are interested.
Technical Communication A Practical Approach Chapter 14: Web Pages and Writing for the Web William Sanborn Pfeiffer Kaye Adkins.
WERST – Methodology Group
Microsoft Expression Web 3 Expression Web Design Feature Web Design Basics.
Assess usability of a Web site’s information architecture: Approximate people’s information-seeking behavior (Monte Carlo simulation) Output quantitative.
Design Evaluation Overview Introduction Model for Interface Design Evaluation Types of Evaluation –Conceptual Design –Usability –Learning Outcome.
Rashmi Sinha, Marti Hearst & Melody Ivory
Chapter 17 Designing for the web
Chapter 12: Automated data collection methods
Tools of Software Development
Evaluation techniques
Project HE Assignment Web Site Design
Presentation transcript:

IBM Almaden, Oct 2000 Automating Assessment of Web Site Usability Marti Hearst University of California, Berkeley

IBM Almaden, Oct 2000 The Usability Gap 196M new Web sites in the next 5 years [Nielsen99] ~20,000 user interface professionals [Nielson99]

IBM Almaden, Oct 2000 The Usability Gap Most sites have inadequate usability [Forrester, Spool, Hurst] (users can’t find what they want 39-66% of the time) 196 M new Web sites in the next 5 years [Nielsen99] A shortage of user interface professionals [Nielson99]

IBM Almaden, Oct 2000 Usability effects the bottom line IBM case study [1999] Spent $millions to redesign site 84% decrease in help usage 400% increase in sales Attributed to improvements in information architecture

IBM Almaden, Oct 2000 Usability effects the bottom line IBM case study [1999] Spent $millions to redesign site 84% decrease in help usage 400% increase in sales Attributed to improvements in information architecture Creative Good Study [1999] Studied 10 e-commerce sites 59% attempts failed If 25% of these had succeeded -> estimated additional $3.9B in sales

IBM Almaden, Oct 2000 Talk Outline Web Site Design Automated Usability Evaluation Our approach WebTANGO Some Empirical Results Wrap-up Joint work with Melody Ivory & Rashmi Sinha

IBM Almaden, Oct 2000 Web Site Design (Newman et al. 00) Information design structure, categories of information Navigation design interaction with information structure Graphic design visual presentation of information and navigation (color, typography, etc.) Courtesy of Mark Newman

IBM Almaden, Oct 2000 Information Architecture includes management and more responsibility for content User Interface Design includes testing and evaluation Web Site Design (Newman et al. 00) Courtesy of Mark Newman

IBM Almaden, Oct 2000 Web Site Design Process Discovery Assemble information relevant to project Design Exploration Explore alternative design approaches (information, navigation, and graphic) Design Refinement Select one approach and iteratively refine it Production Create prototypes and specifications Courtesy of Mark Newman Start

IBM Almaden, Oct 2000 Iteration Design Prototype Evaluate

IBM Almaden, Oct 2000 Usability Evaluation Standard Techniques  User studies  Potential users use the interface to complete some tasks  Requires an implemented interface  "Discount" Usability Evaluation  Heuristic Evaluation  Usability expert assesses guidelines

IBM Almaden, Oct 2000 Automated UE We looked at 124 methods AUE is greatly under-explored Only 36% of all methods Fewer methods for the web (28%) Most techniques require some testing Only 18% are free from user testing Only 6% for the web

IBM Almaden, Oct 2000 Survey of Automated UE Predominant methods (Web) Structural analysis (4) Bobby, Scholtz & Laskowski 98, Stein 97 Guideline Reviews (11) Log file analysis (9) Chi et al. 00, Drott 98, Fuller & de Graaff 96, Guzdial et al., Sullivan 97, Theng & Marsden 98 Simulation (2) Webcriteria (Max), Chi et al. 00

IBM Almaden, Oct 2000 Existing Metrics  Web metric analysis tools report on what is easy to measure  Predicted download time  Depth/breadth of site  We want to worry about  Content  User goals/tasks  We also want to compare alternative designs.

IBM Almaden, Oct 2000 Web TANGO Tool for Assessing NaviGation & Organization Goal: automated support for comparing design alternatives How: Assess usability of the information architecture Approximate information-seeking behavior Output quantitative usability metrics

IBM Almaden, Oct 2000 Benefits/Tradeoffs Benefits Less expensive than traditional methods Use early in design process Tradeoffs Accuracy? Validate methodology with user studies Illustrate different problems than traditional methods For comparison purposes only Does not capture subjective measures

IBM Almaden, Oct 2000 Information-Centric Sites museum, history news, magazines government info

IBM Almaden, Oct 2000 Guidelines There are many usability guidelines A survey of 21 sets of web guidelines found little overlap (Ratner et al. 96) Why? Our hypothesis: not empirically validated So … let’s figure out what works!

IBM Almaden, Oct 2000 An Empirical Study: Which features distinguish well-designed web pages?

IBM Almaden, Oct 2000 Methodology Collect quantitative measures from 2 groups Ranked: Sites rated favorably via expert review or user ratings Unranked: Sites that have not been rated favorably Statistically compare the groups Predict group membership

IBM Almaden, Oct 2000 Quantitative Measures Identified 42 aspects from the literature Page Composition (e.g., words, links, images) Page Formatting (e.g., fonts, lists, colors) Overall Page Characteristics (e.g., information & layout quality, download speed)

IBM Almaden, Oct 2000 Metrics Word Count Body Text Percentage Emphasized Body Text Percentage Text Positioning Count Text Cluster Count Link Count Page Size Graphic Percentage Graphics Count Color Count Font Count Reading Complexity

IBM Almaden, Oct 2000 Data Collection Collected data for 2,015 information-centric pages from 463 sites Education, government, newspaper, etc. Data constraints At least 30 words No e-commerce pages Exhibit high self-containment (i.e., no style sheets, scripts, applets, etc.) 1,054 pages fit constraints (52%)

IBM Almaden, Oct 2000 Data Collection Ranked pages Favorably assessed by expert review or user rating on expert-chosen sites Sources: Yahoo! 101 (ER) Web 100 (UR) PC Mag Top 100 (ER) WiseCat’s Top 100 (ER) Webby Awards (ER) & Peoples Voice (UR)

IBM Almaden, Oct 2000 Data Collection Unranked Not favorably assessed by expert review or user rating on expert-chosen sites Do not assume unranked = unfavorable Sources: WebCriteria’s Industry Benchmark Yahoo Business & Economy Category Others

IBM Almaden, Oct 2000 Data Analysis 428 pages 214 ranked pages 840 unranked pages 214 chosen randomly

IBM Almaden, Oct 2000 Findings Several features are significantly associated with ranked sites Several pairs of features correlate strongly Correlations mean different things in ranked vs. unranked pages Significant features are partially successful at predicting if site is ranked

IBM Almaden, Oct 2000 Significant Differences

IBM Almaden, Oct 2000 Significant Differences Ranked pages More text clustering (facilitates scanning) More links (facilitate info-seeking) More bytes (more content  facilitate info seeking) More images (clustering graphics  facilitates scanning) More colors (facilitates scanning) Lower reading complexity (close to best numbers in Spool study  facilitates scanning)

IBM Almaden, Oct 2000 Metric Correlations

IBM Almaden, Oct 2000 Metric Correlations Created hypotheses based on correlations: Ranked Pages Colored display text Link clustering  Both patterns on all pages in random sample Unranked Pages Display text coloring plus body text emphasis or clustering Link coloring or clustering Image links, simulated image maps, bulleted links  At least 2 patterns in 70% of random sample Confirmed by sampling

IBM Almaden, Oct 2000 Two Examples

IBM Almaden, Oct 2000 Ranked Page Colored display text Link clustering

IBM Almaden, Oct 2000 UnRanked Page Body text emphasis Image links

IBM Almaden, Oct 2000 Predicting Web Page Rating Linear Regression Explains 10% of difference between groups 63% Accuracy (better at unranked prediction)

IBM Almaden, Oct 2000 Predicting Web Page Rating Home vs. Non-home pages Text cluster count predicts home page ranking 66% accuracy Consistent with primary goal of home pages Non-home page prediction Consistent with full sample results 4 of 6 metrics (link count, text positioning count, color count, reading complexity)

IBM Almaden, Oct 2000 Another Rating System Web site ratings from RateItAll.com User ratings on 5-point scale (1= Terrible! 5 = Great!) No rating criteria Small set of 59 pages (61% ranked) 54% of pages classified consistently Only 17% unranked with high rating  unranked sites properly labeled 29% ranked with medium rating  difference between expert/non-expert review Ranking predicted by graphics count with 70% accuracy  Carefully design studies with non-experts

IBM Almaden, Oct 2000 Second study (new results) Better rating data Webby Awards Sites organized into categories New metrics computation tool More quantitative measures Process style sheets, inline frames Larger sample of pages

IBM Almaden, Oct 2000 Webby Awards categories We used finance, education, community, living, health, services 100 judges 6 criteria 3 rounds of judging We used first round only 2000 sites initially

IBM Almaden, Oct 2000 Webby Awards criteria Content Structure & navigation Visual design Functionality Interactivity Overall experience Factor analysis: first factor accounted for 91% of the variance Judgements somewhat normally distributed, with skew

IBM Almaden, Oct 2000 New Metrics

IBM Almaden, Oct 2000 Methodology Data collection 1108 pages 163 sites 3 levels per site 14 metrics About 85% accurate Text cluster and text positioning counts less accurate

IBM Almaden, Oct 2000 Preliminary Results Linear regression to predict Webby judges ratings Top 30% vs bottom 30% Prediction accuracy: 72% if categories not taken into account 83% if categories assessed separately

IBM Almaden, Oct 2000 Significant Metrics by Category

IBM Almaden, Oct 2000 Category-based Profiles K-means clustering of good sites, according to the metrics Preliminary results suggest the sites do cluster Can use clusters to create profiles of good and poor sites for each category These can be used as empircally verified guidelines

IBM Almaden, Oct 2000 Ramifications It is remarkable that such simple metrics predict so well Perhaps good design is good overall There may be other factors A foundation for a new methodology Empircal, bottom up Does this reflect cognitive principles? But, no one path to good design

IBM Almaden, Oct 2000 Longer Term Goal: A Simulator for Comparing Site Design

IBM Almaden, Oct 2000 Monte Carlo Simulation  Have a model of information structure  Have a set of user goals  Want to assess navigation structure  Compare alternatives/tradeoffs  Identify bottlenecks  Identify critically important pages/links  Check all pairs of start/end points  Check overall reachability before and after a change.

IBM Almaden, Oct 2000 One Monte Carlo simulation step for Design 1, Task 1. Simulation starts from the home page and the target information is at Renter Support. X

IBM Almaden, Oct 2000 Monte Carlo simulation results for Design 1, Task 1. Simulation runs start from all pages in the site. Average Navigation times are shown for Tasks 2 & 3. X

IBM Almaden, Oct 2000 Monte Carlo Simulation  At each step in the simulation  Assume a probability distribution over a set of next choices.  The next choice is a function of:  The current goal  The understandability of the choice  Prior interaction history  The overall complexity of the page  Varying the distribution corresponds to varying properties of the links  Spot-check important choices

IBM Almaden, Oct 2000 Monte Carlo Simulation  At each step in the simulation  Assume a probability distribution over a set of next choices.  The next choice is a function of:  The current goal  The understandability of the choice  Prior interaction history  The overall complexity of the page  Varying the distribution corresponds to varying properties of the links  Spot-check important choices

IBM Almaden, Oct 2000 In Summary Automated Usability Assessment should help close the Web Usability Gap We can empirically distinguish between highly rated web pages and other pages Empirical validation of design guidelines Can build profiles of good vs. poor sites Are validating expert judgements with usability assessments via a user study Web use simulation is an under-explored and promising new approach

IBM Almaden, Oct 2000 Current Projects Automating Web Usability (Tango) Melody Ivory, Rashmi Sinha Text Data Mining (Lindi) Barbara Rosario, Steve Tu Metadata in Search Interfaces (Flamenco) Ame Elliott, Andy Chou Web Intranet Search (Cha-Cha) Mike Chen, Jamie Laflen

IBM Almaden, Oct 2000 More information:

IBM Almaden, Oct 2000

Automated Usability Evaluation  Logging/capture  Pro: Easy  Con: Requires implemented system  Con: Don't know the user task (web)  Con: Don't present alternatives  Con: Don't distinguish error from success  Analytical Modeling  Pro: doable at design phase  Con: models an expert  Con: academic exercise  Simulation

IBM Almaden, Oct 2000 Research Issues: Navigation Predictions Develop model for predicting link selection Requirements Information need (task metadata) Representation of pages (page metadata) Method for selecting links (relevance ranking) Maintaining user’s conceptual model during site traversal (scent [Fur97,LC98,Pir97])