Download presentation
Presentation is loading. Please wait.
1
CS 422: UI Design and Programming
Intro to User Testing
2
Housekeeping GR3 is due on Thursday, 3/7. Rubric and examples are posted on Piazza. Your superhero TAs have posted both PS1 grades and GR2 grades! Graduate students: Note upcoming assignments and don’t leave them for the last minute. Mid term syllabus: everything we have covered to this day. Format of the mid-term exam and mid-term presentations will be reviewed in the next class. You should already be working on PS2.
3
User Testing
4
Today we’ll learn the following:
Different kinds of user testing Ethics of Human Subject research Formative Evaluation: In details User testing in class: get some of your GR3 done
5
What is user testing? We learned about usability and how to design for it. Now, we need to put our designs to test. We can test our user interfaces during design and development or after development. Formative evaluation is done in between design iterations. Summative evaluation is done after development (you will do this in GR6) User testing is about putting an interface in front of real users.
6
System testing vs. User testing
System testing does NOT involve human subjects or real users of the system. System testing focuses on the functionalities and utility of the system, and sometimes non-functional requirements, such as latency. User testing focuses on usability dimensions, such as learnability, efficiency, and safety. User testing always involves human subjects or real users of the system.
7
Different kinds of user testing
Formative evaluation Field study Controlled experiment
8
Formative evaluation Find problems for next iteration of design (during the design process) Formative evaluation doesn’t need a full working implementation, but can be done on a variety of prototypes. Evaluates prototype or implementation, in lab, on chosen tasks The results of formative evaluation are largely qualitative observations, usually a list of usability problems. You also choose the tasks given to users, which are generally realistic (drawn from task analysis, which is based on observation) but nevertheless fake.
9
Formative evaluation: Key disadvantages
The testing environment is controlled and tasks are fake. You are coming up with representative tasks based on your scenarios and understanding of the problem space Running a test in a lab environment on tasks of your invention may not tell you enough about how well your interface will work in a real context on real tasks. How to fix that?
10
Field study A field study can answer these questions, by actually deploying a working implementation to real users, and then going out to the users’ real environment and observing how they use it. Ethnography Ethnomethodology Interviews Diary Data Logging ESM or Experience Sampling Method Survey Field study is mostly qualitative and is not used to test a hypothesis.
11
Controlled Experiment
The goal of a controlled experiment is to test a quantifiable hypothesis about one or more interfaces. Think A/B testing. Controlled experiments happen under carefully controlled conditions using carefully-designed tasks— much more carefully chosen than formative evaluation tasks. Hypotheses can only be tested by quantitative measurements of usability, like time elapsed, number of errors, or subjective ratings.
12
Different kinds of user testing
Formative evaluation All: GR3 Field study Not covered in this class Controlled experiment Only graduate students: AS1-3/ RS1-3
13
Ethics of Human Subject research
User testing always involves human subjects or real users of the system. Real often means representative users.
15
Ethics of Human Subject research
Users are human beings Human subjects have been seriously abused in the past The Nazi-era experiments led to the Nuremberg Code, an international agreement on the rights of human subjects.
16
Basic Principles (Belmont Report)
Respect for persons voluntary participation informed consent protection of vulnerable populations (children, prisoners, people with disabilities, esp. cognitive) Beneficence do no harm risks vs. benefits: risks to subjects should be commensurate with benefits of the work to the subjects or society Justice fair selection of subjects
17
Institutional Review Boards
Research with people is subject to scrutiny IRB oversight is confined to research
18
UIC IRB
19
Pressures on a User Performance anxiety
Feels like an intelligence test Comparing self with other subjects Feeling stupid in front of observers Competing with other subjects A programmer with an ironclad ego may scoff at such concerns, but these pressures are real.
20
Treat the User With Respect
Time Don’t waste it Comfort Make the user comfortable Informed consent Inform the user as fully as possible Privacy Preserve the user’s privacy Control The user can stop at any time
21
What to do before a test? Time Comfort Privacy Information
Pilot-test all materials and tasks Comfort “We’re testing the system; we’re not testing you.” “Any difficulties you encounter are the system’s fault. We need your help to find these problems.” Privacy “Your test results will be completely confidential.” Information Brief about purpose of study Inform about audiotaping, videotaping, other observers Answer any questions beforehand (unless biasing) Control: “You can stop at any time.” “Keep in mind that we’re testing the computer system. We’re not testing you.” (comfort)
22
What to do during a test? Time Comfort
Eliminate unnecessary tasks Comfort Calm, relaxed atmosphere Take breaks in long session Never act disappointed Give tasks one at a time First task should be easy, for an early success experience Privacy: User’s boss shouldn’t be watching Information: Answer questions (again, where they won’t bias) Control User can give up a task and go on to the next User can quit entirely Answer the user’s questions as long as they don’t bias the test.
23
After the Test Comfort Information Privacy
Say what they’ve helped you do Information Answer questions that you had to defer to avoid biasing the experiment Privacy Don’t publish user-identifying information Don’t show video or audio without user’s permission
24
Formative Evaluation: How to
Find some users Should be representative of the target user class(es), based on user analysis Give each user some tasks Should be representative of important tasks, based on task analysis Watch user do the tasks
25
Roles in Formative Evaluation
User Facilitator Observers
26
User’s Role User should think aloud Problems
What they think is happening What they’re trying to do Why they took an action Problems Feels weird Thinking aloud may alter behavior Disrupts concentration Another approach: pairs of users Two users working together are more likely to converse naturally Also called co-discovery, constructive interaction
27
Facilitator’s Role Does the briefing Provides the tasks
Coaches the user to think aloud by asking questions “What are you thinking?” “Why did you try that?” Controls the session and prevents interruptions by observers
28
Observer’s Role Be quiet! Take notes
Don’t help, don’t explain, don’t point out mistakes Sit on your hands if it helps Take notes Watch for critical incidents: events that strongly affect task performance or satisfaction Usually negative Errors Repeated attempts Curses May be positive “Cool!” “Oh, now I see.”
29
How to record observations?
Pen & paper notes Prepared forms can help Audio recording For think-aloud Video recording Usability labs often set up with two cameras, one for user’s face, one for screen User may be self-conscious Good for closed-circuit view by observers in another room Generates too much data Retrospective testing: go back through the video with the user, discussing critical incidents Screen capture & event logging Cheap and unobtrusive Camtasia, CamStudio
30
User testing in class: get some of your GR3 done
GR3: You should do at least two rounds of testing, each round involving at least three users. I suggest you finish one round of testing in class. Draft tasks Recruit users Take notes Repeat
31
Why three users? Discount Usability
Each team tested 3 users, and I judged how well they ran the study on a 0–100 scale, with 100 representing my model of the ideal test. In total, the software package had 8 major usability problems (along with more smaller issues that I didn't analyze).
32
Next class Review for Mid-Terms
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.