Observation & Experiments

Observation & Experiments
Watch, listen, and learn…

Observing Users Not as easy as you think
One of the best ways to gather feedback about your interface Watch, listen and learn as a person interacts with your system Qualitative & quantitative, end users, experimental or naturalistic

Observation Direct Indirect In same room Can be intrusive
Users aware of your presence Only see it one time May use 1-way mirror to reduce intrusiveness Indirect Video or app recording Reduces intrusiveness, but doesn’t eliminate it Cameras focused on screen, face & keyboard Gives archival record, but can spend a lot of time reviewing it

Location Observations may be
In lab - Maybe a specially built usability lab Easier to control Can have user complete set of tasks In field Watch their everyday actions More realistic Harder to control other factors

Observation Room State-of-the-art observation room equipped with three monitors to view participant, participant's monitor, and composite picture in picture. One-way mirror plus angled glass captures light and isolates sound between rooms. Comfortable and spacious for three people, but room enough for six seated observers. Digital mixer for unlimited mixing of input images and recording to VHS, SVHS, or MiniDV recorders.

Task Selection What tasks are people performing?
Representative and realistic? Tasks dealing with specific parts of the interface you want to test? Problematic tasks? Don’t forget to pilot your entire evaluation!!

Engaging Users in Evaluation
What’s going on in the user’s head? Use verbal protocol where users describe their thoughts Qualitative techniques Think-aloud - can be very helpful Post-hoc verbal protocol - review video Critical incident logging - positive & negative Structured interviews - good questions “What did you like best/least?” “How would you change..?”

Think Aloud User describes verbally what s/he is thinking and doing
What they believe is happening Why they take an action What they are trying to do Widely used, popular protocol Potential problems: Can be awkward for participant Thinking aloud can modify way user performs task

Cooperative approach Another technique: Co-discovery learning (Constructive iteration) Join pairs of participants to work together Use think aloud Perhaps have one person be semi-expert (coach) and one be novice More natural (like conversation) so removes some awkwardness of individual think aloud Variant: let coach be from design team (cooperative evaluation)

Alternative What if thinking aloud during session will be too disruptive? Can use post-event protocol User performs session, then watches video afterwards and describes what s/he was thinking Sometimes difficult to recall Opens up door of interpretation

What if a user gets stuck?
Decide ahead of time what you will do. Offer assistance or not? What kind of assistance? You can ask (in cooperative evaluation) “What are you trying to do..?” “What made you think..?” “How would you like to perform..?” “What would make this easier to accomplish..?” Maybe offer hints This is why cooperative approaches are used

Inputs / Outcomes Need operational prototype What you get out
could use Wizard of Oz simulation What you get out “process” or “how-to” information Errors, problems with the interface compare user’s (verbalized) mental model to designer’s intended model

Capturing a Session 1. Paper & pencil Can be slow May miss things
Is definitely cheap and easy Task Task Task … Time 10:00 10:03 10:08 10:22 S e S e

Capturing a Session 2. Recording (screen, audio and/or video)
Good for think-aloud Multiple cameras may be needed Good, rich record of session Can be intrusive Can be painful to transcribe and analyze

Capturing a Session 3. Software logging
Modify software to log user actions Can give time-stamped key press or mouse event Two problems: Too low-level, want higher level events Massive amount of data, need analysis tools

Example logs 2303761098721869683|hrichter|1098722080134|MV|START|566
|hrichter| |MV|QUESTION|false|false|false|false|false|false| |hrichter| |MV|TAB|AGENDA |hrichter| |MV|TAB|PRESENTATION |hrichter| |MV|SLIDECHANGE|5 |hrichter| |MV|SEEK|PRESENTATION-A|566|604189|0 |hrichter| |MV|SEEK|PRESENTATION-A|566|604189|604189 |hrichter| |MV|SEEK|PRESENTATION-A|566|604189|604189 |hrichter| |MV|TAB|AGENDA |hrichter| |MV|SEEK|AGENDA|566|149613|604189 |hrichter| |MV|TAB|PRESENTATION |hrichter| |MV|SLIDECHANGE|3 |hrichter| |MV|SEEK|PRESENTATION|566|315796|149613 |hrichter| |MV|PLAY|566|315796 |hrichter| |MV|TAB|AV |hrichter| |MV|TAB|PRESENTATION |hrichter| |MV|SLIDECHANGE|2 |hrichter| |MV|SEEK|PRESENTATION|566|271191|315796 |hrichter| |MV|TAB|AV |hrichter| |MV|TAB|PRESENTATION |hrichter| |MV|TAB|AGENDA |hrichter| |MV|TAB|PRESENTATION |hrichter| |MV|TAB|AV |hrichter| |MV|TAB|AGENDA |hrichter| |MV|TAB|AV |hrichter| |MV|STOP|566|271191 |hrichter| |MV|END

Analysis Many approaches Task based Performance based
How do users approach the problem What problems do users have Need not be exhaustive, look for interesting cases Performance based Frequency and timing of actions, errors, task completion, etc. Can be very time consuming!!

Experiments Testing hypotheses…

Experiments Test hypotheses in your design
Generally quantitative, experimental, with end users. See

Types of Variables Independent Dependent
What you’re studying, what you intentionally vary (e.g., interface feature, interaction device, selection technique, design) Dependent Performance measures you record or examine (e.g., time, number of errors)

“Controlling” Variables
Prevent a variable from affecting the results in any systematic way Methods of controlling for a variable: Don’t allow it to vary e.g., all males Allow it to vary randomly e.g., randomly assign participants to different groups Counterbalance - systematically vary it e.g., equal number of males, females in each group The appropriate option depends on circumstances

Hypotheses What you predict will happen
More specifically, the way you predict the dependent variable (i.e., accuracy) will depend on the independent variable(s) “Null” hypothesis (Ho) Stating that there will be no effect e.g., “There will be no difference in performance between the two groups” Data used to try to disprove this null hypothesis

Example Do people complete operations faster with a black-and-white display or a color one? Independent - display type (color or b/w) Dependent - time to complete task (minutes) Controlled variables - same number of males and females in each group Hypothesis: Time to complete the task will be shorter for users with color display Ho: Timecolor = Timeb/w Note: Within/between design issues

Experimental Designs Within Subjects Design
Every participant provides a score for all levels or conditions Color B/W P secs secs. P secs secs. P secs secs. ...

Experimental Designs Between Subjects
Each participant provides results for only one condition Color B/W P secs P secs. P secs P secs. P secs P secs. ...

Within Subjects Designs
More efficient: Each subject gives you more data - they complete more “blocks” or “sessions” More statistical “power”: Each person is their own control Therefore, can require fewer participants May mean more complicated design to avoid “order effects” e.g. seeing color then b/w may be different from seeing b/w then color

Between Subjects Designs
Fewer order effects Participant may learn from first condition Fatigue may make second performance worse Simpler design & analysis Easier to recruit participants (only one session, less time) Less efficient

Defining Performance Based on the task
Specific, objective measures/metrics Examples: Speed (reaction time, time to complete) Accuracy (errors, hits/misses) Production (number of files processed) Score (number of points earned) …others…? Preference, satisfaction, etc. (i.e. questionnaire response) are also valid measurements

What about subjects? How many?
Book advice:at least 10 Other advice:6 subjects per experimental condition Real advice: depends on statistics Relating subjects and experimental conditions Within/between subjects design

Now What…? Performed initial data inspection Descriptive Statistics
Removed outliers, have general idea what occurred Descriptive Statistics Totals, Averages, Ranges, etc. Subgroup Statistics Statistical Analysis T-test and others to determine significance More in 2 weeks…

Feeding Back Into Design
What were the conclusions you reached? How can you improve on the design? What are quantitative benefits of the redesign? e.g. 2 minutes saved per transaction, which means 24% increase in production, or $45,000,000 per year in increased profit What are qualitative, less tangible benefit(s)? e.g. workers will be less bored, less tired, and therefore more interested --> better cust. service

Example: Heather’s simple experiment
Designing interface for categorizing keywords in a transcript Wanted baseline for comparison Experiment comparing: Pen and paper, not real time Pen and paper, real time Simulated interface, real time

Experiment Hypothesis: fewer keywords in real time, fewer with simulated Independent variables: Time, accuracy of transcript, platform Dependent variables: Number of keywords of each category Controlling variables: Gender, experience, etc. Between subjects design 1 hour, mentally intensive task

Results Non-Real Time Rate Real Time Rate Simulated Rate Domain-specific tags 7.5 9.4 5.1 Domain- independent tags 12 9.8 5.8 Conversation tags 1.8 3 2.5 For Domain-specific tags, Simulated less than RealTime, p < 0.01 For Domain-independent tags, Simulated less than RealTime, p < 0.01 Hypotheses fewer in Real Time: not supported fewer with Simulated: supported for two categories

Example: Web Page Structure
Breadth or depth of linking better? Condition 1: 8 x 8 x 8 Condition 2: 16 x 32 Condition 3: 32 x 16 19 experienced users, 8 search tasks for each condition. Tasks chosen randomly from possible 128. Results: Condition 2 fastest (mean 36s, SD 16) Condition 1 slowest (mean 58 s, SD 23) Implies breadth preferable to depth, although too many links could hurt performance Larson & Czerwinski, 1998; see page 447 in ID

Questions: What are independent variables?
What are dependent variables? What could be hypothesis? Between or within subjects? What was controlled? What other data could you gather on this topic? What other experiments could you do on this topic?

Assignment: Due Thursday
Group evaluation plan: draft Expect at least following: Usability criteria Expected methods And which criteria each are evaluating A few details for each method Tasks you will perform, data you will gather Questions you will ask, etc.

Example: add video to IM voice chat?
Compare voice chat with and without video Plan an experiment: Compare message time or difficulty in communicating or frequency… Consider: Tasks What data you want to gather How you would gather What analysis you would do after

Observation & Experiments

Similar presentations

Presentation on theme: "Observation & Experiments"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Observation & Experiments

Similar presentations

Presentation on theme: "Observation & Experiments"— Presentation transcript:

Similar presentations

About project

Feedback