Intro to Evaluation See how (un)usable your software really is…

Why evaluation is done?  Summative assess an existing system judge if it meets some criteria  Formative assess a system being designed gather input to inform design  Summative or formative? Depends on  maturity of system  how evaluation results will be used Same technique can be used for either

Other distinctions  Form of results of obtained Quantitative Qualitative  Who is experimenting with the design End users HCI experts  Approach Experimental Naturalistic Predictive

Evaluation techniques  Predictive Evaluation Fitt’s law, Hick’s, etc.  Observation  Think-aloud  Cooperative evaluation Watch users perform tasks with your interface Next week…

More techniques  Empirical user studies (experiments) Test hypotheses about your interface Examine dependent variables against independent variables More next lecture…  Interviews  Questionnaire  Focus Groups Get user feedback More in two weeks…

Still more techniques  Discount usability techniques Use HCI experts instead of users Fast and cheap method to get broad feedback Heuristic evaluation  Several experts examine interface using guiding heuristics (like the ones we used in design) Cognitive Walkthrough  Several experts assess learnability of interface for novices You will do one of each of these

And still more techniques  Diary studies Users relate experiences on a regular basis Can write down, call in, etc.  Experience Sampling Technique Interrupt users with very short questionnaire on a random-ish basis  Good to get idea of regular and long term use in the field (real world)

Evaluation is Detective Work  Goal: gather evidence that can help you determine whether your usability goals are being met  Evidence (data) should be: Relevant Diagnostic Credible Corroborated

Data as Evidence  Relevant Appropriate to address the hypotheses  e.g., Does measuring “number of errors” provide insight into how effective your new air traffic control system supports the users’ tasks?  Diagnostic Data unambiguously provide evidence one way or the other  e.g., Does asking the users’ preferences clearly tell you if the system performs better? (Maybe)

Data as Evidence  Credible Are the data trustworthy?  Gather data carefully; gather enough data  Corroborated Do more than one source of evidence support the hypotheses?  e.g. Both accuracy and user opinions indicate that the new system is better than the previous system. But what if completion time is slower?

General Recommendations  Identify evaluation goals  Include both objective & subjective data e.g. “completion time” and “preference”  Use multiple measures, within a type e.g. “reaction time” and “accuracy”  Use quantitative measures where possible e.g. preference score (on a scale of 1-7) Note: Only gather the data required; do so with minimum interruption, hassle, time, etc.

Evaluation planning  Decide on techniques, tasks, materials What are usability criteria? How much required authenticity?  How many people, how long  How to record data, how to analyze data  Prepare materials – interfaces, storyboards, questionnaires, etc.  Pilot the entire evaluation Test all materials, tasks, questionnaires, etc. Find and fix the problems with wording, assumptions Get good feel for length of study

Recruiting Participants  Various “subject pools” Volunteers Paid participants Students (e.g., psych undergrads) for course credit Friends, acquaintances, family, lab members “Public space” participants - e.g., observing people walking through a museum Email, newsgroup lists  Must fit user population (validity)  Note: Ethics, IRB, Consent apply to *all* participants, including friends & “pilot subjects”

Performing the Study  Be well prepared so participant’s time is not wasted  Explain procedures without compromising results  Session should not be too long, subject can quit anytime  Never express displeasure or anger  Data to be stored anonymously, securely, and/or destroyed  Expect anything and everything to go wrong!! (a little story)

Consent  Why important? People can be sensitive about this process and issues Errors will likely be made, participant may feel inadequate May be mentally or physically strenuous  What are the potential risks (there are always risks)?

Data Inspection  Start just looking at the data Were there outliers, people who fell asleep, anyone who tried to mess up the study, etc.?  Identify issues: Overall, how did people do? “5 W’s” (Where, what, why, when, and for whom were the problems?)  Compile aggregate results and descriptive statistics

Making Conclusions  Where did you meet your criteria? Where didn’t you?  What were the problems? How serious are these problems?  What design changes should be made? But don’t make things worse…  Prioritize and plan changes to the design  Iterate on entire process

Example: Heather’s study  Software: MeetingViewer interface fully functional  Criteria – learnability, efficiency, see what aspects of interface get used, what might be missing  Resources – subjects were students in a research group, just me as evaluator, plenty of time  Wanted completely authentic experience

Heather’s software

Heather’s evaluation  Task: answer questions from a recorded meeting, use my software as desired  Think-aloud  Video taped, software logs  Also had post questionnaire  Wrote my own code for log analysis  Watched video and matched behavior to software logs

Example materials

Data analysis  Basic data compiled: Time to answer a question (or give up) Number of clicks on each type of item Number of times audio played Length of audio played User’s stated difficulty with task User’s suggestions for improvements  More complicated: Overall patterns of behavior in using the interface User strategies for finding information

Data representation example

Data presentation

Some usability conclusions  Need fast forward and reverse buttons (minor impact)  Audio too slow to load (minor impact)  Target labels are confusing, need something different that shows dynamics (medium impact)  Need more labeling on timeline (medium impact)  Need different place for notes vs. presentations (major impact)

Your turn: assignment  In one week: draft of evaluation plan What are your goals How you will test each one – basic idea Early drafts of any materials  (tasks you want people to do, questionnaires, interview questions, etc.  Part 2 due in 2 weeks!

Your turn: in class  In your project groups Which usability goals are important for you? How might you measure each one?

Intro to Evaluation See how (un)usable your software really is…

Similar presentations

Presentation on theme: "Intro to Evaluation See how (un)usable your software really is…"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Intro to Evaluation See how (un)usable your software really is…

Similar presentations

Presentation on theme: "Intro to Evaluation See how (un)usable your software really is…"— Presentation transcript:

Similar presentations

About project

Feedback