Questionnaires contain closed questions (attitude scales) and open questions pre- and post questionnaires obtain ratings on an issue before and after an design change can be used to standardise attitude measurement of single subjects following direct observation can be used to survey large user groups Questionnaires are often badly designed, as they are perceived as being trivial. 10
Types of rating scales Can you use the following edit commands? yes no don't know duplicate paste A simple checklist 11
Multipoint checklist Rate the usefulness of the duplicate command on the following scale? very of no useful use 12
Likert Scale statement of opinion to which the subject expresses their level of agreement Computers can simplify complex problems very much agree slightly neutral slightly disagree strongly agree agree disagree disagree 13
Caution! what does 'strongly disagree' mean? The help facility in system A is much better than the help facility in system B very much agree slightly neutral slightly disagree strongly agree agree disagree disagree what does 'strongly disagree' mean? The response ‘very much agree’ is clear - A is much better than B ‘Strongly disagree’ could mean ‘I think B is much better than A’ … but it could also mean ‘I think there is no difference between A and B, and so I strongly disagree with the opinion stated in the question’ 14
Semantic differential Scale uses a series of bi-polar adjectives and obtains ratings which respect to each Rate the Beauxarts drawing package on the following dimensions extremely quite slightly neutral slightly quite extremely easy difficult clear confusing fun dreary This type of question needs to be followed by an open-ended question where the user can explain any negative responses which are given. Simply knowing that the package is ‘extremely difficult’ without knowing why, is of limited value 15
Rank Order Place the following commands in order of usefulness (rank the most useful as 1, the least useful as 4) paste duplicate group clear This question lacks task context. I.e ‘usefulness’ for what? This doesn’t matter if the question is asked after the user has completed a specific task, say as part of a user trial. The question would be fairly meaningless if it formed part of a general survey of user opinion of the interface to a word processing package, for example. In this case, there are many tasks associated with document preparation that the word processor could be used for, and the usefulness of the command would depend which tasks were being considered. 16
Do and Don'ts with Questionnaire evaluation do be clear about the information you want to obtain don't risk subjects becoming demotivated don't be lazy do provide specific task reference for questions don’t assume that responses will be positive do pilot the questionnaire first have a clear idea of what specifically you want information about and ensure there are questions that directly address these issues don't risk subjects being demotivated not interested in the questionnaire questionnaire is too long don't be lazy focus questions to the specific interface. Make sure that all questions apply to the interface being evaluated avoid 'not applicable' responses if questions ask for opinions about particular details of the use of the interface, ensure that the task context is clear. although you may think that the interface is very good, the questionnaire has to be objective and allow for as many negative comments as positive. Ensure that that there is sufficient opportunity for users to justify negative attitudes as positive ones. 17
Planning and logistics of questionnaire design Quantitative or qualitative? Legal requirements: the Data Protection Act Confidentiality and anonymity Sample size Volunteer respondents Identifying subject areas Determining appropriate length Typical time scale Main components of questionnaires
Content of items Avoiding response set Components of attitudes Common types of faulty items leading questions context effects double barelled questions vague and ambiguous terminology hidden assumptions social desirability
Leading questions and context effects Would you agree that the governments policies on health are unfair? Item wordings should not contain value judgements How many pints of beer did you drink last night? Think how the context of the study would affect the response, say in a survey of young peoples life styles survey of health behaviour and heart disease
Double barreled questions Do you believe the training programme was a good one and effective in teaching you new skills? avoid questions that involve multiple premises
Vague and ambiguous terminology How often do you clean your teeth? Frequently often infrequently never what does ‘frequently’ mean? Give quantifiers to ensure all respondents understand the same thing by the response categories
Hidden assumptions, social desirability When did you last borrow a video tape? Avoid hidden assumptions - what are these? Do you ever give to charity? May lead to a positive response as otherwise something negative about the respondent is being conveyed
User diaries Used with early releases of complete systems People use system as part of their normal work and keep a log of the tasks they have used the system for and whether or not they were successful in using the system Has the advantage of using real tasks not contrived standard tasks Requires that the system is capable of supporting enough tasks to be useful to the person doing the evaluation Requires input from the evaluator to maintain the person’s motivation to keep using the diary
Observation and monitoring usage User trials direct and indirect observation verbal protocolls Collecting user opinions User diaries over period of extended use Surveys Software logging 4
User trials - duration Aimed at observing people who are typically of the intended user group using the interface (often a prototype) People are usually volunteers – this limits the time available for each trial – serious constraint on what can be done How much of your time would you give to help someone test a piece of software? Assume a total trial length of 30 – 45 minutes – this has to include introduction, demonstration, data collection and de-brief If subjects are paid then longer trials are possible
User Trials: structured tasks One approach is to give subjects a series of standard tasks to complete using a prototype observe subject completing tasks under standardised conditions data collection aimed at ensuring that qualitative descriptions of problems during task completion are captured Intention is to see whether different people encounter similar problems when using the interface what problems are likely to arise in data recording? 5
Standard tasks in user trials structure tasks into incremental difficulty (easy ones first) have a clear policy on subject becoming stuck and providing help have a reason for including each task (avoid unnecessary duplication) ensure (all) functional areas of interface usage are covered ensure tasks of sufficient complexity are included 6
Example of standard tasks ‘Find the time of the latest train service leaving Leicester that I can take next Tuesday to arrive in Dundee before 8.00 pm’ ‘Find the cost of a return ticket for 2 adults and 2 children for the journey from Leicester to Bristol with no discounts such as saver or supersaver’ ‘Find how many copies of Preece ‘Human-Computer Interaction’ the library currently holds’ Note: each task has a definite end point – the user can provide the answer to the question, which is either correct or incorrect
Unstructured user trials Another approach to user trials is to ask the user to browse through information – more appropriate for web-sites or multimedia presentations Browsing behaviour is directed by the user’s interest rather than being asked to retrieve a specific piece of information No guarantee that subject visits all parts of the application or site – how much of the site they visit is often useful information in itself Requires that subject is actually interested in the application or site
Indirect observation - video enables post-session debriefing 'talk-through' (post-event protocolls) enables quantitative data to be extracted - e.g. part task timings serves as a diary and visual record of problems usually very time consuming to analyse usability laboratories – facilities to administer standard tasks, record data and analyse these 7
Verbal protocols means of enhancing direct observations user articulates what they are thinking during task completion (think-aloud protocols) but… doing this can alter normal behaviour subject likely to stop when undertaking complex cognitive activities user may rationalise behaviour in post-event protocols get subjects working in pairs - co-discovery can overcome some of these problems. Think about driving a car - when the task of driving is not demanding, the driver can normally hold a conversation with a passenger. As soon as something happens which demands the drivers attention, conversation automatically stops while the driver attends to the driving task. The conversation is resumed when the driving situation has passed. Post-event protocols occur when a user, say, watches a recording of an interaction session and talks through what they were thinking during the session. This needs to take place immediately after the session. An variation on this is where the investigator selects parts of the recorded session which appears to have caused the user problems and the reasons for the apparent difficulty are talked through. Post event analysis can add considerable time to an evaluation session. 8