User Interface Evaluation

User Interface Evaluation

(based on Shneiderman chapter 4)

[ lecture notes | CSC 397 | Pete Sanderson | Computer Science | SMSU ]

Table of Contents
Introduction
Usability goals and objectives
Acceptance testing
Expert reviews
Usability testing and laboratories
Other usability tests
Surveys
Controlled psychologically oriented experiments
Other evaluation methods

Resources
Shneiderman chapter 4
The elements of user interface design, Theo Mandel, Wiley Computer Publishing, 1997.
Usability Professionals' Association
University of Maryland Human-Computer Interaction Laboratory

Introduction

In what ways are UI testing and general SW testing (say, of a function) alike?
- test plans can be developed before the product is.
- exhaustive testing generally not possible
- testing can be conducted by someone other than developer
- can be very expensive
In what ways are UI testing and general SW testing (say, of a function) different?
- UI testing can be conducted in different development phases (eg, requirements, design, testing, acceptance test)
- in UI testing, there are no absolute correct or incorrect results.
Importance of UI Evaluation depends on number of product-specific factors:
- Number of users
- Variety of users
- Importance of safety and reliability
- Consequences of user error
(related to safety but not same: consequence could be loss of online customer)
- Budget
UI Evaluation (or Usability) Plan should be developed and followed.
Many UI Evaluation Techniques exist (all require people and system and feedback)
- Expert Reviews
- Usability Lab Testing
- Written User Surveys
- Interviews with users
- Focus Groups
- Performance data logging
- Online suggestion box / newsgroup / bb

Return to the Top

Usability Goals and Objectives

What is usability? Paul Booth, in Introduction to HCI (1989) says

Usefulness : degree to which users can achieve their goals

Effectiveness : how well users can perform tasks

Learnability : how long to reach defined level of competence

Attitude : user perceptions and opinions

One approach is GQM : Goals - Questions - Metrics

All these things need to be determined before test:

Goals can be stated in terms of usability definition (above)

Objectives (e.g. Questions) must express specific desired results ("X% of users will be able to perform Y within time Z ")

Metrics are measurements to be made (time required to perform Y, number of hyperlinks to reach X, any interview/survey question)

Return to the Top

Acceptance Tests

"To some extent, usability is a narrow concern compared to the larger issue of system acceptability, which basically is the question of whether the system is good enough to satisfy all the needs and requirements of the users and other potential stakeholders." Jacob Nielsen, 1993.

Must establish precise acceptance criteria (not vague).
"X% of users will be able to perform task Y within time Z while making less than Q errors."
This is a testable nonfunctional requirement!
Its purpose is to determine conformance to system requirements, not to find flaws or break the system.

Return to the Top

Expert Reviews

Can be staff member or outside consultant
Can be expert in application domain or in usability (or both)

If expert in app only, may not be able to detect and describe all shortcomings.
If expert in usability only, will require training in app before testing

Feedback can be:

List of problems encountered
Problems plus suggested solutions (carefully)

Review can occur at several development points
(perhaps after usability test has revealed that problems exist)
Different experts will notice different things (this is good and bad)
Difficult for expert to "simulate" typical user
Some Review Methods:

Heuristic evaluation: evaluate according to e.g. 8 Golden Rules

Guidelines review : evaluate according to developer guidelines

Cognitive walkthrough : simulate users working on tasks

Formal Inspection : expert vs. developer over UI problems

Return to the Top

Usability Testing and Laboratories

Gaining in popularity (reduces time and cost risk)
Can run range of formality.

Formal: designed experiment
Informal: a few subjects running usability test and reporting problems.

Requires: staff usability experts, participants, laboratory
Staff usability experts:
- Operate the lab
- Assist developer in test planning
- Conduct tests
- Report results
Test participants:

Representative of system users
The number of relevant attributes (cognitive/physical) depends on formality of test
Voluntary (means not coerced, not necessarily free)
Should be informed about test and consent in writing

Laboratory:

Possibly devoted to usability testing
Equipped for monitoring/observation
Methods of monitoring: audio/video recorder, software (input device usage)
Methods of observation: seated beside participant, behind 2-way mirror.

The test itself :

participant works with system to complete a set of tasks.
participant is encouraged to "think aloud"
observer may sit with participant
(only to clarify comments; should not coach)
Test should be max 3 hours.
At end, specific questions can be asked and comments given.
Videotaping is valuable evidence to developer, and most participants quickly ignore the videotaping.
Input device events can also be logged by software to provide trace and much valuable productivity information.

Limitations to this approach

Expensive; not all developers can afford it
Participant doesn't have chance to become acquainted with system, so can't test aspects of system designed for expert user.

__________________________

In-class exercise: from SMSU home page (w/o using FindIt search),

1. find the name of the student body president

2. find the name of Economics dept head.

Don't forget to determine GQM ahead of time so you can quantitatively evaluate.

Return to the Top

Other usability tests

Field test is one possibility (like beta) -- tested by actual users!
formal experiment pitting system against previous or competitor.

CASE IN POINT: usability study conducted in 1995 by International Data Corp (IDC) comparing : Win95, MacOS, OS/2.

Users performed a set of tasks.

Some findings:

Metric	Win95	MacOS	OS/2
A: Total time spent	58 min	72 min	116 min
B: Completed >= 8 of 10 tasks	76%	58%	31%
C: Of B, did it in < 1 hr	85%	47%	Not given
D: Of B, total time spent	---	22% longer	51% longer

WHAT QUESTIONS DO YOU HAVE ABOUT THE SURVEY?

Some answers (from Mandel book)

Who funded the study? Microsoft
PC memory config? Min for MacOS, twice min for Win95 (16 MB)
According to Apple response:

Over 20% of Mac users selected from Microsoft list
Tests "unfairly highlighted" Win95 features
Win95 terminology was used
Did not include system-level functions for which Mac is superior
One test had users change screen resolution but warned against changing # colors (forces reboot on Win95 but not Mac)

By the way, Apple does this sort of thing too.

(1996 Apple vs Microsoft at Software Publishers Association convention, live on stage: setup system from box, install peripherals, connect to net, install and deinstall apps. Mac won.)
(iMac demo you can view at SMSU bookstore.)

Return to the Top

Surveys

Valuable companion to usability testing and expert review.
"The keys to successful surveys are clear goals in advance and then development of focused items that help to attain those goals."
Example, for OAI model, could ask opinions of task OA, interface OA, metaphors, syntax
Also ask questions about user background.
Online surveys cheaper (save printing and mailing costs)
Beware of bias in online or mailin surveys (self-selected sample).
QUIS - Questionnaire for User Interaction Satisfaction. Table 4.1 in text (p 136). (http://www.lap.umd.edu/QUISFolder/quisHome.html).
QUIS developed by a multi-disciplinary team of researchers in the Human-Computer Interaction Lab (HCIL) at the University of Maryland at College Park. (B. Shneiderman, director)

Return to the Top

Controlled Psychologically Oriented Experiments

Designed scientific experiments can yield dramatic results.
Are being used more frequently in UI testing.
They are more expensive than less-formal methods.
Require training in statistical methods

Return to the Top

Other evaluation methods

1. interviews (expensive)
2. focus-group discussions (to determine areas of common concern)
3. consultation available online or by phone
4. online suggestion box
5. BBS, listserv, newsgroup
6. newsletters and conferences

Return to the Top

[ lecture notes | CSC 397 | Pete Sanderson | Computer Science | SMSU ]

Last reviewed: 25 September 1998

Peter Sanderson ( pete@csc.smsu.edu )