User Interface Evaluation
(based on Shneiderman chapter 4)
[ lecture notes | CSC 397 | Pete Sanderson | Computer Science | SMSU ]
Table of Contents
Introduction
Usability goals and objectives
Acceptance testing
Expert reviews
Usability testing and laboratories
Other usability tests
Surveys
Controlled psychologically oriented experiments
Other evaluation methods
Resources
Shneiderman chapter 4
The elements of user interface design, Theo Mandel, Wiley Computer Publishing, 1997.
Usability Professionals' Association
University of Maryland Human-Computer Interaction Laboratory
Introduction
In what ways are UI testing and general SW testing (say, of a function) alike?
- test plans can be developed before the product is.
- exhaustive testing generally not possible
- testing can be conducted by someone other than developer
- can be very expensive
In what ways are UI testing and general SW testing (say, of a function) different?
- UI testing can be conducted in different development phases (eg, requirements, design, testing, acceptance test)
- in UI testing, there are no absolute correct or incorrect results.
Importance of UI Evaluation depends on number of product-specific factors:
- Number of users
- Variety of users
- Importance of safety and reliability
- Consequences of user error
(related to safety but not same: consequence could be loss of online customer)
- Budget
UI Evaluation (or Usability) Plan should be developed and followed.
Many UI Evaluation Techniques exist (all require people and system and feedback)
- Expert Reviews
- Usability Lab Testing
- Written User Surveys
- Interviews with users
- Focus Groups
- Performance data logging
- Online suggestion box / newsgroup / bb
Return to the Top
Usability Goals and Objectives
What is usability? Paul Booth, in Introduction to HCI (1989) says
- Usefulness
: degree to which users can achieve their goals
- Effectiveness
: how well users can perform tasks
- Learnability
: how long to reach defined level of competence
- Attitude
: user perceptions and opinions
One approach is GQM : Goals - Questions - Metrics
All these things need to be determined before test:
- Goals
can be stated in terms of usability definition (above)
- Objectives (e.g. Questions)
must express specific desired results (
"X% of users will be able to perform Y within time Z ")
Metrics are measurements to be made (time required to perform Y, number of hyperlinks to reach X, any interview/survey question)
Return to the Top
Acceptance Tests
"To some extent, usability is a narrow concern compared to the larger issue of system acceptability, which basically is the question of whether the system is good enough to satisfy all the needs and requirements of the users and other potential stakeholders." Jacob Nielsen, 1993.
- Must establish precise acceptance criteria (not vague).
- "X% of users will be able to perform task Y within time Z while making less than Q errors."
- This is a testable nonfunctional requirement!
- Its purpose is to determine conformance to system requirements, not to find flaws or break the system.
Return to the Top
Expert Reviews
Can be staff member or outside consultant
Can be expert in application domain or in usability (or both)
- If expert in app only, may not be able to detect and describe all shortcomings.
- If expert in usability only, will require training in app before testing
Feedback can be:
- List of problems encountered
- Problems plus suggested solutions (carefully)
Review can occur at several development points
(perhaps after usability test has revealed that problems exist)
Different experts will notice different things (this is good and bad)
Difficult for expert to "simulate" typical user
Some Review Methods:
- Heuristic evaluation
: evaluate according to e.g. 8 Golden Rules
- Guidelines review
: evaluate according to developer guidelines
- Cognitive walkthrough
: simulate users working on tasks
- Formal Inspection
: expert vs. developer over UI problems
Return to the Top
Usability Testing and Laboratories
Gaining in popularity (reduces time and cost risk)
Can run range of formality.
- Formal: designed experiment
- Informal: a few subjects running usability test and reporting problems.
Requires: staff usability experts, participants, laboratory
Staff usability experts:
- Operate the lab
- Assist developer in test planning
- Conduct tests
- Report results
Test participants:
- Representative of system users
- The number of relevant attributes (cognitive/physical) depends on formality of test
- Voluntary (means not coerced, not necessarily free)
- Should be informed about test and consent in writing
Laboratory:
- Possibly devoted to usability testing
- Equipped for monitoring/observation
- Methods of monitoring: audio/video recorder, software (input device usage)
- Methods of observation: seated beside participant, behind 2-way mirror.
The test itself :
- participant works with system to complete a set of tasks.
- participant is encouraged to "think aloud"
- observer may sit with participant
(only to clarify comments; should not coach)
- Test should be max 3 hours.
- At end, specific questions can be asked and comments given.
- Videotaping is valuable evidence to developer, and most participants quickly ignore the videotaping.
- Input device events can also be logged by software to provide trace and much valuable productivity information.
Limitations to this approach
- Expensive; not all developers can afford it
- Participant doesn't have chance to become acquainted with system, so can't test aspects of system designed for expert user.
__________________________
In-class exercise: from SMSU home page (w/o using FindIt search),
1. find the name of the student body president
2. find the name of Economics dept head.
Don't forget to determine GQM ahead of time so you can quantitatively evaluate.
Return to the Top
Other usability tests
- Field test is one possibility (like beta) -- tested by actual users!
- formal experiment pitting system against previous or competitor.
CASE IN POINT: usability study conducted in 1995 by International Data Corp (IDC) comparing : Win95, MacOS, OS/2.
Users performed a set of tasks.
Some findings:
Metric |
Win95 |
MacOS |
OS/2 |
A: Total time spent |
58 min |
72 min |
116 min |
B: Completed >= 8 of 10 tasks |
76% |
58% |
31% |
C: Of B, did it in < 1 hr |
85% |
47% |
Not given |
D: Of B, total time spent |
--- |
22% longer |
51% longer |
WHAT QUESTIONS DO YOU HAVE ABOUT THE SURVEY?
Some answers (from Mandel book)
- Who funded the study? Microsoft
- PC memory config? Min for MacOS, twice min for Win95 (16 MB)
- According to Apple response:
- Over 20% of Mac users selected from Microsoft list
- Tests "unfairly highlighted" Win95 features
- Win95 terminology was used
- Did not include system-level functions for which Mac is superior
- One test had users change screen resolution but warned against changing # colors (forces reboot on Win95 but not Mac)
By the way, Apple does this sort of thing too.
- (1996 Apple vs Microsoft at Software Publishers Association convention, live on stage: setup system from box, install peripherals, connect to net, install and deinstall apps. Mac won.)
- (iMac demo you can view at SMSU bookstore.)
Return to the Top
Surveys
- Valuable companion to usability testing and expert review.
- "The keys to successful surveys are clear goals in advance and then development of focused items that help to attain those goals."
- Example, for OAI model, could ask opinions of task OA, interface OA, metaphors, syntax
- Also ask questions about user background.
- Online surveys cheaper (save printing and mailing costs)
- Beware of bias in online or mailin surveys (self-selected sample).
- QUIS - Questionnaire for User Interaction Satisfaction. Table 4.1 in text (p 136). (http://www.lap.umd.edu/QUISFolder/quisHome.html).
- QUIS developed by a multi-disciplinary team of researchers in the Human-Computer Interaction Lab (HCIL) at the University of Maryland at College Park. (B. Shneiderman, director)
Return to the Top
Controlled Psychologically Oriented Experiments
- Designed scientific experiments can yield dramatic results.
- Are being used more frequently in UI testing.
- They are more expensive than less-formal methods.
- Require training in statistical methods
Return to the Top
Other evaluation methods
1. interviews (expensive)
2. focus-group discussions (to determine areas of common concern)
3. consultation available online or by phone
4. online suggestion box
5. BBS, listserv, newsgroup
6. newsletters and conferences
Return to the Top
[
lecture notes | CSC 397 | Pete Sanderson | Computer Science | SMSU ]
Last reviewed: 25 September 1998
Peter Sanderson ( pete@csc.smsu.edu )