« back
Room
Seezimmer 3
Thursday, September 13  »  16:00 - 17:15
Symposium 12
Usability testing in product design
Host
Juergen Sauer (University of Fribourg)
Chair
Juergen Sauer (University of Fribourg)
Discussant
Kai-Christoph Hamborg (University of Osnabrück)
Usability tests have been employed in industry for many years with a view to identify shortcomings and weaknesses in design before the product is being launched on the market. Some research shows remarkable inconsistencies across usability tests with regard to the usability problems identified. This raises the question about the chief factors that influence the outcomes of usability tests. This symposium aims to examine some of the influencing factors that are thought to affect user behaviour in usability tests. Studies are presented that have investigated the impact of the following factors: (a) prototype fidelity (e.g. computer and paper prototypes), (b) the presence of test observer, (c) users of different levels of expertise (novices vs. experts), and (d) data collection methods (e.g., thinking aloud technique). Furthermore, the symposium addresses the use of error classification systems to provide a better evaluation of the severity of usability problems identified.
Speakers
Juergen Sauer
The influence of prototype fidelity in usability tests
Authors
Juergen Sauer (University of Fribourg)
Andreas Sonderegger (University of Fribourg)

This paper presents an empirical study that examined the impact of prototype fidelity on user behaviour, subjective user evaluation and emotion. An experiment with a 3 x 2 between-subjects design was carried out, with prototype fidelity (paper prototype, computer prototype, fully operational appliance) and aesthetic design (highly vs. moderately modern) being employed as independent variables. The 60 participants of the experiment were asked to complete two typical tasks of mobile phone usage (sending text message and phone number suppression). Performance data as well as a range of subjective measures were taken. The results suggested that task completion time may be overestimated when a computer prototype is being used. Furthermore, users appeared to compensate deficiencies in aesthetic design by overrating the aesthetic qualities of reduced fidelity prototypes. Finally, user emotions were more positively affected by the operation of the aesthetically pleasing mobile phone than by the less appealing one.
Andreas Sonderegger
The influence of laboratory set-up in usability tests
Authors
Andreas Sonderegger (University of Fribourg)
Juergen Sauer (University of Fribourg)

Laboratory-based usability tests play an important role for product and system development. However, the influence of the test situation and the presence of test observers on users and their behaviour has hardly been examined. According to social facilitation theory, it may be assumed that the presence of observers influences the behaviour of users in the form of an increased arousal. This in turn may affect performance, in form of improvements in simple tasks and impairments in complex tasks. This hypothesis has been tested in an experiment with a 3x2 mixed design. The test situation was used as a between-subjects variable (presence of several observers, presence of one observer, no observer present). As a within-subjects variable, task difficulty was varied at two levels (low vs. high). 60 participants were asked to complete two tasks on a computer prototype of a mobile phone. Performance data, subjective measures as well as physiological parameters (heart rate variability) were taken. First analyses of the data suggest an increase of arousal when observers are present and a decrement of performance for the difficult task. This implies that the effect of observers has to be considered in usability tests in the future.
Katrin Seibel
The influence of expertise and prototype fidelity on product evaluation
Authors
Katrin Seibel (University of Technology Darmstadt (D))
Bruno Ruettinger (University of Technology Darmstadt (D))

Usability tests allow the evaluation of design options in early phases of the design process. The validity of usability tests depends on the degree the real usage can be modelled in a testing situation. There are some potential limitations concerning user characteristics (e.g. expertise) and prototype fidelity (e.g. functionality, dimensionality). This refers to the question if usability tests with different user groups and/or prototypes varying in fidelity will show different results and are therefore perhaps not suitable for predicting usability problems concerning the fully functioning product. This study examines the effects of user expertise (low vs. high) and prototype fidelity (low, medium, high) on performance and subjective evaluation. Experts (N=24) and novices (N=24) were observed interacting with a prototype and then asekd to identify usability problems. Because of a better cognitive representation of the product, prototype fidelity was expected to show less influence on experts´ performance and problem compared to novices. The results showed no effect of user expertise on performance but there was evidence for differences in the number and quality of the identified problems. The problems detected by experts focused on efficience and functionnality of the system whereas the problems identified by novices mainly concerned ergonomic aspects like localization of controls and ease of use.
Kai-Christoph Hamborg
Using the thinking aloud method in usability testing – what kind of data do you really get and what they are good for?
Authors
Kai-Christoph Hamborg (University of Osnabrueck (D))
Thinking aloud is considered one of the most valuable usability testing methods. Howbeit, it is currently being criticised that the application of the method is diverse and not well-motivated by theory (Boren & Ramey, 2000). Verbalizations comprising inferences about the subjects’ own cognition and information retrieved from long-term memory as well as opinions do not reflect data about mental processes reliably because they require additional cognitive processing beyond that required for task performance. If these aspects are not considered, the reliability of results from usability tests based on the thinking aloud method might be questionable. However, it has to be considered that the function of the method in usability tests is, unlike to its classic application area, to evaluate a software product and not to give insight into cognitive processes. Verbal data resulting from two usability tests applying the thinking aloud method were analyzed and categorized. Results show that the verbalisations comprise descriptions of the subjects´ behaviour, valuations concerning the software, strategic statements and statements reflecting how to reach a goal. Regarding the assumptions about the reliability and validity of verbal data, the gained verbalisations are indeed partly questionable. The findings are discussed with respect to the quality of the usability data collected by means of the thinking aloud method as well as the utility of the method to identify usability problems.
Sonja Kleinheinz
Development of a User-Centred Error Classification
Authors
Sonja Kleinheinz (University of Technology Darmstadt (D) )
Bruno Rüttinger (University of Technology Darmstadt (D))

User-product-interaction may lead to errors of different severity. Existing error taxonomies rarely use exact operationalisations in order to separate error severities. This aspect is included in a user-centred error classification which distinguishes between four severity degrees by means of the two dimensions “action goal reached?” and “superfluous action carried out?”. The four severity degrees are 1. inefficiency, 2. error (controllability and self-descriptiveness error), 3. inefficiency and error and 4. severe error. In inefficiencies the goal is reached but unnecessary actions increase task completion time. Errors include activities that do not lead to an intended goal by either handling the wrong function (self-descriptiveness error) or using the right function in a wrong way (controllability error). They represent operationalisations of two concepts of the ISO 9241-110. The third error severity degree consists of a combination of the first two severity types. In severe errors the goal is not reached and a threat for person and/or product is present. In a pilot study the product interaction of 96 subjects was videotaped and rated. The interrater-reliability was between .91 and 1. A quantitative analysis reveals most errors on level 2 with more controllability than self-descriptiveness errors. The results underline the relevance of the two ISO standards concepts controllability and self-descriptiveness and suggest the need for their consideration in product development.
« back