The one takeaway, if nothing else: Any usability issue found by an inspection method is considered a false alarm if it is not also found by usability testing.
The article does a thourough job of discussing how usability testing should work. It does do a good deal of belaboring the obvious in the beginning (“The Focus is on Usability”, etc), but has a really solid section disussing the recording and analysis of data. I found particularly helpful the section on triangulation – a problem should manifest in several ways, and the section on identifying positive issues. The reason behind providing positive reviews – to ensure that the knowledge of what works is documented is something that changed my perspective on usability testing. Prior to reading this section, my unconcious assumption was that UX design/testing was a way of providing a “bug-free” experience. In reality, it needs to be much more than that.
The paper also has an extensive section on low vs. high fidelity prototypes, which will be useful later on when what we covered in this class becomes fuzzy. Which is why I’m keeping a copy of this one. The part on asynchronous remote testing was also interesting. As a user of Eclipse. I’ve seen the request to gather data many times. My guess is that this must be increasing in use, particularly since the paper was written.
Single usability metrics: Sauro and Kindlund (2005) and McGee (2003,2004) reported on options for a single metric they have used successfully. Need to look into this.
Severity ratings. The article spends some time discussing the lack of consistency in assigment of ratings between users. Interestingly, in the current issue of Communications of the ACM, there is an article on student grading in MOOCS. It seems that there has been some success with having students first evaluate several common baseline designs, and then grade each other’s work. The grades can then be normalized using the baseline evaluations as the reference. As long as students don’t game the system (see the [iterated?] Prisoners dilemma), this could be a good approach for a potential solution.
On thinking aloud. The paper supports this very enthusiastically. I’m personally a fan for more loosely structured situations where patterns emerge rather than running through a detailed script, but the power of the technique is such that it supports this wide variety of applications.
The section on testing special populations is enormous and just a good reference. Too much to actually summarize in any meaningful way.
The section on balancing harm with purpose was thought-provoking, but in my opinion, unhelpfully vague. When harm is considered stress from a participant struggle with a UI, I think that the term harm may have lost meaning. It may be that we need to have a term for insignificant harm. Struggle is a part of life. Will we have to submit an IRB before distributing an exam? In many cases, stress is beneficial in the long run while being uncomfortable in the short run. Consider aerobic training. Wind sprints are undeniably stressful, but critical for improving performance.
In the Future Directions section, the section on the RITE method was quite interesting. To a significant degree, it has been incorporated into the Agile development process, and integrates well into iterative development. In a way, the ultimate expression of development may either converge towards some kind of collaborative development that includes the users from initial concept to rollout, or perhaps even more interesting, user-driven development.