The book really shows its age here. I think my favorite part was when a couple of paragraphs were given over to describing how to use a scan converter to grab the monitor outputs, with the appropriate caveats about how (RS-170) video won’t have very good resolution, etc.
The questions about reliability and validity with respect to testing are as valid as ever. The question is whether the methods described in this chapter are really performed as described any more. It seems that testing is being “crowdsourced” these days. For example, Google is known for trying out variations of it’s interface to a subgroup of (randomly chosen?) users to see how user behaviors change using “Mouse Move Heatmaps”, for example. The analysis of the results can be entirely(?) automated, with a defensible numeric set of values being provided to indicate the usability of a website or page.
- People are exposed to many ideas of what makes for a good/nice/easy GUI, which I would think makes users much more tolerant of a heterogeneous visual environment
- With the cost to access software being so low, there is a strong market component to the evolution of software. It’s easy to find examples of what’s successful and copy it.
- Anyone who wants to can have reasonably sophisticated analytics on their website/app/application. You can have a pretty good idea if people are using your product the way you intended
I would contend that the above conditions are probably both more valid and more reliable than almost anything that can come out of a classic usability lab. This may explain why the only thing that I see regularly being studied in labs are user interfaces that are dramatically different from anything in use. Because if it’s different enough, then there is no user base, and the need to pull out meaningful information from a small sample set becomes critical to the initial development of the technology, or at least in how it is embraced by users.
With that in mind, I would contend that the “Thinking Aloud” method may actually be the best method for obtaining this information. The very “general-ness” and lack of constraint means that the structure of the test is least likely to influence the outcome. Issues can emerge. If these issues emerge consistently enough, then they can be addressed to see if they continue to emerge. Over time though, more data is gathered with the interface, which means that analytics can start to contribute. At some point a handoff occurs, and we’re back to analytics. I’d be curious to see how this process worked with the introduction of tablets and multi-touch interfaces.