How to Design an AI that can Perform Usability Testing

The future of fallibility

· usability testing,AI,card sorting

Usability testing is great, but by far the hardest part - and the part which most amateur researchers underestimate in terms of time, expense and complexity - is finding and recruiting representative users.

But what if we didn't need to? What if we could just turn some Artificial Intelligence lose on our designs, and wait for the results to pour in?

In fact, it's happening today:

Navigating even a relatively simple network of roads populated by real people creates enormous usability issues. For the AI. And in this case, it’s the AI’s “mental model” that is being reprogrammed, rather than the system itself being redesigned. 

But if the purpose of usability testing is to refine the system’s behavior, rather than the users’, then you have a slightly different task on your hands.

Avoid Unnecessary Complexity

I think we are getting closer to the point where a sufficiently complex AI could uncover general usability issues. And, in fact, even an insufficiently complex AI could be helpful.

Usability issues generally occur when the system designers’ understanding of how a system should work differ from users’ mental models of that system, and their resulting expectations.

So, an AI which mapped a set of expectations which was purposefully mismatched against the designers’ expectations or intentions might help to uncover usability issues with a design.

Of course, if you could map those different mental models, why not just design the system to match them?

Here’s a simple example: take an information architecture of a relatively large website in English, represented by cards in a card sort. Have an AI “perform” that card sort, but using a model of the English language that was representative of someone for whom English was a second language. (By which I mean, a model which “knew” the primary definition of most English words, but perhaps not their secondary meanings.)

The resulting card sorting results would possibly uncover some unexpected pairings and groupings, based on a AI’s differentiated understanding of linguistic recommendations.

Admittedly, this would be fairly rudimentary testing; I think it would mostly point out the obvious: complex and ambiguous terminology should be avoided in creating a navigational hierarchy.

But, you could also flip this around, and apply the “knowledge” of a “native English-speaking” AI to a navigational structure designed by a non-native English speaker. Such testing would potentially help to uncover ambiguities that may not be immediately obvious.

For truly general-purpose usability testing to be performed by an AI interface, you’re going to need an AI which can replicate human fallibility - and not just random errors, but purposeful actions which deviate from expected norms.

In short: you’re going to need a self-driving car that occasionally decides to run a red light because it’s in a hurry.