The Five User Problem

Natan Voitenkov

Apr 24, 2024

10

min read

The notion that we should only interview ~5 users per study is attributed to Jakob Nielsen circa 2000, but the idea initially came out of a paper in 1993 titled “A mathematical model of the finding of usability problems” by Nielsen and Landauer. They found in eleven studies that: “the detection of usability problems as a function of the number of users tested or heuristic evaluators employed is well modeled as a Poisson process”, which is just a fancy way of saying there are diminishing returns once you surpass a certain number of usability tests.

Interestingly, the original number in the article isn’t five. It’s fifteen, in the sense that the mathematical model shows “you need to test with at least fifteen users to discover all the usability problems in the design.” However, what Nielsen recommended was taking an iterative approach, probing into the usability of a website several times (ideally three), with five users tested at each iteration. This supposedly gets you the best result, and this was how the famed five-user usability test was born.

A few years after the original 1993 paper by Nielsen and Landauer, Rolf Molich led and published far less famed work. In it, Molich et al. conducted a comparative evaluation of usability tests across four labs, testing the usability of a calendar program. Collectively, the labs found 162 usability problems, but only 13 of the issues were found by more than one team. Thus, there was very little overlap between team findings, and the usability issues found by each team were not reproduced by other teams. This work was an early instance of pushback against the notion of testing with five users only, however it was mostly overlooked.

But a lot has changed since the 1990’s. As Tim Maughan wrote in his excellent article: “The Modern World Has Finally Become Too Complex for Any of Us to Understand”. Software has become exponentially more complex. Creating usable, not to mention delightful user experiences has become increasingly challenging.

There are several reasons for the exponentially increased complexity of the systems we build:

  1. Technological Advancements: Software development tools and capabilities have become much more sophisticated..

  2. Increased User Expectations: Users today expect more functionality and personalized experiences.

  3. Competition: Companies try to differentiate themselves by adding features and services.

  4. Expanding Business Models: websites now offer a much wider range of products, services, and content than they used to, to a much wider range of people.

Take news websites, such as CNN, as an example. In the early 2000s, news websites were fairly straightforward: mostly text articles, perhaps a few basic images. Fast-forward to today, and you'll find video integration, live updates, interactive graphics, social media feeds, topic hubs, and lots of advertising. It’s a significantly more complex user experience.

And why does this matter? Well, Nielsen wrote back in 2000 that “Elaborate usability tests are a waste of resources”, and it’s possible that back then, he made a solid point. Back when experiences weren’t elaborate, testing them did not require an elaborate approach. But this premise no longer holds, and now that a human generation and several technological eras have passed, maybe it’s time to re-evaluate how we research and test experiences?

Well, fast forward 23 years or so, two researchers did. In a study by Felix Chopra and Ingar Haaland , they demonstrated that when you explore the insights from ~400 participants interviewed by AI, and you compare whether random sub-sets uncover the same patterns, you find that the answer is a resounding NO. The co-occurence of codes (which translate into insights) is low, meaning there’s clear value in conducting large-sample qualitative studies.

A recent Case Study by Genway for a mobile-first B2C company - engaged over 220 interviewees on mobile devices. The vast majority chose to complete the interview, and almost 50% extended the interview beyond the scheduled time.

This may surprise you, but in their 2000 article, even Nielsen acknowledged that sometimes, you DO need to test with more than five users.

Here’s the quote:

“You need to test additional users when a website has several highly distinct groups of users. The formula only holds for comparable users who will be using the site in fairly similar ways.”

Note reason #4 for the increased complexity of the systems we’re building - companies are offering much more than ever before, to a much broader and more diverse population of people. At Genway AI , we work with enterprises that have dozens and sometimes hundreds of distinct customer segments, global audiences with vast language and cultural differences, users with disabilities and accessibility issues, and all sorts of niche needs. It’s a rich tapestry of humans and human needs, which can’t be tested and re-tested with samples of five individuals.

A Case Study by Genway for a US-based Software company - covered 23 distinct user segments - represented by the different geos.

In short, things were different when Nielsen published the extremely influential “Why You Only Need to Test with 5 Users” article. Software was more straightforward, and the people using it were perhaps more homogeneous and more comparable to each other.

However, there was one other difference between then and now: then, teams didn’t typically have the resources to run research on more than a few users at the time. Larger-scale research required more people, and teams didn’t have ‘em.

With Genway AI and the power of AI-led research, you can reach any number of participants you need to fully represent the diverse population of your customer/user base. You’re not limited to a number of iterations or a single segment of users. You’re not limited to a local audience, instead of the global audience you serve. And you’re not limited to focusing solely on people who don’t have special needs and requirements in order to benefit from the accessible experience they deserve.

There may be differing perspectives and calculations on the ideal number of users for a usability study, but one thing is for sure: Anyone building software has a tremendous responsibility to the people they serve. We talk about “customer obsession” constantly, but we need to stand behind that statement. Customers expect our best - they don’t want to go into an MRI machine that was only tested with five doctors. They don’t want to be in a plane tested on a tiny sample of pilots. They want you to test what you’re building rigorously and with dedication to the highest standards of usability.

Genway AI is dedicated to transporting customer-obsessed teams into the next era of research, so you can understand the diverse population of people you serve at the scale and speed that only AI can afford you. Our end-to-end user interviewing solution leverages the latest AI technology to conduct user research autonomously, augmenting research by Insight Generating Functions with actionable insights on broad, diverse populations in real time. We offer a solution that benefits businesses building inclusive products and everyone they wish to serve.

If you’d like to learn more about what we’re up to at Genway AI , check out our website at www.genway.ai. We’re working hard to leverage AI in ways that benefit our society and help us build technology inclusively.

We’re always looking for feedback; If you’d like to try Genway, reach out at natan@genway.ai or DM me on LinkedIn.

Ready to supercharge your research?

Ready to supercharge your research?

Generate insights with depth and scale using AI Interviewers

Schedule a demo
Schedule a demo