
What If Your “Successful” AI Project Left Millions on the Table?
Congratulations! Your team just completed an AI project at work. The ML solution you deployed is averaging a 6% improvement, saving more than $2 million per year.
You hear your name being called. As you take the stage at the annual company all-hands to receive the employee of the year award, a strange thought crosses your mind:
Why 6%?
What if it could have been 8, 10, or even 12%?
Standing there, facing the spotlight and a sea of smiling faces, it hits you: you got an answer—but was it the best one? Could you confidently tell the board there’s no way the team could have done better?
The next day, you check in with your lead data scientist.
Yes, she tried several approaches. Cross-validation, hyperparameter tuning, a few different architectures. Eventually, she settled on the model with the best performance.
Should you feel satisfied?
Yes—and no.
Yes, exploring multiple options is better than stopping at the first plausible solution.
But no, it’s still limited. Research shows that asking one person to generate multiple solutions—what’s sometimes called “the crowd within”—can improve results, but only slightly. The gains quickly plateau. (See Müller-Trede, 2023.)
In contrast, simply averaging two independent responses from two different people often performs better.
This is the power of the wisdom of the crowd—an idea dating back to Aristotle and popularized in modern times by James Surowiecki. But unlike its cousin, “the crowd within,” the full version relies on independent, unbiased, and informed perspectives.
Let’s make that real.
The Jellybean Test
At your company’s all-hands event, there’s a giant jar of jellybeans. Whoever guesses the number closest to the actual count wins $5000.
What’s the best way to guess?
Ask everyone for their individual guesses. Then take the average.
Time and again, this method produces astonishingly accurate estimates—often closer to the true value than any single expert. Jack Treynor showed this in 1987, and others have replicated it since.
But there are caveats.
- Independence: If people compare notes or get influenced by the loudest voice, the crowd's strength weakens.
- Unbiasedness: If there’s a big sign saying “2,500 jellybeans,” everyone’s estimates might converge toward that anchor.
- Expertise: If there's a grapefruit hidden in the middle of the jar, someone with knowledge of candy displays (or who saw the jar being filled) might guess differently—and better.
So what does all this have to do with building AI systems?
Building Models with the Crowd in Mind
What if your internal team isn’t the only capable team?
What if you could ask many independent, skilled practitioners to build models, analyze data, or develop competing strategies—then compare or even combine them?
It sounds expensive. Impractical. But it’s becoming more possible.
Some organizations are experimenting with structured model tournaments, expert panels, and competitive modeling platforms that simulate this effect—giving leaders access to the wisdom of many, rather than relying on the insights of a single team.
Now imagine this:
You’re about to pitch the model’s results to the board. Everything’s riding on it—budgets, timelines, strategic bets. And someone asks, “How do we know this is the best we could have done?”
That’s not a moment to second-guess your process. It’s a moment to be ready.
Here are a few ways you can start building that confidence:
- Run internal competitions. Have 2–3 team members independently develop solutions to the same problem, then compare results.
- Invite external reviewers. Before finalizing an analysis, get outside perspectives—especially when the stakes are high.
- Use structured debate. Ask your team to generate multiple hypotheses or approaches, then debate them anonymously before converging.
Some companies even go further, using crowdsourcing platforms to access global pools of expert data scientists and analysts. (At Humyn.ai, the company I helped start, we’ve seen teams run open challenges that attract hundreds of expert submissions per project.) But whether or not you use tools like that, the takeaway is clear:
You don’t need to guess how many jellybeans are in the jar alone.
When it comes to building better AI solutions, more brains—used wisely—can mean better answers, more confidence, and fewer missed opportunities.
As the applause rises and you accept the award, you feel it—not just pride, but peace of mind. You didn’t settle for one answer. You sought the best one.