A large study comparing AI and human creativity found that models can outperform average scores on some standardized measures.
A large study comparing large language models with 100,000 humans suggests that AI systems can outperform average human scores on some standardized creativity measures, especially tasks centered on semantic divergence. But the broader picture is more nuanced than the headline might suggest: the strongest human performers still remain ahead on overall creative performance.
What the study actually tested
The 100,000-human study, published in Scientific Reports, examined semantic divergence in humans and LLMs using the Divergent Association Task (DAT) as well as multiple creative-writing tasks, including haiku, story synopses, and flash fiction. The authors found evidence that LLMs can surpass average human performance on the DAT and approach human creative-writing abilities under objective scoring conditions.
However, the same paper also stresses an important limit: even the top-performing LLMs were still surpassed by the aggregated top half of human participants, suggesting that current models do not exceed the more creative segment of human performance.
Why the headline needs caution
It is tempting to summarize these results as “AI beats humans at creativity,” but that overstates what the research shows. These studies focus on specific, standardized proxies for divergent thinking and semantic originality – not the full range of human creativity, which also includes usefulness, intentionality, emotional resonance, and lived experience. A separate 2024 study in Scientific Reports found GPT-4 outperforming 151 human participants on several divergent-thinking tasks, but the authors explicitly noted that this does not prove AI is more creative “across the board.”
The wider research picture is mixed
The strongest reason to moderate the article’s framing is that another major paper, published in Nature Human Behaviour in 2026, reached a different top-line conclusion. In that large-scale comparison, researchers found that human creativity on average is slightly higher than that of LLMs, with humans showing greater variability and stronger performance at the most creative extreme of the distribution.
That means the current evidence does not support a blanket claim that generative AI has generally surpassed humans in creativity. A more accurate conclusion is that LLMs now perform very strongly on certain structured creativity benchmarks and may exceed average human performance on some of them, while still falling short of the most creative human outputs overall.
What this means in practice
For creative work, the implication is less about replacement and more about positioning. AI appears increasingly capable as an idea-generation engine on benchmarked tasks, especially where creativity is measured through semantic novelty or divergent association. But the research does not yet justify treating LLM performance as equivalent to the full depth of human creative ability.
Bottom line
The story is real, but the current headline is too broad. A 100,000-human study suggests AI can beat average human scores on some creativity measures. It does not show that AI has generally surpassed humans in creativity, and it does not overturn evidence that top human performers still lead.
