LLMs can seem remarkably capable in some settings and surprisingly limited in others. Explore the paradox of knowledge vs. generalization.
LLMs can seem remarkably capable in some settings and surprisingly limited in others. They write code most humans cannot, yet fail at tasks a mosquito handles effortlessly. How can something be so smart and so stupid at the same time? To answer that, we need to define what "smart" really means - and it starts with generalization.
Every AI lab is racing toward AGI - Artificial General Intelligence. But what exactly does "General" mean in this context? And where are the large models really on that path - are we close, or is it mostly hype?
Generalization is easier to recognize than to define precisely. It is handling small variations in a task without relearning. You know the route from A to B; an obstacle appears - you adapt and go around it. That is generalization.
Formalized: generalization is how well an entity handles new but related tasks. If X learns Y0, can it perform Y1 without being taught? Picture a chain (Y1, Y2... YN). As N grows, tasks grow more distant from Y0. Broader generalization means a larger N.
With that definition, we can build a rough scale.
At the bottom sits the traditional computer program: it does exactly what its code says, and fails on any deviation. Apparent flexibility comes from programmers anticipating many cases, not from real adaptation. Call it Y1 - not zero, because heuristics allow a tiny spread, but barely above it.
At the top sit humans, around Y100.
In between, consider a mosquito. It survives in the wild - finding food, avoiding threats, reproducing. Not elegantly, but well enough that the species endures. And the variation it faces is staggering: mid-flight, it receives a stream of inputs - light, scent, air - that is unique at every instant. No mosquito has ever seen exactly what this one sees right now. The world offers infinite variation at every scale. That kind of real-world adaptability places it around Y10.
So on this rough 0-to-100 scale: computers at 1, mosquitoes at 10, humans at 100. Where do LLMs fall?
Deep-learning LLMs generalize far better than conventional software. We never built a rule-based system that could produce fluent natural language - but we did train neural models to do it. That jump from explicit programming to learned behavior is generalization in action.
But LLMs inhabit a narrow domain: rigid rules, largely predictable outputs. Trained heavily on that predictability, they generalize well - but only inside it. Their responses rarely stray far from the patterns they have seen. A "novel" prompt is usually close enough to familiar ones that their limited generalization suffices. The variation LLMs face is far smaller than what a mosquito must cope with in the wild. On our scale, LLMs sit around 5 - below a mosquito.
Some would argue: a mosquito cannot write code; Claude can. So Claude must be smarter than a mosquito - after all, most people cannot code either.
That holds only if "smart" means code-writing. If "smart" means generalization, it does not - Claude generalizes worse than a mosquito. The definition of smart matters.
So how can Claude write code without broad generalization? The answer: training, and a lot of it. It saw effectively all open-source code humanity has produced - GitHub-scale. No human could absorb a fraction of that. LLMs surpass any individual in sheer exposure to knowledge. The full span of human knowledge is instantly accessible to them.
That explains the paradox. Code and natural language are constrained domains - predictable, verifiable, reproducible. Training thrives there. We can check outputs, validate syntax, feed errors back. The physical world a mosquito navigates offers no such structure. Massive training on well-behaved domains produces impressive depth of knowledge, but not breadth of generalization - Smart in knowledge, stupid in generalization.
A mosquito brain contains roughly 200,000 to 225,000 neurons. That alone suggests we are doing something wrong - that we are missing a crucial element in how intelligence really works. We have been optimizing for high knowledge and low generalization, while nature optimizes for the opposite: low knowledge, high generalization. Can we reach that simply by scaling? Increasingly, researchers are accepting that one more major breakthrough is necessary to arrive at AGI - an actual ability to generalize.
