Why are LLM’s so good at writing code, but bad at everything else?
Guardrails
There seems to be a general consensus forming that while its debatable whether A.I. tools are effective or safe in other professions (education, medicine, legal, etc.), one place it clearly shines is in software development. Ignoring for a moment whether that’s actually true or not, its worth considering why that might be true and what it implies.
What’s different about programming?
The main reason why these tools may seem to be better at coding than other tasks is due to constraints. Programming languages have very precise syntax, they have compilers, linters, automated testing frameworks, feedback mechanisms (output, logging, etc.). All of these constraints provide guardrails that keep the LLM’s propensity for hallucination in check. LLM’s can hallucinate incorrect syntax, non-existent api calls (and they often do), but they don’t get far because the code won’t compile. They can hallucinate sloppy code and incorrect implementations, but the linters will alarm, and the tests will fail. Further, these failures provide information back to the LLM’s so they can correct their mistakes (trial and error style) until they get something that compiles, passes static analysis checks, and passes the tests. While these aren’t perfect and don’t guarantee success, they are much more strict and provide much more iterative feedback than a legal document, or medical advice, much earlier and more cost-effectively.
It may not be that they are “better a writing code”, they are just more constrained. Some of their unreliability can be mitigated. That has the advantage of making more sense, as well as demonstrating how they can be made safer both in software development and beyond.